Machine Learning and Data Science: Introduction


By Dr. Donald Kinghorn, Research and Development, Puget Systems

This is the first post in a series that I have wanted to do for some time. Machine Learning and Data Science, fascinating! This is one of the most interesting areas of endeavor today. It has truly blossomed over the last few years. There are numerous high quality tools and frameworks for handling and working with data sets, a seemingly endless number of application domains, and lots and lots of data. Data Science has become “a thing”. Many Universities are offering graduate programs in “Data Science”. It has existed for ages as part of Statistics, Informatics, Business Intelligence/Analytics, Applied Mathematics, etc., but it is now taking on a multidisciplinary life of its own.

In this series I’ll be exploring the algorithms and tools of Machine Learning and Data Science. It will be tutorials, guides, how-to, reviews, “real world” application, and whatever I feel like writing about. It will be documenting my own journey. Even though I am familiar with the mathematics many of the algorithms and tools are new to me. This is partly a revival of the fascination I had with neural networks as a graduate student in the mid 1990’s but never had the time to pursue. It’s about time I did it!

What is Machine Learning

I wrote a blog post in 2016 titled What is Machine Learning. I looked at it again and decided I like the definition I came up with so here it is.

Machine Learning — Machine Learning (ML) is a multidisciplinary field focused on implementing computer algorithms capable of drawing predictive insight from static or dynamic data sources using analytic or probabilistic models and using refinement via training and feedback. It make use of pattern recognition, artificial intelligence learning methods, and statistical data modeling. Learning is achieved in two primary ways; Supervised learning, where the desired outcome is known and an annotated data set or measured values are used to train/fit a model to accurately predict values or labels outside of the training set. This is basically “regression” for values and “classification” for labels. Unsupervised learning uses “unlabeled” data and seeks to find inherent partitions or clusters of characteristics present in the data. ML draws from the fields of computer science, statistics, probability, applied mathematics, optimization, information theory, graph and network theory, biology, and neuroscience.

Read the source article at Puget Systems.