Chapter 1

Introduction to Machine Learning


More Data, More Questions, Better Answers

Machine learning algorithms find natural patterns in data that generate insight and help you make better decisions and predictions. They are used every day to make critical decisions in medical diagnosis, stock trading, energy load forecasting, and more. Media sites rely on machine learning to sift through millions of options to give you song or movie recommendations. Retailers use it to gain insight into their customers’ purchasing behavior.

Automotive and manufacturing, for predictive maintenance

Computational finance, for credit scoring and algorithmic trading

Image processing and computer vision for face recognition and object detection

Computational biology, for tumor detection, drug discovery, and DNA sequencing

Energy production, for price and load forecasting

Natural language processing

Real-World Applications:

Video length is 3:51

How Machine Learning Works

Machine learning uses two types of techniques: supervised learning, which trains a model on known input and output data so that it can predict future outputs, and unsupervised learning, which finds hidden patterns or intrinsic structures in input data.

Classification techniques predict discrete responses—for example, whether an email is genuine or spam, or whether a tumor is cancerous or benign. Classification models classify input data into categories. Typical applications include medical imaging, speech recognition, and credit scoring.

Regression techniques predict continuous responses—for example, changes in temperature or fluctuations in power demand. Typical applications include electricity load forecasting and algorithmic trading.

Unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw inferences from data sets consisting of input data without labeled responses.

Clustering is the most common unsupervised learning technique. It is used for exploratory data analysis to find hidden patterns or groupings in data.

Applications for clustering include gene sequence analysis, market research, and object recognition.

How Do You Decide Which Algorithm to Use?

Choosing the right algorithm can seem overwhelming—there are dozens of supervised and unsupervised machine learning algorithms, and each takes a different approach to learning. There is no best method or one-size-fits-all. Finding the right algorithm is partly just trial and error—even highly experienced data scientists can’t tell whether an algorithm will work without trying it out. But algorithm selection also depends on the size and type of data you’re working with, the insights you want to get from the data, and how those insights will be used.

  • Support Vector Machines
  • Discriminant Analysis
  • Naive Bayes
  • Nearest Neighbor

  • Linear Regression, GLM
  • SVR, GPR
  • Ensemble Methods
  • Decision Trees
  • Neural Networks

  • K-Means, K-Mediods
  • Fuzzy C-Means
  • Hierarchical
  • Gaussian Mixture
  • Neural Networks
  • Hidden Markov Model

When Should You Use Machine Learning?

Consider using machine learning when you have a complex task or problem involving a large amount of data and lots of variables, but no existing formula or equation. For example, machine learning is a good option if you need to handle situations like these.

Handwritten rules and equations are too complex—as in face recognition and speech recognition.

The nature of the data keeps changing, and the program needs to adapt—as in automated trading, energy demand forecasting, and predicting shopping trends.

The rules of a task are constantly changing—as in fraud detection from transaction records.