Automated machine learning (AutoML)

What Is AutoML?

Automated machine learning (AutoML) automates and eliminates manual steps required to go from a data set to a predictive model. AutoML also lowers the level of expertise required to build accurate models, so you can use it whether you are an expert or have limited machine learning experience. By automating repetitive tasks, AutoML streamlines complex phases in the machine learning workflow, such as:

  • Data exploration and preprocessing: Identify variables with low predictive power and highly correlated variables that should be eliminated.
  • Feature extraction and selection: Extract features automatically and—among a large feature set—identify those with high predictive power.
  • Model selection and tuning: Automatically tune model hyperparameters and identify the best performing model.
  • Preparation for deployment: With code generation, you can transform high-level machine learning code into lower level languages like C/C++ for deploying on embedded devices with limited memory and low power consumption.
AutoML streamlines machine learning workflows.

Streamlining machine learning workflows with AutoML. Steps where AutoML applies are shown in light grey.

You can use MATLAB with AutoML to support many workflows, such as feature extraction and selection and model selection and tuning.

Feature Extraction and Selection

Feature extraction reduces the high dimensionality and variability present in the raw data and identifies variables that capture the salient and distinctive parts of the input signal. The process of feature engineering typically progresses from generating initial features from the raw data to selecting a small subset of the most suitable features. But feature engineering is an iterative process, and other methods such as feature transformation and dimensionality reduction can play a role.

Depending on the type of data, many approaches are available to generate features from raw data:

  • Wavelet scattering applies predefined wavelet and scaling filters to obtain low-variance features from signal and image data.
  • Unsupervised learning approaches, such as reconstruction ICA and sparse filtering, learn efficient representations by uncovering the independent components and optimizing for sparsity in the feature distribution.
  • Other functions for images and audio signals can be found in Computer Vision Toolbox™ and Audio Toolbox™.

Feature selection identifies a subset of features that still provide predictive power, but with fewer features and a smaller model. Various methods for automated feature selection are available, including ranking features by their predictive power and learning feature importance along with the model parameters. Other feature selection methods iteratively determine a set of features that optimize model performance.

Model Selection and Tuning

At the core of developing a comprehensive machine learning model is identifying which among the many available models performs best for the task at hand, and then tuning its hyperparameters to optimize performance. AutoML can optimize both model and associated hyperparameters in a single step. Efficient implementations of one-step model optimization apply meta learning to narrow the search for good models to a subset of candidate models based on characteristics of the features, and optimizes the hyperparameters for each of those candidate models efficiently by applying Bayesian optimization instead of the computationally more intensive grid and random searches.

If promising models are identified using other means (e.g., trial and error), its hyperparameters can be optimized individually by methods such as grid or random search, or Bayesian optimization as previously mentioned.

Once you have identified a performance model, you can deploy your optimized model without additional coding. To accomplish this task, apply automated code generation or integrate it within a simulation environment like Simulink®.

See also: Statistics and Machine Learning Toolbox, machine learning, supervised learning, feature extraction, feature selection, data fitting, wavelet transforms, Wavelet Toolbox, machine learning models, biomedical signal processing, Surrogate Optimization