Main Content

Statistics and Machine Learning Toolbox

Analyze and model data using statistics and machine learning

Statistics and Machine Learning Toolbox™ provides functions and apps to describe, analyze, and model data. You can use descriptive statistics, visualizations, and clustering for exploratory data analysis, fit probability distributions to data, generate random numbers for Monte Carlo simulations, and perform hypothesis tests. Regression and classification algorithms let you draw inferences from data and build predictive models either interactively, using the Classification and Regression Learner apps, or programmatically, using AutoML.

For multidimensional data analysis and feature extraction, the toolbox provides principal component analysis (PCA), regularization, dimensionality reduction, and feature selection methods that let you identify variables with the best predictive power.

The toolbox provides supervised, semi-supervised and unsupervised machine learning algorithms, including support vector machines (SVMs), boosted decision trees, k-means, and other clustering methods. You can apply interpretability techniques such as partial dependence plots and LIME, and automatically generate C/C++ code for embedded deployment. Many toolbox algorithms can be used on data sets that are too big to be stored in memory.

Get Started

Learn the basics of Statistics and Machine Learning Toolbox

Descriptive Statistics and Visualization

Data import and export, descriptive statistics, visualization

Probability Distributions

Data frequency models, random sample generation, parameter estimation

Hypothesis Tests

t-test, F-test, chi-square goodness-of-fit test, and more

Cluster Analysis and Anomaly Detection

Unsupervised learning techniques to find natural groupings, patterns, and anomalies in data

ANOVA

Analysis of variance and covariance, multivariate ANOVA, repeated measures ANOVA

Regression

Linear, generalized linear, nonlinear, and nonparametric techniques for supervised learning

Classification

Supervised and semi-supervised learning algorithms for binary and multiclass problems

Dimensionality Reduction and Feature Extraction

PCA, factor analysis, feature selection, feature extraction, and more

Industrial Statistics

Design of experiments (DOE); survival and reliability analysis; statistical process control

Analysis of Big Data with Tall Arrays

Analyze out-of-memory data

Speed Up Statistical Computations

Parallel or distributed computation of statistical functions

Code Generation

Generate C/C++ code and MEX functions for Statistics and Machine Learning Toolbox functions

Statistics and Machine Learning Applications

Apply statistics and machine learning methods to industry-specific workflows