Main Content

Statistics and Machine Learning Toolbox Example Data Sets

Statistics and Machine Learning Toolbox™ includes a variety of data sets with different file formats and sizes. These data sets are used in documentation examples to demonstrate software capabilities. This topic summarizes and describes some of the available data sets, but is not a comprehensive list.

Data Sets Available with Product Installation

This list describes data sets available when you install Statistics and Machine Learning Toolbox. The File Contents column displays the output of the whos command, which you can enter after you load the file into the workspace.

FilenameDescriptionHow to LoadFile Contents
acetylene.mat

Chemical reaction data with correlated predictors

load acetylene.mat

  Name              Size             Bytes  Class     Attributes

  Description      16x105             3360  char                
  x1               16x1                128  double              
  x2               16x1                128  double              
  x3               16x1                128  double              
  y                16x1                128  double              
For more information, read the Description variable.

carbig.mat

Measurements of cars from 1970–1982

load carbig.mat

  Name                Size            Bytes  Class     Attributes

  Acceleration      406x1              3248  double              
  Cylinders         406x1              3248  double              
  Displacement      406x1              3248  double              
  Horsepower        406x1              3248  double              
  MPG               406x1              3248  double              
  Mfg               406x13            10556  char                
  Model             406x36            29232  char                
  Model_Year        406x1              3248  double              
  Origin            406x7              5684  char                
  Weight            406x1              3248  double              
  cyl4              406x5              4060  char                
  org               406x7              5684  char                
  when              406x5              4060  char                

carsmall.mat

Subset of carbig.mat containing measurements of cars from 1970, 1976, and 1982

load carsmall.mat

  Name                Size            Bytes  Class     Attributes

  Acceleration      100x1               800  double              
  Cylinders         100x1               800  double              
  Displacement      100x1               800  double              
  Horsepower        100x1               800  double              
  MPG               100x1               800  double              
  Mfg               100x13             2600  char                
  Model             100x33             6600  char                
  Model_Year        100x1               800  double              
  Origin            100x7              1400  char                
  Weight            100x1               800  double              

census1994.mat

US Census Bureau demographic data from the UCI machine learning repository

load census1994.mat

  Name                 Size              Bytes  Class    Attributes

  Description         20x74               2960  char               
  adultdata        32561x15            1872566  table              
  adulttest        16281x15             944466  table              
For more information, read the Description variable.

cereal.mat

Breakfast cereal ingredients

load cereal.mat

  Name            Size            Bytes  Class     Attributes

  Calories       77x1               616  double              
  Carbo          77x1               616  double              
  Cups           77x1               616  double              
  Fat            77x1               616  double              
  Fiber          77x1               616  double              
  Mfg            77x1               154  char                
  Name           77x1             10288  cell                
  Potass         77x1               616  double              
  Protein        77x1               616  double              
  Shelf          77x1               616  double              
  Sodium         77x1               616  double              
  Sugars         77x1               616  double              
  Type           77x1               616  double              
  Variables      15x2              4134  cell                
  Vitamins       77x1               616  double              
  Weight         77x1               616  double              

cities.mat

Quality-of-life ratings for US metropolitan areas, given in [4]

load cities.mat

  Name              Size            Bytes  Class     Attributes

  categories        9x14              252  char                
  names           329x43            28294  char                
  ratings         329x9             23688  double              

discrim.mat

A version of cities.mat used for discriminant analysis

load discrim.mat

  Name              Size            Bytes  Class     Attributes

  big              26x43             2236  char                
  categories        9x14              252  char                
  group           329x1              2632  double              
  idx              26x1               208  double              
  names           329x43            28294  char                
  ratings         329x9             23688  double              

examgrades.mat

Exam grades on a scale of 0–100

load examgrades.mat

  Name          Size            Bytes  Class     Attributes

  grades      120x5              4800  double              

fisheriris.mat or fisheriris.csv

Fisher's 1936 iris data

load fisheriris.mat

  Name           Size            Bytes  Class     Attributes

  meas         150x4              4800  double              
  species      150x1             18100  cell                

fisheriris = readtable("fisheriris.csv");

  Name              Size            Bytes  Class    Attributes

  fisheriris      150x5             24805  table              

flu.mat

ILI (influenza-like illness) percentage estimated by Google Flu Trends for various regions of the US, and the CDC-weighted ILI percentage based on sentinel provider reports

load flu.mat

  Name              Size             Bytes  Class      Attributes

  Description       1x306              612  char                 
  flu              52x11             14640  dataset              
For more information, read the Description variable.

gas.mat

Gasoline prices in the state of Massachusetts in 1993

load gas.mat

  Name         Size            Bytes  Class     Attributes

  price1      20x1               160  double              
  price2      20x1               160  double              

hald.mat

Heat of cement vs. mix of ingredients

load hald.mat

  Name              Size            Bytes  Class     Attributes

  Description      22x58             2552  char                
  hald             13x5               520  double              
  heat             13x1               104  double              
  ingredients      13x4               416  double              
For more information, read the Description variable.

hogg.mat

Bacteria counts in different shipments of milk

load hogg.mat

  Name      Size            Bytes  Class     Attributes

  hogg      6x5               240  double              
  x1        6x1                48  double              
  x2        6x1                48  double              
  x3        6x1                48  double              
  x4        6x1                48  double              
  x5        6x1                48  double              

hospital.xls or hospital.mat

Simulated hospital data

hospital = readtable("hospital.xls");

  Name            Size            Bytes  Class    Attributes

  hospital      100x12            44579  table              

load hospital.mat

  Name               Size            Bytes  Class      Attributes

  Description        1x23               46  char                 
  hospital         100x7             43784  dataset              
For more information, read the Description variable.

imports-85.mat

1985 Auto Imports Database from the UCI machine learning repository

load imports-85.mat

  Name               Size            Bytes  Class     Attributes

  Description        9x79             1422  char                
  X                205x26            42640  double              
For more information, read the Description variable.

indomethacin.mat

Concentrations of the drug indomethacin in the bloodstream of 6 subjects over 8 hours

load indomethacin.mat

  Name                Size            Bytes  Class     Attributes

  Description        14x50             1400  char                
  concentration      66x1               528  double              
  subject            66x1               528  double              
  time               66x1               528  double              
For more information, read the Description variable.

ionosphere.mat

Ionosphere data set from the UCI machine learning repository

load ionosphere.mat

  Name               Size            Bytes  Class     Attributes

  Description        5x79              790  char                
  X                351x34            95472  double              
  Y                351x1             37206  cell                
For more information, read the Description variable.

kmeansdata.mat

Four-dimensional clustered data

load kmeansdata.mat

  Name        Size            Bytes  Class     Attributes

  X         560x4             17920  double              

lawdata.mat

Grade point averages and LSAT scores from 15 law schools

load lawdata.mat

  Name       Size            Bytes  Class     Attributes

  gpa       15x1               120  double              
  lsat      15x1               120  double              

mileage.mat

Mileage data for three car models from two factories

load mileage.mat

  Name         Size            Bytes  Class     Attributes

  mileage      6x3               144  double              

moore.mat

Biochemical oxygen demand on five predictors

load moore.mat

  Name        Size            Bytes  Class     Attributes

  moore      20x6               960  double              

morse.mat

Recognition of Morse code distinctions by non-coders

load morse.mat

  Name                  Size             Bytes  Class     Attributes

  Y0                   36x8               2304  double              
  dissimilarities       1x630             5040  double              
  morseChars           36x2               7824  cell                

parts.mat

Dimensional run-out on 36 circular parts

load parts.mat

  Name         Size            Bytes  Class     Attributes

  runout      36x4              1152  double              

polydata.mat

Sample data for polynomial fitting

load polydata.mat

  Name      Size             Bytes  Class     Attributes

  x         1x43               344  double              
  x1        1x101              808  double              
  y         1x43               344  double              
  y1        1x101              808  double              

popcorn.mat

Popcorn yield by popper type and brand

load popcorn.mat

  Name         Size            Bytes  Class     Attributes

  popcorn      6x3               144  double              

reaction.mat

Reaction kinetics for Hougen-Watson model

load reaction.mat

  Name            Size            Bytes  Class     Attributes

  beta            5x1                40  double              
  model           1x6                12  char                
  rate           13x1               104  double              
  reactants      13x3               312  double              
  xn              3x10               60  char                
  yn              1x13               26  char                

repeatedmeas.mat

Simulated repeated measures data

load repeatedmeas.mat

  Name          Size            Bytes  Class    Attributes

  between      30x12             6415  table              
  within        8x2              1863  table              

stockreturns.mat

Simulated stock returns

load stockreturns.mat

  Name          Size            Bytes  Class     Attributes

  stocks      100x10             8000  double              

Data Sets Available with Specific Examples

This list describes some of the data sets available when you open specific Statistics and Machine Learning Toolbox examples. The list is not comprehensive. The File Contents column displays the output of the whos command, which you can enter after you load the file into the workspace.

FilenameDescriptionHow to LoadFile Contents
arrhythmia.mat

Patient information and response variables that indicate the presence or absence of cardiac arrhythmia

openExample("arrhythmia.mat")
load arrhythmia.mat

  Name               Size               Bytes  Class     Attributes

  Description        8x69                1104  char                
  VarNames           1x279              41570  cell                
  X                452x279            1008864  double              
  Y                452x1                 3616  double              
For more information, read the Description variable.

batterysmall.mat

Sensor data (voltage, current, and temperature) and state of charge for a Li-ion battery; a subset of the data in [1]

openExample("batterysmall.mat")
load batterysmall.mat

  Name                   Size              Bytes  Class     Attributes

  dataLarge              1x1             1886400  struct              
  testDataSmall       1319x6               65361  table               
  trainDataSmall      6773x6              327153  table               

CreditRating_Historical.dat

Financial ratios, industry sectors, and credit ratings for a list of corporate customers

openExample("CreditRating_Historical.dat")
creditrating = readtable("CreditRating_Historical.dat");

  Name                 Size             Bytes  Class    Attributes

  creditrating      3932x8             649029  table              

humanactivity.mat

Human activity recognition data for five activities: sitting, standing, walking, running, and dancing

openExample("humanactivity.mat")
load humanactivity.mat

  Name                 Size               Bytes  Class     Attributes

  Description         29x1                 5918  string              
  actid            24075x1               192600  double              
  actnames             1x5                  592  cell                
  feat             24075x60            11556000  double              
  featlabels          60x1                 8292  cell                
For more information, read the Description variable.

nlpdata.mat

Natural language processing data extracted from the MathWorks® documentation

openExample("nlpdata.mat")
load nlpdata.mat

  Name                 Size                  Bytes  Class          Attributes

  Description         26x68                   3536  char                     
  X                31572x34023            36716304  double         sparse    
  Y                31572x1                   33094  categorical              
  corpus           31572x1                 6149252  cell                     
  dictionary       34023x1                 4137912  cell                     
For more information, read the Description variable.

NYCHousing2015.mat

Information on the sales of properties in New York City in 2015

openExample("NYCHousing2015.mat")
load NYCHousing2015.mat

  Name                    Size               Bytes  Class    Attributes

  NYCHousing2015      91446x10            32103067  table              

ovariancancer.mat

Grouped observations on 4000 predictors for ovarian cancer, given in [2] and [3]

openExample("ovariancancer.mat")
load ovariancancer.mat

  Name        Size                Bytes  Class     Attributes

  grp       216x1                 25056  cell                
  obs       216x4000            3456000  single              

spectra.mat

NIR spectra and octane numbers for 60 gasoline samples

openExample("spectra.mat")
load spectra.mat

  Name              Size              Bytes  Class      Attributes

  Description      11x72               1584  char                 
  NIR              60x401            192480  double               
  octane           60x1                 480  double               
  spectra          60x2              195660  dataset              
For more information, read the Description variable.

References

[1] Kollmeyer, Phillip, Carlos Vidal, Mina Naguib, and Michael Skells. "LG 18650HG2 Li-ion Battery Data and Example Deep Neural Network xEV SOC Estimator Script." Mendeley 3 (March 2020). https://doi.org/10.17632/CP3473X7XV.3.

[2] Conrads, Thomas P., Vincent A. Fusaro, Sally Ross, Don Johann, Vinodh Rajapakse, Ben A. Hitt, Seth M. Steinberg, et al. "High-Resolution Serum Proteomic Features for Ovarian Cancer Detection." Endocrine-Related Cancer 11 (2004): 163–78.

[3] Petricoin, Emanuel F., Ali M. Ardekani, Ben A. Hitt, Peter J. Levine, Vincent A. Fusaro, Seth M. Steinberg, Gordon B. Mills, et al. “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.” The Lancet 359, no. 9306 (February 2002): 572–77.

[4] Boyer, Richard and Savageau, David. Rand McNally Places Rated Almanac. Rand McNally & Company, 1985.

Related Topics