Main Content

simulationEnsembleDatastore

Manage ensemble data generated by generateSimulationEnsemble or by logging simulation data in Simulink

Description

A simulationEnsembleDatastore object is a datastore specialized for use in developing algorithms for condition monitoring and predictive maintenance using simulated data.

This object specifies the data variables, independent variables, and condition variables stored in a collection of MATLAB® data files (MAT files). The data files contain Simulink.SimulationData.Dataset variables that are the result of logging data during Simulink® model simulation.

For a detailed example illustrating the use of a simulated ensemble datastore, see Generate and Use Simulated Data Ensemble. For general information about data ensembles in Predictive Maintenance Toolbox™, see Data Ensembles for Condition Monitoring and Predictive Maintenance.

Creation

To create a simulationEnsembleDatastore object:

  1. Generate and log simulation data from a Simulink model. You can do so using generateSimulationEnsemble or any other means of logging simulation to disk.

  2. Create a simulationEnsembleDatastore object that points to the generated simulation data using the simulationEnsembleDatastore command (described below).

If you have simulation data previously generated with generateSimulationEnsemble or other means, you can use the creation function simulationEnsembleDatastore to create a new simulation ensemble datastore object at any time.

Description

ensemble = simulationEnsembleDatastore(location) creates a simulation ensemble from data previously generated using generateSimulationEnsemble in the folder specified by location. The function identifies ensemble variables in the generated data from information stored in the generated MAT files. The function populates the DataVariables and SelectedVariables properties of ensemble with the names of these ensemble variables.

example

ensemble = simulationEnsembleDatastore(location,signallog) uses signallog to determine which variable in the MAT files contains logged signals. Use the variable name specified in the Signal logging configuration parameter of the Simulink model from which the data is generated. Specifying this variable allows the ensemble to treat those signals as ensemble data variables, rather than the signallog variable itself. The other variables in the MAT file are also returned as ensemble data variables.

example

ensemble = simulationEnsembleDatastore(location,signallog,Name,Value) specifies additional properties of the object using one or more name-value pair arguments. For example, using 'IndependentVariables',["Age";"ID"] specifies the independent variables when you create the object.

Input Arguments

expand all

File path to the location in which to store simulation data, specified as a string or a character vector. The file path can be any location supported by MATLAB datastores, including an IRI path pointing to a remote location. However, when you use a simulationEnsembleDatastore to manage remote data, you cannot use writeToLastMemberRead to add data to the ensemble datastore. For more information about working with remote data in MATLAB, see Work with Remote Data.

Example: pwd + "\simResults"

Variable name of logged signals, specified as a string or a character vector. This input argument tells simulationEnsembleDatastore which data variable in the stored MAT files contains the logged simulation data. This variable name is specified in the Signal logging configuration parameter of the Simulink model from which the data is generated. When you use generateSimulationEnsemble to generate simulation data for the ensemble, each generated MAT file contains a variable, PMSignalLogName, specifying the variable name of the logged signals.

Example: "logsout"

Properties

expand all

Data variables in the ensemble, specified as a string array. Data variables are the main content of the members of an ensemble. Data variables can include measured data or derived data for analysis and development of predictive maintenance algorithms. For example, your data variables might include measured or simulated vibration signals and derived values such as mean vibration value or peak vibration frequency. In practice, your data variables, independent variables, and condition variables are all distinct sets of variables.

simulationEnsembleDatastore sets the initial value of DataVariables to the names of all the logged signals in the data generated with generateSimulationEnsemble. simulationEnsembleDatastore also adds the variables SimulationInput and SimulationMetadata to DataVariables. These variables contain information about how the simulation was performed.

You can also specify DataVariables using a cell array of character vectors, such as {'Vibration';'Tacho'}, but the variable names are always stored as a string array, ["Vibration";"Tacho"]. If you specify a matrix of variable names, the matrix is flattened to a column vector.

Independent variables in the ensemble, specified as a string array. You typically use independent variables to order the members of an ensemble. Examples are timestamps, number of operating hours, or miles driven. Set this property to the names of such variables in your ensemble. In practice, your data variables, independent variables, and condition variables are all distinct sets of variables.

You can also specify IndependentVariables using a cell array of character vectors, such as {'Time';'Age'}, but the variable names are always stored as a string array, ["Time";"Age"]. If you specify a matrix of variable names, the matrix is flattened to a column vector.

Condition variables in the ensemble, specified as a string array. Use condition variables to label the members in a ensemble according to the fault condition or other operating condition under which the ensemble member was collected. In practice, your data variables, independent variables, and condition variables are all distinct sets of variables.

You can also specify ConditionVariables using a cell array of character vectors, such as {'GearFault';'Temperature'}, but the variable names are always stored as a string array, ["GearFault";"Temperature"]. If you specify a matrix of variable names, the matrix is flattened to a column vector.

Variables to read from the ensemble, specified as a string array. Use this property to specify which variables are extracted to the MATLAB workspace when you use the read command to read data from the ensemble. read returns a table row containing a table variable for each name specified in SelectedVariables. For example, suppose that you have an ensemble, ensemble, that contains six variables, and you want to read only two of them, Vibration and FaultState. Set the SelectedVariables property and call read.

ensemble.SelectedVariables = ["Vibration";"FaultState"];
data = read(ensemble)

SelectedVariables can be any combination of the variables in the DataVariables, ConditionVariables, and IndependentVariables properties. If SelectedVariables is empty, read generates an error.

simulationEnsembleDatastore sets the initial value of SelectedVariables to the names of all the logged signals in the data generated generateSimulationEnsemble.

You can specify SelectedVariables using a cell array of character vectors, such as {'Vibration';'Tacho'}, but the variable names are always stored as a string array, ["Vibration";"Tacho"]. If you specify a matrix of variable names, the matrix is flattened to a column vector.

Number of members to read from the ensemble datastore at once, specified as a positive integer that is smaller than the total number of members in the ensemble. By default, the read command returns a one-row table containing data from one ensemble member. To read data from multiple members in a single read operation, set this property to an integer value greater than one. For example, if ReadSize = 3, then read returns a three-row table where each row contains data from a different ensemble member. If fewer than ReadSize members are unread, then read returns a table with as many rows as there are remaining members.

The ensemble datastore property LastMemberRead contains the names of all files read during the most recent read operation. Thus, for instance, if ReadSize = 3, then a read operation sets LastMemberRead to a string vector containing three file names.

When you use writeToLastMemberRead, specify the data to write as a table with a number of rows equal to ReadSize. The writeToLastMemberRead command updates the members specified by LastMemberRead, writing one table row to each specified file.

Changing the ReadSize property also resets the ensemble to its unread state. For instance, suppose that you read some ensemble members one at a time (ReadSize = 1), and then change ReadSize to 3. The next read operation returns data from the first three ensemble members.

This property is read-only.

Number of members in the ensemble, specified as a positive integer.

This property is read-only.

File name of last ensemble member read into the MATLAB workspace, specified as a string. When you use the read command to read data from an ensemble datastore, the software determines which ensemble member to read next, and reads data from the corresponding file. The LastMemberRead property contains the path to the most recently read file. When the ensemble datastore has not yet been read, or has been reset, LastMemberRead is an empty string.

When you call writeToLastMemberRead to add data back to the ensemble datastore, that function writes to the file specified in LastMemberRead.

By default, read reads data from one ensemble member at a time (the ReadSize property of the ensemble datastore is 1). When ReadSize > 1, LastMemberRead is a string array containing the paths to all files read in the most recent read operation.

This property is read-only.

List of files in the ensemble datastore, specified as a column string vector of length NumMembers. Each entry contains the full path to a file in the datastore. The files are in the order in which the read command reads ensemble members.

Example: ["C:\Data\Data_01.csv"; "C:\Data\Data_02.csv"; "C:\Data\Data_03.csv"]

Object Functions

The read and writeToLastMemberRead functions are specialized for Predictive Maintenance Toolbox ensemble data. Other functions, such as reset and hasdata, are identical to those used with datastore objects in MATLAB. To extract specific ensemble members into a smaller or more specialized ensemble datastore, use subset. To transfer all the member data into a table or cell array with a single command, use readall. To partition an ensemble datastore, use the partition(ds,n,index) syntax of the partition function.

readRead member data from an ensemble datastore
writeToLastMemberReadWrite data to member of an ensemble datastore
subsetCreate new ensemble datastore from subset of existing ensemble datastore
resetReset datastore to initial state
hasdataDetermine if data is available to read
progress Determine how much data has been read
readallRead all data in datastore
numpartitionsNumber of datastore partitions
partitionPartition a datastore
tallCreate tall array
transformTransform datastore
isPartitionableDetermine whether datastore is partitionable
isShuffleableDetermine whether datastore is shuffleable

Examples

collapse all

Generate a simulation ensemble datastore of data representing a machine operating under fault conditions by simulating a Simulink® model of the machine while varying a fault parameter.

Load the Simulink model. This model is a simplified version of the gear-box model described in Using Simulink to Generate Fault Data. For this example, only one fault mode is modeled, a gear-tooth fault.

mdl = 'TransmissionCasingSimplified';
open_system(mdl)

The gear-tooth fault is modeled as a disturbance in the Gear Tooth fault subsystem. The magnitude of the disturbance is controlled by the model variable ToothFaultGain, where ToothFaultGain = 0 corresponds to no gear-tooth fault (healthy operation). To generate the ensemble of fault data, you use generateSimulationEnsemble to simulate the model at different values of ToothFaultGain, ranging from -2 to zero. This function uses an array of Simulink.SimulationInput objects to configure the Simulink model for each member in the ensemble. Each simulation generates a separate member of the ensemble in its own data file. Create such an array, and use setVariable to assign a tooth-fault gain value for each run.

toothFaultValues  = -2:0.5:0; % 5 ToothFaultGain values 

for ct = numel(toothFaultValues):-1:1
    simin(ct) = Simulink.SimulationInput(mdl);
    simin(ct) = setVariable(simin(ct),'ToothFaultGain',toothFaultValues(ct));
end

For this example, the model is already configured to log certain signal values, Vibration and Tacho (see Save Signal Data Using Signal Logging (Simulink)). generateSimulationEnsemble further configures the model to:

  • Save logged data to files in the folder you specify.

  • Use the timetable format for signal logging.

  • Store each Simulink.SimulationInput object in the saved file with the corresponding logged data.

Specify a location for the generated data. For this example, save the data to a folder called Data within your current folder. The indicator status is 1 (true) if all the simulations complete without error.

mkdir Data
location = fullfile(pwd,'Data');
[status,E] = generateSimulationEnsemble(simin,location);
[05-Sep-2024 18:49:28] Running simulations...
[05-Sep-2024 18:49:33] Completed 1 of 5 simulation runs
[05-Sep-2024 18:49:35] Completed 2 of 5 simulation runs
[05-Sep-2024 18:49:37] Completed 3 of 5 simulation runs
[05-Sep-2024 18:49:39] Completed 4 of 5 simulation runs
[05-Sep-2024 18:49:42] Completed 5 of 5 simulation runs

Inside the Data folder, examine one of the files. Each file is a MAT file containing the following MATLAB® variables:

  • SimulationInput — The Simulink.SimulationInput object that was used to configure the model for generating the data in the file. You can use this to extract information about the conditions (such as faulty or healthy) under which this simulation was run.

  • logsout — A Dataset object containing all the data that the Simulink model is configured to log.

  • PMSignalLogName — The name of the variable that contains the logged data ('logsout' in this example). The simulationEnsembleDatastore command uses this name to parse the data in the file.

  • SimulationMetadata — Other information about the simulation that generated the data logged in the file.

Now you can create the simulation ensemble datastore using the generated data. The resulting simulationEnsembleDatastore object points to the generated data. The object lists the data variables in the ensemble, and by default all the variables are selected for reading. Examine the DataVariables and SelectedVariables properties of the ensemble to confirm these designations.

ensemble = simulationEnsembleDatastore(location)
ensemble = 
  simulationEnsembleDatastore with properties:

           DataVariables: [4x1 string]
    IndependentVariables: [0x0 string]
      ConditionVariables: [0x0 string]
       SelectedVariables: [4x1 string]
                ReadSize: 1
              NumMembers: 5
          LastMemberRead: [0x0 string]
                   Files: [5x1 string]

ensemble.DataVariables
ans = 4x1 string
    "SimulationInput"
    "SimulationMetadata"
    "Tacho"
    "Vibration"

ensemble.SelectedVariables
ans = 4x1 string
    "SimulationInput"
    "SimulationMetadata"
    "Tacho"
    "Vibration"

You can now use ensemble to read and analyze the generated data in the ensemble datastore. See simulationEnsembleDatastore for more information.

In general, you use the read command to extract data from a simulationEnsembleDatastore object into the MATLAB® workspace. Often, your ensemble contains more variables than you need to use for a particular analysis. Use the SelectedVariables property of the simulationEnsembleDatastore object to select a subset of variables for reading.

For this example, use the following code to create a simulationEnsembleDatastore object using data previously generated by running a Simulink® model at a various fault values (See generateSimulationEnsemble.). The ensemble includes simulation data for five different values of a model parameter, ToothFaultGain. Because of the volume of data, the unzip operation takes a few minutes.

unzip simEnsData.zip  % extract compressed files
ensemble = simulationEnsembleDatastore(pwd,'logsout')
ensemble = 
  simulationEnsembleDatastore with properties:

           DataVariables: [5x1 string]
    IndependentVariables: [0x0 string]
      ConditionVariables: [0x0 string]
       SelectedVariables: [5x1 string]
                ReadSize: 1
              NumMembers: 5
          LastMemberRead: [0x0 string]
                   Files: [5x1 string]

The model that generated the data, TransmissionCasingSimplified, was configured such that the resulting ensemble contains variables including accelerometer data, Vibration, and tachometer data, Tacho. By default, the simulationEnsembleDatastore object designates all these variables as both data variables and selected variables, as shown in the DataVariables and SelectedVariables properties.

ensemble.DataVariables
ans = 5x1 string
    "PMSignalLogName"
    "SimulationInput"
    "SimulationMetadata"
    "Tacho"
    "Vibration"

ensemble.SelectedVariables
ans = 5x1 string
    "PMSignalLogName"
    "SimulationInput"
    "SimulationMetadata"
    "Tacho"
    "Vibration"

Suppose that for the analysis you want to do, you need only the Vibration data and the Simulink.SimulationInput object that describes the conditions under which this member data was simulated. Set ensemble.SelectedVariables to specify the variables you want to read. The read command then extracts those variables from the current ensemble member.

ensemble.SelectedVariables = ["Vibration";"SimulationInput"];
data1 = read(ensemble)
data1=1×2 table
         Vibration                SimulationInput        
    ___________________    ______________________________

    {20202x1 timetable}    {1x1 Simulink.SimulationInput}

data.Vibration is a cell array containing one timetable that stores the simulation times and the corresponding vibration signal. You can now process this data as needed. For instance, extract the vibration data from the table and plot it.

vibdata1 = data1.Vibration{1};
plot(vibdata1.Time,vibdata1.Data)
title('Vibration - First Ensemble Member')

Figure contains an axes object. The axes object with title Vibration - First Ensemble Member contains an object of type line.

The next time you call read on this ensemble, the last-read member designation advances to the next member of the ensemble (see Data Ensembles for Condition Monitoring and Predictive Maintenance). Read the selected variables from the next member of the ensemble.

data2 = read(ensemble)
data2=1×2 table
         Vibration                SimulationInput        
    ___________________    ______________________________

    {20215x1 timetable}    {1x1 Simulink.SimulationInput}

To confirm that data1 and data2 contain data from different ensemble members, examine the values of the varied model parameter, ToothFaultGain. For each ensemble, this value is stored in the Variables field of the SimulationInput variable.

data1.SimulationInput{1}.Variables
ans = 
  Variable with properties:

           Name: 'ToothFaultGain'
          Value: -2
      Workspace: 'global-workspace'
        Context: ''
    Description: ""

data2.SimulationInput{1}.Variables
ans = 
  Variable with properties:

           Name: 'ToothFaultGain'
          Value: -1.5000
      Workspace: 'global-workspace'
        Context: ''
    Description: ""

This result confirms that data1 is from the ensemble member with ToothFaultGain = –2, and data2 is from the member with ToothFaultGain = –1.5.

You can process data in an ensemble datastore and add derived variables to the ensemble members. For this example, process a variable value to compute a label that indicates whether the ensemble member contains data obtained with a fault present. You then add that label to the ensemble.

For this example, use the following code to create a simulationEnsembleDatastore object using data previously generated by running a Simulink® model at a various fault values. (See generateSimulationEnsemble.) The ensemble includes simulation data for five different values of a model parameter, ToothFaultGain. The model was configured to log the simulation data to a variable named logsout in the MAT files that are stored for this example in simEnsData.zip. Because of the volume of data, the unzip operation might take a minute or two.

unzip simEnsData.zip  % extract compressed files
ensemble = simulationEnsembleDatastore(pwd,'logsout')
ensemble = 
  simulationEnsembleDatastore with properties:

           DataVariables: [5x1 string]
    IndependentVariables: [0x0 string]
      ConditionVariables: [0x0 string]
       SelectedVariables: [5x1 string]
                ReadSize: 1
              NumMembers: 5
          LastMemberRead: [0x0 string]
                   Files: [5x1 string]

Read the data from the first member in the ensemble. The software determines which ensemble is the first member, and updates the property ensemble.LastMemberRead to reflect the name of the corresponding file.

data = read(ensemble)
data=1×5 table
    PMSignalLogName           SimulationInput                   SimulationMetadata                   Tacho                Vibration     
    _______________    ______________________________    _________________________________    ___________________    ___________________

      {'logsout'}      {1x1 Simulink.SimulationInput}    {1x1 Simulink.SimulationMetadata}    {20202x1 timetable}    {20202x1 timetable}

By default, all the variables stored in the ensemble data are designated as SelectedVariables. Therefore, the returned table row includes all ensemble variables, including a variable SimulationInput, which contains the Simulink.SimulationInput object that configured the simulation for this ensemble member. That object includes the ToothFaultGain value used for the ensemble member, stored in a data structure in its Variables property. Examine that value. (For more information about how the simulation configuration is stored, see Simulink.SimulationInput (Simulink).)

data.SimulationInput{1}
ans = 
  SimulationInput with properties:

               ModelName: 'TransmissionCasingSimplified'
            InitialState: [0x0 Simulink.op.ModelOperatingPoint]
           ExternalInput: []
         ModelParameters: [0x0 Simulink.Simulation.ModelParameter]
         BlockParameters: [0x0 Simulink.Simulation.BlockParameter]
               Variables: [1x1 Simulink.Simulation.Variable]
               PreSimFcn: []
              PostSimFcn: []
              UserString: ''
    VariantConfiguration: ''

Inputvars = data.SimulationInput{1}.Variables;
Inputvars.Name
ans = 
'ToothFaultGain'
Inputvars.Value
ans = 
-2

Suppose that you want to convert the ToothFaultGain values for each ensemble member into a binary indicator of whether or not a tooth fault is present. Suppose further that you know from your experience with the system that tooth-fault gain values less than 0.1 in magnitude are small enough to be considered healthy operation. Convert the gain value for this ensemble into an indicator that is 0 (no fault) for –0.1 < gain < 0.1, and 1 (fault) otherwise.

sT = abs(Inputvars.Value) < 0.1;

To append the new tooth-fault indicator to the corresponding ensemble data, first expand the list of data variables in the ensemble to include a variable for the indicator.

ensemble.DataVariables = [ensemble.DataVariables; "ToothFault"];
ensemble.DataVariables
ans = 6x1 string
    "PMSignalLogName"
    "SimulationInput"
    "SimulationMetadata"
    "Tacho"
    "Vibration"
    "ToothFault"

This operation is conceptually equivalent to adding a column to the table of ensemble data. Now that DataVariables contains the new variable name, assign the derived value to that column of the member using writeToLastMemberRead.

writeToLastMemberRead(ensemble,'ToothFault',sT);

In practice, you want to append the tooth-fault indicator to every member in the ensemble. To do so, reset the ensemble datastore to its unread state, so that the next read operation starts at the first ensemble member. Then, loop through all the ensemble members, computing ToothFault for each member and appending it. The reset operation does not change ensemble.DataVariables, so "ToothFault" is still present in that list.

reset(ensemble);

sT = false; 
while hasdata(ensemble)
    data = read(ensemble);
    InputVars = data.SimulationInput{1}.Variables;
    TFGain = InputVars.Value;
    sT = abs(TFGain) < 0.1;
    writeToLastMemberRead(ensemble,'ToothFault',sT);
end

Finally, designate the new tooth-fault indicator as a condition variable in the ensemble datastore. You can use this designation to track and refer to variables in the ensemble data that represent conditions under which the member data was generated.

ensemble.ConditionVariables = {"ToothFault"};
ensemble.ConditionVariables
ans = 
"ToothFault"

You can add the new variable to ensemble.SelectedVariables when you want to read it out for further analysis. For an example that shows more ways to manipulate and analyze data stored in a simulationEnsembleDatastore object, see Using Simulink to Generate Fault Data.

To read data from multiple ensemble members in one call to the read command, use the ReadSize property of an ensemble datastore. This example uses simulationEnsembleDatastore, but you can use the same technique for fileEnsembleDatastore.

Use the following code to create a simulationEnsembleDatastore object using data previously generated by running a Simulink model at a various fault values (see generateSimulationEnsemble). The ensemble includes simulation data for five different values of a model parameter, ToothFaultGain. (Because of the volume of data, the unzip operation might take a minute or two.) Specify some of the data variables to read.

unzip simEnsData.zip  % extract compressed files
ensemble = simulationEnsembleDatastore(pwd,'logsout');
ensemble.SelectedVariables = ["Vibration";"SimulationInput"];

By default, calling read on this ensemble datastore returns a single-row table containing the values of the Vibration and SimulationInput variables for the first ensemble member. Change the ReadSize property to read three members at once.

ensemble.ReadSize = 3;
data1 = read(ensemble)
data1=3×2 table
         Vibration                SimulationInput        
    ___________________    ______________________________

    {20202x1 timetable}    {1x1 Simulink.SimulationInput}
    {20215x1 timetable}    {1x1 Simulink.SimulationInput}
    {20204x1 timetable}    {1x1 Simulink.SimulationInput}

read returns a three-row table, where each row contains data from one of the first, second, and third ensemble members. read also updates the LastReadMember property of the ensemble datastore to a string array containing the paths of the three corresponding files. Avoid setting ReadSize to a value so large as to risk running out of memory while loading the data.

If the ensemble contains three or more additional members, the next read operation returns data from the fourth, fifth, and sixth members. Because the ensemble of this example contains only five members total, the next read operation returns only two rows.

data2 = read(ensemble)
data2=2×2 table
         Vibration                SimulationInput        
    ___________________    ______________________________

    {20213x1 timetable}    {1x1 Simulink.SimulationInput}
    {20224x1 timetable}    {1x1 Simulink.SimulationInput}

Version History

Introduced in R2018a