Working with Big Data
This example shows how Simulink® models handle big data as input to and output from a simulation.
Big data refers to data that is too large to load into system memory all at once. Simulink simulations can produce big data as simulation output and consume big data as simulation input. Big data for both input and output is stored in a MAT-file on the hard disk. Only small chunks of this data are loaded into system memory at any time during simulation. This approach is known as streaming. Simulink simulations can stream data to and from a MAT-file. Streaming solves memory issues because the hard disk capacity of a system is typically much greater than the capacity of the random access memory.
The software uses logging to file to stream big data as the output of a simulation. Streaming from file then supplies big data as input to a simulation.
This example demonstrates strategies for big data simulations. To reduce the time required to run the example, the example uses a shorter simulation duration and generates less data than most big data simulations.
Open Example
Open the example, which uses the model named sldemo_mdlref_bus
.
openExample('simulink_features/WorkingWithBigDataExample');
If you close the model, you can reopen it from the example folder.
open_system('sldemo_mdlref_bus');
Set Up Logging to File
To stream output data to a MAT-file, enable logging to file by selecting the Configuration Parameters > Data Import/Export > Log Dataset data to file option. You can also specify the name of the file that will contain the result.
To enable logging to file programmatically, set the model configuration parameter LoggingToFile
to on
.
When logging to file is enabled on a model, simulation of that model streams logged signals directly into the MAT-file. Additionally, if the States or Output configuration parameters are enabled and the Format configuration parameter is set to Dataset
, those values are streamed into the same MAT-file.
Simulate Model
Call the sim
command to simulate the model.
To specify the name of the Dataset
object to hold the result of signal logging, set the SignalLoggingName
configuration parameter to topOut
.
To specify the name of the resulting MAT-file, set the LoggingFileName
configuration parameter to top.mat
. Set the StopTime
configuration parameter to 5000 seconds. Note that the stop time will be a much larger value for most big data simulations, which results in many more data samples to log.
sim('sldemo_mdlref_bus','SignalLoggingName','topOut',... 'LoggingToFile','on','LoggingFileName','top.mat',... 'StopTime','5000');
Create DatasetRef
Object to Reference Logged Dataset Within MAT-File
Use a DatasetRef
object to reference the resulting Dataset
object in the logged MAT-file. By using a DatasetRef
object, the referenced MAT-file is not loaded into memory. The DatasetRef
object is a light wrapper object for referencing a Dataset
object that is stored in a file. Alternatively, if you call the load
function on this file, the software loads the entire file into memory, which might not be possible if this Dataset
object contains big data.
dsr = Simulink.SimulationData.DatasetRef('top.mat','topOut');
Obtain Reference to Logged Signal
You can use { } indexing of DatasetRef
objects to reference individual signals within a Dataset
object without loading the signals into memory. For example, to reference the seconds signal, enter this command.
sig2 = dsr{2};
The Values
field of sig2
is a SimulationDatastore
object, which is a lightweight reference to the data of signal 2, stored on disk.
sig2.Values
ans = SimulationDatastore with properties: ReadSize: 100 NumSamples: 50001 FileName: '/tmp/Bdoc24b_2679053_690045/tpe1cce742/simulink_features-ex31145726/top.mat' Data Preview: Time Data _______ ______ 0 sec 1 5 0.1 sec 1 5 0.2 sec 2 6 0.3 sec 2 6 0.4 sec 3 7 : :
Obtain More References to Other Logged Signals
You can use some of the logged signals as inputs to the simulation of the referenced model. Create lightweight references for each of the logged signals that connect to the input ports of the Model block. In this example, buses connect to the Model block input ports. The resulting Values
fields are structures of SimulationDatastore
objects. Each structure reflects the hierarchy of the original bus.
counterbus = dsr{1}; incrementbus = dsr{3};
Create Dataset Object to Use as Simulation Input
Specify the input signals to a simulation using a Dataset
object. Each element in this Dataset
object provides input data to the Inport block that corresponds to the same index. Create an empty Dataset
object named ds
. Then, place the references to the logged signals into the Dataset
object as elements number one and two.
Use { } indexing on the Dataset
object to order elements.
ds = Simulink.SimulationData.Dataset; ds{1} = counterbus; ds{2} = incrementbus;
Within each element of the Dataset
object, you can mix references to signal data, such as SimulationDatastore
objects, with in-memory data, such as timeseries
objects. To change one of the upper saturation limits from 30 to 37, enter this command.
ds{1}.Values(2).limits.upper_saturation_limit = timeseries(int32(37),0);
Stream Input Data into Simulation
Open the referenced model sldemo_mdlref_counter_bus
as a top model, outside of the context of the model sldemo_mdlref_bus
.
open_system('sldemo_mdlref_counter_bus');
Simulate the model sldemo_mdlref_counter_bus
. Use the Dataset
object named ds
as input.
out = sim('sldemo_mdlref_counter_bus', ... 'LoadExternalInput','on','ExternalInput','ds');
The data that is referenced by SimulationDatastore
objects is streamed into the simulation without overwhelming the system.
The data for the upper saturation limit is not streamed because that signal is specified as an in-memory timeseries. The change in saturation limit is reflected at around time 6 in the scope. The signal now saturates to a value of 37 instead of 30.
Analyze Logged Data Incrementally in MATLAB®
SimulationDatastore
objects let you analyze the logged data incrementally in MATLAB. Return to the reference to the second logged signal and assign the datastore to a new variable to simplify access to it.
dst = sig2.Values;
SimulationDatastore
objects allow incremental reading of the referenced data. The reading is done in chunks and is controlled by the ReadSize
property. The default value for ReadSize
is 100 samples. Each sample for a signal is the data logged for a single time step of simulation. Change the ReadSize
value to 1000. Each read of the datastore returns a timetable representation of the data.
dst.ReadSize = 1000; tt = dst.read;
Each read on the datastore advances the read counter. You can reset this counter and start reading from the beginning.
dst.reset;
Use SimulationDatastore
objects for incremental access to the logged simulation data for big data analysis. You can iterate over the entire data record and chunks.
while dst.hasdata next_chunk = dst.read; end
Consider Longer Simulations
This example shows how logging to persistent storage streams data from the first simulation into a MAT-file. A second simulation then streams the data from that file as input.
A more realistic big data example would have a larger value for the model StopTime
configuration parameter, resulting in a larger logged MAT-file. The second simulation could also be configured for a longer stop time. However, even with the larger data files for output and input, the memory requirements for the longer simulations remain the same.
See Also
matlab.io.datastore.SimulationDatastore
| Simulink.SimulationData.DatasetRef