Technical Articles

Predictive Maintenance Using a Digital Twin

By Steve Miller, MathWorks


When industrial equipment breaks, the resulting problem often is not the cost of replacing that equipment but the forced downtime. A production line standing still may mean thousands of dollars lost every minute. Performing regular maintenance can help avoid unplanned downtime, but it does not guarantee that equipment will not fail.

What if the machine could indicate when one of its parts was about to fail? What if the machine could even tell you which part needed to be replaced? Unplanned downtime would be reduced considerably. Planned maintenance would be performed only when necessary rather than at fixed intervals. This is the goal of predictive maintenance: avoiding downtime by using sensor data to predict when maintenance is necessary.

At the heart of developing any predictive maintenance algorithm is sensor data, which can be used to train a classification algorithm for fault detection. Meaningful features are extracted from this data in a preprocessing step and used to train a machine learning algorithm for predictive maintenance. This algorithm is exported to simulation software such as Simulink® for verification and then deployed as code to the control unit of the machine.

It is not always possible to acquire data from physical equipment in the field under typical fault conditions. Permitting faults to occur in the field may lead to catastrophic failure and result in destroyed equipment. Generating faults intentionally under more controlled circumstances may be time-consuming, costly, or even unfeasible.

A solution to this challenge is to create a digital twin of the equipment and generate sensor data for various fault conditions through simulation. This approach enables engineers to generate all sensor data needed for a predictive maintenance workflow, including tests with all possible fault combinations and faults of varying severity.

This article discusses the design of a predictive maintenance algorithm for a triplex pump using MATLAB®, Simulink, and Simscape™ (Figure 1). A digital twin of the actual pump is created in Simscape and tuned to match measured data, and machine learning is used to create the predictive maintenance algorithm. The algorithm needs only the outlet pump pressure to recognize which components or combinations of components are about to fail.

Figure 1. Predictive maintenance workflow.

Figure 1. Predictive maintenance workflow.

Building the Digital Twin

A triplex pump has three plungers driven by a crankshaft (Figure 2). The plungers are laid out so that one chamber is always discharging, making the flow smoother, reducing pressure variation, and thereby lowering material strain as compared with a single-piston pump. Typical failure conditions of such a pump are worn crankshaft bearings, leaking plunger seals, and blocked inlets.

Figure 2. Triplex pump schematic and plot showing volumetric flow rate.

Figure 2. Triplex pump schematic and plot showing volumetric flow rate.

CAD models for pumps, which are often available from the manufacturer, can be imported into Simulink and used to build a mechanical model of the pump for 3D multibody simulation. To model the dynamic behavior of the system, the pump now needs to be complemented by the hydraulic and electric elements.

Some of the parameters needed for creating a digital twin, such as bore, stroke, and shaft diameter, can be found in the manufacturer’s data sheet, but others may be missing or are specified only in terms of ranges. In this example, we need the upper and lower pressures at which the three check valves feeding the outlet will open and close. We do not have exact values for these pressures, as they depend on the temperature of the fluid transported.

The plot in Figure 3 shows that simulating the pump with rough estimates (blue line) does not sufficiently match the field data (black line). The blue line resembles the measured curve to some extent, but the differences are obviously great.

Figure 3. Estimating parameters using measured data.

Figure 3. Estimating parameters using measured data.

We use Simulink Design Optimization™ to automatically tune the parameter values so that the model will generate results that match the measured data. The parameters selected for optimization are found in the Check Valve Outlet block in Simscape (Figure 4). Simulink Design Optimization selects parameter values, runs a simulation, and calculates the difference between the simulated and measured curves. Based on this result, new parameter values are selected, and a new simulation is run. The gradients of the parameter values are calculated to determine the direction in which the parameter should be adjusted. Convergence is achieved quickly in this example, since only two parameters were tuned. For more complex scenarios with more parameters, it is important to use capabilities that will accelerate the tuning process.

Figure 4. Tuning parameter values in Simscape.

Figure 4. Tuning parameter values in Simscape.

Creating the Predictive Model

Now that we have a digital twin of our pump, the next step is to add the behavior of failed components to the model.

There are various ways to add fault behavior. Many Simulink blocks have dropdown menus for typical faults such as short or open circuits. Simply changing parameter values can model effects such as friction or fading. In this example, three fault types will be considered: increased friction due to a worn bearing, a reduced passage area caused by a blocked inlet, and seal leakage at the plungers. The first two faults require the adjustment of block parameters. To model leakage, we need to add a path to the hydraulic system.

As shown in Figure 5, the selected fault conditions can be switched on and off either from a user interface or from the command line in MATLAB. In the model presented here, all fault conditions are toggled using MATLAB commands. This way, the whole process can be automated using scripts.

Figure 5. Modeling leakage in the triplex pump.

Figure 5. Modeling leakage in the triplex pump. Parameters can be modified using the Pump block dialog box (top) or the command line (bottom).

In the simulation of the pump shown at the top of Figure 6, two faults have been enabled: a blocked inlet and a seal leakage at plunger 3. These faults are indicated by the red circles. The plot in Figure 6 shows the simulation results for outlet pressure both as a continuous line (blue) and sampled with noise (yellow). The data generated by the simulation must include quantization effect noise because we need to train our fault detection algorithm with data that is as realistic as possible.

Figure 6. Pump schematic showing the blocked inlet and seal leakage, and plot of the outlet pressure simulation and sampled with noise.

Figure 6. Top: Pump schematic showing the blocked inlet and seal leakage. Bottom: Plot of the outlet pressure simulation (blue line) and sampled with noise (yellow line).

The green box in Figure 6 indicates the normal value range for outlet pressure. There are spikes clearly leaving the normal range, indicating some fault. This plot alone would tell an engineer or operator that something is wrong with the pump, but it is still impossible to judge exactly what the fault is.

We use this simulation to generate pressure data for the pump under all possible combinations of fault conditions. Approximately 200 scenarios were created for the digital twin. Each scenario must be simulated numerous times to account for quantization effects in the sensor. Since this approach requires several thousand simulations, we want to be able to speed up the data generation process.

One typical approach is to distribute simulations across the threads available on multicore machines or across several machines or computer clusters. Depending on the complexity of the problem, time constraints, and resources, this approach is supported by Parallel Computing Toolbox™ and MATLAB Parallel Server™.

Another approach is to use the Fast Restart feature in Simulink, which takes advantage of the fact that many systems require a certain settling time until a steady state is achieved. With Fast Restart, this portion of the test needs to be simulated only once. All subsequent simulations will start from the point where the system has reached steady state. In the current example, the settling time would make up about 70% of the simulation time required for a single test (Figure 7). Consequently, about two-thirds of the simulation time can be saved using Fast Restart. Since Fast Restart can be configured from the MATLAB command line and from scripts, it is perfectly suited to automating the training process.

Figure 7. Using the Fast Restart feature in Simulink to reduce simulation time.

Figure 7. Using the Fast Restart feature in Simulink to reduce simulation time.

The next step is to use the simulation results to extract training data for the machine learning algorithm. Predictive Maintenance Toolbox™ provides various options for extracting training data. Since the signal we are looking at here is a periodic one, a fast Fourier transform (FFT) appears most promising. As shown in Figure 8, the result is a small number of clearly separated spikes of different magnitudes for individual faults as well as for fault combinations. This is the kind of data that a machine learning algorithm can handle very well. 

Figure 8. Using a fast Fourier transform to extract training data.

Figure 8. Using a fast Fourier transform to extract training data.

The FFT results for each fault scenario are extracted to a table containing the inserted faults plus the observed signal frequencies and magnitudes. As a result, the number of parameters to consider is comparably small.

Now that all the data required for training a fault detection algorithm is available, it can be imported into Statistics and Machine Learning Toolbox™. We will use a subset of the generated data to verify the trained algorithm.

We visualize the results of the training process in Statistics and Machine Learning Toolbox. These visualizations enable us to compare the strengths and weaknesses of different algorithms and determine whether additional training data is needed. We select the trained algorithm that achieved the highest accuracy for determining the pump fault from the measured data. We import that algorithm into the digital twin for verification using seven test cases saved for this purpose (Figure 9). As the final results show, the classification algorithm is able to detect all seven scenarios securely. It is now ready for deployment on the control unit.

Figure 9. Exporting the most accurate model for verification.

Figure 9. Exporting the most accurate model for verification.

A real-world application of this workflow is industrial equipment that will be used across the world under widely divergent environmental conditions. Such equipment may be subject to change: A new seal or valve supplier may be selected, or the pump may be operated with various kinds of fluids or in new environments with different daily temperature ranges. All these factors affect the pressure measured by the sensor, possibly making the fault detection algorithm unreliable or even useless. The ability to quickly update the algorithm to account for new conditions is critical for using this equipment in new markets.

The workflow described here can be automated using scripts in MATLAB, and most of the work can be reused. The only step that needs to be repeated is data acquisition under conditions comparable to those the pump will face in the field.

With the latest advances in smart interconnectivity, it will even be possible for machine makers to deliver equipment to customers with provisional settings, remotely collect data under actual onsite conditions, train the fault detection algorithm, and then remotely redeploy it to the machine. This will open up new customer support opportunities, including the retraining of fault detection on equipment that has been in use for some time under site-specific conditions. The insights gathered on numerous machines will benefit both customers and manufacturers.

Summary

Predictive maintenance helps engineers determine exactly when equipment needs maintenance. It reduces downtime and prevents equipment failure by enabling maintenance to be scheduled based on actual need rather than a predetermined schedule. Often it is too costly or even impossible to create the fault conditions necessary for training a predictive maintenance algorithm on the actual machine. A solution to this challenge is to use field data from the fully working machine to tune a physical 3D model and create a digital twin. The digital twin can then be used to design a predictive maintenance detection algorithm for deployment to the controller of the actual equipment. The process can be automated, enabling quick adjustment to varying conditions, materials handled, and equipment configurations.

Published 2019