Optimizing a Regression Learner App for an Electrochemical NO2 Sensor: Dealing with Drift and Input Variations

2 views (last 30 days)
Hello,
I am currently using the Regression Learner App to develop a GPR Exponential model for my Electrochemical NO2 Sensor. This sensor outputs a voltage, and I use reference data alongside temperature and humidity measurements to train my model.
Initially, after creating a model with the App, I find that the GPR Exponential model aligns reasonably well with the sensor data. However, over time, I have noticed a slight drift in the data. I don't believe that this drift is a result of the sensor itself. Instead, it may be influenced by new combinations of sensor output voltage, temperature, humidity, and reference data values, which the model might not have encountered during the training process.
If I rerun the Regression Learner App to update or create a new GPR Exponential model, the sensor output appears to be accurate again. This leads me to believe that the need to retrain the model might be due to changes in the combination of the input parameters.
Considering the potential for a wide array of different parameter combinations, how can I optimize my model to predict more accurately?
Moreover, could the nature of my temperature input impact the prediction? Specifically, would there be a noticeable difference in the accuracy of predictions if I input the absolute temperature compared to inputting the temperature segmented into smaller blocks?
I'm curious to know if anyone else has had similar experiences with their models? Any insights or suggestions to enhance the performance of my GPR Exponential model would be greatly appreciated.
  6 Comments
dpb
dpb on 8 Sep 2023
Edited: dpb on 9 Sep 2023
A. You can always compute something outside the model range; how accurate it will be is clearly dependent upon how accurate the model is to begin with plus how well it does predict what the response will be outside that range.
Clearly, if a sensor's response were purely linear over the entire range, then it wouldn't matter; a straight line is a straight line. That is never the case in practice; just how nonlinear and how well the fitted model holds is purely up to whatever the particular data/model predict related to what the sensor output actually is for a given input. Polynomials in higher degrees are particular notorious for "blowing up" as a range gets larger; a quadratic term response alone increases by 2X for every 1.4X in input; iow a 40% increase in T would double the predicted sensor output including a quadratic term by that term alone. (38/32)^2 ==> 1.4. Remember the shape of a parabola is always increasing slope magnitude, whether pointing up or down.
B. You clearly can't measure every single combination of all paramters, that's not what experiment design is about. You should, however cover the RANGE of all parameters over the ranges that can exist jointly. Picking that set of points is the subject of experiment design; one method that has been generally found helpful in fitting quadratic response surface models is the central composite design. Again, I recommend to you Box, Hunter and Hunter as an essential background tool to get an idea of the issues and techniques designed to avoid pitfalls.
Dharmesh Joshi
Dharmesh Joshi on 21 Sep 2023
Thanks for the update.
Yes, I can add some additional computing after the mode if necessary. My concern is that if I were to train my model with data with temperature values below 20 degrees, but then, when my model is used in the real world, the temperature becomes 35 degrees, how would the model behave? Would it not know, or would it somehow learn and predict?
If I retrain the model, is it possible to see what new elements are learned from the new data?
I have a large amount of data which is being inputted into the regression learner app, and it becomes very slow. When I want to see the effects of temperature using the "Partial Dependence Plot", can I simply import my model into my script, keep all variables (apart from temperature) static, and observe the effect of varying the temperature value?

Sign in to comment.

Answers (1)

Kaustab Pal
Kaustab Pal on 19 Aug 2024
For the model to work well, it needs to see inputs that are similar to what it saw during training. For example, if the input and output had a linear relationship during training, the model will do well if this relationship stays the same. But if the relationship changes to something like exponential during testing, the model might not perform well, and you'll need to retrain it.
To make your model more accurate, try to gather a large dataset that shows different types of input-output relationships. You can also improve the model by updating it regularly with new data it hasn't seen before.
The way you input temperature data can also affect how well the model works. It's important to use the same method of representing temperature both when training the model and when using it to make predictions. This consistency is key to maintaining accuracy.
I hope this helps clear things up!

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!