Machine Learning: Use cross-validation between time series
4 views (last 30 days)
Show older comments
Hello,
I am working on the following task:
Given: about 30 series of measurements. Each one includes measurements until the break of a system. I have divided the data points of each lifetime in 5 classes (A = data points in the section 0-20% of lifetime, B = 20-40% of lifetime, ... E = 80-100% of lifetime).
Goal: I want to determine in which state a concrete system is.
Solution: I have used the function "fitcauto" to train many classification algorithms and to choose the best one. However, there is a problem: The algorithm uses cross-validation. Thereby, it divides the input data into training and validation data. The problem is, that this division is made measuring series overlapping. This means there are data points of a specific series of measurements in both training data and test data. However, this training task is too easy, because the algorithm just has to interpolate the missing sections. If it sees after training a completely new series, it will perform very badly. The solution I want to try is to do the cross-validation at the level of the measurement series. This means the data points of one series are all either in the training or validation data.
Question: Is this type of cross-validation possible with MATLAB, especially with the "fitcauto"-function? If yes, how? If no, is there an alternative MATLAB function?
1 Comment
Magsud Hasanov
on 22 Jul 2022
Hi Paul,
I am also working now on time series forecast and I've been looking for matlab cross validation implementation, as well.
Hope we'll find the answer.
All the best,
Magsud
Answers (1)
Ayush Aniket
on 11 Jun 2025
You can use grouped cross-validation using the cvpartition function, which ensures that all data points from a single measurement series remain in either the training or validation set. Refer the following documentation and code snippet below: https://www.mathworks.com/help/stats/cvpartition.html#mw_9d9b6de7-30dc-4a1c-9349-370602efa9f2
% Assume 'SeriesID' is a column indicating the measurement series
K = 10; % Number of folds
seriesGroups = unique(SeriesID); % Unique measurement series
cvp = cvpartition(length(seriesGroups), 'KFold', K); % Grouped cross-validation
% Prepare training and test sets based on grouped partition
for i_fold = 1:K
testSeries = seriesGroups(cvp.test(i_fold)); % Test series
trainSeries = seriesGroups(cvp.training(i_fold)); % Train series
% Select data points belonging to the respective series
trainIdx = ismember(SeriesID, trainSeries);
testIdx = ismember(SeriesID, testSeries);
trainX = X(trainIdx, :);
trainY = Y(trainIdx);
testX = X(testIdx, :);
testY = Y(testIdx);
% Train model using fitcauto
trainedModel = fitcauto(trainX, trainY);
% Evaluate model on test set
predictions = predict(trainedModel, testX);
accuracy(i_fold) = sum(predictions == testY) / length(testY);
end
Additionally, if fitcauto does not support grouped cross-validation directly, you can manually train models using fitcecoc (for multi-class SVM) or fitcensemble (for ensemble learning) while ensuring grouped cross-validation.
trainedModel = fitcecoc(trainX, trainY, 'CVPartition', cvp);
0 Comments
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!