Compare Data from two tables and show how close the comparison is as a percentage
5 views (last 30 days)
Show older comments
Aman Sood
on 13 Apr 2020
Edited: Raymond MacNeil
on 13 Apr 2020
I'm trying to create a function that compares two tables imported into matlab, compares the like for like data points and and then can show the resemblance as a percentage, but after refering to documentation showing different ways to do this, I'm unsure how to proceed.
I have attatched the two data sets being used to this question. Essentially I'd compare the data in the columns that have the same headings (for example GyroX, GyroY, GyroZ) in each file and then want to show how close the data is as a percentage. The data doesn't need to be sorted as each row of data is the next entry in time therefore already in the correct order.
Finally I'm doing everything within a script and not just in the command window so that the script can be run multiple times with different data
Thanks in advance
0 Comments
Accepted Answer
Raymond MacNeil
on 13 Apr 2020
How are you loading in the data? If you're currently using readtable(), I strongly encourage you to instead use textscan(). In doing so you will learn about format specifications for parsing text, which is incredibly useful, and will provide you with skills that generalize to other languages. Moreover, it is much faster when than readtable(). Mind you it's not a huge timecost if you're only dealing with a couple of files.
Here is a function you can use to batch-read in all of your data files. If you run it line-by-line as a script, you'll get a better idea as to what's going on. Generally, it is much easier to manipulate data when it is in matrix form (as opossed to tables), especially if you're only dealing with numeric data!
function out = getMyData(idx)
if nargin < 1 || ~exist('idx', 'var') || isempty(idx)
warning('No argument supplied for idx. Using default values.');
idx = 1:16;
end
if ~isnumeric(idx)
error('idx must be a numeric vector of integers from 1:16')
end
%% Use GUI for file lookup
[fileNames, fileDir]=uigetfile('MultiSelect','On','*.txt');
%% Get the file count
% Account for case of single trial selection
if ischar(fileNames)
fileNames = cellstr(fileNames);
end
FILEcount = numel(fileNames);
%% Determine number and name of headers
fid = fopen(fullfile(fileDir, fileNames{1}),'r');
headerNames = strsplit(fgetl(fid), ', ');
fclose(fid);
%% Specify formatSpec
formSpec = [repelem({'%*f'}, 1, numel(headerNames)), {'\r\n'}];
formSpec(idx) = {'%f'};
% Preallocate cell for data import
out = cell(FILEcount, 1);
% Loop to load in all files needed
for ii = 1:numel(fileNames)
try
fid = fopen(fullfile(fileDir, fileNames{ii}),'r');
out{ii} = single(cell2mat(textscan(fid,[formSpec{:}],...
'HeaderLines', 1, 'Delimiter',',')));
fclose(fid);
catch ME
warning('There was a problem loading file: %s', fileNames{ii});
end
end
end
The input argument, idx allows you to specify variables to be read-in by using the numeric index of the corresponding header, n. If you're finding it difficult to keep track of which variable is which, you can always modify the function to give you the headerNames variable in the output as well.
So, let's say I want to get the percentage ratio of AccelX from IMUREF.txt to the first m observations of AccelX in IMUDATA.txt.
% let's just to convert to matrix for demonstration purposes
% certainly possible to index into the cells, without much difficulty, however
% e.g. data{1}(:,2) for the AccelX column vector of IMUDATA.txt
a = data{1};
b = data{2};
cmpidxs = length(b);
my_X_accel_proportion = b(:,2) ./ a(1:cmpidxs,2)
my_X_accel_difference = b(:,2) - a(1:cmpidxs,2)
Most likely, you're going to want to get some functions together to streamline your analysis pipeline. Use what you can.
2 Comments
Raymond MacNeil
on 13 Apr 2020
Edited: Raymond MacNeil
on 13 Apr 2020
Yes, table2array() was the other thing I was actually going to mention. I work with time-series data very similar to yourself so I adapted that script from something else I previously coded. Really, changed a few lines of code. I thought it would be useful for you because you alluded to the possibility that there were multiple files you needed to read in, and textscan() just gets everything into the workspace so much faster.
I definitely appreciate the that it is handy to work with tables in terms of tracking what's what, and as you've suggested, I will often just use an intermediate table2array() call myself. This is particularly true if I have texual data I want to read-in. But if you can modularize your analysis pipeline, you can use the above function to specify what variables you're working with at any given time, and it's easy to keep track of what you're working with.
Also, you'll find it helpful to write functions for the specific sub-analyses and plots you're generating. You'll be able to recycle these later on for similar projects. Just like I did here. :)
More Answers (0)
See Also
Categories
Find more on Cell Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!