How to import multiple csv files into a single appropriate data structure for processing and plotting

30 views (last 30 days)
Hello! I am pretty new to Matlab and a pretty poor programmer in any language. I am having a tough time navigating all of the different data types and knowing how to choose between them.
Problem Statement: Import multiple csv files of time history data and then filter the data based on certain criteria. Here are some details about the data:
  • Each csv file (in Excel) contains time in column A and a dependent variable (e.g. temperature, state) in column B.
  • For example, temp1.csv contains T1 vs time, temp2.csv contains T2 vs time .... and so on.
  • The time column is identical time for each file (i.e. all sensors sampled at same rate and on same system for same time range).
  • The first row of each csv has a header: "time" and "T1".
  • Most of the csv's contain floating point temperature values but there is also a csv file that contains text values representing the system state: state vs time where state values could be A,B or C (i.e. text). This is the variable I will most frequently need to filter data on.
My desired workflow would be to do something like the following:
1.) Import all csv files in directory into a single array or container or table (or whatever is appropriate for the steps that follow) where column 1 is time and subsequent columns are the corresponding data. Call this workspace variable telem for example.
2.) The header name for each column should be retained.
An example of how telem would look when done:
telem= time T1 T2 STATE
0.25 25.3 55.3 A
0.50 25.47 55.6 B
0.75 25.9 57.9 B
1.00 27.0 60.0 C
and so on
3.) Filter the data into a new workspace variable like so:
telemFiltered = telem.state == B
telemFiltered = 0.50 25.47 55.6 B
0.75 25.9 57.9 B
4.) Plot the filtered data on separate plots for each column in telemFiltered:
plot filtered T1 vs time
plot filtered T2 vs time
Here is what I have so far, but I am not sure if it is the right approach given what I want to accomplish. The directory has 4 csv files in it for this example (i.e. T1.csv, T2.csv ... State.csv):
files = dir('*.csv');
telem = cell(length(files), 1);
for i =1:length(files)
telem{i} = readtable(files(i).name,VariableNamingRule="preserve");
end
The result is a cell array where each of 4 rows is a 9x2 table
It looks like to hold onto the header names I have to keep track of the property telem{i,j). VariableNames.
Question: How can I combine the columns of all the tables in telem so that I get a single 9x5 table: time, T1,T2,T3,State ?
Is this a reasonable approach or would you use different data types.
Thanks for reading.

Answers (1)

Walter Roberson
Walter Roberson on 22 Mar 2024
  2 Comments
casey
casey on 22 Mar 2024
@Walter Roberson thank you for your reply. Forgive my ignorance, but when I read the documentation on outerjoin I am interpreting that when I supply Tleft and Tright, whatever outerjoin is doing under the hood, it is some sort of comparison/test using keyvariables. I failed to mention that my actual .csv files have at minimum 10^6 rows and so perhaps that will be a bottleneck, perhaps not. It would seem though that since I can attest that the keyvariables (time) are all identical that there might be way to bypass having Matlab check for me and simply "slap a new column" on the end. I could also be reading incorrectly (it's been a long week). Either way, I will give this a go and report back. Thank you again.
Voss
Voss on 22 Mar 2024
Edited: Voss on 22 Mar 2024
outerjoin is a good general-purpose approach, and I wouldn't expect any problems with it handling tables with a million rows.
In this case, since you attest that the times are all the same and in the same order in each file, you can do something like this to slap new columns on the end (assuming in each file time is the 1st column and the column to be appended to the full table is the 2nd):
files = dir('*.csv');
T = table();
for i = 1:length(files)
temp = readtable(files(i).name,VariableNamingRule="preserve");
if i == 1
T.time = temp.time;
end
T.(temp.Properties.VariableNames{2}) = temp.(2);
end

Sign in to comment.

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!