Open rows corresponding to a specific value of a big csv file.

18 views (last 30 days)
Hi there,
I have got a big csv file to open (~7GB).
The first column with variablename 'Datenstand' has got rows with 3 unique values. 13122021 2282021 12312020
Is it possible to open all columns but only rows corresponding to the value 2282021?
Thank you in advance for your help.

Answers (1)

larush
larush 2 minutes ago
Hey there BdS
I believe that the issue can be resolved using a combination of datastore and tall arrays. datastore and tall are utilities in MATLAB which allow to process large data, by chunking it into smaller pieces, as well as use parallel computing on this data to speedup computation. You should be careful with using gather, since it may fill up the memory if the filtered data is still large. Below is a sample implementation of how you would achieve the same:
filePath = 'file_name.csv'; % change to your filename
ds = datastore(filePath);
tt = tall(ds);
% filter the tall arrays using the value
result = tt(tt.Datenstand == "2282021", :);
% gather may fill up the memory
resultInMemory = gather(result);
writetable(resultInMemory, 'filtered_data.csv');
In order to understand datastore and tall arrays, you can refer to the following documentation links:

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!