extract data from EEG text file

I need help to write script to exatrct MCAP samples with time it occured in seaerate file and plot so I can use these sampes ton signal procsing application on maltlb this is only art of the data , the file contains tens of CAP samples so need genaral code to exatrrct them
Time Date Sample # Type Sub Chan Num Aux
[22:16:05.000 01/01/2007] 0 " 0 0 0 ## time resolution: 256
[22:16:05.000 01/01/2007] 0 0 0 0
[22:34:35.000 01/01/2007] 284160 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:35:05.000 01/01/2007] 291840 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:35:35.000 01/01/2007] 299520 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:36:05.000 01/01/2007] 307200 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:36:35.000 01/01/2007] 314880 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:37:05.000 01/01/2007] 322560 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:37:35.000 01/01/2007] 330240 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:38:05.000 01/01/2007] 337920 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:38:35.000 01/01/2007] 345600 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:39:05.000 01/01/2007] 353280 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:39:35.000 01/01/2007] 360960 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:40:05.000 01/01/2007] 368640 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:40:35.000 01/01/2007] 376320 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:41:05.000 01/01/2007] 384000 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:41:35.000 01/01/2007] 391680 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:41:37.000 01/01/2007] 392192 " 0 0 0 MCAP-A3 17 S1 EEG-F4-C4
[22:41:57.000 01/01/2007] 397312 " 0 0 0 MCAP-A3 9 S1 EEG-F4-C4
[22:42:05.000 01/01/2007] 399360 " 0 0 0 SLEEP-S2 30 S2 ROC-LOC
[22:42:13.000 01/01/2007] 401408 " 0 0 0 MCAP-A3 11 S2 EEG-F4-C4
[22:42:28.000 01/01/2007] 405248 " 0 0 0 MCAP-A3 23 S2 EEG-F4-C4
[22:42:35.000 01/01/2007] 407040 " 0 0 0 SLEEP-S2 30 S2 ROC-LOC
[22:42:57.000 01/01/2007] 412672 " 0 0 0 MCAP-A3 10 S2 EEG-F4-C4
[22:43:05.000 01/01/2007] 414720 " 0 0 0 SLEEP-S2 30 S2 ROC-LOC
[22:43:11.000 01/01/2007] 416256 " 0 0 0 MCAP-A2 6 S2 EEG-F4-C

 Accepted Answer

Cedric
Cedric on 27 Apr 2019
Edited: Cedric on 28 Apr 2019
Using the data text file that you provided elsewhere (renamed and attached to this answer), here is a short example of one way to parse it. Note that it is not the best way, but it is good enough for starting the discussion:
buffer = fileread( 'data01.txt' ) ;
pattern = '\[([^\]]+).\s+(\d+)\s+"\s+(\d+)\s+(\d+)\s+(\d)+\s+MCAP-(\S+)\s+(\S+)\s+(\S+)\s+(\S+)' ;
data = regexp( buffer, pattern, 'tokens' ) ;
data = vertcat( data{:} ) ;
Running it outputs a cell array of 830 rows associated with MCAP entries, as follows:
EDIT 04/28/2019@1:59pm UTC: I updated the pattern so REGEXP extracts all other "numeric" columns.
>> data
data =
830×9 cell array
{'22:41:37.000 01…'} {'392192' } {'0'} {'0'} {'0'} {'A3'} {'17'} {'S1'} {'EEG-F4-C4' }
{'22:41:57.000 01…'} {'397312' } {'0'} {'0'} {'0'} {'A3'} {'9' } {'S1'} {'EEG-F4-C4' }
{'22:42:13.000 01…'} {'401408' } {'0'} {'0'} {'0'} {'A3'} {'11'} {'S2'} {'EEG-F4-C4' }
{'22:42:28.000 01…'} {'405248' } {'0'} {'0'} {'0'} {'A3'} {'23'} {'S2'} {'EEG-F4-C4' }
...
{'07:08:22.000 02…'} {'8175872'} {'0'} {'0'} {'0'} {'A1'} {'8' } {'S4'} {'EEG-F4-C4' }
{'07:11:27.000 02…'} {'8223232'} {'0'} {'0'} {'0'} {'A1'} {'8' } {'S4'} {'EEG-Fp2-F4'}
{'07:12:08.000 02…'} {'8233728'} {'0'} {'0'} {'0'} {'A1'} {'6' } {'S4'} {'EEG-Fp2-F4'}
{'07:18:31.000 02…'} {'8331776'} {'0'} {'0'} {'0'} {'A1'} {'6' } {'S4'} {'EEG-F4-C4' }
{'07:18:53.000 02…'} {'8337408'} {'0'} {'0'} {'0'} {'A1'} {'7' } {'S4'} {'EEG-F4-C4' }
{'07:19:27.000 02…'} {'8346112'} {'0'} {'0'} {'0'} {'A1'} {'15'} {'S4'} {'EEG-F4-C4' }
{'07:20:29.000 02…'} {'8361984'} {'0'} {'0'} {'0'} {'A1'} {'11'} {'S4'} {'EEG-F4-C4' }
{'07:20:48.000 02…'} {'8366848'} {'0'} {'0'} {'0'} {'A1'} {'12'} {'S4'} {'EEG-F4-C4' }
Now depending what you want to accomplish, you may prefer using a TIMETABLE or a TIMESERIES object, or just some conversion of these columns.
So now you should define which part of the data you are interested in, and how you are planning to process it.
Let me know if you have any question.

21 Comments

I atatched the complete file,I run the code you provided and shows the array of data but with next step as you see in complete text file three are events with MCAP like MCAP-A3 17 S1 EEG-F4-C4 until the end of the file whihc I need to extract seperately and I need to plot them samples as well
The code extracts all rows with the MCAP keyword, but does not store the keyword iteself (it records 'A1' and 'A3' instead of 'MCAP-A1' and 'MCAP-A3'). In your file, there are 830 such rows, hence the output.
I don't understand what you mean by "separately". Run the code, have a look at the whole output by displaying the value of variable data, and let me know how you need to group these entries (if needed) and what you need to extract/process further and plot.
thanks alot for your help
The data of CAP was extarxted in data cell correctly as you mentioned , I didn't relze that A1,A2,A3 were CAP data
my point in extracting these samples to plot and dispaly them in signal anaylzer app , when I open the app althogh the cell data in worksapce but it didnt show in the app so I think need extra code to convert them so it can disaplyed in signal app
I used this code:
% plot of the patient with sleep disorder with respect to samples
[tm,signal,Fs,siginfo]=rdmat('sdb4_edfm');
x_sdb = tm;
y_sdb = signal;
figure(2)
plot(x_sdb,y_sdb);
title('Sleep disorder patient 4');
xlabel('samples');
ylabel('sdb signal');
grid on
the data and funtion from physionet , with the code I managed to have all data to be read and loaded in worksapce then easy dispalyed in signal app
I am trying to prepare this data for feature extraction and classifiction steps next
Where did you get the rdmat function? Can you attach it to your question? It seems that it was designed to parse the file already and that you just want to limit the output to MCAP (and not re-implement the whole parsing).
It is function in physionet WFDB tool box Yes the function read all data and it was easy to use it to convert signals to physical and displayed in signal app It might be good idea to edit this code to extract MCAP samples only with the time I thought if I extract from samples text The code with rdmat function need three data files only provided in physionet
rdmat
function varargout=rdmat(varargin) [tm,signal,Fs,siginfo]=rdmat(recordName) Import a signal in physical units from a *.mat file generated by WFDB2MAT. Required Parameters: recorName String specifying the name of the *.mat file. Outputs are: tm A Nx1 array of doubles specifying the time in seconds. signal A NxM matrix of doubles contain the signals in physical units. Fs A 1x1 integer specifying the sampling frequency in Hz for the entire record. siginfo A LxN cell array specifying the signal siginfo. Currently it is a structure with the following fields: siginfo.Units siginfo.Baseline siginfo.Gain siginfo.Description NOTE: You can use the WFDB2MAT command in order to convert the record data into a *.mat file, which can then be loaded into MATLAB/Octave's workspace using the LOAD command. This sequence of procedures is quicker (by several orders of magnitude) than calling RDSAMP. The LOAD command will load the signal data in raw units, use RDMAT to load the signal in physical units. KNOWN LIMITATIONS: This function currently does support several of the features described in the WFDB record format (such as multiresolution signals) : http://www.physionet.org/physiotools/wag/header-5.htm If you are not sure that the record (or database format) you are reading is supported, you can do an integrity check by comparing the output with RDSAMP: [tm,signal,Fs,siginfo]=rdmat('200m'); [tm2,signal2]=rdsamp('200m'); if(sum(abs(signal-signal2)) !=0); error('Record not compatible with RDMAT'); end Written by Ikaro Silva, 2014 Last Modified: November 26, 2014 Version 1.2 Since 0.9.7 %Example: wfdb2mat('mitdb/200') tic;[tm,signal,Fs,siginfo]=rdmat('200m');toc tic;[signal2]=rdsamp('200m');toc sum(abs(signal-signal2)) See also RDSAMP, WFDB2MAT
This isn't the function, but I just downloaded the toolbox from physionet. Now what code are you using for processing the text data file that you attached? The simplest approach will probably consist in post-processing the output to filter out all entries that are not MCAP. But for this I need to be able to import your data.
D. Ali
D. Ali on 28 Apr 2019
Edited: per isakson on 28 Apr 2019
I downloaded all data from PhysioBank ATM. Choose CAP sleep, data length to the end and tool box to download .matlab then I downloaded three files .mat and .hea and .info I chose for example sdb4 subject Once all filed downloaded I can use the code provided to plot and convert to physical signals For importing the data I just used import option and generate script from option provided in import window Hope this part is clear let me know please if you need more explanations
I will give it a try on Monday.
Thanks alot I will stay tuned
Ok, I used the code that you gave above to plot a signal, looked at the content of sdb4_edfm.hea, sdb4_edfm.info, and sdb4_edfm.mat, and they can all be processed fairly easily, but where does the file of annotations come from and how does it relate to the sdb4_edfm files? In the 3 files that I list above, there is no mention of MCAP or CAP.
The annotations of data always given in physionet from same link just choose to export annotations as text instead of .mat files so from text files I thought it will be easy to extract CAP events as you guided me beacuese it is recorded in details as you saw that is why I thought to extract MCAP from text file but plotting these samples and processing them from cell array they were extracted to I could not do it if you can help in this point.thanks
The problem is not technical, but that you don't know (yet) how these files and data are related. As far as I can see, we have a MAT-File with 1 million records, an annotation file with 1889 records, a signal text file with 15360 records, etc.
Unless you are able to define how these data/files are related, it is difficult to implement any approach. In other words, we can load/parse each one of these files, but then what do we do?
Ok if we focus on the annotations text file and with code extracted the MCAP samples into data cell array 833×9 can we plot all MCAP samples with time from this cell data Thanks
I started new quetstion with last issue about plotting from cell
the tiltel General method to plot data from cell array
Where is the data that you want to plot in the cell array? There are several columns, but I don't see "signal" type of data.
The cell array contains strings. We can easily convert these to numeric of they correspond to numbers, or extract first part of the strings if they contain mixed data.
I want to plot sample with time and showing the Aux in the figure as label on resulted figure
i created text file of extarcted CAP to clarify the data
But the problem is not to extract CAP entries, this is technical, we know how to do this.
Currently the problem, at least on my side, is that I still don't understand what you need to do with this. If I pick a series of lines associated with CAP, form the source file:
[22:41:37.000 01/01/2007] 392192 " 0 0 0 MCAP-A3 17 S1 EEG-F4-C4
[22:41:57.000 01/01/2007] 397312 " 0 0 0 MCAP-A3 9 S1 EEG-F4-C4
[22:42:13.000 01/01/2007] 401408 " 0 0 0 MCAP-A3 11 S2 EEG-F4-C4
[22:42:28.000 01/01/2007] 405248 " 0 0 0 MCAP-A3 23 S2 EEG-F4-C4
Ok, now what do I do with this? Are there data to extract from there that need to be converted to numeric for plotting? This file apparently do not contain signal information, so are these lines only defining time stamps for CAP?
If so, where do you need to add these labels? Is it on the plot of the signal(s) that you generate after calling RDMAT?
If so, is it something that is already done for all labels (how?) and you'd like to keep only CAP labels, or is it something that must be implemented from scratch?
in the extracted CAP.text file I attached it shows sample number and time of recording it which is main reopnse of data we wre trying to plot second and 4 th column , for labels i menat to creat kind of conntion with time and AUx fto show on plot itself for extra clarfiction ( if not possible,I just think for more clarfiation )
after calling rdmat we can plot physical signal with .mat files as you saw before but not from cell data that we created after extraction , if we can call data extracted from cell with rdmat then plot it will be great to do
I didn't try to add labels before , just draw signal itself
I just thougt to try adding lables for more clarificaton of the data but wasn't sure if it can be actually done
Ok, now it makes more sense. I will give it a try tonight (EST).
perfect thanks alot for your time and patience
No problem!
Next issue though: rdmat output arrays that suggest that there are 1e6 samples:
>> [tm,signal,Fs,siginfo]=rdmat('sdb4_edfm');
>> whos
Name Size Bytes Class Attributes
Fs 1x1 8 double
siginfo 1x18 11040 struct
signal 1000000x18 144000000 double
tm 1x1000000 8000000 double
Here you see tm, the vector of times I suppose, that has 1 million elements and the array of signals has 1 million rows (I guess each corresponding to a sample).
Now after converting the sample # from you annotation file to numeric:
buffer = fileread( 'annotations sdb4.txt' ) ;
pattern = '\[([^\]]+).\s+(\d+)\s+"\s+(\d+)\s+(\d+)\s+(\d)+\s+MCAP-(\S+)\s+(\S+)\s+(\S+)\s+(\S+)' ;
annotations = regexp( buffer, pattern, 'tokens' ) ;
annotations = vertcat( annotations{:} ) ;
sampleId = str2double( annotations(:,2) ) ;
I see that sample # (or IDs) up to 8,36,6848, which is way above 1 million. So most of the sample IDs correspond to regions that are outside of the plot ..(?)

Sign in to comment.

More Answers (0)

Categories

Asked:

on 27 Apr 2019

Edited:

on 2 May 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!