MATLAB Answers

D. Ali
0

extract data from EEG text file

Asked by D. Ali
on 27 Apr 2019
Latest activity Edited by Cedric Wannaz
on 2 May 2019
I need help to write script to exatrct MCAP samples with time it occured in seaerate file and plot so I can use these sampes ton signal procsing application on maltlb this is only art of the data , the file contains tens of CAP samples so need genaral code to exatrrct them
Time Date Sample # Type Sub Chan Num Aux
[22:16:05.000 01/01/2007] 0 " 0 0 0 ## time resolution: 256
[22:16:05.000 01/01/2007] 0 0 0 0
[22:34:35.000 01/01/2007] 284160 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:35:05.000 01/01/2007] 291840 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:35:35.000 01/01/2007] 299520 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:36:05.000 01/01/2007] 307200 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:36:35.000 01/01/2007] 314880 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:37:05.000 01/01/2007] 322560 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:37:35.000 01/01/2007] 330240 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:38:05.000 01/01/2007] 337920 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:38:35.000 01/01/2007] 345600 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:39:05.000 01/01/2007] 353280 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:39:35.000 01/01/2007] 360960 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:40:05.000 01/01/2007] 368640 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:40:35.000 01/01/2007] 376320 " 0 0 0 SLEEP-S0 30 W ROC-LOC
[22:41:05.000 01/01/2007] 384000 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:41:35.000 01/01/2007] 391680 " 0 0 0 SLEEP-S1 30 S1 ROC-LOC
[22:41:37.000 01/01/2007] 392192 " 0 0 0 MCAP-A3 17 S1 EEG-F4-C4
[22:41:57.000 01/01/2007] 397312 " 0 0 0 MCAP-A3 9 S1 EEG-F4-C4
[22:42:05.000 01/01/2007] 399360 " 0 0 0 SLEEP-S2 30 S2 ROC-LOC
[22:42:13.000 01/01/2007] 401408 " 0 0 0 MCAP-A3 11 S2 EEG-F4-C4
[22:42:28.000 01/01/2007] 405248 " 0 0 0 MCAP-A3 23 S2 EEG-F4-C4
[22:42:35.000 01/01/2007] 407040 " 0 0 0 SLEEP-S2 30 S2 ROC-LOC
[22:42:57.000 01/01/2007] 412672 " 0 0 0 MCAP-A3 10 S2 EEG-F4-C4
[22:43:05.000 01/01/2007] 414720 " 0 0 0 SLEEP-S2 30 S2 ROC-LOC
[22:43:11.000 01/01/2007] 416256 " 0 0 0 MCAP-A2 6 S2 EEG-F4-C

1 Answer

Answer by Cedric Wannaz
on 27 Apr 2019
Edited by Cedric Wannaz
on 28 Apr 2019
 Accepted Answer

Using the data text file that you provided elsewhere (renamed and attached to this answer), here is a short example of one way to parse it. Note that it is not the best way, but it is good enough for starting the discussion:
buffer = fileread( 'data01.txt' ) ;
pattern = '\[([^\]]+).\s+(\d+)\s+"\s+(\d+)\s+(\d+)\s+(\d)+\s+MCAP-(\S+)\s+(\S+)\s+(\S+)\s+(\S+)' ;
data = regexp( buffer, pattern, 'tokens' ) ;
data = vertcat( data{:} ) ;
Running it outputs a cell array of 830 rows associated with MCAP entries, as follows:
EDIT 04/28/2019@1:59pm UTC: I updated the pattern so REGEXP extracts all other "numeric" columns.
>> data
data =
830×9 cell array
{'22:41:37.000 01…'} {'392192' } {'0'} {'0'} {'0'} {'A3'} {'17'} {'S1'} {'EEG-F4-C4' }
{'22:41:57.000 01…'} {'397312' } {'0'} {'0'} {'0'} {'A3'} {'9' } {'S1'} {'EEG-F4-C4' }
{'22:42:13.000 01…'} {'401408' } {'0'} {'0'} {'0'} {'A3'} {'11'} {'S2'} {'EEG-F4-C4' }
{'22:42:28.000 01…'} {'405248' } {'0'} {'0'} {'0'} {'A3'} {'23'} {'S2'} {'EEG-F4-C4' }
...
{'07:08:22.000 02…'} {'8175872'} {'0'} {'0'} {'0'} {'A1'} {'8' } {'S4'} {'EEG-F4-C4' }
{'07:11:27.000 02…'} {'8223232'} {'0'} {'0'} {'0'} {'A1'} {'8' } {'S4'} {'EEG-Fp2-F4'}
{'07:12:08.000 02…'} {'8233728'} {'0'} {'0'} {'0'} {'A1'} {'6' } {'S4'} {'EEG-Fp2-F4'}
{'07:18:31.000 02…'} {'8331776'} {'0'} {'0'} {'0'} {'A1'} {'6' } {'S4'} {'EEG-F4-C4' }
{'07:18:53.000 02…'} {'8337408'} {'0'} {'0'} {'0'} {'A1'} {'7' } {'S4'} {'EEG-F4-C4' }
{'07:19:27.000 02…'} {'8346112'} {'0'} {'0'} {'0'} {'A1'} {'15'} {'S4'} {'EEG-F4-C4' }
{'07:20:29.000 02…'} {'8361984'} {'0'} {'0'} {'0'} {'A1'} {'11'} {'S4'} {'EEG-F4-C4' }
{'07:20:48.000 02…'} {'8366848'} {'0'} {'0'} {'0'} {'A1'} {'12'} {'S4'} {'EEG-F4-C4' }
Now depending what you want to accomplish, you may prefer using a TIMETABLE or a TIMESERIES object, or just some conversion of these columns.
So now you should define which part of the data you are interested in, and how you are planning to process it.
Let me know if you have any question.

  21 Comments

Ok, now it makes more sense. I will give it a try tonight (EST).
D. Ali
on 1 May 2019
perfect thanks alot for your time and patience
No problem!
Next issue though: rdmat output arrays that suggest that there are 1e6 samples:
>> [tm,signal,Fs,siginfo]=rdmat('sdb4_edfm');
>> whos
Name Size Bytes Class Attributes
Fs 1x1 8 double
siginfo 1x18 11040 struct
signal 1000000x18 144000000 double
tm 1x1000000 8000000 double
Here you see tm, the vector of times I suppose, that has 1 million elements and the array of signals has 1 million rows (I guess each corresponding to a sample).
Now after converting the sample # from you annotation file to numeric:
buffer = fileread( 'annotations sdb4.txt' ) ;
pattern = '\[([^\]]+).\s+(\d+)\s+"\s+(\d+)\s+(\d+)\s+(\d)+\s+MCAP-(\S+)\s+(\S+)\s+(\S+)\s+(\S+)' ;
annotations = regexp( buffer, pattern, 'tokens' ) ;
annotations = vertcat( annotations{:} ) ;
sampleId = str2double( annotations(:,2) ) ;
I see that sample # (or IDs) up to 8,36,6848, which is way above 1 million. So most of the sample IDs correspond to regions that are outside of the plot ..(?)

Sign in to comment.