Removing unrecognised characters and superfluous data from a string
Show older comments
I have a set of data I have imported as a string, however when it is imported it fills in blank rows with arrow characters, as shown below:
NC(N)=N→59.0717→C→"5N3→-1.71840001642704→0.022000011056661606→75.89000129699707"→→→→→→→→→→→→"
"→→→→→→→→→→→→→"
"OC(=O)CCC(O)=O→118.08764→C4→"6O4→-0.6665999889373779→-0.4739999994635582→74.5999984741211"→→→→→→→→→→→→"
"→→→→→→→→→→→→→"
"CC(=O)NC1=CC=C(O)C=C1→151.16446→C8→"9NO2→1.0175999999046325→-1.6619999147951603→49.32999897003174"→→→→→→→→→→→→"
"→→→→→→→→→→→→→"
"NC(N)=N→59.0717→C→"5N3→-1.71840001642704→0.022000011056661606→75.89000129699707"→→→→→→→→→→→→"
"→→→→→→→→→→→→→"
I can't remove these by the method I would normally use below:
Data(Data == "→→→→→→→→→→→→→") = []
In the original file they appear to be blank lines, in between the actual lines of text.
Is there a way to remove these?
11 Comments
@Duncan: Remember that just like with numeric data, how text data is displayed is not the same thing as the data saved in a file! Those arrows look like a perfectly normal representation of a horizontal tab character (commonly called "tab") that is commonly used by text editors. It is unlikely that your file contains "arrow" characters, but we cannot check this unless you upload a sample file.
"In the original file they appear to be blank lines, in between the actual lines of text."
That would be consistent with them being horizontal tab characters.
"Is there a way to remove these?"
Most likely yes, but without information about the file format, how you import that data, a sample file, etc., you didn't give us much to work with.
Jan
on 16 Jul 2019
Where do you see the arrows? Some text editors display TABs as arrows.
"I can't remove these by the method I would normally use below:" - Why not? Please explain what happens and what you expect instead.
Duncan
on 16 Jul 2019
@Duncan: the file appears to be a (rather broken) Tab-Separated Values text file, so all of the data fields are also separated by tabs. If you remove all of the tab characters, then your data values will not be separated by any delimiter. Is that really what you want to do?
"In the original file they appear to be blank lines, in between the actual lines of text."
Not when I look at the file: every single line is contained within one pair of double quotes. Double quotes are not "blank".
To be honest the format of that file is a mess: there are escaped double quotes spread randomly around the file, every line is contained within double quotes, and those lines of tab characters have no obvious purpose... can you fix the SW that created these awful files?
dpb
on 16 Jul 2019
It may have the suffix .csv, but it is not a csv file -- the "arrows" are tabs -- here's a portion of the input file as shown by a file dump utility--
0000 0000 ef bb bf 43 6f 6c 75 6d 6e 31 0d 0a 22 53 74 72 Column1.."Str
0000 0010 75 63 74 75 72 65 09 6d 77 09 4d 46 09 6c 6f 67 ucture.mw.MF.log
0000 0020 50 09 6c 6f 67 53 09 50 6f 6c 61 72 20 73 75 72 P.logS.Polar sur
0000 0030 66 61 63 65 20 61 72 65 61 09 09 09 09 09 09 09 face area.......
0000 0040 09 09 09 09 09 09 22 0d 0a 22 09 09 09 09 09 09 ......".."......
0000 0050 09 09 09 09 09 09 09 22 0d 0a 22 4e 5b 43 40 40 .......".."N[C@@
0000 0060 09 22 22 5d 28 43 43 43 28 4e 29 3d 4f 29 43 28 .""](CCC(N)=O)C(
0000 0070 4f 29 3d 4f 09 31 34 36 2e 31 34 35 34 09 43 35 O)=O.146.1454.C5
0000 0080 22 22 09 22 22 31 30 4e 32 4f 33 09 2d 33 2e 37 "".""10N2O3.-3.7
0000 0090 35 36 34 30 30 30 37 38 35 33 35 30 38 09 2d 30 5640007853508.-0
0000 00a0 2e 34 39 36 39 39 39 39 39 39 35 30 38 32 36 31 .496999999508261
0000 00b0 37 09 31 30 36 2e 34 30 39 39 39 39 38 34 37 34 7.106.4099998474
0000 00c0 31 32 31 31 22 22 09 09 09 09 09 09 09 09 09 09 1211""..........
the 09 bytes are tabs. There are some non-ASCII characters at the beginning of the file "ef bb bf" -- not sure what those are about in what otherwise does appear to be a text file.
Duncan
on 16 Jul 2019
"There are some non-ASCII characters at the beginning of the file "ef bb bf" -- not sure what those are about in what otherwise does appear to be a text file."
@Duncan: please upload the original file, before you made any changes at all, i.e. the raw file that was output by some badly-written third-party SW. We will see what we can do...
dpb
on 16 Jul 2019
@Stephen -- I figured something of the sort but not aware-enough of the specific header bytes to recognize from whence they came.
Duncan
on 16 Jul 2019
Duncan
on 17 Jul 2019
Accepted Answer
More Answers (0)
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!