how can i change an indice in Matrix as vector?
2 views (last 30 days)
Show older comments
I have sequences as character arrays. I need to search particular characters and change them with vectors(Boolean representations).
So finally i need 3 D matrix.
It worked for one sequences but i have 96000 more. I tried to do with loops but i get error.
Theese are my code for one sequences bu i need to do for 96000 sequences.
I need your help about that issue, Thanks in advance
p1_1=sequences;
% first sequence selected and converted to character array
Chp1_1=char(p1_1(1,:));
% from first character to end of sequences search for every character to replace boolean representation
SeqL = length(Chp1_1);
for i=1:SeqL
X = Chp1_1(1,i)
switch X
case 'A'
M(i,:) = A1;
case 'C'
M(i,:) = C1;
case 'D'
M(i,:) = D1;
case 'E'
M(i,:) = E1;
case 'F'
M(i,:) = F1;
case 'G'
M(i,:) = G1;
case 'H'
M(i,:) = H1;
case 'I'
M(i,:) = I1;
case 'K'
M(i,:) = K1;
case 'L'
M(i,:) = L1;
case 'M'
M(i,:) = M1;
case 'N'
M(i,:) = N1;
case 'P'
M(i,:) = P1;
case 'Q'
M(i,:) = Q1;
case 'R'
M(i,:) = R1;
case 'S'
M(i,:) = S1;
case 'T'
M(i,:) = T1;
case 'V'
M(i,:) = V1;
case 'W'
M(i,:) = W1;
case 'Y'
M(i,:) = Y1;
end
end
4 Comments
Guillaume
on 26 Nov 2019
Edited: Guillaume
on 26 Nov 2019
It's important to use notation that actually reflects your data. Otherwise, the code we give you might not work. It's also important to use the proper notation. Because now, we're left wondering:
- Do you have numbered variables as per your Protein_1, Protein_2, etc.
- Do you have a cell array of char vector as per your "{1,96000}" which is a cell array notation
- Do you have a string array as per your "in the [...] string array"
Answers (3)
Guillaume
on 25 Nov 2019
First, probably the most important thing: numbered or sequentially named variables are always a very bad idea. they always make the code more complicated, not easier, to write. For example, with your protein_1, protein_2, ... protein_96000 you cannot easily apply the same code to each variable, whereas if you just had one variable, for example a cell array called protein, you could just use a loop to apply the same code to each:
for p = 1:numel(protein)
dosomethingwith(protein{p});
end
Same with your horrible switch...case and your A1, C1, etc. You end up rewriting many times the same thing with only one variation, with increased risk that you make a mistake on one line. Computers are very good at doing repetitive things, so why do you end up doing the repetition yourself.
Anything that is numbered or sequentially named should be just one variable that you index instead.
So, with regards to your transformation, first create two variables, the first one the list of letters to transform and the second one what they need to be transformed into, eg:
letters = 'ACDEFGHIKLMNPQSTVWY'.'; %column vector of letters
acid = [1 0 0 0 0;
0 1 0 0 0;
0 0 1 0 0;
0 0 0 1 0;
..etc.
];
For pretty display we could even put them into a table:
map = table(letters, acid);
Now that we have that transforming a sequence of letters into a 2D matrix is trivial:
prot = 'ACDKLMEGAC'; %content and length doesn't matter
[found, whichrow] = ismember(prot, map.letters); %find which row of letters correspond to each letter of prot
assert(all(found), 'some letters of the input are invalid');
transformed = map.acid(whichrow, :); %and use the correspond row of acid instead
%all done!
And assuming protein is the above mentioned cell array where all the sequences are the same length, then:
transformed = zeros(numel(protein{1}, size(map.acid, 2), numel(protein))); %preallocated 3D array
for p = 1:numel(protein)
[found, whichrow] = ismember(protein{p}, map.letters); %find which row of letters correspond to each letter of prot
assert(all(found), 'some letters of protein %d are invalid', p);
transformed(:, :, p) = map.acid(whichrow, :); %and use the correspond row of acid instead
end
See how short the code can be once you don't have numbered variables and use indexing instead?
0 Comments
Philippe Lebel
on 25 Nov 2019
I am not sure what you are trying to do as a whole, but if you want to quickly find where there are occurences of a certain string, use strfind().
a = 'aasdasffwfdasda';
your_sequence_of_bools_for_letter_a = [true false true];
idx = strfind(a,'a')
ans =
1 2 5 12 15
M=cell(1,length(a));
for i=1:length(idx)
M{idx(i)} = your_sequence_of_bools_for_letter_a;
end
Philippe Lebel
on 25 Nov 2019
Now i understand.
Here is a solution that you can easily expand.
clear
protein(1).name = 'A';
protain(1).bool_value = [1 0 0];
protein(2).name = 'B';
protain(2).bool_value = [0 1 0];
protein(3).name = 'C';
protain(3).bool_value = [0 0 1];
protein_name_list = [protein.name];
sequences = ['ABC';'CCC';'CAB'];
M=cell(1,length(sequences));
for i=1:length(sequences)
resulting_bool = [];
sequence = sequences(i,:);
for j = 1:length(sequence)
idx = strfind(protein_name_list, sequence(j));
resulting_bool = [resulting_bool ;protain(idx).bool_value];
end
M{i} = resulting_bool;
end
0 Comments
See Also
Categories
Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!