Loop through the DNA array and record all of the locations of the triplets (codons): ‘AAA’, ‘ATC’ and ‘CGG’.

6 views (last 30 days)
My code so far is functional, but I don't think that it's correct. I am supposed to loop through the cell array and record the locations of each codon, while skipping over the ones that contain a character from preceding codon. For example, if part of the sequence contains [A,T,C,C,G,G] then the section with CCG should be skipped. I'm just not entirely sure what the best way to do that would be.
Here is what I have so far:
fid = fopen('sequence_long.txt','r')
A = textscan(fid,'%3s');
DNA = A{1};
fclose(fid);
i = 1;
%loops through array and counts codon occurrences
%finds the index location of individual codons
while i < length(DNA)
i = i + 1;
if strcmp(DNA(i),'AAA')
num_AAA = nnz(strcmp(DNA,'AAA'));
loc_AAA = find(strcmp(DNA,'AAA'));
elseif strcmp(DNA(i),'ATC')
num_ATC = nnz(strcmp(DNA,'ATC'));
loc_ATC = find(strcmp(DNA,'ATC'));
elseif strcmp(DNA(i),'CGG')
num_CGG = nnz(strcmp(DNA,'CGG'));
loc_CGG = find(strcmp(DNA,'CGG'));
end
end
fprintf('The number of AAA values is: %.f',num_AAA)
fprintf('The index location of AAA values: %.f\n',loc_AAA(1:10))
fprintf('The number of ATC values is: %.f',num_ATC)
fprintf('The index location of ATC values: %.f\n',loc_ATC(1:10))
fprintf('The number of CGG values is: %.f',num_CGG)
fprintf('The index location of CGG values: %.f\n',loc_CGG(1:10))

Accepted Answer

Sai Veeramachaneni
Sai Veeramachaneni on 17 Nov 2020
Edited: Sai Veeramachaneni on 17 Nov 2020
One workaround is to iterate over the sequence and skip the next two characters whenever we find a codon.
You can look at the below code for your reference.
DNA = 'AAATCATCGGCGGATC';%Example sequence
i = 1;
loc_AAA = [];
loc_ATC = [];
loc_CGG = [];
num_AAA = 0;
num_ATC = 0;
num_CGG = 0;
while i <= length(DNA)-2
if DNA(i)=='A' && DNA(i+1)=='A' && DNA(i+2)=='A'
loc_AAA = [loc_AAA i];
num_AAA = num_AAA + 1;
i = i + 3; %Skip the next two characters
elseif DNA(i)=='A' && DNA(i+1)=='T' && DNA(i+2)=='C'
loc_ATC = [loc_ATC i];
num_ATC = num_ATC + 1;
i = i + 3;
elseif DNA(i)=='C' && DNA(i+1)=='G' && DNA(i+2)=='G'
loc_CGG = [loc_CGG i];
num_CGG = num_CGG + 1;
i = i + 3;
else
i = i + 1;
end
end

More Answers (0)

Categories

Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!