You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
How to add leading zero to a letter in matlab
3 views (last 30 days)
Show older comments
I have written a code which will accepts a sequence and convert it to specific numerical sequence. The code is given below--
if true
% code
end
function[xA]=dnatobinary(sample_dna)
for i=1:length(sample_dna)
if sample_dna(i)=='A'|| sample_dna='a'
xA(i)=01;
elseif sample_dna(i)=='T' || sample_dna='t'
xA(i)=11;
end
end
end
Now this code should convert 'ATTTA' into '0111111101'. Instead it is converting it as '11111111' i.e the leading zero for A=01 is missing.please provide appropriate coding.
2 Comments
Jan
on 27 Aug 2017
Edited: Jan
on 27 Aug 2017
This is not twitter: I've removed the # from the tags.
Note that leading zeros of numbers are meaningless and therefore ignored in Matlab. xA(i)=01 is exactly the same as xA(i)=1. If you need the leading zeros, stay at a character vector or use two numbers per element.
Cedric
on 17 Sep 2017
I think that you are confusing floating point doubles, binary codes, and strings. It seems that you want to convert a string into another string, made of '0' and '1' characters. For this I wouldn't use regular expressions but indexing or basic arithmetic, especially if you need to operate quickly on large sequences.
Accepted Answer
Stephen23
on 27 Aug 2017
Edited: Stephen23
on 27 Aug 2017
>> str = 'ATTTA';
>> regexprep(str,{'A','T'},{'01','11'},'ignorecase')
ans = 0111111101
or
>> [~,idx] = ismember(upper(str),'AT');
>> C = {'01','11'};
>> [C{idx}]
ans = 0111111101
You will simply waste a lot of time trying to replicate functionality that already exists.
46 Comments
SUBHAJIT KAR
on 17 Sep 2017
Thanks for the answer. But what should I do if I have to convert a FASTA file into corresponding 2 bit binary sequence governed by A=01,T=11,C=00,G=10. My code is
if true
% code
end
clear all;
close all;
clc
sample = fastaread('sequence.fasta);
sample = struct2cell(sample);
sample = cell2mat(sample(2));
function[x] = dna_binary(sample)
for i= 1:length(sample)
if sample(i) == 'A'
x(i) = 01;
else if sample(i) =='T'
x(i) = 11;
else if sample(i) == 'C'
x(i) = 00;
elseif sample(i) == 'G'
x(i) = 10;
if true
% code
end
This code is not printing the leading zero in case of A and C. It is replacing C with a single zero and A with single one. Can I use the previous suggestions into this code ? or the entire coding has to be changed. please help.
SUBHAJIT KAR
on 18 Sep 2017
Thanks for your suggestions. But if I replace 'str' by'sample' sequence in stephens code then it is showing error. i have to apply this on a given sample sequence. But in stephans program it is applied on a given string called 'str'. It may be very simple. kindly help me with code.
Cedric
on 18 Sep 2017
Edited: Cedric
on 18 Sep 2017
You have to tell us exactly what error you get, and to clarify what you get with:
sample = fastaread('sequence.fasta);
sample = struct2cell(sample);
sample = cell2mat(sample(2));
If possible give an example of what sample is after each one of these three calls. Based on your example, Stephen could only assume that what you call sample is a string. If not, you have to tell us what it is.
SUBHAJIT KAR
on 18 Sep 2017
I have to read a FASTA file which is a sequence of letters A,C,T,G. This sequence is to be converted into a 2 bit binary representation according to the previous rule.I have modified the program as
if true
% code
end
clear all;
close all;
clc
sample = fastaread('sequence.fasta);
sample = struct2cell(sample);
sample = cell2mat(sample(2));
[~,x] = ismember(sample,'ATCG');
c = {'00','01','10','11'}
result= cell2mat(c(x));
There is no output. please help.
Stephen23
on 18 Sep 2017
Edited: Stephen23
on 18 Sep 2017
@SUBHAJIT KAR: These three lines are not good code, and could easily cause some problems for you:
sample = fastaread(...);
sample = struct2cell(sample);
sample = cell2mat(sample(2));
The documentation states that fastaread returns a structure with two fields Header and Sequence, and it makes little sense to combine these into one cell array, especially as the field order is not specified. You would be much better off accessing the field directly by name:
S = fastaread(...);
sample = S.Sequence;
Note also that you can simply concatenate the substrings directly, it is not required to create a cell array and then call cell2mat, i.e. instead of this:
result= cell2mat(c(x));
all you need is this:
result = [c{x}];
Thus better code would be:
S = fastaread(...);
[~,idx] = ismember(S.Sequence,'ATCG');
C = {'00','01','10','11'};
result = [C{idx}];
And you will probably find that it works correctly (once you determine the correct order of string in C ).
SUBHAJIT KAR
on 18 Sep 2017
Edited: SUBHAJIT KAR
on 18 Sep 2017
Thanks a lot. Now the error message showing index exceeds matrix dimentions.It is a good learning experience.
SUBHAJIT KAR
on 19 Sep 2017
Here is the complete code. please suggest the correct one. Here X.fasta is a fasta file which contain sequence of A,C,T,G. Also I want to find the mean amplitude, mean frequency and standard deviation of the Y. Is it possible?
if true
% code
end
S = fastaread(x.fasta);
S = S(7021:15080);
[~,idx] = ismember(S.Sequence,'ATCG');
C = {'00','01','10','11'};
z = [C{idx}];
R = 0.992;
theta = (2*pi)/3;
b = [1 0 -1];
a = [1 -2*R*cos(theta) R^2];
u = filter (b a z);
Y = abs(u).^2;
plot(Y/max(Y));
axis([0 8000 0 1.05]);
SUBHAJIT KAR
on 19 Sep 2017
Now the previous code generates the error as "??? Index exceeds matrix dimensions. "
Error in ==> stephan at 5 sample_dna = sample_dna(7021:15080);
Stephen23
on 19 Sep 2017
Edited: Stephen23
on 19 Sep 2017
@SUBHAJIT KAR: does the FASTA file contain multiple sequences?
Given that S is a structure, what is this supposed to do?:
S = S(7021:15080);
Did you really mean to index into the numeric data vector?:
S.Sequence(7021:15080)
I notice that the error message that you show us
sample_dna = sample_dna(7021:15080);
indicates that the error occurred in some other code which you did not show us. Please show the actual code where the error occurred, as well as the size and class of the variables used on that line.
SUBHAJIT KAR
on 19 Sep 2017
Edited: SUBHAJIT KAR
on 20 Sep 2017
FASTA file conatin a big sequence comprising of A,C,T,G. It may be a sequence 20,000 long . But I want to analyse the portion ranging from 7021 to 15080. I do not want consider the letters previous to 7021 and after 15080. I have given you the entire code I have run.The only change is sample_dna is replaced by S.Is there any way out?
SUBHAJIT KAR
on 27 Sep 2017
ok. I am giving you the entire code as a doc file. Please tell if any other document is needed. Thank you.
SUBHAJIT KAR
on 28 Sep 2017
Sorry. Did you mean the 'sequence.fasta' file ? A fasta file could not be attached in it.
Stephen23
on 29 Sep 2017
@SUBHAJIT KAR: here is what I asked for my earlier comment:
sample = fastaread('sequence.fasta);
save('SUBHAJIT_KAR.mat','sample')
Please provide what I asked for. I did not ask for a FASTA file, or a DOC file, or a text file. I even gave you the two lines of code that you need to use.
SUBHAJIT KAR
on 2 Oct 2017
Thanks for your constant support. But still I am getting error as ??? Attempt to reference field of non-structure array.
Error in ==> stephan at 9 [~,idx] = ismember(sample_dna.Sequence,'ATCG');
Stephen23
on 2 Oct 2017
Edited: Stephen23
on 23 Nov 2017
@SUBHAJIT KAR: I have asked you three times for the same data, which so far you have not provided. This is my fourth time asking for the same data. Please upload the raw data that is imported by fastaread. You will need to do something like this:
sample = fastaread('sequence.fasta);
save('SUBHAJIT_KAR.mat','sample')
Stephen23
on 3 Oct 2017
@SUBHAJIT KAR. The code that I have given in my comments above works perfectly for me:
load('SUBHAJIT KAR.mat')
seq = sample_dna.Sequence(7021:15080);
[~,idx] = ismember(seq,'ATCG');
C = {'00','01','10','11'};
result = [C{idx}];
and checking:
>> result
result = 0101110000010110000001010000000010000111100101010101011111111111010000000000110011100000100000000000000
1010101010110000000100111111111000000011010110110010111111110011000000101010111100110101100001001010011011110101
101010101010111100110100010101... etc
SUBHAJIT KAR
on 3 Oct 2017
Edited: SUBHAJIT KAR
on 3 Oct 2017
Thank you very much. Please help me on this also. I want to filter this result. So I have written this code
if true
% code
end
clear all;
close all;
clc
load('SUBHAJIT KAR.mat');
seq = sample_dna.Sequence(7021:15080);
[~,idx] = ismember(seq,'ATCG');
C = {'00','01','10','11'};
z = [C{idx}];
R = 0.992;
theta = (2*pi)/3;
b = [1 0 -1];
a = [1 -2*R*cos(theta) R^2];
u = filter(b,a,z);
Y = abs(u).^2;
plot(Y/max(Y));
axis([0 8000 0 1.05]);
But running this code generates an error----??? Error using ==> filter Arguments must be single or double.
Error in ==> stephan at 13 u = filter(b,a,z); Please give a solution.
Stephen23
on 3 Oct 2017
Edited: Stephen23
on 3 Oct 2017
The solution is easy: read the error message. And the filter documentation. What kind of inputs does filter accept? (hint: single and double). What is the class of z ? (hint: character).
Is char the same as single or double ? No, it isn't.
Solution: convert from char to double, e.g. using: z-'0'
SUBHAJIT KAR
on 3 Oct 2017
Edited: SUBHAJIT KAR
on 3 Oct 2017
Thanks for helping. I have used m = double(z-'0') for conversion after z = [C{idx}]. Can I write it in a single line? Another problem is that, after the conversion, a single cell is containing a single bit instead of two bit. That is why I am not getting the desired output. Need help.
SUBHAJIT KAR
on 3 Oct 2017
sample_dna = struct2cell(sample_dna); sample_dna = cell2mat(sample_dna(2));
These two functions are used to convert the sequence from structure array to string before numeric conversion.
Stephen23
on 14 Oct 2017
"I have used m = double(z-'0') for conversion after z = [C{idx}]. Can I write it in a single line?"
You can write
z = [C{idx}]-'0';
if you want, if that is what you mean. As you do not use m in your code it is not clear what you want to achieve.
" after the conversion, a single cell is containing a single bit instead of two bit."
This is not clear. What "cell" ? What "bits" ?
SUBHAJIT KAR
on 14 Nov 2017
Thanks a lot for response. A single bit means either '0' or '1'. When I check the the value of z in matlab, it is showing in excel sheet. Here cell refer to each cell of excel sheet. Can I send you screenshot?
SUBHAJIT KAR
on 16 Nov 2017
Please see the screenshot. Here each cell contains one bit instead of two. The value of 'A'= '00' should be occupied in a single cell in order to get desired result.
SUBHAJIT KAR
on 16 Nov 2017
It is possible. Please run the following program you will understand
if true
% code
end
clear all;
close all;
clc
sample_dna = fastaread('sequence_AF099922.fasta');
%save('SUBHAJIT KAR.mat','sample_dna');
sample_dna = struct2cell(sample_dna);
sample_dna = cell2mat(sample_dna(2));
sample_dna = sample_dna(7021:15020);
for i = 1:length(sample_dna)
if sample_dna(i)=='A' || sample_dna(i)=='a'
x(i) = 01;
elseif sample_dna(i)=='T' || sample_dna(i)=='t'
x(i) = 11;
elseif sample_dna(i)=='C' || sample_dna(i)=='c'
x(i) = 00;
elseif sample_dna(i)=='G' || sample_dna(i)=='g'
x(i) = 10;
end
end
Please check the value of x after running the code. But the problem with this code is it is ignoring initial zere in case of '00' and '01'. Please suggest.
Stephen23
on 16 Nov 2017
Edited: Stephen23
on 16 Nov 2017
"But the problem with this code..."
is that it is buggy code that I did not write.
"Please suggest."
I already showed you neater, simpler ways to solve this! You have gone back to using the exact same buggy loop that you were using two months and 32 comments ago. What is the point of asking me for advice?
You should also read Jan Simon's comment from the 17th of Septempber 2017 (two months ago), because your concept will continue to be buggy for as long as you ignore Jan Simon's advice.
"Please run the following program ..."
I do not have your datafile, so I cannot run your code.
"It is possible"
What is possible? Putting two values into one element of a numeric array? Please show me an example of that. I suspect all regular contributors here would be quite interested in how you would achieve that.
...
Note that you can easily use the second method I showed you to return a cell array of char vectors, which will give you exactly those two characters per cell:
>> str = 'ATTTA';
>> [~,idx] = ismember(upper(str),'ATCG');
>> C = {'01','11','00','10'};
>> C(idx)
ans =
'01' '11' '11' '11' '01'
Is that what you are looking for?
SUBHAJIT KAR
on 23 Nov 2017
Yes this code is giving two bit per cell. But when converted into double in order to apply filter it splits and therefore each cell contain only one bit. Thanks for your support. There should be a way out. Please help.I have enclosed the data again.
SUBHAJIT KAR
on 23 Nov 2017
Edited: SUBHAJIT KAR
on 23 Nov 2017
Is this correct code?
if true
% code
end
clear all;
close all;
clc
load('SUBHAJIT KAR.mat');
seq = sample_dna.Sequence(7021:15080);
%z = regexprep(seq,{'A','T','C','G'},{'00','01','10','11'},'ignorecase')-'0';
[val ,idx] = ismember(upper(seq),'ATCG');
C = {'00','01','10','11'};
z = cellfun(@(s)s-'0',C(idx),'uni',0);
z{:};
R = 0.992;
theta = (2*pi)/3;
b = [1 0 -1];
a = [1 -2*R*cos(theta) R^2];
u = filter(b,a,z);
Y = abs(u).^2;
plot(Y/max(Y));
axis([0 8000 0 1.05]);
Stephen23
on 23 Nov 2017
Edited: Stephen23
on 23 Nov 2017
You should always load data into a variable (a structure), just like I showed you in september:
S = fastaread(...);
sample = S.Sequence;
and not just load variables magically into the workspace. Read this to know why:
The input to filter must be a numeric array. You could convert the cell array to numeric like this:
D = C(idx);
D = cellfun(@(s)s-'0',D(:),'uni',0);
M = cell2mat(D);
You need to add comments to your code, because I have no idea what it should be doing. Code comments tell people what code should be doing, and why.
SUBHAJIT KAR
on 23 Nov 2017
Yes it is working and giving desired result. But it looks like one bit per cell. Attaching the output.
Stephen23
on 23 Nov 2017
Edited: Stephen23
on 23 Nov 2017
M is not a cell array, and therefore those are not cells. In MATLAB they are called elements of an array. As has been discussed many times on this forum (and we discussed one week ago on this thread) it is not possible to store two values in one element of a numeric array. If you care to disagree with this, please demonstrate.
I showed you how to define a cell array where each cell contains a two-element numeric vector.
SUBHAJIT KAR
on 23 Nov 2017
I am novice in matlab and hence my knowledge is limited. I said what I saw.Could you please explain why the previous buggy code gives two element in a single cell?
Stephen23
on 23 Nov 2017
Edited: Stephen23
on 23 Nov 2017
"Could you please explain why the previous buggy code gives two element in a single cell?"
I have no idea what code you are referring to: what "previous buggy code" do you mean? As I wrote before, it is perfectly possible to have a two-element numeric array inside one cell of a cell array, if that is what you are referring to.
You seem to be getting confused between cell arrays (which can contain any sized arrays in their cells) and numeric arrays (which contain exactly one value in each element). I would suggest that you revise basic MATLAB array types, and how to access their data:
Note that filter requires its input to be a numeric array.
SUBHAJIT KAR
on 23 Nov 2017
I am talking about this code.
if true
% code
end
clear all;
close all;
clc
sample_dna = fastaread('sequence_AF099922.fasta');
%save('SUBHAJIT KAR.mat','sample_dna');
sample_dna = struct2cell(sample_dna);
sample_dna = cell2mat(sample_dna(2));
sample_dna = sample_dna(7021:15020);
for i = 1:length(sample_dna)
if sample_dna(i)=='A' || sample_dna(i)=='a'
x(i) = 01;
elseif sample_dna(i)=='T' || sample_dna(i)=='t'
x(i) = 11;
elseif sample_dna(i)=='C' || sample_dna(i)=='c'
x(i) = 00;
elseif sample_dna(i)=='G' || sample_dna(i)=='g'
x(i) = 10;
end
end
Stephen23
on 23 Nov 2017
Edited: Stephen23
on 23 Nov 2017
"Could you please explain why the previous buggy code gives two element in a single cell?"
The code that you show does not put "two element in a single cell":
- there is no cell because there is no cell array
- that code sets one element of a numeric array to be one value (i.e. zero, one, ten, or eleven). That was already explained by Jan Simon in a comment to this thread on the 17th of September 2017.
If you simply want to define numeric values like 0, 1, 10, or 11 (i.e. zero, one, ten, or eleven) then you can do this with any numeric class (e.g. double), exactly as that code does.
Instead of asking me to repeated myself, please revise the basic MATLAB array classes. I gave you the link earlier.
SUBHAJIT KAR
on 4 Jun 2018
Edited: Stephen23
on 4 Jun 2018
Sir, I am facing problem computing DFT of the previous sequence. Could you please help me? I have learned a lot from our previous conversation. Also the previous code is running smoothly. Sir please help me out this. Please check the code. The code is to find fft of the given sequence and plot the output PSD. But is is not generating the desired result.
sample_dna = fastaread('sequence_AF099922.fasta');
seq = sample_dna.sequence(7021:15080);
for i = 1:length(sample_dna)
if sample_dna(i)=='A' || sample_dna(i)=='a'
x(i) = 0.1260;
elseif sample_dna(i)=='T' || sample_dna(i)=='t'
x(i) = 0.1335;
elseif sample_dna(i)=='C' || sample_dna(i)=='c'
x(i) = 0.1340;
elseif sample_dna(i)=='G' || sample_dna(i)=='g'
x(i) = 0.0806;
end
end
end
Y = abs(fft(x)).^2;
plot(Y/max(Y));
axis([0 8000 0 1.05])
xlabel('Nucleotide position');
ylabel('Output');
title('Detection of period 3 behaviour using DFT');
The attached file contains the sequence.
Stephen23
on 4 Jun 2018
Edited: Stephen23
on 4 Jun 2018
Using ismember, as I showed in my answer, makes your code simpler and more efficient:
S = fastaread('sequence_AF099922.fasta');
V = S.Sequence(7021:15080); % your uploaded file uses title case.
R = [0.1260,0.1335,0.1340,0.0806];
[~,idx] = ismember(upper(V),'ATCG');
x = R(idx);
"But is is not generating the desired result."
As you did not explain what the "desired result" is, I can only advise you to debug your code:
SUBHAJIT KAR
on 9 Jun 2019
Sir,
Can you please help me with sliding window DFT ? I want to find DFT of a sequence of 8000 points. I want to compute DFT of first 351 points then the window will be slided one pts and the DFT for the next window will be calculated untill the window reach the last point. I want to compute signal to noise ratio of window segment and plot it. I have written this code but it is plotting for the first 351 points.
for idx = 1:351:length(x)
slice = x(1+(idx-1):(idx-1)+351);
spectrum = fft(slice);
signal_noise = max(spectrum)/mean(spectrum);
end
But as the window slided through the length there should be 8000 signal to ratio points which would be plotted against the index. Please help.
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)