Main Content

int2nt

Convert nucleotide sequence from integer to letter representation

Syntax

SeqChar = int2nt(SeqInt)
SeqChar = int2nt(SeqInt, ...'Alphabet', AlphabetValue, ...)
SeqChar = int2nt(SeqInt, ...'Unknown', UnknownValue, ...)
SeqChar = int2nt(SeqInt, ...'Case', CaseValue, ...)

Input Arguments

SeqInt Row vector of integers specifying a nucleotide sequence. For valid integers, see the table Mapping Nucleotide Integers to Letter Codes. Integers are arbitrarily assigned to IUB/IUPAC letters.
AlphabetValue Character vector or string specifying a nucleotide alphabet. Choices are:
  • 'DNA' (default) — Uses the symbols A, C, G, and T.

  • 'RNA' — Uses the symbols A, C, G, and U.

UnknownValue Character to represent unknown nucleotides, that is 0 or integers ≥ 17. Choices are any character other than the nucleotide characters A, C, G, T, and U and the ambiguous nucleotide characters N, R, Y, K, M, S, W, B, D, H, and V. Default is *.
CaseValue Character vector or string specifying the upper or lower case. Choices are 'upper' (default) or 'lower'.

Output Arguments

SeqCharNucleotide sequence specified by a character vector of codes.

Description

SeqChar = int2nt(SeqInt) converts SeqInt, a row vector of integers specifying a nucleotide sequence, to SeqChar, a character vector of codes specifying the same nucleotide sequence. For valid codes, see the table Mapping Nucleotide Integers to Letter Codes.

Mapping Nucleotide Integers to Letter Codes

NucleotideIntegerCode
Adenosine 1A
Cytidine 2C
Guanine 3G
Thymidine 4T
Uridine (if 'Alphabet' set to 'RNA') 4U
Purine (A or G) 5R
Pyrimidine (T or C) 6Y
Keto (G or T) 7K
Amino (A or C) 8M
Strong interaction (3 H bonds) (G or C) 9S
Weak interaction (2 H bonds) (A or T) 10W
Not A (C or G or T)11B
Not C (A or G or T)12D
Not G (A or C or T)13H
Not T or U (A or C or G)14V
Any nucleotide (A or C or G or T or U) 15N
Gap of indeterminate length16-
Unknown (any integer not in table) 0 or ≥ 17* (default)

SeqChar = int2nt(SeqInt, ...PropertyName', PropertyValue, ...) calls int2nt with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

SeqChar = int2nt(SeqInt, ...'Alphabet', AlphabetValue, ...) specifies a nucleotide alphabet. AlphabetValue can be 'DNA', which uses the symbols A, C, G, and T, or 'RNA', which uses the symbols A, C, G, and U. Default is 'DNA'.

SeqChar = int2nt(SeqInt, ...'Unknown', UnknownValue, ...) specifies the character to represent unknown nucleotides, that is 0 or integers ≥ 17. UnknownValue can be any character other than the nucleotide characters A, C, G, T, and U and the ambiguous nucleotide characters N, R, Y, K, M, S, W, B, D, H, and V. Default is *.

SeqChar = int2nt(SeqInt, ...'Case', CaseValue, ...) specifies the upper or lower case. CaseValue can be 'upper' (default) or 'lower'.

Examples

  • Convert a nucleotide sequence from integer to letter representation.

    s = int2nt([1 2 4 3 2 4 1 3 2])
    
    s =
    ACTGCTAGC
    
  • Convert a nucleotide sequence from integer to letter representation and define # as the symbol for unknown numbers 17 and greater.

    si = [1 2 4 20 2 4 40 3 2];
    s = int2nt(si, 'unknown', '#')
    
    s =
    ACT#CT#GC
    

Version History

Introduced before R2006a