Main Content

soapread

Read data from Short Oligonucleotide Analysis Package (SOAP) file

Description

SOAPStruct = soapread(File) reads File, a SOAP-formatted file (version 2.15) and returns the data in SOAPStruct, a MATLAB® array of structures.

example

SOAPStruct = soapread(File,Name,Value) specifies additional options using one or more name-value arguments. For example, to read entry 10 of the file, SOAPStruct = soapread(File,BlockRead=10).

example

Examples

collapse all

Read the alignment records (entries) from the sample01.soap file.

data = soapread("sample01.soap")
data=17×1 struct array with fields:
    QueryName
    Sequence
    Quality
    NumHits
    PairedEndSourceFile
    Length
    Strand
    ReferenceName
    Position
    AlignDetails

View the quality score for the 6th entry.

data(6).Quality
ans = 
'<>.>>>8>;:1>>>3>6>'

Determine the strand direction (forward or reverse) of the reference sequence to which the 12th entry aligns

data(12).Strand
ans = 
'-'

Read a block of six alignment records (entries) from the sample01.soap file.

data_5_10 = soapread('sample01.soap',BlockRead=[5 10])
data_5_10=6×1 struct array with fields:
    QueryName
    Sequence
    Quality
    NumHits
    PairedEndSourceFile
    Length
    Strand
    ReferenceName
    Position
    AlignDetails

Input Arguments

collapse all

File to read, specified as a path to a SOAP-formatted file (version 2.15) or as a file name. If you specify only a file name, that file must be on the MATLAB search path or in the current folder.

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: SOAPStruct = soapread(File,BlockRead=10)

The names are case-insensitive. For example, you can use aligndetails instead of AlignDetails.

Indication to include the AlignDetails field in the SOAPStruct output argument, specified as true (include the field) or false (do not include the field).

Example: false

Data Types: logical

Entries to read, specified as a positive integer or as a two-element positive integer vector.

  • To read entry N in File, specify a positive integer N.

  • To read the block of entries starting at N1 and ending at N2, specify a positive integer vector [N1 N2] with N1 < N2. To read all the entries starting at N1, specify Inf for N2.

Example: [10,19]

Data Types: single | double

Output Arguments

collapse all

Sequence alignment and mapping information, returned as an N-by-1 array of structures, where N is the number of alignment records stored in File. Each structure contains the following fields.

FieldDescription
QueryName

Name of aligned read sequence.

SequenceCharacter vector containing the letter representations of the read sequence. It is the reverse-complement if the read sequence aligns to the reverse strand of the reference sequence.
QualityCharacter vector containing the ASCII representation of the per-base quality score for the read sequence. The quality score is reversed if the read sequence aligns to the reverse strand of the reference sequence.
NumHitsThe number of total instances where this read sequence aligned to an identical length of bases on another area of the reference sequence.
PairedEndSourceFileFlag (a or b) specifying which source file to which the read sequence belongs. This field applies only to read sequences that are paired in the alignment.
LengthScalar specifying the length of the read sequence.
Strand+ or − specifying direction (forward or reverse) of reference sequence to which the read sequence aligns.
ReferenceNameName or numeric ID of the reference sequence to which the read sequence aligns.
PositionPosition (one-based offset) of the forward reference sequence where the left-most base of the alignment of the read sequence starts.
AlignDetailsInformation on mismatches, insertions, and deletions in the alignment. For SOAP-formatted files v2.15, this field includes CIGAR strings.

Tips

If your SOAP-formatted file is too large to read using available memory, try either of the following:

  • Use the BlockRead name-value pair arguments to read a subset of entries.

  • Create a BioIndexedFile object from the SOAP-formatted file (using 'TABLE' for the Format), and then access the entries using methods of the BioIndexedFile class.

References

[1] Li, R., Yu, C., Li, Y., Lam, T., Yiu, S., Kristiansen, K., and Wang, J. (2009). SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 15, 1966–1967.

[2] Li, R., Li, Y., Kristiansen, K., and Wang, J. (2008). SOAP: short oligonucleotide alignment program. Bioinformatics 24(5), 713–714.

Version History

Introduced in R2010b