ilmnbslookup
Look up Illumina BeadStudio target (probe) sequence and annotation information
Syntax
AnnotStruct
=
ilmnbslookup(AnnotationFile
, ID
)
AnnotStruct
=
ilmnbslookup(AnnotationFile
, ID
,
'LookUpField', LookUpFieldValue
)
Input Arguments
AnnotationFile | Character vector or string specifying a file name or a path and file name of an Illumina® annotation file (CSV, BGX, or TXT format). If you specify only a file name, that file must be on the MATLAB® search path or in the current folder. Tip You can download Illumina annotation files, such as
|
ID | Character vector, string, string vector, or cell array of character vectors representing a unique identifier(s) for one or more targets (probes) on an Illumina microarray. Tip By default, |
LookUpFieldValue | Field in Tip Set this property so that it corresponds to the |
Output Arguments
AnnotStruct | Structure containing the probe
sequence and annotation information for one or more targets (probes)
specified by
|
Description
returns AnnotStruct
=
ilmnbslookup(AnnotationFile
, ID
)AnnotStruct
,
a structure containing probe sequence and annotation information for
one or more targets (probes) specified by ID
,
and by AnnotationFile
, an Illumina annotation
file (CSV, BGX, or TXT format).
AnnotStruct
contains the same fields
as AnnotationFile
. The fields are described
in the following two tables.
Structure Created from Illumina CSV Annotation File
Field | Description |
---|---|
Search_key | Internal identifier for the target, useful for custom design array |
Target | Unique identifier for the target |
ProbeId | Illumina probe identifier |
Gid | GenBank® identifier for the gene |
Transcript | Illumina internal transcript identifier |
Accession | GenBank accession number for the gene |
Symbol | Typically, the gene symbol |
Type | Probe type |
Start | Starting position of the probe sequence in the GenBank record |
Probe_Sequence | Sequence of the probe |
Definition | Definition field from the GenBank record |
Ontology | Gene Ontology terms associated with the gene |
Synonym | Synonyms for the gene (from the GenBank record) |
Structure Created from a BGX or TXT Annotation File
Field | Description |
---|---|
Accession | GenBank accession number for the gene |
Array_Address_Id | Decoder identifier |
Chromosome | Chromosome on which the gene is located |
Cytoband | Cytogenetic banding region of the chromosome on which the gene associated with the target is located |
Definition | Definition field from the GenBank record |
Entrez_Gene_ID | Entrez Gene database identifier for the gene |
GI | GenBank identifier for the gene |
ILMN_Gene | Illuminainternal gene symbol |
Obsolete_Probe_Id | Probe identifier before BGX annotation files |
Ontology_Component | Gene Ontology cellular components associated with the gene |
Ontology_Function | Gene Ontology molecular functions associated with the gene |
Ontology_Process | Gene Ontology biological processes associated with the gene |
Probe_Chr_Orientation | Orientation of the probe on the NCBI genome build |
Probe_Coordinates | Genomic position of the probe on the NCBI genome build |
Probe_Id | Illuminaprobe identifier |
Probe_Sequence | Sequence of the probe |
Probe_Start | Start position of the probe relative to the 5' end
of the source transcript sequence |
Probe_Type | Information about what the probe is targeting |
Protein_Product | NCBI protein accession number |
RefSeq_ID | Identifier from the NCBI RefSeq database |
Reporter_Composite_map | Information associated with control probes |
Reporter_Group_Name | Information associated with control probes |
Reporter_Group_id | Information associated with control probes |
Search_Key | Internal identifier for the target, useful for custom design array |
Source | Source from which the transcript sequence was obtained |
Source_Reference_ID | Source's identifier |
Species | Species associated with the gene |
Symbol | Typically, the gene symbol |
Synonyms | Synonyms for the gene (from the GenBank record) |
Transcript | Illuminainternal transcript identifier |
Unigene_ID | Identifier from the NCBI UniGene database |
looks
for AnnotStruct
=
ilmnbslookup(AnnotationFile
, ID
,
'LookUpField', LookUpFieldValue
)ID
in the annotation file in the field
specified by LookUpFieldValue
. Default
is the Search_key
field.
Examples
Note
The gene expression file, TumorAdjacent-probe-raw.txt
,
and the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx
,
used in the following examples are not provided with the Bioinformatics Toolbox™ software.
Read the contents of a tab-delimited file exported from the Illumina BeadStudio™ software into a MATLAB structure.
ilmnStruct = ilmnbsread('TumorAdjacent-probe-raw.txt') ilmnStruct = Header: [1x1 struct] TargetID: {22184x1 cell} ColumnNames: {1x37 cell} Data: [22184x37 double] TextColumnNames: {1x23 cell} TextData: {22184x23 cell}
Find the number of the
Search_key
column in theTextColumnNames
cell array, which is returned in theilmnStruct
structure by theilmnbsread
function.srchCol = find(strcmpi('Search_Key',ilmnStruct.TextColumnNames)) srchCol = 1
Look up the probe sequence and annotation information for the 10th entry in the annotation file,
HumanRef-8_V3_0_R0_11282963_A.bgx
.annotation = ilmnbslookup('HumanRef-8_V3_0_R0_11282963_A.bgx',... ilmnStruct.TextData{10,srchCol}) annotation = Accession: 'NM_144670.2' Array_Address_Id: '0004050154' Chromosome: '12' Cytoband: '12p13.31b' Definition: 'Homo sapiens alpha-2-macroglobulin-like 1 (A2ML1), mRNA.' Entrez_Gene_ID: '144568' GI: '74271844' ILMN_Gene: 'A2ML1' Obsolete_Probe_Id: '' Ontology_Component: '' Ontology_Function: 'endopeptidase inhibitor activity [goid 4866] [evidence IEA]' Ontology_Process: '' Probe_Chr_Orientation: '+' Probe_Coordinates: '8920412-8920461' Probe_Id: 'ILMN_2136495' Probe_Sequence: 'TGTAATCGCAGCCCCTTGGAAGGCCAAGGCAGGAGAATCGCCTCAACACT' Probe_Start: '4889' Probe_Type: 'S' Protein_Product: 'NP_653271.2' RefSeq_ID: 'NM_144670.2' Reporter_Composite_map: '' Reporter_Group_Name: '' Reporter_Group_id: '' Search_Key: 'ILMN_17375' Source: 'RefSeq' Source_Reference_ID: 'NM_144670.2' Species: 'Homo sapiens' Symbol: 'A2ML1' Synonyms: [1x141 char] Transcript: 'ILMN_17375' Unigene_ID: ''
Use the ilmnbslookup
function with the 'LookUpField'
property
to look up the annotation information for all targets located on chromosome
12 in the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx
.
chr12annotation = ilmnbslookup('HumanRef-8_V3_0_R0_11282963_A.bgx',... '12','LookUpField','Chromosome') chr12annotation = Accession: {1x1186 cell} Array_Address_Id: {1x1186 cell} Chromosome: {1x1186 cell} Cytoband: {1x1186 cell} Definition: {1x1186 cell} Entrez_Gene_ID: {1x1186 cell} GI: {1x1186 cell} ILMN_Gene: {1x1186 cell} Obsolete_Probe_Id: {1x1186 cell} Ontology_Component: {1x1186 cell} Ontology_Function: {1x1186 cell} Ontology_Process: {1x1186 cell} Probe_Chr_Orientation: {1x1186 cell} Probe_Coordinates: {1x1186 cell} Probe_Id: {1x1186 cell} Probe_Sequence: {1x1186 cell} Probe_Start: {1x1186 cell} Probe_Type: {1x1186 cell} Protein_Product: {1x1186 cell} RefSeq_ID: {1x1186 cell} Reporter_Composite_map: '' Reporter_Group_Name: '' Reporter_Group_id: '' Search_Key: {1x1186 cell} Source: {1x1186 cell} Source_Reference_ID: {1x1186 cell} Species: {1x1186 cell} Symbol: {1x1186 cell} Synonyms: {1x1186 cell} Transcript: {1x1186 cell} Unigene_ID: {1x1186 cell}
The output structure indicates that there are 1,186 targets located on chromosome 12.
Version History
Introduced in R2008a