# swalign

Locally align two sequences using Smith-Waterman algorithm

## Syntax

``Score = swalign(Seq1,Seq2)``
``[___, Alignment] = swalign(Seq1,Seq2)``
``[___,___,Start] = swalign(Seq1,Seq2)``
``swalign(___,Name,Value)``

## Description

````Score = swalign(Seq1,Seq2)` returns the optimal local alignment score in bits. The scale factor used to calculate the score is provided by the scoring matrix.```

````[___, Alignment] = swalign(Seq1,Seq2)` returns a 3-by-N character array showing the two sequences, `Seq1,Seq2`, in the first and third rows, and symbols representing the optimal local alignment between them in the second row. The symbol `|` indicates amino acids or nucleotides that match exactly. The symbol `:` indicates amino acids or nucleotides that are related as defined by the scoring matrix (nonmatches with a zero or positive scoring matrix value).```

````[___,___,Start] = swalign(Seq1,Seq2)` returns a 2-by-1 vector of indices indicating the starting point in each sequence for the alignment.```

````swalign(___,Name,Value)`calls `swalign` with optional properties that use property name/property value pairs. You can specify one or more properties in any order.```

## Examples

Locally align two amino acid sequences using the `BLOSUM50` scoring matrix. Return the optimal local alignment score.

`[Score] = swalign('VSPAGMASGYD','IPGKASYD')`
```Score = 8.6667 ```

Locally align two amino acid sequences specifying the `PAM250` scoring matrix and a gap open penalty of `5`. Return the optimal local alignment score in bits and the alignment character array.

```[Score, Alignment] = swalign('HEAGAWGHEE','PAWHEAE',... 'ScoringMatrix', 'pam250',... 'GapOpen',5) ```
```Score = 8 Alignment = 3×6 char array 'GAWGHE' ':|| ||' 'PAW-HE' ```

Locally align two amino acid sequences returning the `Score` in nat units (nats) by specifying a scale factor of `log(2)`. Return the optimal local alignment score in bits and the alignment character array.

```[Score, Alignment, Start] = swalign('HEAGAWGHEE','PAWHEAE',... 'Scale',log(2))```
```Score = 6.4694 Alignment = 3×5 char array 'AWGHE' '|| ||' 'AW-HE' Start = 5 2```

## Input Arguments

Amino acid or nucleotide sequences specified as a structure containing a `Sequence` field, character vector, string, or integer vector. For example:

For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.

Example: `'HEAGAWGHEE','PAWHEAE'`

Example: `'VSPAGMASGYD','IPGKASYD'`

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: `swalign('HEAGAWGHEE','PAWHEAE','Scale',log(2))`

The type of sequence, specified as a character vector or string.

Example: `"AA"`

The scoring matrix used for the local alignment, specified as one of the following:

• `BLOSUM50` — When `Alphabet` is `AA` then the `ScoringMatrix` is `BLOSUM50`.

• `NUC44` — When `Alphabet` is `NT` then the `ScoringMatrix` is `NUC44`.

The above scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the `Scale` property to specify an additional scale factor to convert the output score from bits to another unit.

• Matrix representing the scoring matrix to use for the local alignment, such as returned by the `blosum`, `pam`, `dayhoff`, `gonnet`, or `nuc44` function.

If you use a scoring matrix that you created or was created by one of the above functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix. You can use the `Scale` property to specify a scale factor to convert the output score to another unit.

• `DAYHOFF`

• `GONNET`

• `BLOSUM62`

• `BLOSUM30`

• `BLOSUM35`

• `BLOSUM40`

• `BLOSUM45`

• `BLOSUM50`

• `BLOSUM55`

• `BLOSUM60`

• `BLOSUM65`

• `BLOSUM70`

• `BLOSUM75`

• `BLOSUM80`

• `BLOSUM85`

• `BLOSUM90`

• `BLOSUM100`

• `PAM10`

• `PAM20`

• `PAM30`

• `PAM40`

• `PAM50`

• `PAM60`

• `PAM70`

• `PAM80`

• `PAM90`

• `PAM100`

• `PAM110`

• `PAM120`

• `PAM130`

• `PAM140`

• `PAM150`

• `PAM160`

• `PAM170`

• `PAM180`

• `PAM190`

• `PAM200`

• `PAM210`

• `PAM220`

• `PAM230`

• `PAM240`

• `PAM250`

• `PAM260`

• `PAM270`

• `PAM280`

• `PAM290`

• `PAM300`

• `PAM310`

• `PAM320`

• `PAM330`

• `PAM340`

• `PAM350`

• `PAM360`

• `PAM370`

• `PAM380`

• `PAM390`

• `PAM400`

• `PAM410`

• `PAM420`

• `PAM430`

• `PAM440`

• `PAM450`

• `PAM460`

• `PAM470`

• `PAM480`

• `PAM490`

• `PAM500`

If you need to compile `swalign` into a stand-alone application or software component using MATLAB® Compiler™, use a matrix instead of a character vector or string for `ScoringMatrix`.

Example: `"BLOSUM75"`

Example: `"PAM420"`

Example: `"GONNET"`

Example: `"DAYHOFF"`

The scale factor that is applied to the output score, and controls the units of the output score, specified as any positive value.

For example, if the output score is initially determined in bits, and you enter `log(2)`, then `swalign` returns the `Score` in nats.

If the `ScoringMatrix` property also specifies a scale factor, then `swalign` uses it first to scale the output score, then applies the provided scale factor to rescale the output score.

Before comparing alignment scores from multiple alignments, ensure the scores are in the same units. You can use the `Scale` property to control the units of the output scores.

Example: `5`

Example: `log(2)`

The penalty for opening a gap in the alignment, specified as any positive value.

Example: `16`

Penalty for extending a gap using the affine gap penalty scheme, specified as any positive value.

If you specify this value, `swalign` uses the affine gap penalty scheme, that is, it scores the first gap using the provided `GapOpen` value and scores subsequent gaps using the `ExtendGap`. If you do not specify this value, `swalign` scores all gaps equally, using the `GapOpen` penalty.

Example: `12`

Control the display of the scoring space and winning path of the alignment.

The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences. The color of each (`n1,n2`) coordinate in the scoring space represents the best score for the pairing of subsequences `Seq1(s1:n1)` and `Seq2(s2:n2)`, where `n1` is a position in `Seq1`, `n2` is a position in `Seq2`, `s1` is any position in `Seq1` between `1:n1`, and `s2` is any position in `Seq2` between `1:n2`. The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties.

The winning path is represented by black dots in the scoring space, and it illustrates the pairing of positions in the optimal local alignment. The color of the last point (lower right) of the winning path represents the optimal local alignment score for the two sequences and is the `Score` output returned by `swalign`.

The scoring space visually shows tandem repeats, small segments that potentially align, and partial alignments of domains from rearranged sequences.

Example: `true`

## Output Arguments

Optimal local alignment score in bits.

Example: `8.667`

3-by-N character array showing the two sequences, `Seq1,Seq2`, in the first and third rows, and symbols representing the optimal local alignment between them in the second row.

Example: ` 'AWGHE' '|| ||' 'AW-HE'`

2-by-1 vector of indices indicating the starting point in each sequence for the alignment.

Example: ` 3 2`

## References

[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).

[2] Smith, T., and Waterman, M. (1981). Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197.

## Version History

Introduced before R2006a