seqsplitpe

Split merged paired-end sequences into separate files

Syntax

seqsplitpe(fastqFile)

seqsplitpe(___,Name,Value)

[outFiles,N]
= seqsplitpe(___)

Description

seqsplitpe(fastqFile) splits merged paired-end sequences from fastqFile into two separate files. Each sequence is split in the middle. The first half of the sequence is saved in the first output file and the other half in the second output file. By default, each output file name consists of the input file name appended with a suffix '_1' or '_2' before the file extension.

example

seqsplitpe(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments.

example

[outFiles,N] = seqsplitpe(___) returns the names of output files in a cell array outFiles. N represents a vector containing the numbers of sequences saved in each output file.

example

Examples

collapse all

Split merged paired-end sequences into separate files

Open Live Script

Split each of the paired-end sequences in half, and store each half in separate output files.

[outFiles, N] = seqsplitpe('SXX123456_merged.fastq');

Check the number of sequences in each output file.

Input Arguments

collapse all

`fastqFile` — Names of FASTQ files with sequence and quality information
character vector | string | string vector | cell array of character vectors

Names of FASTQ files with sequence and quality information, specified as a character vector, string, string vector, or cell array of character vectors.

Example: 'SRR005164_1_50.fastq'

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'OutputSuffix','PairedEnd_split' specifies to use the custom suffix in the output file names.

`OutputDir` — Relative or absolute path to output file directory
character vector | string

Relative or absolute path to the output file directory, specified as a character vector or string. The default is the current directory.

Example: 'OutputDir','F:\results'

`OutputSuffix` — Custom suffix to use in output file names
`''` (default) | character vector | string

Custom suffix to use in the output file names, specified as a character vector or string. It is inserted after the input file name and before the suffix '_1' or '_2'. The default is ''.

Example: 'OutputSuffix','_MisMatches2'

`UseParallel` — Option to perform computations in parallel
`"off"` (default) | `"auto"` | `"on"`

Option to perform computations in parallel using a parallel pool of workers, specified as one of these values:

"off" — Run in serial on the MATLAB^® client.
"auto" — Use a parallel pool if one is open or if MATLAB can automatically create one. If a parallel pool is not available, run in serial on the MATLAB client.
"on" — Use a parallel pool if one is open or if MATLAB can automatically create one. If a parallel pool is not available, throw an error.

If you do not have a parallel pool open and automatic pool creation is enabled, MATLAB opens a pool using the default cluster profile. To use a parallel pool to run computations in MATLAB, you must have Parallel Computing Toolbox™.

Before R2026a: You can specify this argument as true or false only. The default value is false. To run computations in parallel, set this argument to true.

Note

There is a cost associated with sharing large input files across workers in a distributed environment. In some cases, running in parallel may not be beneficial in terms of performance.

Example: 'UseParallel',true

Output Arguments

collapse all

`outFiles` — Output file names
cell array of character vectors

Output file names, returned as a cell array of character vectors. By default, the name of each output file consists of the input file name appended with a suffix '_1' or '_2' before the file extension.

`N` — Number of sequences saved in each output file
vector

Number of sequences saved in each output file, returned as an n-by-1 vector where n is the number of output files. If there are multiple output files, the order within N corresponds to the order of the output files.

Extended Capabilities

expand all

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

seqsplitpe has automatic parallel support.

To run computations in parallel, set the UseParallel argument to "on" or "auto".

Version History

Introduced in R2016b

expand all

R2026a: Enhanced control over parallel execution with `UseParallel` argument

The UseParallel name-value argument now accepts "off", "auto", or "on" instead of true or false. This change gives you more control over when to use a parallel pool for parallel execution.

Specifying the UseParallel argument as true or false is not recommended.

This table shows how to update your code depending on your goal.

Goal	Not recommended	Recommended
Write code that runs on the MATLAB client	`seqsplitpe(fastqFile,UseParallel=false)`	`seqsplitpe(fastqFile,UseParallel="off")` (default)
Write portable code that runs on a parallel pool and, if a pool is not available runs on the MATLAB client.	`seqsplitpe(fastqFile,UseParallel=true)`	`seqsplitpe(fastqFile,UseParallel="auto")`
Write code that runs on a parallel pool and errors if a pool is not available.	N/A	`seqsplitpe(fastqFile,UseParallel="on")`

There are no plans to remove support for true or false values.

seqsplitpe

Syntax

Description

Examples

Split merged paired-end sequences into separate files

Input Arguments

fastqFile — Names of FASTQ files with sequence and quality information character vector | string | string vector | cell array of character vectors

Name-Value Arguments

OutputDir — Relative or absolute path to output file directory character vector | string

OutputSuffix — Custom suffix to use in output file names '' (default) | character vector | string

UseParallel — Option to perform computations in parallel "off" (default) | "auto" | "on"

Output Arguments

outFiles — Output file names cell array of character vectors

N — Number of sequences saved in each output file vector

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

Version History

R2026a: Enhanced control over parallel execution with UseParallel argument

See Also

`fastqFile` — Names of FASTQ files with sequence and quality information
character vector | string | string vector | cell array of character vectors

`OutputDir` — Relative or absolute path to output file directory
character vector | string

`OutputSuffix` — Custom suffix to use in output file names
`''` (default) | character vector | string

`UseParallel` — Option to perform computations in parallel
`"off"` (default) | `"auto"` | `"on"`

`outFiles` — Output file names
cell array of character vectors

`N` — Number of sequences saved in each output file
vector

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

R2026a: Enhanced control over parallel execution with `UseParallel` argument