Main Content


Split merged paired-end sequences into separate files



seqsplitpe(fastqFile) splits merged paired-end sequences from fastqFile into two separate files. Each sequence is split in the middle. The first half of the sequence is saved in the first output file and the other half in the second output file. By default, each output file name consists of the input file name appended with a suffix '_1' or '_2' before the file extension.


seqsplitpe(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments.


[outFiles,N] = seqsplitpe(___) returns the names of output files in a cell array outFiles. N represents a vector containing the numbers of sequences saved in each output file.


collapse all

Split each of the paired-end sequences in half, and store each half in separate output files.

[outFiles, N] = seqsplitpe('SXX123456_merged.fastq');

Check the number of sequences in each output file.

N = 2×1


Input Arguments

collapse all

Names of FASTQ files with sequence and quality information, specified as a character vector, string, string vector, or cell array of character vectors.

Example: 'SRR005164_1_50.fastq'

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'OutputSuffix','PairedEnd_split' specifies to use the custom suffix in the output file names.

Relative or absolute path to the output file directory, specified as a character vector or string. The default is the current directory.

Example: 'OutputDir','F:\results'

Custom suffix to use in the output file names, specified as a character vector or string. It is inserted after the input file name and before the suffix '_1' or '_2'. The default is ''.

Example: 'OutputSuffix','_MisMatches2'

Boolean indicating whether to perform computation in parallel, specified as true or false.

For parallel computing, you must have Parallel Computing Toolbox™. If a parallel pool does not exist, one is created automatically when the auto-creation option is enabled in your parallel preferences. Otherwise, computation runs in serial mode.


There is a cost associated with sharing large input files across workers in a distributed environment. In some cases, running in parallel may not be beneficial in terms of performance.

Example: 'UseParallel',true

Output Arguments

collapse all

Output file names, returned as a cell array of character vectors. By default, the name of each output file consists of the input file name appended with a suffix '_1' or '_2' before the file extension.

Number of sequences saved in each output file, returned as an n-by-1 vector where n is the number of output files. If there are multiple output files, the order within N corresponds to the order of the output files.

Extended Capabilities

Introduced in R2016b