Split merged paired-end sequences into separate files
merged paired-end sequences from
two separate files. Each sequence is split in the middle. The first
half of the sequence is saved in the first output file and the other
half in the second output file. By default, each output file name
consists of the input file name appended with a suffix
the file extension.
additional options specified by one or more
Split each of the paired-end sequences in half, and store each half in separate output files.
[outFiles, N] = seqsplitpe('SXX123456_merged.fastq');
Check the number of sequences in each output file.
N = 2×1 50 50
fastqFile— Names of FASTQ files with sequence and quality information
Names of FASTQ files with sequence and quality information, specified as a character vector, string, string vector, or cell array of character vectors.
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'OutputSuffix','PairedEnd_split'specifies to use the custom suffix in the output file names.
'OutputDir'— Relative or absolute path to output file directory
Relative or absolute path to the output file directory, specified as a character vector or string. The default is the current directory.
'OutputSuffix'— Custom suffix to use in output file names
''(default) | character vector | string
Custom suffix to use in the output file names, specified as a character vector or string. It
is inserted after the input file name and before the suffix
'_2'. The default is
'UseParallel'— Boolean indicating whether to perform computation in parallel
Boolean indicating whether to perform computation in parallel,
For parallel computing, you must have Parallel Computing Toolbox™. If a parallel pool does not exist, one is created automatically when the auto-creation option is enabled in your parallel preferences. Otherwise, computation runs in serial mode.
There is a cost associated with sharing large input files across workers in a distributed environment. In some cases, running in parallel may not be beneficial in terms of performance.
outFiles— Output file names
Output file names, returned as a cell array of character vectors.
By default, the name of each output file consists of the input file
name appended with a suffix
the file extension.
N— Number of sequences saved in each output file
Number of sequences saved in each output file, returned as an n-by-
where n is the number of output files. If there
are multiple output files, the order within
to the order of the output files.
To run in parallel, set
For more information, see the
'UseParallel' name-value pair argument.