View the project on GitHub. jakob-schuster/matchbox

Navigation

Manipulating read metadata

Read fields

Reads have several fields that store metadata. Which fields are available depends on the type of input reads you provide.

Input format Fields
FASTA
(.fa, .fasta)
seq: Str The sequence of the read.
id: Str The read ID. Everything from > to the first whitespace in the description line.
desc: Str The description line. Everything after the read ID.
FASTQ
(.fq, .fastq)
seq: Str The sequence of the read.
id: Str The read ID. Everything from @ to the first whitespace in the description line.
desc: Str The description line. Everything after the read ID.
qual: Str The quality string.
SAM
(.sam, .bam)
id | qname: Str The read ID. Defaults to '*' if unavailable.
flag: Num Combination of bitwise flags.
rname: Str Name of the reference sequence to which the read is aligned. Defaults to '*' if unavailable or unmapped.
pos: Num 1-based leftmosed mapping position of the first CIGAR operation that consumes a reference base. Defaults to 0 if unmapped.
mapq: Num Mapping quality.
cigar: Str CIGAR string.
rnext: Str Reference sequence name of the primary alignment of the next read in the template. Defaults to '*' if unavailable or unmapped.
pnext: Num 1-based position of the primary alignment of the next read in the template. Defaults to 0 if unavailable.
tlen: Num Signed observed template length. Defaults to 0 if unavailable.
seq: Str The sequence of the read.
qual: Str The quality string.
desc: Str The optional fields, as a single string.
Paired
(any formats)
r1: Read The first read, from the file given as mandatory argument.
r2: Read The second read, from the file given via the --paired-with optional argument.

Reverse-complementing

Reads can be reverse-complemented. For FASTA reads, this reverse-complements the sequence. For FASTQ and SAM reads, it additionally reverses the quality score, so that the bases still correctly correspond to quality score characters.

# reverse complement each read
-read.out!('reversed.fq')

Tagging

Information can be added to the description line of reads, such as a UMI or barcode, or any other data derived from processing with matchbox.

# tag each read with its length
read.tag('length={read.seq.len()}')
info The string given to tag is appended to the read's desc, separated by a space. To change this prefix (e.g. when adding an optional field to a SAM read, which are tab-delimited), the optional argument prefix can be set.