Manipulating read metadata
Read fields
Reads have several fields that store metadata. Which fields are available depends on the type of input reads you provide.
Input format |
Fields |
FASTA (.fa , .fasta ) |
seq: Str |
The sequence of the read. |
id: Str |
The read ID. Everything from > to the first whitespace in the description line. |
desc: Str |
The description line. Everything after the read ID. |
|
FASTQ (.fq , .fastq ) |
seq: Str |
The sequence of the read. |
id: Str |
The read ID. Everything from @ to the first whitespace in the description line. |
desc: Str |
The description line. Everything after the read ID. |
qual: Str |
The quality string. |
|
SAM (.sam , .bam ) |
id | qname: Str |
The read ID. Defaults to '*' if unavailable. |
flag: Num |
Combination of bitwise flags. |
rname: Str |
Name of the reference sequence to which the read is aligned. Defaults to '*' if unavailable or unmapped. |
pos: Num |
1-based leftmosed mapping position of the first CIGAR operation that consumes a reference base. Defaults to 0 if unmapped. |
mapq: Num |
Mapping quality. |
cigar: Str |
CIGAR string. |
rnext: Str |
Reference sequence name of the primary alignment of the next read in the template. Defaults to '*' if unavailable or unmapped. |
pnext: Num |
1-based position of the primary alignment of the next read in the template. Defaults to 0 if unavailable. |
tlen: Num |
Signed observed template length. Defaults to 0 if unavailable. |
seq: Str |
The sequence of the read. |
qual: Str |
The quality string. |
desc: Str |
The optional fields, as a single string. |
|
Paired (any formats) |
r1: Read |
The first read, from the file given as mandatory argument. |
r2: Read |
The second read, from the file given via the --paired-with optional argument. |
|
Reverse-complementing
Reads can be reverse-complemented. For FASTA reads, this reverse-complements the sequence. For FASTQ and SAM reads, it additionally reverses the quality score, so that the bases still correctly correspond to quality score characters.
# reverse complement each read
-read.out!('reversed.fq')
Tagging
Information can be added to the description line of reads, such as a UMI or barcode, or any other data derived from processing with matchbox.
# tag each read with its length
read.tag('length={read.seq.len()}')
info
|
The string given to tag is appended to the read's desc , separated by a space. To change this prefix (e.g. when adding an optional field to a SAM read, which are tab-delimited), the optional argument prefix can be set.
|