View the project on GitHub. jakob-schuster/matchbox
In matchbox, each variable has a type, and each function has a type signature, only accepting arguments of the expected types. This allows matchbox to catch errors when the program starts, and offer helpful error messages. Most programming languages have types, although not all languages enforce them.
Generally, matchbox users don't need to worry about learning the types. However, understanding them may help you interpret error messages and put together more complex scripts.
Bool
A Boolean value. Either true
or false
.
a = true
b = false
# c will be true
c = a or b
# d will be false
d = a and b
if a => {
# this will get printed
stdout!('a was true')
}
if d and c => {
# this will not get printed!
stdout!('d and c were both true')
}
Num
A numeric value. Can be negative, can include a decimal component. Represented internally as a 32-bit floating point number.
a = 10000
# b will be -1000
b = a / -10
# c will be -999.99
c = b + 0.01
Str
A string value, consisting of a sequence of ASCII characters. May represent a nucleotide sequence, or more generally any string data.
Strings must be constructed using single quotes; for example, 'hello world'
. Values can be inserted into strings using curly braces {}
.
message = 'read named {read.id} is {read.seq.len()} bases'
stdout!(message)
[]
A list of values all belonging to the same type.
Lists can be iterated over when pattern matching, useful when working with barcodes.
# a list of records derived from the CSV header
# in this case, with type [{ name: Str, seq: Str }]
refs = csv('references.csv')
# within a pattern, you can iterate
# over all the values in a list
if read is [_ r.seq _] for r in refs =>
r.name |> stdout()
{}
A set of fields, each of which has a name and stores a value. Used for grouping related data together. Fields can be accessed using a dot.
# creating a record, assigning values to some fields
rec = {
primer = AAGTCGATGCTAGTG,
output = 'out.fq',
}
# accessing a field of a record with a dot
if read is [_ rec.primer _] => read.out!(rec.output)
Read
Reads are a special kind of record to represent a sequencing read.
Reads contain a seq
field, and they can be sliced, reverse-complemented and pattern-matched.
Different input formats will produce different kinds of Read
. See Manipulating read metadata.
Only the above types are relevant to matchbox users; the remaining types are only really relevant to those working on the matchbox backend. Don't worry if they seem confusing!
(..) -> ..
A function, which can be applied to a sequence of arguments to return a value. Each function expects arguments of a particular type, and guarantees a return value of a particular type.
A function of type (Num, Num) -> Str
takes two arguments of type Num
and returns a value of type Str
.
An example of such a function is:
f = (n1: Num, n2: Num) => '{n1} and {n2}'
Functions can also have optional parameters. A function of the type (Num, opt_arg: Num = 0) -> Num
has one mandatory positional argument, and one optional argument called opt_arg
, of type Num
, with the default value 0
.
For two function types to be equivalent:
Any
A value of any type. Concrete values will always have a more specific type than Any
; it merely exists so that some functions can have sufficiently generic type signatures. For example, to_str
can convert any value to a Str
regardless of its value, so its type is (Any) -> Str
.
Type
A value which is itself a type. All instances of the above types (e.g. Str
, { age: Num }
, (Bool, Bool) -> Bool
) are themselves values of this type. Hence, you can create type aliases, by assigning a variable to a type:
# aliasing a list of Num type as NumList
NumList = [Num]
Users probably shouldn't worry about this! But treating types as values is helpful in writing functions whose types depend on the evaluation of other functions.
For example, the function csv_ty
takes a filename, opens it as a CSV, and returns a record type with each of the CSV's columns as a Str
field. Therefore, csv_ty
has the type (Str) -> Type
.
# header_ty has type Type
# and its value is { first_name: Str, last_name: Str }
# (because those are the headers in this particular CSV)
header_ty = csv_ty('friends.csv')
The csv
function loads a CSV and produces a list of records, where the type of each record depends on executing csv_ty
on the file. Hence, the type of csv
is [csv_ty(filename)]
which evaluates to a concrete type in the presence of a specific filename.
# rows has type [{ first_name: Str, last_name: Str }]
# (the result from evaluating csv_ty on this filename,
# wrapped in a list)
rows = csv('friends.csv')
# this is an error, because we know each row
# does NOT include a 'seq' field
if read is [_ row.seq _] for row in rows =>
read |> file('trimmed.fq')
We do this so that, if a user loads a CSV and tries to access a field which doesn't exist in their particular CSV, matchbox can give them a type error.