View the project on GitHub. jakob-schuster/matchbox

Navigation

Types

In matchbox, each variable has a type, and each function has a type signature, only accepting arguments of the expected types. This allows matchbox to catch errors when the program starts, and offer helpful error messages. Most programming languages have types, although not all languages enforce them.

Generally, matchbox users don't need to worry about learning the types. However, understanding them may help you interpret error messages and put together more complex scripts.


Boolean Bool

A Boolean value. Either true or false.

a = true
b = false
# c will be true
c = a or b
# d will be false
d = a and b


if a => {
    # this will get printed
    stdout!('a was true')
}

if d and c => {
    # this will not get printed!
    stdout!('d and c were both true')
}


Numeric Num

A numeric value. Can be negative, can include a decimal component. Represented internally as a 32-bit floating point number.

a = 10000

# b will be -1000
b = a / -10

# c will be -999.99
c = b + 0.01

String Str

A string value, consisting of a sequence of ASCII characters. May represent a nucleotide sequence, or more generally any string data.

Strings must be constructed using single quotes; for example, 'hello world'. Values can be inserted into strings using curly braces {}.

message = 'read named {read.id} is {read.seq.len()} bases'

stdout!(message)

List []

A list of values all belonging to the same type.

Lists can be iterated over when pattern matching, useful when working with barcodes.

# a list of records derived from the CSV header
# in this case, with type [{ name: Str, seq: Str }]
refs = csv('references.csv')

# within a pattern, you can iterate
# over all the values in a list
if read is [_ r.seq _] for r in refs =>
    r.name |> stdout()

Record {}

A set of fields, each of which has a name and stores a value. Used for grouping related data together. Fields can be accessed using a dot.

# creating a record, assigning values to some fields
rec = {
    primer = AAGTCGATGCTAGTG,
    output = 'out.fq',
}

# accessing a field of a record with a dot
if read is [_ rec.primer _] => read.out!(rec.output)

Read Read

Reads are a special kind of record to represent a sequencing read.

Reads contain a seq field, and they can be sliced, reverse-complemented and pattern-matched.

Different input formats will produce different kinds of Read. See Manipulating read metadata.


Advanced types

Only the above types are relevant to matchbox users; the remaining types are only really relevant to those working on the matchbox backend. Don't worry if they seem confusing!


Function (..) -> ..

A function, which can be applied to a sequence of arguments to return a value. Each function expects arguments of a particular type, and guarantees a return value of a particular type.

A function of type (Num, Num) -> Str takes two arguments of type Num and returns a value of type Str.

An example of such a function is:

f = (n1: Num, n2: Num) => '{n1} and {n2}'

Functions can also have optional parameters. A function of the type (Num, opt_arg: Num = 0) -> Num has one mandatory positional argument, and one optional argument called opt_arg, of type Num, with the default value 0.

For two function types to be equivalent:

  • They must have the same number of mandatory positional arguments, and the types of each argument must be equivalent.
  • They must have the same optional named arguments, with equivalent names, types, and values.
  • They must have equivalent return types.

Any Any

A value of any type. Concrete values will always have a more specific type than Any; it merely exists so that some functions can have sufficiently generic type signatures. For example, to_str can convert any value to a Str regardless of its value, so its type is (Any) -> Str.


Type Type

A value which is itself a type. All instances of the above types (e.g. Str, { age: Num }, (Bool, Bool) -> Bool) are themselves values of this type. Hence, you can create type aliases, by assigning a variable to a type:

# aliasing a list of Num type as NumList
NumList = [Num]

Users probably shouldn't worry about this! But treating types as values is helpful in writing functions whose types depend on the evaluation of other functions.

For example, the function csv_ty takes a filename, opens it as a CSV, and returns a record type with each of the CSV's columns as a Str field. Therefore, csv_ty has the type (Str) -> Type.

# header_ty has type Type
# and its value is { first_name: Str, last_name: Str }
# (because those are the headers in this particular CSV)
header_ty = csv_ty('friends.csv')

The csv function loads a CSV and produces a list of records, where the type of each record depends on executing csv_ty on the file. Hence, the type of csv is [csv_ty(filename)] which evaluates to a concrete type in the presence of a specific filename.

# rows has type [{ first_name: Str, last_name: Str }]
# (the result from evaluating csv_ty on this filename,
#  wrapped in a list)
rows = csv('friends.csv')

# this is an error, because we know each row
# does NOT include a 'seq' field
if read is [_ row.seq _] for row in rows =>
    read |> file('trimmed.fq')

We do this so that, if a user loads a CSV and tries to access a field which doesn't exist in their particular CSV, matchbox can give them a type error.