View the project on GitHub. jakob-schuster/matchbox

Navigation

Expressions

Expressions are the smallest unit of matchbox syntax.


Variables

A variable name, defined earlier in the script.

Variable names can include letters, numbers and underscores. A name must start with a letter or an underscore. Variable names must be unique, and can't be overwritten.

a = 'hello'
# expressions can refer to bound variables
b = a

# variables bound in patterns are accessible 
# inside the body of the branch
if read is [fst:|5| _] => fst.seq.stdout!()

# but NOT outside it! 
fst.seq.len().average!()

Boolean literals

There are only two Bool literals: true and false.


Numeric literals

Numeric literals can be negative, and can include a decimal component.

a = 10000

# b will be -1000
b = a / -10

# c will be -999.99
c = b + 0.01

String literals

Strings must be constructed using single quotes; for example, 'hello world'. Values can be inserted into strings using curly braces {}.

message = 'read named {read.id} is {read.seq.len()} bases'

stdout!(message)

Record literals

Record literals are a set of curly braces, containing a number of fields. Each field has a name and an expression, separated by equals.

Fields can be accessed using a dot.

# creating a record, assigning values to some fields
rec = {
    primer = AAGTCGATGCTAGTG,
    output = 'out.fq',
}

# accessing a field of a record with a dot
if read is [_ rec.primer _] => read.out!(rec.output)

Function literals

New functions can be defined using function literal syntax. Arguments must be declared with their types, in parentheses, separated by commas. The body of the function comes after an arrow =>. The function's return type is inferred from its body.

Variable names can be assigned to functions, just like any other value.

# a function that  numbers
f = (n1: Num, n2: Num) => n1 * 3

# a function which formats the result of f into a Str
g = (n: Num) => '{n} times two equals {f(n)}'

Functions can also take optional named arguments. These must come after positional arguments in the function definition, and they have a default value which is an expression.

print_both = (v1: Str, v2: Str, separator: Str = ' & ') =>
    '{v1}{separator}{v3}'

Function application

A function can be applied by writing the function name followed by parentheses enclosing a comma-separated list of arguments.

n = len(read.seq)

Alternatively, a function can be applied with the first argument in front and a dot before the function name. All of the remaining arguments are still written inside the parentheses.

# equivalent
n = read.seq.len()

Similarly, the pipe operator |> can also be used to apply functions.

# also equivalent
n = read.seq |> len()

# useful when chaining functions together!
read 
    |> tag('length={len(read.seq)}') 
    |> out!('file.fq')

Some functions take optional named arguments. These must be given after all the mandatory arguments. The optional arguments themselves can then be given in any order.

read.describe(
    { polya = AAAAAAAAAA }, 
    reverse_complement = true
).count!()

Operators

A number of built-in common operators can be used. They are applied prefix or infix as appropriate.

# + and > are both operators
if 10 + 2 > 11 => 'basic maths' |> stdout()

Some operators bind more tightly than others. The full list of operators is given below, from tightest to loosest precedence.

Precedence Operators
0 All other expressions
1
-Num Unary negation
-( Str | Read ) Reverse-complementation
2
Num * Num Multiplication
Num / Num Division
Num % Num
Modulo
3
Num + Num Addition
Num - Num Subtraction
4
Num < Num Less than
Num > Num Greater than
Num <= Num Less than or equal
Num >= Num Greater than or equal
5
Any == Any Equality
Any != Any Inequality
6
Bool and Bool Logical AND
7
Bool or Bool Logical OR