Baysig is a modern statistical programming language combining functional programming with Bayesian statistical inference in the style of WinBUGS and related modeling languages. To users of functional languages such as Haskell or Standard ML, Baysig brings powerful statistical inference in arbitrarily complex statistical models while retaining the full uncertainty in inference and prediction. To users of BUGS, WinBUGS, Stan or JAGS, Baysig enhances their ability to compose complex models out of simple models and to describe computations that follow from statistical inference.
In addition to the Baysig language, we have built a web application called BayesHive that allows users with little to no programming experience to build statistical models in the Baysig language and run these models against their own or public data. The easiest way to learn Baysig is probably to build models in BayesHive and then look at and modify the generated Baysig code. Once you've done this, the Baysig documentation will show you how to build even more flexible models.
We have written an informal quick tour of the Baysig language that we suggest you start with if you want to learn Baysig. This reference Manual covers many of the concepts in more detail. In addition, large amounts of example Baysig code can be found in the Baysig test suite.
There is also a list of frequently asked questions with answers.
This and the other help files are themselves written as Baysig documents. You can view the source on the GitHub repository for BayesHive and Baysig documentation.
All Baysig programs are embedded within documents that interleave written language (e.g. in English) with Baysig code that defines functions and statistical models and with questions about the value of variables, probability distributions, and graphical plots.
The interpretation of a line in a Baysig document depends on the first one or two characters.
| Line starts with | What it is | Example |
|---|---|---|
> |
Code | > f x = x+2 |
?> |
Question | ?> f 5 |
| anything else | Text | Hello world! |
When the document is typeset, questions are replaced by their answers. This allows you to build a full report or any document describing a particular analysis within the Baysig environment.
Code is typeset with a red bar. Questions and answers are typeset with a green bar.
myString = "This is a string"
myString ⇒ "This is a string"
Documents are further described in the Documents help file.
Baysig is white-space sensitive like Python and Haskell. Several identifiers (let, case, prob, where) start a block of indented code that terminated only when a symbol that has less indentation than the first symbol of the indented code block.
At the core of Baysig is a functional programming language that should feel very familiar to users of Haskell or Standard ML. But in fact much of the notation comes directly from mathematics. This is because in functional programming expressions have no side effects and therefore there is no need for syntax to describe anything else than essentially mathematical expressions.
Numbers in Baysig are subject to essentially the same arithmetic syntax as in most other programming languages. Standard arithmetic operators such as +, -, *, /, and functions such as sin, cos can be used directly as written in mathematical notation.
| Operator | What it is | Precedence | Example |
|---|---|---|---|
+ |
Addition | 6 | 2+2 |
- |
Substraction | 6 | 1-5 |
* |
Multiplication | 7 | x*5 |
/ |
Division | 7 | 4/2.3 |
- |
Unary minus | -5 | |
^ |
Exponentiation | 8 | 2.5^1.01 |
| Operator | What it is | Example |
|---|---|---|
| exp | natural exponentiation | exp 1 |
| log | natural logarithm | log 3 |
| sin, cos | sine and cosine | sin 4.3 |
When a number is assigned to a variable that variable can be used to represent the number itself in a calculation.
x = 5
y = 3+x
y ⇒ 8
However, since Baysig is a functional programming language, there is no way to change the value of a variable once it has been assigned.
You can write numbers that are very big or very small using the scientific notation. For example, 1e3 = 1000 and 2.1e-3 = 0.0021.
Functions allow you to abstracts out common calculations based on arguments. Simple functions are written with a lightweight syntax that is nearly identical to that found in mathematics
f x = 2+cos(x)
The function is used by applying a value to it, which results in a value that can be calculated by substituting the function's argument name with the value to which the function is applied. It is possible to follow this process along by hand:
f 2.1 = 2+cos(2.1) = 2+ (-0.50485) = 1.49514
In Baysig, there is no need to enclose the functions arguments in parentheses, but you can do so if you like: f 2 = f(2).
Functions can also take multiple arguments:
f x y = x + y
f 5 6 => 11
When using functions with multiple arguments together with parentheses, it is important to wrap each argument in its own parentheses, like this: f(5)(6). Functions can also be used in conjunction with pairs (see below) to support a more traditional syntax for multiple arguments.
Functions can also be declared as infix operators. First, the new operator's precedence and associativity has to be declared on a separate line. For instance,
infixl 7 **
declares that ** is to be treated as an infix operator with:
Left associativity (infixl rather than infixr). Thus a**b**c will be interpreted as (a**b)**c.
Precedence level 7. Since this is higher than that of addition (+, level 6; see table above) but higher than exponentiation (^, level 8), a + b ** c is parsed as a + (b ** c) but a ^ b ** c is parsed as (a ^ b) ** c.
After the fixity declaration, you then need to define the function implementing the operator itself. This is done by surrounding the operator by arguments on the left-hand side, and giving the rule for the result on the right hand side:
x ** y = x + y + 5
This is done in the same way as function definition -- in fact, infix operators are just functions. We could equally have declared:
** = \x -> \y -> x + y + 5
Infix operators can take as arguments and return any type, not just numbers, just like any other function.
A useful operator defined in this way, which is used extensively in Baysig, is the dollar operator ($). This has a very simple defintion:
infixr 0 $
f $ y = f x
The dollar operator $ simply applies the argument to its right to the function on its left. This is only useful in that it prevents us from writing a lot of parentheses. For instance, if we want to calculate
we can write this with either parentheses:
sin(1.1+2.2)
or with the dollar operator
sin $ 1.1+2.2
On the other hand sin 1.1+2.2 would be interpreted as (sin 1.1)+2.2 because function application takes precedence over any infix operator application.
lambda and top level function declarations. let.
variable rules - not start with upper case.
functions as first class entities. higher-order functions.
shadowing.
deceptively simple but very powerful. refer to existing literature.
static type checking and type inference. type declarations
parametric polymophism
types table
Strings in Baysig are always enclosed in double quotes.
x = "This is a String"
soIsThis = "A"
You can't really do much with strings in Baysig for the moment. You can compare two strings for equality, though.
x = "Foo"
y = "Bar"
x==y => False
x==x => True
If you want a number 0 or 1 to be the outcome of an equality test, you have to convert the Boolean to a real numebr.
boolToReal (x==y) => 0.0
boolToReal (x==x) => 1.0
There is a little helper function for that:
x .==. y => 0.0
x .==. x => 1.0
Pairs hold two values of which may be of different types. A pair is created with a comma:
myPair = (1,"Hello World!")
You can retrieve these two elements with the functions fst and snd, respectively.
fst myPair => 1
snd myPair => "Hello World!"
If you want to store more than two elements in this way, you can use nested pairs ((1,(2,False))). However, this quickly becomes unwieldy and you may want to consider using records instead.
Lists are homogenous sequences of values of a given type. That is, a list can contain values of any type but they must all be of the same type. Literal lists can be created by separating values with commas and enclosing them in square brackets: [1,2,3] and ["Hello", "World"] are lists of integers and strings, respectively. The empty list can be constructed by [].
There are several functions in Baysig to operate on lists:
| function name | type | description | example |
|---|---|---|---|
length |
List a -> Int |
length | length [1,2] => 2 |
head |
List a -> a |
first element | head [1,2] => 1 |
take |
Int -> List a -> List a |
first n elements | |
drop |
Int -> List a -> List a |
drop first n elements |
TODO: mutation vs persistence
Records are inhomogeneous collections of values; each value in a record is given a name (the field name) which is used to reference it. Records are built using the following syntax:
{name=>"Tom"; age => 35}
The values corresponding to different fields can therefore have different types. The overall type of the record created above is
{name=>String; age=>Int}
Given a record value, inidividual field values can be accessed with the # operator used as record#field form, for instance:
r : {height => Real; age=>Int}
r = {height => 1.80; age=>42}
h = r#height
In Baysig, we often use lists of records for datasets that might come from a spreadsheet. This representation gives some flexibility when building statistical models in terms of controlling which aspects of a dataset are observed and which are not modelled.
There are some special syntactic constructs to make it easy to work with lists of records.
To transform a list of records, use with to define a transformation based on the individual record. The first argument to with is the list of records. The second argument is an expression that is evaluated in the context of each record where every field value can be accessed directly as a variable given by the field name, i.e. without using the # notation. The new list has the same length as the one that is being transformed.
For example, if we have a dataset of that contains personal information about persons, it might be stored in a variable of the following type:
people : [{name:String; age:Int;height:Real}]
people = [{name=>"John"; age=>34; height=>1.85}, ...]
heights : [Real]
heights = with people height
height_and_age : [(Real,Int)]
height_and_age = with people (height, age)
with can also be written as an infix operator. The following is an equivalent definition
height_and_age = people <#> (height,age)
You might remember <#> by it accessing record fields (#) inside a container (< >), which here is a list. Later, we will see that with and <#> work not only on lists but also on probability distributions.
A related contruct is when and its infix operator equivalent <|>. This allows you to remove some items in a list of records while preserving its type. The first argument is again a list of records. The second argument is an expression, evaluated again in the context of each record being opened such that fields are directly accessible by their name. The final type of this expression must be a boolean (True/False) value. If this is true, the record is retained in the new list, otherwise it is rejected. Following the previous example, we can make a list of voting-age adults from our list of all persons:
voters = when people (age>17)
or equivalently
voters = people <|> age > 17
With and when can be combined, which works best when they are used as infix operators
height_of_voters : [Real]
height_of_voters = people <|> age > 17 <#> height
In functions, let- and case-expressions, in addition to individual variables, more complex patterns can be used to deconstruct complex data into simpler values. As an example, we will use a value v of type (Int, String):
v = (4,"Hello World!")
If we then define a function intended to operate on pair data, we can directly deconstruct the pair into its constituents, for instance:
swap (x,y) = (y,x)
or
swap = \(x,y) -> (y,x)
for swap v => ("Hello World!", 4)
instead of
swap thePair = (snd thePair, fst thePair)
Pairs can contain variable names (as in the pair (x,y) which contains the variables x and y). Instead of variables, literal values can also appear. If the value the pair is being matched against does not match the literal value, and the pair is used in the argument to a function, a runtime error will occur
f (3,x) = "match!"
f v => runtime error
However, patterns can also be used in case expressions which allow multiple patterns to be tested against a value
s = case v of
(1,x) -> "one is the loneliest number"
(2,y) -> "two can be as bad as one"
(3,y) -> "three is a crowd"
(x,y) -> "party time!"
s => "three is a crowd"
If we do not care about the value in a certain place in the pattern, we can use the pattern _ which matches anything. For instance in the above definition of s we did not really care about the values denoted by x and y. For instance, fst is defined as
fst (x,_) = x
Patterns can also be used to match against lists and user-defined data types.
Finally, Baysig features an experimental record wildcard pattern {..} that matches a record and brings all the fields into scope. We previously defined a record r:
r : {height => Real; age=>Int}
r = {height => 1.80; age=>42}
we can now access fields by matching the record against the wildcard pattern
h = (\{..}-> height) r
The constructs with and when are defined in terms of the wildcard pattern.
Baysig has a special pattern that allows the introduction of ordinary differential equations. By adding a D in front a function definition, you are telling Baysig that the right-hand side defines the derivative of the function body. You also need to give an initial value for the function by assigning a value to the varialbe which has "_0" appended to the function name. For instance, to solve the differential equation
would be written in Baysig:
y_0 = 1
D y t = y t + sin t
Here, we used the shorthand form of function that puts the function argument on the left hand side. The following are also valid definitions of y:
D y = \t-> y t + sin t
another differential equation with a simple right hand sinde might be defined as:
D f = sin
Baysig does not currently support higher-order differential equations.
In addition to defining the function and its initial value, the variables dt and tmax also need to be in scope which will define the timestep and the duration of integration. For instance
tmax = 2 dt = 0.001
The value of these variables are taken from the local scope in which the numerical integration is performed, so it can be reassigned locally:
ysig = let dt = 0.01
tmax = 10.0
y_0 = 1
D y t = 1 - y t
in y
The result of defining a differential equation is a function that can be evaluated at any timepoint with standatd function application (e.g. y 0.5) up until tmax. If it is evaluated beyond this timepoint, the program may crash.
definition of wiener and d on rhs of SDE.
Vectors are homogeneous collections of elements.
To build a vector
fillV : Int -> (Int -> a) -> Vector a
v = fillV 5 \i-> 0
w = fillV 3 \i -> cos (unround i)
You can pattern match against a vector:
<x,y,z> = fillV 3 \i -> sin (unround i)
brings into scope variables x, y and z (but no vectors).
Operations on vectors:
| function name | type | description | example |
|---|---|---|---|
dim |
Vector a -> Int |
vector length | dim v => 5 |
! |
Vector a -> Int -> a |
index elements | v ! 2 => 0 |
*^ |
Real -> Vector Real -> Vector Real |
multiply by scalar | 5.1 *^ w |
^+^ |
Vector Real -> Vector Real -> Vector Real |
add two vectors | w ^+^ w |
^*% |
Vector Real -> Matrix Real -> Vector Real |
Vector times Matrix | |
%*^ |
Matrix Real -> Vector Real -> Matrix Real |
Matrix times Vector | |
vcons |
a -> Vector a -> Vector a |
prepend an element | |
vmap |
(a->b) -> Vector a -> Vector b |
apply a function to every element | |
slice |
Int -> Int -> Vector a -> Vector a |
extract a subvector given starting index and length |
in some of these operators, the symbol ^ reminds you on which side of the operator the vector goes.
To build a matrix:
fillM : (Int,Int) -> ((Int,Int) -> a) -> Matrix a
ident n = fillM (n,n) \(i,j) -> if i==j then 1 else 0
mymat = fillM (5,5) \(i,j) -> cos (unround i)+sin(unround j)
Operations on matrices:
| function name | type | description | example |
|---|---|---|---|
mdims |
Matrix a -> (Int,Int) |
matrix size | mdims m => (5,5) |
!! |
Matrix a -> (Int,Int) -> a |
index elements | ident 4 !! (2,2) => 1 |
*% |
Real -> Matrix Real -> Matrix Real |
multiply by scalar | 5.1 *% mymat |
%+% |
Matrix Real -> Matrix Real -> Matrix Real |
add two vectors | mymat %+% mymat |
%*% |
Matrix Real -> Matrix Real -> Matrix Real |
multiply two matrices | mymat %*% mymat |
^*% |
Vector Real -> Matrix Real -> Vector Real |
Vector times Matrix | |
%*^ |
Matrix Real -> Vector Real -> Matrix Real |
Matrix times Vector | |
svd |
Matrix Real -> List (Matrix Real) |
Singular value decomposition | |
transM |
Matrix a -> Matrix a |
transpose | transM mymat |
diag |
Vector a-> Matrix a |
create matrix from diagonal | |
mmap |
(a->b) -> Matrix a -> Matrix b |
apply a function to every matrix element |
In some of these operators, the symbol % reminds you on which side of the operator the matrix goes.
If you use matrix or vector arithmetic with inconsistent dimensions, Terrible Things happen.
Maybe, Either, ()
Definition of List, Bool
pure functions and side effects
Probability distributions in Baysig are first-class entities - they can be build from simpler parts, assigned to variables and passed as arguments to functions. Probability distributions have type Prob a where the type variable a indicates the type of values that can be drawn from the distribution. For instance the standard normal distribution (unormal, with mean 0 and variance 1) has type Prob Real which means that value that represent real numbers (0.1, 3.141 etc) can be drawn from this distribution. The non-standard normal distribution normal has type Real -> Real -> Prob Real, because it takes two arguments, the parameters (mean and variance, respectively).
The fundamental probability distribution in Baysig is unit with no parameters, from which numbers of type Real and between 0 and 1 are drawn. We cannot do very much with probability distributions yet, but at least we can plot samples from them:
distPlot unit
That may not look very uniform but these outputs are approximated at the latest possible point, i.e. when that data is shown to the user in the plot. In calculations and statistical inference, the unit really is uniform.
One simple way to manipulate probability distributions is the function fmap, which applies a function to values drawn from the distribution. Thus the distribution fmap (\x -> x+1.5) unit yields values between 1.5 and 2.5 (because we add 1.5 to the values drawn fron unit).
You can look at distribution in a few more ways, apart from plotting them. If you just ask for a probability distribution on a line in a document using ?>, Baysig will try to summarise it in the output. For real-valued probability distributions, that means printing the mean plus/minus the standard deviation:
fmap (\x -> x+1.5) unit ⇒ 2.00 ± 0.30
The bernoulli distribution is already defined in Baysig's standard library. But if it were not, we could already define it using unit and fmap:
myBernoulli p = fmap (\x-> x < p) unit
Defining distributions in this way is a lot of fun (except for the gamma distribution, which is no fun at all).
If we "ask" for a Boolean probability distribution, Baysig gives you the proportion that is True (based on a finite number of samples):
myBernoulli 0.3 ⇒ 29%
Sometimes we need something more powerful than fmap to build new distributions. Sometimes you want to sequence draws, such that one value is sampled, then another, and then perhaps the two samples are combined. We can use the "prob" and "~" notation for that. For instance, in the definition of unormal based on the Box-Muller transform:
unormal = prob
u1 ~ unit
u2 ~ unit
n = sqrt (-2 * log u1) * cos(2 * pi * u2)
return n
Four things are happening in this code snippet:
the "prob" keyword introduces a block for sequential sampling. This block starts on the next symbol (here u1) and continues until there next appears a symbol with less indentation then u1.
two values (u1 and u2) are drawn independently from the unit distribution
a new value n is calculated in a deterministic way from an equation. This equation is allowed to reference the values drawn from the previous distribution. In the same way, further draws from probability distributions can reference values previously drawn or calculated.
The "prob" block must end in a probability distribution which defines that value of overall distribution (here unormal). The function return takes an expression (here n) that is deterministic, given all the values that have been drawn, and gives a probability distribtion that always returns that value.
Statisticians will recognize the "prob/~" notation as being very similar to the hierarchical statistical notation, but a bit more formal and less ambiguous. Functional programmers will recognize this as the construction of a monad.
This notation also allows us to build some simple utility functions to help build more complex probability distributions. The function repeat can be used to repeatedly draw from independently from identical distributions. If p is a probability distribution of type Prob a, then repeat 10 p is a probability distribution over lists of type a, each of which having 10 elements. For instance, repeat 20 unormal has type Prob (List Real).
| Distribution | Name | Parameters | Value type |
|---|---|---|---|
| uniform | uniform |
lo : Real hi : Real |
Real |
| improper uniform | improper_uniform |
none | Real |
| improper uniform positive | improper_uniform_positive |
none | Real |
| list choice | oneOf |
xs : List a |
a |
| choice 1..n | oneTo |
n : Int |
Int |
| unit normal | unormal |
none | Real |
| normal | normal |
mean : Real variance : Real |
Real |
| binomial | binomial |
n : Int p : Real |
Int |
| bernoulli | bernoulli |
p : Real |
Bool |
| bernoulli01 | bernoulli |
p : Real |
Real |
| exponential | exponential |
lambda : Real |
Real |
| poisson | poisson |
rate : Real |
Int |
| beta | beta |
a : Real b : Real |
Real |
| gamma | gamma |
k : Real theta : Real |
Real |
| inverse gamma | invGamma |
a : Real b : Real |
Real |
| multivariate normal | multiNormal |
mean : Vector Real cov : Matrix Real |
Vector Real |
| wiener process | wiener |
none | Real -> Real |
Baysig also supports stochastic differential equations (SDEs). These are introduced by drawing a sample from the wiener process distribution, defining an initial condition and then introducing the SDE itself.
The function wiener defines a probability distribution over functions which are realisations of the Wiener process. You can sample from this process within the prob and ~ notation. For instance, we may be interested in the distribution of the values of the wiener process at a timepoint 1.0 (dimensionless)
wienersAtOne = prob w ~ wiener return (w 1.0)
wienersAtOne ⇒ -0.00 ± 0.98
This prints the mean plus/minus the standard deviation of the distribution based on a numerical approximation (the theoretical value should be mean 0, standard deviation 1).
A stochastic diffferential equation is then introduced based on the wiener process realisation. As with ordinary differential equations, the initial value need to be defined first by assigning a value (either with equality = or by sampling from a probability distribution with ~) to the name of the function to be integrated with "_0" appended. Then, the pattern d preceeds the function definition on the left hand side and the operator d is used to access the differential of the wiener process realisation on the right hand side
sdeAtOne = prob s_0 ~ gamma 1 1 w ~ wiener d s t = - s t + d w t return (s 1.0)
wienersAtOne ⇒ -0.02 ± 0.92
Note that unlike in the mathematical notation, the deterministic part of the SDE (in the above, - s t) is not multiplied by an infinitesimal timestep (typically written as dt).
Readers with a background in typed programming may at this point be puzzled by how the definitions of differential equations fit in with the type system.
In ordinary differenial equations, the differential operator D on the left-hand side implies that its argument must have type Real -> Real and that the value the pattern D f matches against must also have type Real -> Real. Conceptually, the D operators is itself a function operating on functions, with D having type (Real -> Real) -> (Real -> Real). On the right-hand side of this defintion, the function being defined is in scope, and its value can be read by applying a time value to it.
In stochastic differential equations, the d operator on the left-hand side has the same conceptual type as the D operator for ordinary differential equations. On the right hand side, d also has type (Real -> Real) -> (Real -> Real) and acts as a function rather than a pattern. Thus, in the definition of a SDE:
d s = \t -> a + b * d w t
the wiener process realisation w is a function of type Real -> Real. The application of w to d also has the combined type Real -> Real, and finally this is applied to a timevalue t of type Real giving an overall type of d w t of a single Real number.
sampling
repeat
Baysig is desinged to enable statistical inference in complex models and to enable the exploitation of the result of statistical estimation. Estimation in Baysig is invoked with a single call to a procedure based on (1) a probabilistic model written as a probability distribution and (2) a value representing the observed data. The outcome of statistical estimation is a probability distribution over the parameters in the statistical model, where each parameter is represented as a field in a record. This resulting probability distribution is a representation of what Bayesians call the posterior. The complete call takes this form
posterior <* estimate model observed_data
estimate is not a function, because it needs to "look inside" the definition of the model. It is therefore called using <* instead of =.
The resultant type of running estimate, here assigned to the posterior, is a probability distribution over the unobserved parameters in the model given the date. We explain this in detail below.
Statistical models passed to the estimate procedure have type Prob a and several parameters that are sampled using the ~ notation in the definition of the model, typically with the prob notation. For instance, a model for univariate linear regression may be written as:
univariateLinearRegression = prob
slope ~ normal 0 1
offset ~ normal 0 1
variance ~ gamma 1 1
repeat 20 $ prob
x ~ uniform 0 1
y ~ normal (offset + slope * x) variance
return (x,y)
Since the Prob monad just keeps track of a seed for pseudo-random number generation this definition is equivalent to the following pseudocode in an imperative programming language:
procedure univariateLinearRegression
slope := genNormal(0,1)
offset := genNormal(0,1)
variance := genGamma(1,1)
for i := 1 to 20
x[i] := genUniform(0,1)
y[i] := genNormal(offset + slope * x,variance)
end
return [x,y]
end
In the next section we will see that the defintion univariateLinearRegression also has a second interpretation that enables it to be used in statistical inference.
The distributions from which the model parameters (here slope, offset and variance) are drawn (normal 0 1, normal 0 1 and gamma 1 1) are known in Bayesian statistics as the prior and must have type Prob a where a is the type of the parameter. The priors are therefore an intrinsic part of the model itself rather than being a different aspect.
Sometimes, it may be desirable to use an improper prior, which does not represent a valid probability distribution but can nevertheless be used for inference. Baysig has these improper distributions built in:
improper_uniform: a prior that is flat everywhere, i.e. completely insensitive to the parameter value
improper_uniform_positive: a prior that is flat for any positive value.
Because these are not valid probability distributions, you cannot sample (generate data) from such a model. The model can only be used to estimate parameters. You may later update the model and you can sample from it.
One way in which Baysig differs from BUGS-like languages is that because we are explicitly specifying data-generating mechanism we must give a distribution for the observed independent variables (x in the above example). This is the case even if the independent variables in the data to be analysed have been controlled in a systematic fashion. However, the distribution from which these observed variables are drawn (uniform 0 1) above does not affect the estimation unless that distribution is itself parametrised by other parameters (e.g. uniform lo hi where lo and hi then themselves will be estimated.)
The probability distribution any can be used as an improper distribution for observed data in this case. It has the same defintion as improper_uniform, but has the special effect that estimate will remember the observed values after an update such that replicated data datasets sampled in this way will have the same independent variable values.
The previous example can then be written using improper priors in this way:
improperLinearRegression = prob
slope ~ improper_uniform
offset ~ improper_uniform
variance ~ improper_uniform_positive
repeat 20 $ prob
x ~ any
y ~ normal (offset + slope * x) variance
return (x,y)
We stress that improperLinearRegression is not necessarily better than univariateLinearRegression; the general advice in Bayesian statistics textbooks is that if you have prior knowledge, you should use it.
A value drawn from a probability distribution of type Prob a in the ~ within a prob block denotes a single value of type a. For instance, in
myValueModel = prob
mean ~ improper_uniform
var ~ gamma 1 1
x ~ normal mean var
return x
the type of the overall model is Prob Real and the variable x within the prob-block has type Real because is is drawn from a probability distribution of type Prob Real, namely normal mean var.
Models for several values must explicitly use a looping construct to describe how the different values are related to each other and whether parameters are identical for all values or each value rely on different parameter draws.
The simplest looping construct is repeat, which takes two arguments: an integer representing the number of repetitions, and a probability distribution which is to be repeated (the base distribution). If the base distribution has type a, then the overall type of repeat n base has type Prob (List a), i.e. a probability distribution over lists of values of type a. For instance, repeat 20 (normal 0 1) is a probability distribution over lists of real numbers; each list drawn from this distribution has 20 elements. In statistical jargon, we would say that repeat implies that the elements are independent and indentically distributed, often abbreviated to iid.
The base distribution passed to repeat can itself contain draws from further parameters, and repetitions can be nested to create hierarchical models. In all of these cases, it is simple to determine whether parameters are shared across loop repetitions: if the parameter is drawn inside the loop, the parameter is not shared and can have different values across repetitions. If the parameter is drawn outside (i.e. before a repeat, then the parameter is just drawn once and the same value is used across repetitions. For instance, in the following hierarchical model:
myNestedModel = prob
x ~ normal 0 1
repeat 10 $ prob
y ~ normal x 1
repeat 20 $ prob
z ~ normal y 1
return z
the variable x is drawn once, the draw of y is repeated across the outermost loop (the repeat 10 line) but shared across the innermost loop (the repeat 20 line) for each repetition of the outermost loop.
TODO: unfold
TODO: stochastic differential equations in models
Models do not have to have loops or timeseries constructs. myValueModel, defined above as a statistical model for a single value, is an entirely valid statistical model. There is no need as such for repetition in Baysig or elsewhere in Bayesian statistics.
The last line of the model defines the overall type of the model. This last line must be either a return statement or must be a valid probability distribution. For instance, the following valid model:
modelA = prob
x ~ normal 0 1
y ~ normal x 1
return y
is equivalent to:
modelB = prob
x ~ normal 0 1
normal x 1
The type of the expression passed to return determines the overall type of the model, such that modelA and modelB both have type Prob Real because normal m v has type Prob Real. However, if the return is called within a repeat then the type of value passed to return gets lifted into a List, possibly multiple times for nested repetitions. For instance, univariateLinearRegression has type Prob (List (Real, Real)) and myNestedModel has type Prob (List (List Real))
The syntax and the semantics of the statistical models in Baysig are derived from the monadic properties of probability distributions. While it is not necessary to understand what a monad is to use Baysig, it helps when building complex models.
A monad is a data type m for which we can define operations:
return : a -> t a
and
>>= : m a -> (a -> m b) -> m b
Notice that m here must be a higher-order type that takes a type parameter to form a concrete type. Types like Real and String cannot be monads — only types like List and Maybe can be monads, because they need an additional type to form the type of a concrete value (a value cannot have a type List, but it can have a type List Real).
The type Prob forms a monad, which means that the operations return and >>= are available for probability distributions.
return turns a single value into a probability distribution. When sampled, this probability distribution always yields the value that was passed to return.
return 3.14 ⇒ 3.14000000 ± 0.00000026
TODO >>=
In addition to the existence of these operations, for a type formaing a monad has to saitsfy certain laws. TODO
We can now see how the useful function fmap can be defined in terms of return and >>=:
fmap f p
= p >>= (return . f)
= p >>= (\x-> return (f x))
= prob x ~ p
return (f x)
The estimate procedure allows you to go beyond simulating data from models. Instead, it calculates the likely values of the model parameters given observed data. That it, estimate calculates the probability distribution over model parameters given the observed data.
estimate must be called at the top level and cannot be called inside let statements or function definitions. The syntax for using esitmate is:
parameter_pat <* estimate model_expr observed_data_expr
where
parameter_pat is the pattern that will be bound to the probability distribution over the unobserved parameters in the model
model_expr is the expression of the model which must have type Prob a for some type a.
observed_data_expr is the expression for the observed data.
None of these have to be primitive variable names. You can compose the model or the observed data using function calls, e.g.
myModel distx disty = prob
x ~ distx
y ~ disty
return (x,y)
pars <* esitmate (myModel (normal 0 1) (gamma 1 1))
(3.1, 2.3)
In many cases, the type of the data should be identical to the type of the model (wrapped in Prob). However, the type-checking rule for estimate is slightly more relaxed: model_expr must have type Prob t0 and observed_data_expr must have type t1 where t0 and t1 are structurally similar:
a type t is structurally similar to itself
List x is structurally similar to List y if and only if x is structurally similar to y
Two record types are structurally similar if and only if their common fields are all structurally similar
This can be used to control what is observed in the model such that a general model can written without knowing exactly which components are observed.
TODO: example: stochastic volatility
estimate can be used to estimate parameters from new probability distributions that you write yourself. Any elementary probability distribution used must, howver, be associated with a valid log-domain probability density function (PDF). Baysig uses a simple naming convention to associate probability distributions with PDFs. If you introduce a new probability distribution, for instance named myDist, you must also define its PDF in the variable myDistLogPdf.
TODO: Example
Not every model that can be written use the prob notation or is of a valid type of the form Prob a for some type a can be used in statistical inference with estimate. The limitations in the models that can be used with estimate come from two sources: the first is that models must be associated with a valid probability density function (pdf) for estimate to be able to apply Bayes theorem to it. This is not the case for all models of type Prob a in Baysig. For instance, here is a model that is not assocatiated with a valid probability density function (see below for the reason why this is so):
notBayesianModel = prob
x ~ normal 0 1
return $ x^2
This model is not associated with a well-defined pdf, because it is not in general possible to calculate the pdf resulting from a non-linear transformations of values drawn from probability distributions. One way to "rescue" this model is to allow a further noise distribution after the non-linear transformation, for instance
bayesianModel = prob
sigma ~ gamma 1 1
x ~ normal 0 1
normal (x^2) sigma
This may or may not be sufficiently close to the model you had in mind.
This limitation of using estimate can be summarized in this rule:
A model passed to
estimatemust end in a draw or simple combination of draws
According to this rule, a model can end directly in a draw from a probability distribution for which the log-pdf function is defined. For instance, in
model = prob
x ~ uniform 0 1
sigma ~ gamma 1 1
normal x sigma
the last line is itself a probability distribution, so it is OK to use this model in estimate.
If the last line is not itself a probability distribution, it must a combination of draws from such distributions. In the simplest case, you can reference a variable denoting a draw from a probability distribution:
model = prob
x ~ uniform 0 1
sigma ~ gamma 1 1
y ~ normal x sigma
return y
Pairs and records that reference variables which are draws from probability distributions are also simple combinations that can be used in the last line of models passed to estimate. For instance,
model = prob
x ~ uniform 0 1
sigma ~ gamma 1 1
y1 ~ normal x sigma
y2 ~ normal x sigma
return (y1,y2)
model = prob
x ~ uniform 0 1
sigma ~ gamma 1 1
y1 ~ normal x sigma
y2 ~ normal x sigma
return {a=>y1; b=>y2}
are both acceptable models.
In models that use unfold, it is very common that only a portion of the state space is observed. You can use map in the last line of a model to control this:
--- NOTE: CURRENTLY BROKEN - still true?
model = prob
v ~ gamma 1 1
x0 ~ normal 0 0.1
y0 ~ normal 0 0.1
ys ~ unfold 100 (x0,y0) $ \(xp,yp) -> prob
x ~ normal xp v
y ~ normal yp v
return (x,y)
return $ map fst ys
Here, only the x's in the two-dimensional state space model are observed.
Finally, you can end the model in a repeat. The same rules about ending in a draw or a simple combination of draws apply within repeat loops. For instance,
regress = prob
offset ~ normal 0 1.0
sigma ~ gamma 1 1
slope ~ normal 0.0 1.0
repeat 50 $ prob w ~ uniform 0.0 1.0
y ~ normal (offset+slope*w) sigma
return $ { foo=>w;bar=>y }
can also be used within estimate.
The second source of limitations is that the implementation of the estimate procedure is still incomplete and does not support as many cases as we'd like. We are actively working on removing these limitations.
Please do not call any recursive functions
We would like to remove this limitation, but it is not clear how feasible it is to do this while maintaining reasonably high performance inference.
We do not support integer, boolean or string-valued parameters. We would like to remove this limitation, but it is a lot of work.
Please do not reference lists, vectors or matrices that are defined in global scope. All observed data should be returned by the model and pass through the second argument to estimate. Numbers, functions or probability distributions are OK.
We will remove this limitation ASAP.
You cannot define or use local function in you model.
badModel = prob
y ~ normal 0 1
f x = 2+1
normal (f y) 1
instead, make these functions global
f x = 2+1
goodModel = prob
y ~ normal 0 1
normal (f y) 1
We will remove this limitation ASAP.
Finally, estimate is prototype software and it will not be hard to find additional cases where it breaks. Please report all bugs to [email protected].
syntax example
use for generating data from model with vague priors
use for posterior predictive
example
what it does under the hood.