The Net Language

The HUGIN Development Environment uses a special-purpose language, called the net language, for description of networks and LIMID models. This language allows the user to create complete descriptions of Bayesian networks and LIMID models, containing specifications of the structure of the network model, the conditional probability, the policy, and the utility functions.

This chapter describes the third revision of the net language. This revision is substantially different from the first revision, and somewhat different from the second revision. The reason is that the first revision of the language used a fixed format (i.e., the semantics of the different elements were determined by their position within the description). This implied that it was impossible to extend this language in such a way that descriptions in the old language retained their meaning in the new language. The second revision was designed with that goal in mind. This third revision contains a minor modification of the second revision, enabling specification of object-oriented networks (i.e., a net file now can contain instance nodes, and means for specifying interface nodes).

Nodes

The basic element of a network model is the node. In ordinary Bayesian networks, a node represents a random variable (discrete or continuous). In a LIMID model, a node may also represent a decision, controlled by the decision maker, or a utility function, which is used to assign preferences to different configurations of variables. In object-oriented networks, there is an additional category of nodes, namely instance nodes, representing instances of other networks; that is, an instance node represents a sub-network in the network in which it appears.

Example 1

The following node description is taken from the “Chest Clinic” example (Lauritzen & Spiegelhalter 1988)

node T
{
  states = ("yes" "no");
  label = "Has tuberculosis?";
  position = (25 275);
}
  • This describes a binary random variable named T, with states labeled “yes” and “no”. The description also gives the label and position, which are used by the HUGIN Graphical User Interface.

A node description is introduced by one of the keywords: [<prefix>] node, decision, utility, or instance where the optional prefix on node is either discrete or continuous (omitting the prefix causes discrete to be used as default). The keywords are followed by a name that must be unique within the model. For instance nodes further follows the name of the network (or class) of which it is an instance, and the bindings of the input nodes of the instance node. To be precise, the syntax of is

instance <name> : <class> ({<input> = <node>}*)

where <class> is the name of the network of which <name> is an instance, and where {<input> = <node>}* denotes a comma separated list of zero or more elements of the kind <input> = <node>, where <input> is the name of an input node in <class> and <node> is the name of a node appearing in the network or an output node of an instance of a network included in the network. Note that <input> and <node> must be type consistent; i.e., be of the same category, kind, and subtype, and have identical state labels/values if they are discrete.

Example 2

The following specifies an instance node with name Person_1. This node represents an instance of a network named Person, and the nodes named I and E have been bound to the input nodes named Income and Education, respectively, of the instance of Person used in our network.

instance Person_1 : Person (Income = I, Education = E)
{
    label = "Bill Yates";
    position = (101 156);
}
  • The description also gives the label and position, which are used by the HUGIN Graphical User Interface.

Then, for all kinds of nodes, follows a sequence of name/value pairs of the form

<name> = <value>;

enclosed in braces.

The example shows the field names currently defined in the net language for nodes: states, label, and position. All of these fields are optional; if any field is absent, a default value is supplied.

  • states specifies the states of the node (here, the decisions of a decision node are also referred to as states); the states are indicated by a list of strings. The list must be non-empty. The strings in the list comprise the labels of the individual states; the labels need not be unique (can even be empty strings) for the node. Only the length of the list (i.e., the number of states) is relevant to the HUGIN Decision Engine; the state labels are, however, used by the companion system, the HUGIN Graphical User Interface. The default value is a list of length one, containing an empty string (i.e., the node will have one state). The states field is not allowed for utility and continuous nodes.

  • label is a string that is used by the HUGIN Graphical User Interface when displaying the nodes. The label is not used by the inference engine. The default value is the empty string.

  • position is a list of integers (the list must have length two). It indicates the position within the graphical display of the network by the HUGIN Graphical User Interface. The position is not used by the inference engine. The default position is at (0,0).

Apart from these fields, you can specify your own fields for nodes. These can be used for a specific application needing some extra information about the nodes.

Example 3

In this situation the T node has being assigned the application specific field MY_APPL_my_field.:

node T
{
    states = ("yes" "no");
    label = "Has tuberculosis?";
    position = (25 275);
    MY_APPL_my_field = "1000";
}

The value of such application specific fields can only be text strings encapsulated in quote characters (”) (see section “Lexical Matters” for precise definition of text strings).

It would be regarded as good style to start field name with an application specific prefix to avoid confusion (in example 3 the MY_APPL prefix).

Example 4

In the HUGIN Graphical User Interface some extra fields are used to save descriptions of both nodes and their states. These are the fields prefixed with HR.:

node T
{
    states = ("yes" "no");
    label = "Has tuberculosis?";
    position = (25 275);
    HR_State_0 = "Yes, the patient HAS tuberculosis.";
    HR_State_1 = "No, the patient has NOT tuberculosis.";
    HR_Desc = "Represents the fact that the patient has\
tuberculosis or not.";
}

The Structure of the Model

The structure (i.e., the edges of the underlying graph) is specified indirectly. We have two kinds of edges: directed and undirected edges.

Example 5

This is a typical specification of directed edges:

potential ( A | B C ) { }

This specifies that node A has two parents: B and C. That is, there is a directed edge from B to A, and there is a directed edge from C to A.

The model may also contain undirected edges. Such a model is called a chain graph model.

Example 6

potential ( A B | C D ) { }

This specifies that there is an undirected edge between A and B. Moreover, as usual, it specifies that both A and B have C and D as parents.

If there are no parents, the vertical bar may be omitted.

A maximal set of nodes, connected by undirected edges, is called a chain graph component.

Not all graps are permitted. The following restrictions are imposed on the structure of the network.

The graph may not contain any (directed) cycles.

Example 7

The following specification is not allowed, because of the cycle \(A \rightarrow B \rightarrow C \rightarrow A\).

potential ( B | A ) { }
potential ( C | B ) { }
potential ( A | C ) { }

However, the following specification is legal.

potential ( B | A ) { }
potential ( C | B ) { }
potential ( C | A ) { }

Example 8

The following specification is not allowed either, since there is a cycle \(A \rightarrow B \rightarrow C ~ A\) (the edge between A and C counts as “bidirectional”).

potential ( B | A ) { }
potential ( C | B ) { }
potential ( A C ) { }

However, the following specification is legal.

potential ( A | B ) { }
potential ( C | B ) { }
potential ( A C ) { }

Continuous chance nodes are not allowed in LIMID models, i.e., there cannot be continuous nodes in a net also containing utility or decision nodes.

Utility nodes may not have any children in the graph. This implies that utility nodes may only appear to the left of the vertical bar (never to the right).

Undirected edges can only appear between discrete chance nodes.

Continuous nodes can only have continuous nodes as children.

If a decision node appears to the left of the vertical bar, it must appear alone. In this case, so-called informational links are specified; such links specify which variables are known when the decision is to be made.

Example 9

Assume we want to specify a LIMID with two decisions, D1 and D2, and with three discrete chance variables, A, B, and C. First, A is observed; then, decision D1 is made; then, B is observed; finally, decision D2 is made. This sequence of events can be specified as follows:

potential ( D1 | A) { }
potential ( D2 | D1 B ) { }

Finally, no node may be referenced in any potential-specification before it has been declared by a node-, decision-, or a utility-specification.

Potentials

We also need to specify the quantitative part of the model. This part consists of conditional probability functions for random variables, policies for decision nodes and the values a utility function may assume. Thus, we distinguish between discrete probability, continuous probability, and utility potentials.

All types of potentials are different in the numerical specification between the braces of the potential-specification.

Example 10

The following description is taken from the “Chest Clinic” example and specifies the conditional probability table of the discrete variable T.

potential ( T | A )
{
    data = (( 0.05 0.95 )          %  A=yes
            ( 0.01 0.99 ));        %  A=no
}

This specifies that the probability of tuberculosis given a trip to Asia is 5 %, whereas it is only 1 % if the subject has not been to Asia. The data field may also be specified as an unstructured list of numbers.

potential ( T | A )
{
    data = ( 0.05 0.95           %  A=yes
             0.01 0.99 );        %  A=no
}

As the example shows, the numerical data is specified through the data field of a potential-specification. This data has the form of a list of real numbers. The structure of the list must either correspond to that of a multi-dimensional table with node list comprised of the parent nodes followed by the child nodes, or it must be a flat list with no structure at all. The “layout” of the data list is row-major (see section “Row-major Representation”).

Example 11

potential ( D E F | A B C ) { }

The data field of this potential-specification corresponds to a multi-dimensional table with dimension list <A, B, C, D, E, F>.

The data field of a utility potential has only the dimension of the nodes on the right side of the vertical bar.

Example 12

The following description is taken from the “Oil Wildcatter” example and shows a utility potential. Drillpay is a utility node while Oil is a discrete chance node with three states and Drill is a decision node with two states.

potential (Drillpay | Oil Drill)
{
    data = (( -70 0 )         %  dr
            ( 50 0 )          %  wt
            ( 200 0 ));       %  sk
}

The data field of this potential-specification corresponds to a multi-dimensional table with dimension list < Oil, Drill>.

The table in the data field of a continuous probability potential has the dimensions of the discrete chance nodes to the right of the vertical bar. All the discrete chance nodes must be listed first on the right side of the vertical bar (then follows the continuous nodes). However, the items in the multi-dimensional table are no longer values but instead continuous distribution functions. Currently, only Gauss normal distribution can be used. A normal distribution can be specified by its mean and variance. In the following example, a continuous probability potential is described.

Example 13

Suppose A is a continuous node with parents B and C which are both discrete. Also, both B and C have two states: B has states b1 and b2 while C has states c1 and c2.

potential (A | B C)
{
    data = (( normal ( 0, 2 )       %  b1  c1
              normal ( 3, 2 ) )     %  b1  c2
            ( normal ( 1, 2 )       %  b2  c1
              normal ( 2, 2 ) ));   %  b2  c2
}

The data field of this potential-specification is a table with the dimension list < B, C>. Each entry contains a probability distribution for the continuous node A.

All entries in the above example contains a Gauss normal distribution (the only continuous distribution currently available). A normal distribution is specified with the keyword normal followed by a list of two parameters. The first parameter is the mean and the second is the variance of the normal distribution.

Example 14

In this example, suppose A is a continuous node with one discrete parent B and one continuous parent C. B has two states b1 and b2 and C has a normal distribution.

potential (A | B C)
{
    data = ( normal ( 1 + C, 2 )            %  b1
             normal ( 1 + 2 * C, 2 ) );     %  b2
}

The data field of this potential-specification is a table with the dimension list <B> (B is the only discrete parent which is then listed first on the right side of the vertical bar). Each entry again contains a continuous distribution function for A. The influence of C on A now comes from the use of C in an expression specifying the mean parameter of the normal distributions.

Only the mean parameter of a normal distribution can be specified as a an expression. The variance parameter must be a numeric constant. The operators allowed in the expression are +, -, and * (addition, subtraction, and multiplication).

Since a decision node has no function assigned, it cannot have a data field. Thus, the decision potential specification does not really specify a potential but is rather a trick for specification of informational links.

If the data field is omitted from a potential-specification, a list of ones is supplied for discrete probability potentials, whereas a list of zeros is supplied for utility potentials. For a continuous probability potential, a list of normal distributions with both mean and variance set to 0 is supplied.

The values of the data field of discrete probability potentials may only contain non-negative numbers. In the specification of a normal distribution for a continuous probability potential, only non-negative numbers are allowed for the variance parameter. There is no such restriction on the values of utility potentials or the mean parameter of a normal distribution.

Global Information

Information pertaining to the Bayesian network or LIMID model as a whole can be specified at the beginning of the description, initiated by the keyword class (in the second revision of the net language, this keyword was net) - see Example 15. Notice that, except for the first line (i.e., “class <net name>”), the entire specification must be enclosed in curly braces (this syntax differs from the syntax used in the second revision).

Example 15

class test
{
    inputs = ();
    outputs = ();
    node_size = (80 40);
    ...
}

This specifies lists of names of nodes of the network that serves as input nodes and output nodes, respectively. It also specifies that nodes are drawn in the HUGIN Graphical User Interface with a width of 80 pixels and a height of 40 pixels.

Currently, only the inputs, outputs, and node_size fields have been defined for net-specifications. However, as with nodes, you can add all the additional fields you want.

Example 16

The newest version of the HUGIN Graphical User Interface uses a series of application specific fields. Some of them are shown here:

class
{
    inputs = ();
    outputs = ();
    node_size = (80 40);
    HR_Grid_X = "10";
    HR_Grid_Y = "10";
    HR_Grid_GridSnap = "1";
    HR_Grid_GridShow = "0";
    HR_Font_Name = "Arial";
    HR_Font_Size = "-12";
    HR_Font_Weight = "400";
    HR_Font_Italic = "0";
    HR_Propagate_Auto = "0";
    ...
}

The HUGIN Graphical User Interface uses the prefix HR on all fields.

Lexical Matters

A name has the same structure as an identifier in the C programming language. This means that a name is a non-empty sequence of letters and digits, beginning with a letter. In this context, the underscore character ( _ ) is considered a letter. The case of letters is significant. The sequence of letters and digits forming a name extends as far as possible; it is terminated by the first non-letter/digit character (for example, braces or whitespace).

A string is a sequence of characters not containing a quote character ( “ ) or a newline character; its start and end are indicated by quote characters.

A number is comprised of an optional sign, followed by a sequence of digits, possibly containing a decimal point character, and an optional exponent field containing an ‘E’ or ‘e’ followed by a possibly signed integer.

Comments can be placed in a net description anywhere (except within a name, a number, or other multi-character lexical elements). It is considered equivalent to whitespace. A comment is introduced by a percent character ( % ) and extends to the end of the line.

Row-Major Representation

This section describes the row-major “layout” of a table. If you do not have interest in this particular subject, you can just ignore it.

To find a value corresponding to a specific configuration in the row-major representation of a table, we index the values from 0 to s-1 where s is the number of values in the list. Suppose that the corresponding list of nodes is (A1, A2,…, An) and that node Ai has si states indexed from 0 to si-1. Then,

\[s = \prod_{i = 1}^{n} s_i\]

What we want is the index x of a configuration (a1, a2,…, an). Now, suppose that the state index of ai is ji \((0 \leq ji \leq si-1)\). Then, x can be computed as

\[x = \sum_{i = 1}^{n} a_i j_i\]

where

\[\begin{split}a_i = \begin{cases} a_i j_i &\text{ if }i < n, \\ 1 &\text{ if }i = n. \end{cases}\end{split}\]