Data Files

The HUGIN Graphical User Interface supports saving and loading of data files. A data file contains a set of cases as this

../../../_images/datafiles.png

Figure 1: A sample data file.

../../../_images/datafiles1.png

Figure 2: Another sample data file.

As indicated in Figure 1, the first line is the name of the nodes as in the network whereas each of the following lines specify a case in which each variable is assigned a value, i.e., a set of observations with one observation for each node. An observation may missing in a case. A missing value is specified using N/A.

Notice that the first line of the data file always contains the names of a (subset) of the nodes in the network. It is important to specify the name and not the label of nodes. The name is a unique identifier for each node. Thus, the data file in Figure 1 assumes that the network has nodes with names E, T, L, S, A , D, B and X. This could be the node names of the network shown in Figure 3 (where node labels are shown in the nodes.)

../../../_images/asia.png

Figure 3: Bayesian-network representation of “Chest Clinic”.

See also notes on data files for OOBN EM.

The format of a data file can be described by the following grammar:

<Data file> ::= <Header> <Case>*
<Header> ::= # <Separator> <Node list> | <Node list>
<Separator> ::= , | <Empty>
<Node list> ::= <Node name> | <Node list> <Separator> <Node name>
<Case> ::= <Case count> <Separator> <Data list> | <Data list>
<Case count> ::= <Nonnegative real number>
<Data list> ::= <Data> | <Data list> <Separator> <Data>
<Data> ::= <Value> | N/A | <Empty>
<Value> ::= <State index> | <Label> | <Real number> | true | false
<State index> ::= #<Integer>

Where:

  • The <Header> must occupy a single line in the file. Likewise, each <Case> must occupy a single line.

  • If # is the first element of <Header>, then each <Case> must include a <Case count>.

  • Each <Case> must contain a <Data> item for each node specified in the <Header>. The ith <Data> item (if it is a <Value>) in the <Data list> must be valid for the ith node in the <Node list> of the <Header>.

  • If <Data> is *, ?, or <Empty>, then the data is taken as ‘missing’.

  • If <Separator> is <Empty>, then none of the separated items is allowed to be <Empty>.

  • <Value> is as defined in Section 8.8, with the exception that <Likelihood> is not allowed.

  • <Real number> is a valid specification for CG, numbered, and interval nodes. For numbered and interval nodes, the acceptable values are defined by the state values of the named node.

  • <Label> is a valid specification for labeled nodes. The label (a doublequoted string) must match a unique state label of the named node. Quotes can be omitted if the label is a single word.

  • *true* and *false* are valid specifications for boolean nodes.