Structure Learning Tutorial¶
The HUGIN Graphical User Interface provides the user with powerful structure learning capabilities (i.e., learning the structure of a Bayesian-network model from data (a set of cases)). Structure learning can be performed via the Learning Wizard (which allows data to be read from data files, to be preprocessed, etc) or by activating one of the structure learning algorithms directly. In this tutorial, the structure learning algorithms are activated directly.
There are several algorithms to choose from, however this example will only look at the NPC algortihm and the PC algorithm. In this tutorial we shall assume that you already know how these algorithms work. Refer to the “Help” button on the bottom left of the window for any details on the algorithms.
Consider the data file asia.dat
sampled from the Chest Clinic
network shown in Figure 1.
Figure 2 shows the first few lines of the data file, which consists of 10000 cases altogether.
The structure learning functionality is available under the “File” menu and through the structure learning button of the tool bar of the Main Window. The structure learning button is shown in Figure 3.
The structure learning functionality is available under the “Wizards” menu as shown in Figure 3. This wizard will take you through a series of steps required in order to use the feature.
When said option is chosen, the structure learning dialog appears (see Figure 4).
Press the “Load File” button and choose a file from which the structure is to be estimated. When a file has been selected the “Next” button gets enabled. When the “Next” button is clicked, the wizard shows you a node summary of your file and if you click “Next” again you can use the Feature Selection tool to narrow the set of nodes. For this example, we just click “Next” again without selecting any nodes. Next you need to specify any structure constraints, meaning any known dependencies or independencies in the data set, see Figure 5. Once again we just click “Next”.
On the next page you need to specify an algorithm for the structure learning to perform.
First chose the PC algorithm and assert the level of significance for the statistical independence tests performed during structure learning. Then click the “Next” button to the Data Dependencies dialog where you at last can detect the relative strengths of the dependencies (links) found in the data. You may use the slider to the right on the window.
Once finished press “Next” one last time to start the structure learning algorithm and load the network to the network pane. Here you should be able to manoeuvre the nodes to look like figure 8.
If the PC algorithm was selected, the result should appear as in Figure 8. Obviously, the structure of the original network has not been recalled perfectly. The problem is that variable E depends deterministically on variables T and L through a logical OR (E stands for Tuberculosis or Lung cancer). This means that (i) T and X are conditionally independent given E, (ii) L and X are conditionally independent given E, and (iii) E and X are conditionally independent given L and T. Thus, the PC algorithm concludes that there should be no links between T and X, between L and X, and between E and X. Obviously, this is wrong. The same reasoning leads the PC algorithm to leave D unconnected to T, L, and E. Also, as the PC algorithm directs links more or less randomly (respecting, of course, the conditional independence and dependence statements derived from data), the directions of some of the links are wrong.
If the NPC algorithm NPC algortihm was selected and there are uncertain links or links for which the directionality could not be determined for sure, the user will be presented with an intuitive graphical interface for resolving these structural uncertainties (see the description of the NPC algorithm for details). Figure 9 shows the (intermediate) result of the NPC structure learning algorithm applied to the same data.
Depending on the choices made by the user, many different final structures (including the original structure shown in Figure 1) can be generated from this intermediate structure.
The HUGIN Graphical User Interface implements a number of different algorithms for learning the structure of a Bayesian network from data, see Structure Learning for more information.
Once the structure of the Bayesian network has been generated, the conditional probability distributions of the network can be estimated from the data using the EM-learning algorithm, see the EM tutorial.