Structure Learning Tutorial

The HUGIN Graphical User Interface provides the user with powerful structure learning capabilities (i.e., learning the structure of a Bayesian-network model from data (a set of cases)). Structure learning can be performed via the Learning Wizard (which allows data to be read from data files, to be preprocessed, etc) or by activating one of the structure learning algorithms directly. In this tutorial, the structure learning algorithms are activated directly.

There are several algorithms to choose from, however this example will only look at the NPC algortihm and the PC algorithm. In this tutorial we shall assume that you already know how these algorithms work. Refer to the “Help” button on the bottom left of the window for any details on the algorithms.

Consider the data file asia.dat sampled from the Chest Clinic network shown in Figure 1.

../../../_images/chestclinic3.png

Figure 1: The structure from which the data were sampled.

Figure 2 shows the first few lines of the data file, which consists of 10000 cases altogether.

../../../_images/asia_dat1.png

Figure 2: First few lines of the data file.

The structure learning functionality is available under the “File” menu and through the structure learning button of the tool bar of the Main Window. The structure learning button is shown in Figure 3.

The structure learning functionality is available under the “Wizards” menu as shown in Figure 3. This wizard will take you through a series of steps required in order to use the feature.

../../../_images/Structure_Learning_Wizard_button.png

Figure 3: The option to activate the structure learning functionality.

When said option is chosen, the structure learning dialog appears (see Figure 4).

../../../_images/Structure_Learning_Wizard_menu.png

Figure 4: The structure learning dialog.

Press the “Load File” button and choose a file from which the structure is to be estimated. When a file has been selected the “Next” button gets enabled. When the “Next” button is clicked, the wizard shows you a node summary of your file and if you click “Next” again you can use the Feature Selection tool to narrow the set of nodes. For this example, we just click “Next” again without selecting any nodes. Next you need to specify any structure constraints, meaning any known dependencies or independencies in the data set, see Figure 5. Once again we just click “Next”.

../../../_images/Structure_Learning_Wizard_constrains.png

Figure 5: The Structure Constraints dialog.

On the next page you need to specify an algorithm for the structure learning to perform.

../../../_images/structure_learning_wizard_features.png

Figure 6: The Structure Learning dialog

First chose the PC algorithm and assert the level of significance for the statistical independence tests performed during structure learning. Then click the “Next” button to the Data Dependencies dialog where you at last can detect the relative strengths of the dependencies (links) found in the data. You may use the slider to the right on the window.

../../../_images/structure_learning_wizard_dependencies.png

Figure 7: The Data Dependences dialog

Once finished press “Next” one last time to start the structure learning algorithm and load the network to the network pane. Here you should be able to manoeuvre the nodes to look like figure 8.

../../../_images/structure_learning_wizard_PC.png

Figure 8: The structure learned by the PC algorithm

If the PC algorithm was selected, the result should appear as in Figure 8. Obviously, the structure of the original network has not been recalled perfectly. The problem is that variable E depends deterministically on variables T and L through a logical OR (E stands for Tuberculosis or Lung cancer). This means that (i) T and X are conditionally independent given E, (ii) L and X are conditionally independent given E, and (iii) E and X are conditionally independent given L and T. Thus, the PC algorithm concludes that there should be no links between T and X, between L and X, and between E and X. Obviously, this is wrong. The same reasoning leads the PC algorithm to leave D unconnected to T, L, and E. Also, as the PC algorithm directs links more or less randomly (respecting, of course, the conditional independence and dependence statements derived from data), the directions of some of the links are wrong.

If the NPC algorithm NPC algortihm was selected and there are uncertain links or links for which the directionality could not be determined for sure, the user will be presented with an intuitive graphical interface for resolving these structural uncertainties (see the description of the NPC algorithm for details). Figure 9 shows the (intermediate) result of the NPC structure learning algorithm applied to the same data.

../../../_images/structure_learning_wizard_NPC.png

Figure 9: The intermediate structure learned by the NPC algorithm

Depending on the choices made by the user, many different final structures (including the original structure shown in Figure 1) can be generated from this intermediate structure.

The HUGIN Graphical User Interface implements a number of different algorithms for learning the structure of a Bayesian network from data, see Structure Learning for more information.

Once the structure of the Bayesian network has been generated, the conditional probability distributions of the network can be estimated from the data using the EM-learning algorithm, see the EM tutorial.