# EM Learning Tutorial¶

It often happens that many (or all) of the probability distributions of the variables in a Bayesian network are unknown, and that we want to learn these probabilities (parameters) from data (i.e., series of observations obtained by performing experiments, from the literature, or from other sources). An algorithm known as the EM (Estimation-Maximization) algorithm is particularly useful for such parametric learning, and it is the algorithm used by HUGIN for learning from data. EM tries to find the model parameters (probability distribution) of the network from observed (but often not complete) data.

The EM Algorithm is well-suited for parameter estimation in batch (i.e., estimation of the parameters of conditional probability tables from a set of cases) whereas the Adaptation Algorithm is well-suited for sequential parameter update.

## Experience Tables In EM Learning¶

Experience Tables are used to specify prior experience on the parameters of the conditional probability distributions. Notice that EM parameter estimation is disabled for nodes without an experience table.

## Example EM Learning¶

We use the example from the Experience Tables to illustrate the use of EM learning. The Bayesian network is shown in Figure 1.

Go to `Chest Clinic`

to learn more about the domain of this network.

To illustrate the use of EM learning, we assume that the structure of the network is known to be as shown in Figure 1. In addition, assume that the conditional probability distribution of “Tuberculosis or cancer” given “Has tuberculosis” and “Has lung cancer” is known to be a disjunction (i.e., the child is in state yes if and only if at least one of the parents is in state yes).

We do not assume any prior knowledge on the remaining set of distributions. Thus, we set all entries of the conditional probability distributions to one except for the “Tuberculosis or cancer” node. This will turn all the probability distributions for all but the “Tuberculosis or cancer” node into uniform distributions because HUGIN will normalize the values of the distribution tables if the sum of the column values differs from one. A uniform distribution signifies ignorance (i.e., we have no a priori knowledge about the probability distribution of a variable). Figure 2 shows the initial marginal probability distributions prior to parameter estimation.

Now we will try to learn the probabilities from a data file associated with this network. The EM-Learning Wizard will help guide you through the process. Go to Edit Mode and select the EM-Learning Wizard from the Wizard menu as shown in Figure 3.

Click the “Load File” button and choose a file from which the conditional distribution probabilities are to be learned. Choose the `asia.dat`

file, which is located in the same directory as the network. The first few lines of the file are shown below.

The first line is the name of the nodes as in the network, the rest are the evidence for each experiment/observation. After the file is selected click the “Load” button and then the “Next” button to open the prior distribution knowledge dialog as shown in Figure 5. Now you must create Experience Tables for all variables except Turberculosis or cancer, by selecting each of them and pressing the “Create” button under Table Existence as shown in Figure 6.

Once you have created an experience table for all variables, click the “All” button under Initialize. This will start the EM-Learning algorithm. Based on the data, the EM-algorithm computes the conditional probability distribution for each node. Figure 5 shows the new conditional probability distribution after EM-learning finishes. As can be seen from Figure 6, all the conditional probability distribution values have changed to reflect all the cases in the case file.

Figure 7 shows the marginal probability distributions for the network from which the set of cases was generated. Note that the probability distribution shown here is for the original `Chest Clinic`

network.

The EM algorithm can both estimate the conditional probability tables of discrete chance nodes and the conditional probability densities of continuous chance node.