This section describes the feature of using data (i.e., a set of cases) to estimate the conditional probability distributions and densities when only the graphical structure is given. This is known as batch learning. Note that if the initial conditional probability distributions and densities are also given, it is still possible to use batch learning.
Batch learning requires that all data are available when the learning process starts, whereas sequential learning allows the knowledge to be updated incrementally, but sequential learning requires that initial assessments of the conditional probabilities are given, whereas batch learning only requires the graphical structure. The batch learning method currently implemented in HUGIN is called EM-algorithm or EM-learning and was developed by Lauritzen (1995); see also the work of Cowel & Dawid (1992).
A case is an assignment of values to some or all the nodes of a domain. If values have been assigned to all nodes, the case is said to be complete; otherwise, it is said to be incomplete. The data used by the learning algorithm is comprised of a set of cases. The cases are numbered sequentially from 0 to N-1, where N is the total number of cases.
Before learning can take place, in addition to the data set, the set of nodes for which conditional probability distributions and densities should be estimated must be specified. This set is specified as the set of nodes having experience tables (described in section Learning - Adaptation). The input to the learning algorithm is the cases data; this set of cases must be non-empty. Note that if the domain is a LIMID model, then the domain’s decision nodes must be instantiated in all cases.
The learning algorithm needs to do inference, so the domain must be compiled before the learning algorithm is called. If successful, the procedure will update the conditional probability (or density) and the experience table for each node of the domain that has an experience table. Moreover, the inference engine will be reset and use the new conditional probability (or density).
If the contents of the experience tables are all zeros, then the algorithm will compute the best (“maximum likelihood”) conditional probability tables and densities, assuming that any table is valid (i.e., there are no restrictions on the form of the tables). If the contents are not all zeros, then the experience table and the conditional probability tables or density is used to form counts (“prior counts”) that will be added to those derived from the data set. This is known as “penalized EM”. If the experience count for a parent configuration is negative, then parameter estimation is disabled for that parent configuration.
The starting point or the EM algorithm is the conditional probability tables and densities specified prior to calling the algorithm, assuming that the domain has been compiled with these tables. If no table has been specified for a discrete node, uniform distributions are assumed whereas initial values for the mean and variances are computed from the data for a continuous node. Sometimes it is desirable to enforce zeros in the conditional probability tables for the configurations that should be impossible (i.e., having zero probability).
The EM algorithm performs a number of iterations. For each iteration the logarithm of the probability of the case data given the current joint probability distribution is computed. This quantity is known as the log-likelihood, and the EM-algorithm attempts to maximize this quantity. There are two ways in which the number of iterations can be controlled:
A maximum number of iterations can be specified (zero means no maximum).
If a maximum number of iterations has been specified and not exceeded, the EM algorithm terminates when the relative difference between the log-likelihood for two successive iterations is sufficiently small. That is, when the relative difference between the log-likelihood for two successive iterations becomes less than a tolerance, which must be a positive number. By default, the value of the tolerance is 104.
See also notes on EM learning in OOBNs.