Structure Learning Help¶
At this point, the Learning Wizard is ready to learn the structure of the model from the (possibly preprocessed) data and with the help of the possible structure constraints specified.
The user can select from six different algorithms for learning the structure of the model from data: The PC algorithm, the NPC algorithm, the Greedy search-and-score algorithm, the Chow-Liu tree algorithm, the Rebane-Pearl polytree algoritm or the Tree Augmented Naive Bayes algorithm. The first to algorithm (PC and NPC) are constraint-based approaches whereas the later two are both based on the Chow-Liu algorithm for learning a tree representation of the underlying joint distribution.
Note that, in this step of the Learning Wizard, only the structure of the network is determined. The conditional probability tables (CPTs) to be associated with the learned structure, will be learned in a subsequent step.
Constraint-Based Algorithms¶
Two pieces of information are needed in order to start the learning process using the PC or NPC algorithm: * the level of significance to use for the learning, and * the learning algorithm to be used; either the The PC algorithm or the NPC algorithm.
Level of Significance¶
The PC and NPC structure learning algorithms are based on making dependence tests that calculate a test statistic which is asymptotically chi-squared distributed assuming (conditional) independence. If the test statistic is large for a given independence hypothesis, the hypothesis is rejected; otherwise, it is accepted. The probability of rejecting a true independence hypothesis is given by the level of significance:
The level of significance can be entered into this text field. The default value is set to 0.05, which should be appropriate for most learning sessions, but it can take on any value in the open interval (0;1). In general, the higher the significance level the more links will be included in the learned structure. Reducing the level of significance will usually also reduce the running time of the structure learning algorithms.
The PC Algorithm¶
The PC algorithm, which is a variant of the original PC algorithm due to Peter Spirtes and Clark Glymour, belongs to the class of constraint-based learning algorithms. The basic idea of these algorithms is to derive a set of conditional independence and dependence statements (CIDs) by statistical tests.
The algorithm performs the following steps:
Statistical tests for conditional independence are performed for all pairs of variables (except for those pairs for which a structure constraint has been specified).
An undirected link is added between each pair of variables for which no conditional independences were found. The resulting undirected graph is referred to as the skeleton of the learned structure.
Colliders are then identified, ensuring that no directed cycles occur. (A collider is a pair of links directed such that they meet in a node.) For example, if we find that A and B are dependent, B and C are dependent, but A and C are conditionally independent given S, not containing B, then this can be represented by the structure A –> B <– C.
Next, directions are enforced for those links whose direction can be derived from the conditional independences found and the colliders identified.
Finally, the remaining undirected links are directed randomly, ensuring that no directed cycles occur.
One important thing to note about the PC algorithm is that, in general, it will not be able to derive the direction of all the links from data, and thus some links will be directed randomly. This means that the learned structure should be inspected, and if any links seem counterintuitive (e.g., sweat causes fever, instead of the other way around), one might consider going back and insert a constraint specifying the direction of the link, or use the NPC algorithm, which allows the user to interactively decide on the directionality of undirected links.
Traditional constraint-based learning algorithms produce provably correct structures under the assumptions of infinite data sets, perfect tests, and DAG faithfulness (i.e., that the data can be assumed to be simulated from a DAG). In the case of limited data sets, however, these algorithms often derive too many conditional independence statements. Also, they may in some cases leave out important dependence relations.
Generally, it is recommended to use the NPC algorithm, as the resulting graph will be a better map of the (conditional) independence relations represented in the data. In particular, when the data set is small, the NPC algorithm should be the one preferred. The NPC algorithm, however, has longer running times than the PC algorithm.
The NPC Algorithm¶
NPC stands for Necessary Path Condition, and it is a criterion developed by researchers at Siemens in Munich for solving some of the problems of constraint-based learning algorithms like the PC algorithm. The basic machinery is the same in the PC and the NPC algorithms (i.e., they are both based on generating a skeleton derived trough statistical tests for conditional independence).
The NPC algorithm seeks to repair the deficiencies of the PC algorithm, which occur especially in the face of limited data sets. The solution provided by the NPC algorithm is based on inclusion of a criterion known as the necessary path condition. This criterion forms the basis for introducing the notion of ambiguous regions, which in turn provide a language for selecting among sets of inter-dependent uncertain links. The resolution of ambiguous regions is performed in interaction with the user (see the next page of the Learning Wizard).
In constraint-based learning algorithms, the skeleton of the graph is constructed by not including a link in the induced graph whenever the corresponding nodes are found to be conditional independent. There can, however, be inconsistencies among the set of conditional independence and dependence statements (CIDs) derived from limited data sets. That is, not all CIDs can be represented simultaneously. The inconsistencies are assumed to stem solely from sampling noise; i.e., we still assume that there exists a perfect independence map of the (unknown) probability distribution from which the data is generated.
The number of inconsistencies in the set of CIDs reflects structure model uncertainty. Thus, the number of uncertainties is a confidence measure for the learned structure and can as such be used as an indication of whether or not sufficient data has been used to perform the learning. The inconsistent CIDs produce multiple solutions when inducing a directed acyclic graph (DAG) from them. These solutions differ with respect to the set of links included.
To resolve the inconsistencies, the NPC algorithm relies on user interaction where the user gets the opportunity to decide on directionality of undirected links and to resolve the ambiguous regions.
The Necessary Path Condition¶
Informally, the necessary path condition says that in order for two variables X and Y to be independent conditional on a set S, with no proper subset of S for which this holds, there must exist a path between X and every Z in S (not crossing Y) and between Y and every Z in S (not crossing X). Otherwise, the inclusion of Z in S is unexplained. Thus, in order for an independence statement to be valid, a number of links are required to be present in the graph.
Ambiguous Regions¶
When the absence of a link, a, depends on the presence of another link, b, and vice versa, we define a and b to be inter-dependent. Both a and b are constitute what we call uncertain links. An ambiguous region is a maximal set of inter-dependent links. That is, an ambiguous region consists of a set of uncertain links.
The main goal is to obtain as few and small ambiguous regions as possible. It should be noted that deterministic relations between variables will also produce ambiguous regions.
The user is offered the possibility of providing information as to how the ambiguous regions should be resolved. The next page of the Learning Wizard will provide an intuitive graphical interface for this interaction.
References¶
The interested reader is referred to the literature for a more detailed description of the notions behind the NPC algorithm. See e.g. * Steck, H. (2001). Constrained-Based Structure Learning in Bayesian Networks Using Finite Data Sets, PhD Thesis, Institut für der Informatik der Technischen Universität München. * Steck, H. and Tresp, V. (1999). Bayesian Belief Networks for Data Mining, Proceedings of The 2nd Workshop on Data Mining und Data Warehousing als Grundlage Moderner Entschidungsunterstuezender Systeme, DWDW99, Sammelband, Universität Magdeburg, September 1999. * Steck, H., Hofmann, R., and Tresp, V. (1999). Concept for the PRONEL Learning Algorithm, Siemens AG, Munich.
The Greedy Search-And-Score Algorithm¶
The greedy search-and-score algorithm for learning the structure of a Bayesian network uses a score function to evaluate the goodness of a candidate network structure. The algorithm performs a search through the space of possible network structures and returns the structure with the highest score. The operators used to perform the search given a current candidate are add an arc, remove an arc and reverse an arc.
The candidate network structure is provided with a maximum likelihood estimate of the conditional probability tables associated with the nodes of the network in order to compute the score of the candidate structure. If the data is incomplete, the EM algorithm must be used. However, if the data is complete, this can be done by counting cases and normalizing the counts.
For simplicity, we assume that the data is complete (by ignoring the missing data).
We also assume that the score function is decomposable. This means that the total score can be computed as a sum of scores of the nodes in the network. This makes it easy to compute the effect of a simple modification to the network (such as adding or removing an edge).
The user can chouse between using the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) as the score function.
The user can specify an upper limit on the number of parents of each node in the network structure.
Maximum Number of Parent¶
The user has the option to specify an upper limit on the number of parents in the network structure.
Chow-Liu Tree Based Algorithms¶
The Chow-Liu Tree Algorithm¶
The Chow-Liu algorithm is a structure learning algorithm that findsthe best tree representation of the joint distribution induced bythe data.
A Chow-Liu tree is the best possible tree-shaped network approximation to a probability distribution such that all edges are directed away from the root. The quality of the approximation is measured using the Kullback-Leibler distance between the true distribution and the distribution defined by the Chow-Liu tree. When we learn from data, the true distribution is defined by the frequencies of the observations. Chow and Liu (1968) show that the optimum tree can be found as the maximum-weight spanning tree over all variables, where the weight of each edge is given as the mutual information between the variables connected by the edge.
A Chow-Liu tree can be constructed as follows: * Calculate the mutual information MI(Xi, Xj) for each pair (Xi, Xj). * Consider the complete MI-weighted graph: the complete undirected graph over {X,…,Xn}, where the links (Xi, Xj) have the weight MI(Xi, Xj). * Build a maximal-weight spanning tree for the complete MI-weighted graph. * Direct the resulting tree by choosing any variable as a root and setting the directions of the links to be outward from it.
The Rebane-Pearl Polytree Algorithm¶
The Rebane-Pearl polytree algorithm constructs a polytree-shaped belief network approximation to a probability distribution (Rebane and Pearl, 1987).
The Rebane-Pearl algorithm is similar to the Chow-Liu tree algorithm for constructing the “best possible” tree-shaped belief network approximation to a probability distribution such that all edges are directed away from the root. Instead of constructing a tree-shaped belief network approximation, the algorithm constructs a polytree-shaped belief network.
Rebane-Pearl polytree recovery algorithm: The Chow-Liu algorithm constructs a simple tree structure (each node has one parent — except the root). The Rebane-Pearl algorithm turns this tree into a polytree (each node can have multiple parents, but the underlying network structure is still a tree).
The Tree Augmented Naive Bayes Algorithm¶
The Tree Augmented Naive (TAN) Bayes algorithm is based on theChow-Liu algorithm.
The TAN algorithm is useful for constructing classification networks where a specific node of the model is target of the reasoning. The target node is used to construct a conditional Chow-Liu tree (i.e., a Chow-Liu tree over all nodes except the selected target) with the selected root as root of the tree. Weights are defined as conditional mutual information (conditional on the target), and all nodes (except target) have target as an extra parent.
References * C. K. Chow, C. N. Liu (1968). Approximating DiscreteProbability Distributions with Dependence Trees. * G. Rebane, J. Pearl, (1987). The recovery of causal polytreesfrom statistical data.