PC AlgorithmΒΆ

Two constraint-based algorithms are available for structure learning: The PC algorithm and the NPC algorithm.

The HUGIN PC algorithm, which is a variant of the original PC algorithm due to Spirtes, Glymour & Scheines (2000), belongs to the class of constraint-based learning algorithms. The basic idea of these algorithms is to derive a set of conditional independence and dependence statements (CIDs) by statistical tests.

The algorithm performs the following steps:

  • Statistical tests for conditional independence are performed for all pairs of variables (except for those pairs for which a structural constraint has been specified – for more information, see the help page for the Structural Constraints page of the Learning Wizard).

  • An undirected link is added between each pair of variables for which no conditional independences were found. The resulting undirected graph is referred to as the skeleton of the learned structure.

  • Colliders are then identified, ensuring that no directed cycles occur. (A collider is a pair of links directed such that they meet in a node.) For example, if we find that A and B are dependent, B and C are dependent, but A and C are conditionally independent given S, not containing B, then this can be represented by the structure A –> B <– C.

  • Next, directions are enforced for those links whose direction can be derived from the conditional independences found and the colliders identified.

  • Finally, the remaining undirected links are directed randomly, ensuring that no directed cycles occur.

One important thing to note about the PC algorithm is that, in general, it will not be able to derive the direction of all the links from data, and thus some links will be directed randomly. This means that the learned structure should be inspected, and if any links seem counterintuitive (e.g., sweat causes fever, instead of the other way around), one might consider using the Learning Wizard (which provides a means of specifying structural domain knowledge) or the NPC algorithm, which allows the user to interactively decide on the directionality of undirected links.

Traditional constraint-based learning algorithms produce provably correct structures under the assumptions of infinite data sets, perfect tests, and DAG faithfulness (i.e., that the data can be assumed to be simulated from a probability distribution that factorizes according to a DAG). In the case of limited data sets, however, these algorithms often derive too many conditional independence statements. Also, they may in some cases leave out important dependence relations.

Generally, it is recommended to use the NPC algorithm, as the resulting graph will be a better map of the (conditional) independence relations represented in the data. In particular, when the data set is small, the NPC algorithm should be the one preferred. The NPC algorithm, however, has longer running times than the PC algorithm.