Structural Constraints¶
The Learning Wizard allows you to specify available knowledge about dependences or independences among pairs of variables in the data set. That is, if a pair of variables are known to be marginally dependent (i.e., a there must be a link between them) or conditionally independent (i.e., there must not be a link between them), such knowledge can be specified by imposing structural constraints upon the graphical model learned from the data. You can specify four different kinds of constraints:
Please note that only one constraint can be specified per pair of variables.
Existence of a Causal Link¶
If a variable, say B, is known to be causally dependent on another variable, say A, then this knowledge can be specified using the Arrow Constraint Tool, which is activated by pressing its associated button:
Once activated, the causal link from A to B is specified by pressing the left mouse button at A, dragging the mouse to B, and then releasing the button. Please note that the tool remains activated until you select another tool.
Non-Existence of a Causal Link¶
If a variable, say B, is known not to be causally dependent on another variable, say A, then this knowledge can be specified using the No-Arrow Constraint Tool, which is activated by pressing its associated button:
Once activated, the constraint that a causal link from A to B is forbidden is specified by pressing the left mouse button at A, dragging the mouse to B, and then releasing the button. Please note that the tool remains activated until you select another tool.
Notice that specifying that a causal link is not allowed from A to B does not disallow a causal link from B to A.
Existence of a Link¶
If a pair of variables, say A and B, are known to be marginally dependent (i.e., no matter what evidence is given, A and B are dependent), but it is not known which variable causes which, then this knowledge can be specified using the Link Constraint Tool, which is activated by pressing its associated button:
Once activated, the link between A and B is specified by pressing the left mouse button at one of the nodes, dragging the mouse to the other node, and then releasing the button. Please note that the tool remains activated until you select another tool.
Notice that in the model resulting from structural learning, the link between A and B will be directed.
Non-Existence of a Link¶
If a pair of variables, say A and B, are known to be conditionally independent (i.e., with some, possibly empty, set of evidence given, A and B are independent), then this knowledge can be specified using the No-Link Constraint Tool, which is activated by pressing its associated button:
Once activated, constraint that no link is allowed between A and B is specified by pressing the left mouse button at one of the nodes, dragging the mouse to the other node, and then releasing the button. Please note that the tool remains activated until you select another tool.
Usefulness of Enforced Links¶
Not only can specification of domain knowledge in the form of link constraints be useful in guiding the learning algorithm towards the best possible model, it also can be indispensable in some cases.
Consider, for example, a case where for three variables, A, B, and C, the following statements hold true * A and B are independent given C * A and C are independent given B * B and C are independent given A * A and B are marginally dependent * A and C are marginally dependent * B and C are marginally dependent
It is not possible to represent all of these statements in a single DAG. The DAG, however, must represent all dependence statements (i.e., if a pair of variables are marginally dependent, there must be a link between them), as the posterior probabilities will otherwise be wrong. Unrepresented independence statements may lead to more complex inference, but do not lead to wrong probabilities. Therefore, the following DAG is a correct, but not a complete, representation of the above statements.
However, if for a pair of variables, say X and Y, it holds true that X and Y are conditionally independent given a third variable (or a set of variables), then the PC learning algorithm makes sure that X and Y do not get connected by a link. Thus, the DAG resulting from the PC algorithm completely represents the independences found in the data, but it does not necessarily represent all dependences.
Thus, in the above example, the following structure will be generated.
Therefore, it can be necessary to manually specify some dependences when using the PC algorithm. Alternatively, one may wish to use the NPC algorithm, which takes care of situations like this (see the help page of the next page of the Learning Wizard). In the subsequent Data Dependences page such marginal dependences can be identified.
Saving / Importing Information¶
If the learning process is to be repeated a number of time, it can be rather cumbersome to specify the same constraints over and over again. To avoid this, the model information, including the constraints, can be saved to a net-file using the “Save”-button :
This allow for later import of the saved information by using the “Import”-button : .
Note, that this allows for import of all the model information, such as node positions, labels, sizes, etc. This can be very useful, if the data relates to a network whose structure is known. In that case, you can simply import the labels and positions of the nodes. The learned network can then easily be compared to the existing one.