The Stud Farm

(Requires minimal knowledge in the area of genealogy)

A Constructed Example from a Stud Farm

The stallion Alan has with the mare Ann sired Betsy and with the mare Alice sired Benny. Betsy has with Bill born Carl, and Benny has with Bonnie sired Cecily. Both Bill and Bonnie are born by Ann, but their fathers (A1 and (A2) are in no way related. Carl and Cecily have just born a colt, Dennis.

../../../_images/studfarm.png

Figure 1: Dennis’s genealogy.

It turns out that Dennis suffers from a life threatening hereditary disease carried by a recessive gene a. The corresponding dominant gene is A. The disease is so serious that Dennis is put down instantly, and as the stud farm wants the gene out of the production, Carl and Cecily are taken out of breeding because they both must be carriers of the gene having genotype Aa.

Now the problem is: Which other horses are to be taken out of breeding? Bonnie is a very fine mare, whereas Alan more easily can be replaced in the production. What will the stud farm be best off doing? It would be nice to know the probabilities of each of the horses being carrier of the sick gene. Normally the probability of being carrier is known to be 0.01.

Bayesian Networks

The domain of the inheritance of genes in the stud farm can easily be modeled by a Bayesian network (BN). Actually, the genealogy in Figure 1 only needs a conditional probability table (CPT) on each node to be a BN. First we specify the states of the nodes: All horses except Dennis are either carriers (Aa) or not (AA) since none of them are sick. We give them states “AA” and “Aa”. Each of the nodes in the upper layer in Figure 1 have the CPT shown in Table 1. The others except for Dennis have the CPT shown in Table 2. Dennis has the CPT shown in Table 3.

../../../_images/studfarmtabel.png

Table 1: CPT of the nodes in the upper layer (Alan used as an example).

../../../_images/studfarmtabel1.png

Table 2: CPT of the nodes in the middle layers (Betsy used as an example).

../../../_images/studfarmtabel2.png

Table 3: CPT of the node Dennis: P(Dennis | Carl, Cecily).

This BN has been implemented using the HUGIN Graphical User Interface in less than half an hour. Then, the evidence that Dennis is aa is entered and sum propagation is performed. The result is shown in Figure 2.

../../../_images/studfarm1.png

Figure 2: The probabilities of the horses being carriers (Aa) of the sick gene.

In Figure 2, we can see that it is very likely that Betsy is carrier of the sick gene. Both her parents (Ann and Alan) also has great probability of being carriers. However, a more thorough investigation shows that it is very unlikely that both of them are carriers at the same time. In Figure 3 we see that if Alan is known to be carrier, it becomes most unlikely that Ann is also carrier. This is because a sick gene is only inherited from one parent. The Figure shows that the gene is inherited from Alan to Betsy and Benny to Carl and Cecily.

The conclusion to the results would be very dependent on how much the farmer wants to be sure of getting the sick gene out of production. He can never be absolutly sure that he gets rid of the right horses, but he should at least get rid of Betsy, Ann and Bonnie. If also wants to get rid of Alan because he is easily replaced this would have no effect if he does not also get rid of Benny since Benny probably has inherited the sick gene if Alan has it.

../../../_images/studfarm2.png

Figure 3: If we assume that Alan carries the sick gene, this Figure shows that Ann is probably not carrier.

This network has been installed on your computer with the HUGIN software. Open the network in the HUGIN Graphical User Interface (Note: not all browsers can open HUGIN directly). You can find the network in the directory where you installed HUGIN (e.g. C:Program FilesHuginHugin LiteSamples).

You can also find the samples at the Hugin download area.

Comments

A long list of areas have essential characteristics in common with the above example, e.g. medical diagnosis and treatment, credit valuation of costumers, search for minerals, monitoring of biological production plants, image understanding, information retrieval, and fault analysis.

The areas are characterized by a cause-effect structure, where effects are not completely determined. Sometimes an event has one effect and sometimes it has another. This phenomenon is called causal uncertainty. A domain characterized by causal uncertainty can be modeled by a BN.

Another characteristic of the areas is that the number of essential properties can not be observed directly. This is the diagnosis problem: You know only the symptoms, and from them you must conclude the causes. You must so to speak reason in the opposite direction of the links in the network.