# Hierarchical Naive Bayes¶

Naive Bayes models are a popular type of classification model. The
*target* (or *class*) variable has a state for each possible class. The
attribute nodes are the observations used to decide the class. In a
Naive Bayes model, the attribute nodes are children of the target
node. When this structure hs been created and case data has been
entered, it is simply a matter of running the *EM algorithm* in order to
estimate the parameters of the model.

One of the problems with the Navie Bayes model is that some of the
attribute variables can be more or less correlated. If two attribute
variables are strongly correlated, then their combined effect on the
target variable can be overestimated. We can compensate for this
overestimation by introducing an extra (latent)variable, this
variable will be the (only) parent of the two correlated attribute
variables, and the latent variable will be a child of the target
variable (or some other latent variable). This is called a
*Hierarchical Naive Bayes* model by Langseth and Nielsen (2006).

HUGIN provides a simplified version of the algorithm proposed by Langseth and Nielsen (2006). The algorithm successively identifies pairs of variables to be combined via a latent variable.

One iteration of the algorithm follows this scheme:

- Identify a pair of variables (the list of eligible variables
initially contains all non-

*target*variables) that are strongly correlated given the*target*variable. This is done using the same test that is used to determine [in]dependence in the PC algorithm. The pair with the highest score (correlation) is selected.

The algorithm stops when no pair of correlated variables can be found, that is, the test of the PC algorithm indicates

*independence*for all remaining pairs of variables. The threshold value for this decision is controlled by*significance level*.Let

*X*and*Y*be the selected pair of varaibles. Create a latent variable*L*with*X*and*Y*as children. The states of*L*are defined as one state for each combination of states of*X*and*Y*, and the conditional probability tables for*X*and*Y*are defined (according to the meaning of the states of*L*to be deterministic functions of*L*. Then it is tested (see Langseth and Nielsen (2006)) for details) whether some of the states of*L*contribute the*same information*with respect to the classification task. (if*X*and*Y*are irrelevant to the classification task, then all states of*L*will be collapsed to a single state) These states are then*collapsed*to a single state. An additional latent node*L’*is created with*L*as the only child to represent the collapsed state space, and a conditional probability table for*L*is defined in the natural way. (if the collapsed state space is identical to the original state space,*L’*will not be created) Since the state spaces of*L*and*L’*are deterministically determined by*X*and*Y*, we can create case data for*L*and*L’*to be used in the following iterations of the algorithm.Finally,

*X*and*Y*are removed from the list of eligible variables from which the next pair of variables will be chosen, and*L’*(If*L’*has not been created, then*L*is used instead.) is added to that list.

When the algorithm above stops, the nodes (except the *target* node)
without parents will be given *target* as parent. These nodes also
have no conditional probability table. In order to finalize the
model, these nodes can be equipped with experience tables before
estimating the CPTs using the EM algorithm. The last part must be done
explicitly (i.e., parameter estimation is not performed as part of the
algorithm).

**Authors**

Langseth, T. D. Nielsen (2006).

*Classification using Hierarchical Naive Bayes models*.