Bayesian Learning

Generally speaking, it is a wise idea to relate any proposed machine learning method to a Bayesian method to better understand its assumptions, strengths and weaknesses. If it can be cast, at least approximately, into a form of Bayesian learning, then you can check to see if the prior probabilities are suitable for the problem. If it does not even roughly correspond to any form of Bayesian learning, then there is little guarantee in the validity of its results, and it should only be used if it has other valuable qualities, such as being particularly simple or fast.

The Netica learning algorithm is equivalent to a system of true Bayesian learning, under the assumptions that the conditional probabilities being learned are independent of each other, and the prior distributions are Dirichlet functions (if a node has 2 states, these are “beta functions”). For more information see Spiegelhalter&DLC93, section 4.1 (with the word “precision” equivalent to our “experience”).

Assuming the prior distributions to be Dirichlet generally does not result in a significant loss of accuracy, since precise priors aren’t usually available, and Dirichlet functions can fairly flexibly fit a wide variety of simple functions. Assuming the conditional probabilities to be independent generally results in poor performance when the number of usable cases isn’t large compared to the number of parent configurations of each node to be learned.