Generally speaking, it is a wise idea to relate any proposed machine learning method to a Bayesian method to better understand its assumptions, strengths and weaknesses. If it can be cast, at least approximately, into a form of Bayesian learning, then you can check to see if the prior probabilities are suitable for the problem. If it does not even roughly correspond to any form of Bayesian learning, then there is little guarantee in the validity of its results, and it should only be used if it has other valuable qualities, such as being particularly simple or fast.
The Netica learning algorithm is equivalent to a system of true Bayesian learning, under the assumptions that the conditional probabilities being learned are independent of each other, and the prior distributions are Dirichlet functions (if a node has 2 states, these are “beta functions”). For more information see Spiegelhalter&DLC93, section 4.1 (with the word “precision” equivalent to our “experience”).
Assuming the prior distributions to be Dirichlet generally
does not result in a significant loss of accuracy, since precise priors
aren’t usually available, and Dirichlet functions can fairly flexibly
fit a wide variety of simple functions. Assuming the conditional
probabilities to be independent generally results in poor performance
when the number of usable cases isn’t large compared to the number of
parent
configurations of each node to be learned.
Home > Learning From Cases > Bayesian Learning