B. Basic Netica Operation | Copyright © 2024 Norsys Software Corp. |
3. Defining Node Relationships
In the previous tutorial we saw how to build the basic structure of a net, that is, how to define nodes and link them up. Here we will learn how to define the probability relationships between the nodes that have been linked up. Although you might think that links would naturally house these relationships, it turns out that this is not ideal, since it makes it difficult to specify any interdependence between the relationships. It turns out best if the node holds the relationships that it bears with its parents. Therefore you will find conditional probabilities associated with any link by examining the child node of that link. You can find the relationship by either:
3.1 Defining probability tables manually The most basic and straightforward way to define a conditional probability between a node and its parents is to explicitly define what is termed the Conditional Probability Table, or CPT, for short. The CPT is simply a table that has one probability for every possible combination of parent and child states. This is an N+1 dimensional table, where N is the number of parents. However the table can be "flattened" into two dimensions by explicitly specifying all combinations of parent states in one dimension and all child states in the other dimension. This is what Netica does, since multidimensional tables are hard to visualize. Let us look at the CPT for node Dyspnea in net Asia: Dyspnea has two parents, TbOrCa and Bronchitis. Each of those has two states. On the left, presented vertically, are all possible combinations of the parent states. On the top right are all possible states of Dyspnea, "Present and "Absent". The probabilities of each combination of parent states and child state are then given at the bottom right. Rows must sum to 1.0 Note that the probabilities of each row in the table must sum exactly to 1.0. This is because each row is summarizing the probabilities of one possible world, one where the parents are in the given states. And for that possible world, the chances of the child being in any one state must sum to 1.0. You can edit the probabilities in the table by just clicking at the appropriate row-column location and typing in a new value. When done, click on "Okay". If any of the rows do not total 1.0, an error dialog is raised; simply make the necessary corrections and then click "Okay" again. Terminology: Deterministic vs. Probabilistic relations. Sometimes a child node has exactly only one possible value for each possible configuration of parent states. Such a node is said to be deterministic since it's value is determined exactly by its parents, there is no element of chance involved. This means that in the child's CPT, in each row, one column will have a value of 1.0, and all the other columns will have a value of 0.0. Load Button: Clicking the Load button will update all the probabilities in the table to reflect their current state in the network. It is the inverse of the "Apply" button which copies the table into the net. The "Load" feature is not one that is used much by beginner users, but when you are doing complex "what if" analyses, it can be very handy (it is perfectly fine in Netica to have multiple table dialogues open for the same node, each with different probabilities that you are experimenting with). Also, sometimes the underlying net is being updated by a program (say when the net is learning its probabilities from cases that occur on the fly), and you want to keep abreast of the current probabilities, so you can just keep clicking on "Load" to see the latest probabilities. 3.2 Defining probability tables by equation Tables can sometimes be cumbersome to enter by hand, especially if there are many parent states to consider. Netica offers the ability to create a convenient shorthand description of the conditional probability tables using equations. The equation language is complete and powerful, and follows the syntax of the popular programming languages C, C++, and Java. Tables can be used whether the nodes are continuous or discrete, and whether the relation is probabilistic or deterministic. All equations must be converted to tables before compiling a network, doing network transforms or solving decision problems. The tables are then used in the same way as if you had entered them by hand. Because tables assume a discrete set of states on the part of parents and child, any continuous nodes taking part in an equation must first have been discretized. We will learn how to use equations by learning their basic syntactic form, and then by looking at a few examples. You should then try to create a few for yourself. The syntax for equations varies slightly depending on whether the node's value is deterministically determined by its parents (always has a unique value, for each parent state configuration) or is probabilistically determined. Deterministic nodes Syntax: Child(Parent1, Parent2, ... Parent N) = some expression that yields legal state values of Child Examples: /* convert F to Centigrade */ C(F) = 9.0/5 * (F-32) /* total distance traveled, X, is the average velocity * time traveled + initial distance */ X (Vel, dt, X0) = X0 + Vel * dt /* if taste is sour, choose a blue color; else if taste is sweet, choose red; else if taste is salty, choose green; else choose gray; */ Color (Taste) = Taste==sour? blue: Taste==sweet? red: Taste==salty? green: gray Probabilistic nodes Syntax: p(Child|Parent1, Parent2, ... Parent N) = some expression that yields probabilities, that is, numbers in the range 0.0 to 1.0. Examples: /* the total distance traveled, X, follows a normal distribution with a mean of Vel*dt+X0, and a standard deviation of 'spread' */ p (X | Vel, dt, X0, spread) = NormalDist (X, Vel*dt+X0, spread) /* the chemical's color is a probabilistic function of the temperature: if the temperature is high, the color is always yellow; if the temperature is medium, the color is always orange; but if the temperature is low, the color can be orange 20% of the time and red 80% of the time. */ p (Color | Temp) = Temp == high ? (Color==yellow ? 1.0 : 0.0) : Temp == med ? (Color==orange ? 1.0 : 0.0) : Temp == low ? (Color==orange ? 0.2 : Color==red ? 0.8 : 0.0) : 0 Note that spaces and carriage returns have no bearing whatsoever in the equation syntax; you can use as many or as few as you like, to suit your taste. You can also add C-style comments (/* ... */) anywhere you like. There are some rules that must be followed for an equation to make sense to Netica:
Netica's on-screen help contains a complete listing of all the available equation functions, including a detailed reference manual describing their parameters and exact function. There are over 50 of them, from simple mathematical ones to complex statistical ones. Here is a listing of them by name, just so you have an idea of what is available.
Tips
3.3 Learning probability tables In the two preceding sections we learned how to define the probabilistic relation between a node and its parents by either manually editing a table of probabilities or by writing an equation that is a short-hand expression for such a table. In this section we discover a third way that Netica allows these conditional probabilities to be defined. This is by learning them from a collection of cases. If the collection of cases is a sample from the population we are modeling, then we can use the frequency information implicit in that data as approximations of the desired probabilities. This is a very powerful and easy-to-use feature of Netica. Here is how to use it:
Here is a sample case file for the net Asia. Notice that we have added a special column called 'IDnum'. It is not required, but is a good idea for data handling purposes. // ~->[CASE-1]->~ IDnum VisitAsia Tuberculosis Smoking Cancer TbOrCa XRay Bronchitis Dyspnea 1 No_Visit Present Smoker Absent True Abnormal Absent Present 2 No_Visit Absent Smoker Absent False Normal Present Present 3 No_Visit * Smoker Present True Abnormal * Present 4 No_Visit Absent NonSmoker Absent False Normal Absent Absent 5 No_Visit Absent Smoker Present True Abnormal Present Present 6 No_Visit Absent Smoker Absent False Abnormal Present Present ... The exact CASE file format is given here. It describes some other nice features about Case files, including how to comment them, add a multiplicity factor, and so forth. As a teaching exercise, let us create a case file for both defining our nodes (as we saw how to do earlier in tutorial section B.2.1.6) and for learning the probabilities. Perform the following:
More on Learning
|
Return to Tutorial Home |