norsys.netica
Class Learner

java.lang.Object
  |
  +--norsys.netica.Learner

public class Learner
extends java.lang.Object

An object for managing batch-mode learning, such as EM or Gradient Descent learning, of CPTs from case data.

Currently only batch-mode learning is supported, but it is intended that in future, all modes of learning will be managed by this class.

Since:
2.27
Version:
5.04 - January 21, 2012

Field Summary
static int COUNTING_LEARNING

Indicates the case counting learning algorithm.

static int EM_LEARNING

Indicates the EM (Expectation Maximization) learning algorithm.

static int GRADIENT_DESCENT_LEARNING

Indicates the Gradient Descent learning algorithm.

 
Constructor Summary
Learner(int method)

Creates and returns a new Learner object for use in learning of CPTs from case data, and associates it with the default Netica environment.

Learner(int method, java.lang.String info, Environ env)

Creates and returns a new Learner object for use in learning of CPTs from case data, and associates it with a given Netica environment.

 
Method Summary
 void finalize()

Removes the Learner object and frees all its resources (e.g., memory).

 Environ getEnviron()

Returns the Environ that this object belongs to.

 int getMaxIterations()

Returns the maximum number of learning-step iterations for learnCPTs.

 double getMaxTolerance()

Returns the tolerance for the minimum change in data log likelihood between consecutive passes through the data, as a termination condition for any learning to be done by learner.

 int getMethod()

Returns the algorithmic method used by this learner, one of COUNTING_LEARNING, EM_LEARNING, or GRADIENT_DESCENT_LEARNING.

 void learnCPTs(NodeList nodeList, Caseset caseset, double degree)

Performs learning of CPT tables from data.

 void setMaxIterations(int maxIterations)

Sets the maximum number of learning-step iterations (i.e., complete passes through the data) which will be done when learner is used, after which learning will be automatically terminated.

 void setMaxTolerance(double logLikelihoodTolerance)

Sets the tolerance for the minimum change in data log likelihood between consecutive passes through the data, as a termination condition for any learning to be done by learner.

 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

public static int COUNTING_LEARNING 
Indicates the case counting learning algorithm. Pass this to a Learner constructor.


public static int EM_LEARNING 
Indicates the EM (Expectation Maximization) learning algorithm. Pass this to a Learner constructor.


public static int GRADIENT_DESCENT_LEARNING 
Indicates the Gradient Descent learning algorithm. Pass this to a Learner constructor.

Constructor Detail
public Learner (
 int  method 
) throws NeticaException
Creates and returns a new Learner object for use in learning of CPTs from case data, and associates it with the default Netica environment.

After creating this object, you use it to set the learning parameters you want, and then you pass it to a learning method, such as learnCPTs, to actually perform the learning on some net using some data file.

method should be one of COUNTING_LEARNING, EM_LEARNING, or GRADIENT_DESCENT_LEARNING. See learnCPTs for a description of how each learning algorithm operates.

This method is identical to calling   new Learner(method,null,Environ.getDefaultEnviron()).

Parameters:
int    method    The type of learning algorithm that this Learner should use: one of COUNTING_LEARNING, EM_LEARNING, or GRADIENT_DESCENT_LEARNING.

Version:

Versions 2.26 and later have this method.
See Also:
Learner(int,Environ)    Same, but for any environment
setMaxIterations    Set the maximum number of iterations (if applicable) it will do when learning
setMaxTolerance    Set the maximum tolerance (if applicable) it will allow before termination
learnCPTs    Performs the learning
finalize    Discard the Learner


public Learner (
 int  method
 String  info
 Environ  env 
) throws NeticaException
Creates and returns a new Learner object for use in learning of CPTs from case data, and associates it with a given Netica environment.

After creating this object, you use it to set the learning parameters you want, and then you pass it to a learning method, such as learnCPTs, to actually perform the learning on some net using some data file. When done, you discard the Learner with finalize.

Pass null for options; it is only for future expansion.

method must be one of COUNTING_LEARNING, EM_LEARNING, or GRADIENT_DESCENT_LEARNING. See learnCPTs for a description of how each learning algorithm operates.

Parameters:
int    method    The type of learning method that this Learner should use: one of COUNTING_LEARNING, EM_LEARNING, or GRADIENT_DESCENT_LEARNING.
String    options    For future expandability. Pass null for now.
Environ    env    The Environ in which this new Learner will be placed.

Version:

Versions 2.26 and later have this method.
In the C Version of the API, this function is named NewLearner_bn.
See Also:
setMaxIterations    Set the maximum number of iterations (if applicable) it will do when learning
setMaxTolerance    Set the maximum tolerance (if applicable) it will allow before termination
learnCPTs    Performs the learning
finalize    Discard the Learner
RandomGenerator    May also want this to control randomization

Example:
Method Detail
public void finalize ( ) throws NeticaException
Removes the Learner object and frees all its resources (e.g., memory).

Version:
Versions 2.26 and later have this method.
In the C Version of the API, this function is named DeleteLearner_bn.
See Also:
Learner    Create a new Learner

Overrides:
finalize in class java.lang.Object

public Environ getEnviron ( )
Returns the Environ that this object belongs to.

Version:
Versions 2.26 and later have this method.

public int getMaxIterations ( ) throws NeticaException
Returns the maximum number of learning-step iterations for learnCPTs.

See setMaxIterations for additional documentation.

Version:

Versions 2.26 and later have this method.
In the C Version of the API, this function is named GetLearnerMaxIters_bn.
See Also:
Learner    Create a new Learner
setMaxIterations    Sets it
getMaxTolerance    Retrieves another termination parameter
learnCPTs    Performs the learning iterations


public double getMaxTolerance ( ) throws NeticaException
Returns the tolerance for the minimum change in data log likelihood between consecutive passes through the data, as a termination condition for any learning to be done by learner. This applies to EM_LEARNING and GRADIENT_DESCENT_LEARNING only, since they are iterative by nature.

See setMaxTolerance for additional documentation.

Version:

Versions 2.26 and later have this method.
In the C Version of the API, this function is named GetLearnerMaxTol_bn.
See Also:
setMaxTolerance    Sets it
getMaxIterations    Retrieves another termination parameter
learnCPTs    Performs the learning


public int getMethod ( )
Returns the algorithmic method used by this learner, one of COUNTING_LEARNING, EM_LEARNING, or GRADIENT_DESCENT_LEARNING. This method is originally set in the Learner's constructor (see Learner).

Version:
Versions 2.26 and later have this method.

public void learnCPTs (
 NodeList  nodeList
 Caseset  caseset
 double  degree 
) throws NeticaException
Performs learning of CPT tables from data. For EM or gradient descent algorithms this is done until a termination condition is met.

nodeList is the list of nodes whose experience and conditional probability tables are to be updated by learning. They must all be from the same net. Other nodes in that net will not be modified.

cases is the set of cases to be used for learning.

degree is the frequency factor to apply to each case in the case set. It must be greater than zero. It gets multiplied by the "NumCases" (multiplicity number) which appears for each case in the file (if the number doesn't appear in the file, it is taken as 1).

When you create the Learner (see Learner), you choose the algorithm you wish, which may be one of:

1. Counting Learning This is traditional one-pass learning (see Net.reviseCPTsByFindings) ... . It is the preferred learning method to use, if there are no hidden (also known as 'latent') nodes in the net and no missing values in the case data. If there are hidden variables, that is, variables for which you have no observations, but you suspect exist and can be useful for modeling your world, or if there are a substantial number of missing values in the case data, then the iterative learning algorithms may yield better results.
Because this learning method is not iterative, setMaxIterations and setMaxTolerance have no affect on it.

2. EM Learning EM learning optimizes the net's CPTs using the well known expectation maximization algorithm, in an attempt to maximize the probability of the data set given the net (i.e., minimize negative log likelihood of the data). If the nodes have CPT and experience tables before the learning starts, they will be considered as part of the data (properly weighted using the experience table), so the knowledge from the data set is combined with the knowledge already in the net. If you do not want this effect, be sure to delete the tables first (see deleteTables). During EM learning, for each case in the case file, only the CPTs of nodes with findings and their ancestor nodes become modified, so only those nodes will have their experience tables incremented.

3. Gradient Descent Learning Gradient descent learning works similar to EM learning, but it uses a very different algorithm internally. It uses a conjugate gradient descent to maximize the probability of the data, given the net, by adjusting the CPT table entries. Generally speaking, this algorithm converges faster than EM learning, but may be more susceptible to local maxima. It has similarities to the neural net back propagation algorithm.

After the Learner is created, you can set the termination conditions for it. For both EM learning and gradient descent learning, the two possible termination conditions are the maximum number of iterations of the whole batch of cases (see setMaxIterations), and the minimum change in log likelihood from one pass through the batch to the next (see setMaxTolerance). Termination will occur when either of the two conditions are met. For Counting learning, there currently are no termination conditions to set.

Parameters:
NodeList    nodeList    The list of nodes from the net whose case data will be used for learning the remainder of the net.
Caseset    caseset    The case set whose cases will be used for learning.
double    degree    The frequency factor to apply to each case in the case set.

Version:

Versions 2.26 and later have this method.
In the C Version of the API, this function is named LearnCPTs_bn.
See Also:
Learner    Creates the learner
setMaxIterations    Sets a learning termination parameter: the maximum number of batch iterations
setMaxTolerance    Sets a learning termination parameter: the minimum log likelihood increase
Caseset    Creates the Caseset
reviseCPTsByCaseFile    Uses a different learning algorithm (better suited if there is little missing data)
deleteTables    May want to do this before learning

Example:

public void setMaxIterations (
 int  maxIterations 
) throws NeticaException
Sets the maximum number of learning-step iterations (i.e., complete passes through the data) which will be done when learner is used, after which learning will be automatically terminated. This applies to EM_LEARNING and GRADIENT_DESCENT_LEARNING only, since they are iterative by nature. Learning by the COUNTING_LEARNING method is not affected by this method.

Learning may be terminated earlier, if it first reaches another limit, such as learner's maximum tolerance limit (see setMaxTolerance).

maxIterations must be greater than 0. The default is 1000.

Parameters:
int    maxIterations    The maximum number of learning-step iterations that learnCPTs is allowed to perform.

Version:

Versions 2.26 and later have this method.
In the C Version of the API, this function is named SetLearnerMaxIters_bn.
See Also:
Learner    Creates the Learner
setMaxTolerance    Sets another termination parameter
getMaxIterations    Retrieves value
learnCPTs    Performs the learning using this parameter


public void setMaxTolerance (
 double  logLikelihoodTolerance 
) throws NeticaException
Sets the tolerance for the minimum change in data log likelihood between consecutive passes through the data, as a termination condition for any learning to be done by learner. This applies to EM_LEARNING and GRADIENT_DESCENT_LEARNING only, since they are iterative by nature. Learning by the COUNTING_LEARNING method is not affected by this method.

When learning is performed, with each iteration (i.e., pass through the complete data set), the "log likelihood" of the data given the net is computed. The log likelihood is the per-case average of the negative of the logarithm of the probability of the case given the current Bayes net (structure + CPTs). When the difference between the computed log-likelihoods for two consecutive passes falls below this tolerance, the algorithm is halted. So, the closer this tolerance is to zero, the longer the algorithm may take.

The algorithm may terminate earlier if another termination condition is met, such as the maximum number of iterations (see setMaxIterations).

logLikelihoodTolerance must be greater than 0.0. The default is 1.0e-5.

Parameters:
double    logLikelihoodTolerance    The value to make this tolerance.

Version:

Versions 2.26 and later have this method.
In the C Version of the API, this function is named SetLearnerMaxTol_bn.
See Also:
setMaxIterations    Sets another termination parameter
getMaxTolerance    Retrieves value
learnCPTs    Performs the learning using this parameter