Components

The theta package comes with the following modules:

These modules provide all the required components to train Riemann-Theta Boltzmann Machines for probability density estimation, regression and classification.


RTBM

The theta.rtbm module contains the definition of class RTBM. This object provides the simplest interface to the parameters of the RTBM and to the probability and expectation values.

Schematically, the RTBM is given by the network configuration

_images/rtbm.png

where \(T\) is the connection matrix of the visible sector with \(N_v\) visible units, \(Q\) of the hidden sector with \(N_h\) hidden units and \(W\) the inter-connections.

class theta.rtbm.RTBM(visible_units, hidden_units, mode=0, init_max_param_bound=2, random_bound=1, phase=1, diagonal_T=False)

This class implements the Riemann-Theta Boltzmann Machine.

Parameters:
  • visible_units (int) – number of visible units.
  • hidden_units (int) – number of hidden units.
  • mode (theta.rtbm.RTBM.Mode) – set the working mode among: probability mode (Mode.Probability), log of probability (Mode.LogProbability) and expectation (Mode.Expectation), see theta.rtbm.RTBM.Mode.
  • init_max_param_bound (float) – maximum value allowed for all parameters during the CMA-ES minimization.
  • random_bound (float) – selects the maximum random value for the Schur complement initialization.
  • phase (complex) – number which multiplies w and bh phase=1 for Phase I and phase=1j for Phase II.
  • diagonal_T (bool) – force T diagonal, by default T is symmetric.
  • check_positivity (bool) – enable positivity condition check in set_parameters.

Properties (setters and getters):

  • mode (theta.rtbm.RTBM.Mode) - sets and returns the RTBM mode.
  • bv (numpy.array) - sets and returns the Bv.
  • t (numpy.array) - sets and returns the T.
  • bh (numpy.array) - sets and returns the Bh.
  • w (numpy.array) - sets and returns the W.
  • q (numpy.array) - sets and returns the Q.

Example

from theta.rtbm import RTBM
m = RTBM(1, 2)  # allocates a RTBM with Nv=1 and Nh=2
print(m.size()) # returns the total number of parameters
output = m(x)   # evaluate prediction
predict(x)

Performs prediction with the trained model. This method has a shortcut defined by the parenthese operator, i.e. model.predict(x) and model(x) are equivalent.

Parameters:x (numpy.array) – input data, shape (Nv, Ndata)
Returns:evaluates Model predictions.
Return type:numpy.array
random_init(bound)

Random initializer which satisfies the Schur complement positivity condition. If diagonal_T=True the initial Q and T are diagonal and W is set to zero.

Parameters:bound (float) – the maximum value for the random matrix X used by the Schur complement.
mean()

Computes the first moment estimator (mean).

Returns:the mean of the probability distribution.
Return type:float
Raises:theta.rtbm.AssertionError – if mode is not theta.rtbm.RTBM.Mode.Probability.
size()
Returns:the size of the RTBM.
Return type:int
get_parameters()
Returns:flat array with all RTBM parameters.
Return type:numpy.array
get_gradients()
Returns:flat array with calculated gradients [Gbh,Gbv,Gw,Gt,Gq].
Return type:numpy array
set_bounds(param_bound)

Sets the parameter bound for each parameter.

Parameters:param_bound (float) – the maximum absolute value for parameter variation.
get_bounds()
Returns:two arrays with min and max of each parameter for the GA.
Return type:list of numpy.array

Model

The theta package provide a dedicated container for RTBM probability mixtures models and theta neural networks (TNNs) through the class Model stored in theta.model.

This module allows the concatenation of objects for building mixture model based on multiple RTBMs and a final NormAddLayer, such as:

_images/mixture.png

and the possibility to build TNNs, e.g.:

_images/tnn.png

Further information about the implemented layers for the TNNs are listed in the Layers section of this document.

class theta.model.Model

The model class which holds layer for building mixture models and theta neural networks.

Example

from theta.theta import Model
from theta.layers import ThetaUnitLayer, AddNormLayer
m = Model()  # allocates a RTBM with Nv=1 and Nh=2
m.add(ThetaUnitLayer(1,2))
m.add(NormAddLayer(2,1))
predict(x)

Performs prediction with the trained model. This method has a shortcut defined by the parenthese operator, i.e. model.predict(x) and model(x) are equivalent.

Parameters:x (numpy.array) – input data, shape (Nv, Ndata)
Returns:evaluates Model predictions.
Return type:numpy.array
add(layer)

Add layer to the model instance.

Parameters:layer (theta.layers) – any layer implemented in theta.layers (Layers).

Warning

The layer input size must match the output size of previous layer!

size()
Returns:the size of the RTBM.
Return type:int
get_parameters()

Collects all parameters and returns a flat array.

Returns:flat array with current matrices weights.
Return type:numpy.array
get_gradients()

Collects all gradients and returns a flat array.

Returns:flat array with calculated gradients.
Return type:numpy array
get_layer(N)
Parameters:N (int) – the layer number.
Returns:returns the N-th layer stored in the model
Return type:theta.layers
get_bounds()
Returns:two arrays with min and max of each parameter of all layers.
Return type:list of numpy.array
gradient_check(g, x, epsilon)

Performs numerical check of gth gradient.

Parameters:
  • g (int) – id of gradient to check.
  • x (numpy.array) – input data shape (Ninput, Ndata).
  • epsilon (float) – infinitesimal variation of parameter.
Returns:

the numerical and analytical gradients.

Return type:

floats


Layers

The theta package implements the following layers:

  • Theta Probability Unit: provides a layer with multiple RTBMs setup in probability mode. This layer is used to build probability and mixture models.
  • Theta Diagonal Expectation Unit: A layer consisting of a RTBM in expectation mode with diagonal \(Q\). This layer is suitable for regression and classification applications, and can be combined with other layers into a deep model.
  • Normalized Additive: performs a weighted sum of the inputs. This layer guarantees a positive and normalized output and is used to build mixture models.
  • Linear: a standard linear layer for testing and benchmarking purposes.
  • Non-Linear: a non linear layer for testing and benchmarking purposes.

All layers are inherited from the theta.layers.Layer class, so custom layers can be implemented by extending that class.

Theta Probability Unit

class theta.layers.ThetaUnitLayer(Nin, Nout, Nhidden=1, init_max_param_bound=2, random_bound=1, phase=1, diagonal_T=False)

Allocate a Theta Unit Layer working in probability mode

Parameters:
  • Nin (int) – number of input nodes
  • Nout (int) – number of output nodes (i.e. # of RTBMs)
  • Nhidden (int) – number of hidden layers per RTBM
  • init_max_param_bound (float) – maximum bound value for CMA
  • random_bound (float) – the maximum value for the random matrix X used by initialization
  • phase (complex) – number which multiplies w and bh phase=1 for Phase I and phase=1j for PhaseII.
  • diagonal_T (bool) – force T diagonal, by default T is symmetric.
get_unit(N)

Return the singular RTBM unit.

Parameters:N (int) – the Nth RTBM unit.
Returns:the Nth RTBM unit.
Return type:theta.rtbm.RTBM
get_parameters()
Returns:the parameters as a flat array [b,w,q].
Return type:numpy.array
get_bounds()
Returns:two arrays with min and max of each parameter of the layer for the GA.
Return type:list of numpy.array
size()
Returns:total number of parameters.
Return type:int
get_gradients()
Returns:gradients for all RTBM units as a flat array.
Return type:numpy.array

Theta Diagonal Expectation Unit

class theta.layers.DiagExpectationUnitLayer(Nin, Nout, W_init=<theta.initializers.glorot_uniform object>, B_init=<theta.initializers.null object>, Q_init=<theta.initializers.uniform object>, param_bound=16, phase=1)

A layer of log-gradient theta units.

Parameters:
  • Nin (int) – number of inputs.
  • Nout (int) – number of outputs.
  • W_init (theta.initializers) – random initialization for W
  • B_init (theta.initializers) – random initialization for B
  • Q_init (theta.initializers) – random initialization for Q
  • param_bound (float) – maximum value alowed for the optimization via genetic optimizer.
  • phase (complex) – the RTBM phase (default=1)
show_activation(N, bound=2)

Plots the Nth activation function on [-bound,+bound].

Parameters:
  • N (int) – the Nth activation function
  • bound (float) – min/max value for the plot.
get_parameters()
Returns:the parameters as a flat array [bh,w,q]
Return type:numpy.array
get_bounds()
Returns:two arrays with min and max of each parameter of the layer for the GA.
Return type:list of numpy.array
get_gradients()
Returns:B, W and Q gradients as a flat array
Return type:numpy.array
size()
Returns:total number of parameters.
Return type:int

Normalized Additive

class theta.layers.NormAddLayer(Nin, Nout, W_init=<theta.initializers.null object>, param_bound=10)

Linearly combines inputs with outputs normalized by sum of weights. Weights are exponentiated.

\[M(v) = \frac{1}{\sum_{i=1}^N e^{\omega_i}} \sum_{i=1}^{N} e^{\omega_i} P^{(i)}(v)\]
Parameters:
  • Nin (int) – number of input nodes.
  • Nout (int) – number of output nodes.
  • W_init (theta.initializers) – random initialization for weights.
  • param_bound (float) – maximum value alowed for the optimization via genetic optimizer.
get_parameters()
Returns:the parameters as a flat array [w].
Return type:numpy.array
get_gradients()
Returns:W gradients as a flat array
Return type:numpy.array
get_bounds()
Returns:two arrays with min and max of each parameter of the layer for the GA.
Return type:list of numpy.array
size()
Returns:total number of parameters.
Return type:int

Linear

class theta.layers.Linear(Nin, Nout, W_init=<theta.initializers.glorot_uniform object>, B_init=<theta.initializers.null object>, param_bound=10)

Linear layer.

Parameters:
  • Nin (int) – number of inputs.
  • Nout (int) – number of outputs.
  • W_init (theta.initializers) – random initialization for weights.
  • B_init (theta.initializers) – random initialization for biases.
  • param_bound (float) – maximum value alowed for the optimization via genetic optimizer.
get_parameters()
Returns:the parameters as a flat array [b,w].
Return type:numpy.array
get_gradients()
Returns:B and W gradients as a flat array
Return type:numpy.array
get_bounds()
Returns:two arrays with min and max of each parameter of the layer for the GA.
Return type:list of numpy.array
size()
Returns:total number of parameters.
Return type:int

Non-Linear

class theta.layers.NonLinear(Nin, Nout, activation=<class 'theta.activations.tanh'>, W_init=<theta.initializers.glorot_uniform object>, B_init=<theta.initializers.null object>, param_bound=10)

Non-Linear layer.

Parameters:
  • Nin (int) – number of inputs.
  • Nout (int) – number of outputs.
  • activation (theta.activations) – the non-linear activation function.
  • W_init (theta.initializers) – random initialization for weights.
  • B_init (theta.initializers) – random initialization for biases.
  • param_bound (float) – maximum value alowed for the optimization via genetic optimizer.
get_parameters()
Returns:the parameters as a flat array [b,w].
Return type:numpy.array
get_gradients()
Returns:B and W gradients as a flat array
Return type:numpy.array
get_bounds()
Returns:two arrays with min and max of each parameter of the layer for the GA.
Return type:list of numpy.array
size()
Returns:total number of parameters.
Return type:int

Minimizers

The theta package provides two minimizers:

We also provide the BFGS optimizer for testing purposes.

Evolutionary algorithm

class theta.minimizer.CMA(parallel=False, ncores=0)

Implements the GA using CMA-ES library (cma package). This class provides a basic CMA-ES implementation for RTBMs.

Parameters:
  • parallel (bool) – if set to True the algorithm uses multi-processing.
  • ncores (int) – limit the number of cores when parallel=True.
train(cost, model, x_data, y_data=None, tolfun=1e-11, popsize=None, maxiter=None, use_grad=False)

Trains the model using the custom cost function.

Parameters:
  • cost (theta.costfunctions) – the cost function.
  • model (theta.model.Model or theta.rtbm.RTBM) – the model to be trained.
  • x_data (numpy.array) – the support data with shape (Nv, Ndata).
  • y_data (numpy.array) – the target prediction.
  • tolfun (float) – the maximum tolerance of the cost function fluctuation to stop the minimization.
  • popsize (int) – the population size.
  • maxiter (int) – the maximum number of iterations.
  • use_grad (bool) – if True the gradients for the cost and model are used in the minimization.
Returns:

the optimal parameters

Return type:

numpy.array

Note

The parameters of the model are changed by this algorithm.

Gradient descent

class theta.minimizer.SGD

Stochastic gradient descent.

train(cost, model, x_data, y_data=None, validation_split=0, validation_x_data=None, validation_y_data=None, stopping=None, scheme=None, maxiter=100, batch_size=0, shuffle=False, lr=0.001, decay=0, momentum=0, nesterov=False, noise=0, cplot=True)

Trains the given model with stochastic gradient descent methods

Parameters:
  • cost (theta.costfunctions) – the cost fuction class
  • model (theta.rtbm.Model or theta.model.Model) – the model to be trained
  • x_data (numpy.array) – the target data support
  • y_data (numpy.array) – the target data prediction
  • validation_split (float) – fraction of data used for validation only
  • validation_x_data (numpy.array) – external set of validation support
  • validation_y_data (numpy.array) – external set of validation target
  • stopping (theta.stopping) – the stopping class (see theta.stopping)
  • scheme (theta.gradientscheme) – the SGD method (Ada, RMSprop, see Gradient descent schemes)
  • maxiter (int) – maximum number of allowed iterations
  • batch_size (int) – the batch size
  • shuffle (bool) – shuffle the data on each iteration
  • lr (float) – learning rate
  • decay (float) – learning rate decay rate
  • momentum (float) – add momentum
  • nesterov (bool) – add nesterov momentum
  • noise (bool) – add gaussian noise
  • cplot (bool) – if True shows the cost function evolution
Returns:

iterations, cost and validation functions

Return type:

dictionary

Note

The parameters of the model are changed by this algorithm.

class theta.minimizer.BFGS

Implements the BFGS method

train(cost, model, x_data, y_data=None, tolfun=1e-11, maxiter=100)
Parameters:
  • cost (theta.costfunctions) – the cost function.
  • model (theta.model.Model or theta.rtbm.RTBM) – the model to be trained.
  • x_data (numpy.array) – the support data with shape (Nv, Ndata).
  • y_data (numpy.array) – the target prediction.
  • tolfun (float) – the maximum tolerance of the cost function fluctuation to stop the minimization.
  • popsize (int) – the population size.
  • maxiter (int) – the maximum number of iterations.
Returns:

the optimal parameters

Return type:

numpy.array

Note

The parameters of the model are changed by this algorithm.

Gradient descent schemes

class theta.gradientschemes.adagrad(epsilon=1e-05)

The Adagrad scheme.

Parameters:epsilon (float) – smoothing term to avoid division by zero
getupdate(G, lr)

Get updates.

Parameters:
  • G (numpy.array) – gradients
  • lr (float) – learning rate
Returns:

the updated gradient.

Return type:

numpy.array

class theta.gradientschemes.RMSprop(rate=0.9, epsilon=1e-05)

The RMS propagation scheme.

Parameters:
  • rate (float) – weighting of the previous squared gradient expectation value
  • epsilon (float) – smoothing term to avoid division by zero
getupdate(G, lr)

Get updates.

Parameters:
  • G (numpy.array) – gradients
  • lr (float) – learning rate
Returns:

the updated gradient.

Return type:

numpy.array

class theta.gradientschemes.adadelta(rate=0.9, epsilon=1e-05)

The Adadelta scheme.

Parameters:
  • rate (float) – weighting of the previous squared gradient expectation value
  • epsilon (float) – smoothing term to avoid division by zero
getupdate(G, lr)

Get updates.

Parameters:
  • G (numpy.array) – gradients
  • lr (float) – learning rate
Returns:

the updated gradient.

Return type:

numpy.array

class theta.gradientschemes.adam(b1=0.9, b2=0.999, epsilon=1e-08)

The Adam scheme.

Parameters:
  • b1 (float) – weight of the previous first moment of the gradient estimate
  • b2 (float) – weight of the previous second moment of the gradient estimate
  • epsilon (float) – smoothing term to avoid division by zero
getupdate(G, lr)

Get updates.

Parameters:
  • G (numpy.array) – gradients
  • lr (float) – learning rate
Returns:

the updated gradient.

Return type:

numpy.array


Activations

All activations functions are inherited from the theta.activations.actfunc class, so custom activations can be implemented by extending that class.

The current code contains the following activation functions:

Linear

class theta.activations.linear

A linear pass through.

static activation(x)

Evaluates the activation function.

Parameters:x (numpy.array) – the input data.
Returns:the activation function evaluation.
Return type:numpy.array
static gradient(x)

Evaluates the gradient of the activation function.

Parameters:x (numpy.array) – the input data.
Returns:the gradient of the activation function.
Return type:numpy.array

Sigmoid

class theta.activations.sigmoid

The sigmoid activation.

static activation(x)

Evaluates the activation function.

Parameters:x (numpy.array) – the input data.
Returns:the activation function evaluation.
Return type:numpy.array
static gradient(x)

Evaluates the gradient of the activation function.

Parameters:x (numpy.array) – the input data.
Returns:the gradient of the activation function.
Return type:numpy.array

Tanh

class theta.activations.tanh

The tanh activation.

static activation(x)

Evaluates the activation function.

Parameters:x (numpy.array) – the input data.
Returns:the activation function evaluation.
Return type:numpy.array
static gradient(x)

Evaluates the gradient of the activation function.

Parameters:x (numpy.array) – the input data.
Returns:the gradient of the activation function.
Return type:numpy.array

Custom activation functions can be implemented by extending the theta.activations.actfunc class.


Initializers

All initializers are inherited from the theta.initializers.initializer class, so custom initializers can be implemented by extending that class.

The current code contains the following parameter initalizers:

Uniform

class theta.initializers.uniform(bound=1, center=0)

Uniformly distributed initialization.

Parameters:
  • bound (float) – half-width of the distribution [-bound,+bound]
  • center (float) – location of the center

Normal

class theta.initializers.normal(mean=0, sdev=1)

Normal distribution initialization.

Parameters:
  • mean (float) – mean of the normal distribution
  • sdev (float) – standard deviation

Null

class theta.initializers.null

Initialize all parameters to zero.

Glorot normal

class theta.initializers.glorot_normal

Initializes with glorot normal distribution.

Glorot uniform

class theta.initializers.glorot_uniform

Initializes with glorot uniform distribution.

Custom initalization schemes can be easily implemented by extending the theta.initializers.initializer class.


Cost functions

All cost functions are inherited from the theta.costfunctions.costfunction class, so custom costs can be implemented by extending that class.

The current code contains the following cost functions:

MSE

class theta.costfunctions.mse

Mean squared error

Logarithmic

class theta.costfunctions.logarithmic

Logarithmic total cost

Sum

class theta.costfunctions.sum

Sum total cost

RMSE

class theta.costfunctions.rmse

Root mean squared error


Stopping conditions

The stopping condition can be used with the theta.minimizer.SGD minimizer. The validation data is monitored and if a specific condition is achieved the optimization is stopped. Custom stopping conditions can be implemented by extending the theta.minimizer.stopping abstract class.

The current code contains the following stopping algorithms:

Early Stop

class theta.stopping.earlystop(delta=10)

A simple implementation of early stopping. If the validation loss function increases after delta iterations the stop signal is send to the minimizer.

Parameters:delta (int) – the number of iterations to pass until the stopping condition check becomes active.
do_stop(v)

Function which tests if the stop condition is reached.

Parameters:v (numpy.array) – history of the validation loss function.
Returns:True if the validation loss is growing in the delta window, False elsewhere.
Return type:bool