Skip to content
mzoehr edited this page Jan 30, 2013 · 7 revisions

This page demonstrates some uses of theano.

Symbolic Manipulations

This section discusses using the symbolic manipulation and automatic differentiation features of Theano.

Defining New Functions (Ops)

Theano works well if you can express your result in terms of operations that Theano knows, but the mechanism for teaching Theano how to work with new functions is not particularly transparent. One needs to define a new [Op] The documentation (as for version 0.5rc1) has the following examples:

  • Making arithmetic Ops on double: This section discusses some methods of the Op, however, how to properly write the make-node and perform was not completely clear to me at the start (one needs to know how to properly create an Apply node for example) and quite far removed from the core of needing to specify what the new Op does in terms of the math. This snipit might help.
  • Upgrade a Scalar Op: This snip of documentation from the sandbox gives an example of writing a scalar operation using the functionality in theano.scalar. Unfortunately, this is undocumented and has a warning not to use the code! I assume this warning is only trying to redirect users to use the tensor module for daily use rather than suggesting not to use the code since this module seems to be at the core of the define functionality. The various subclasses of theano.scalar.ScalarOp allow you to define a new function in a much more intuitive manner, later upgrading it to a tensor Op. This is the approach I have taken below. Somebody please correct me if I should be doing this in another way:-)

Boilerplate

All of the examples below assume that this code has been executed:

import theano.compile.mode
import theano.scalar
import theano.gof

# This compile mode will optimize the expression tree but will not compile any of the code.
_FAST_PY = theano.compile.mode.Mode('py', 'fast_run')

def _is_symbolic(v, kw):
    r"""Return `True` if any of the arguments are symbolic."""
    symbolic = False
    v = list(v)
    for _container, _iter in [(v, xrange(len(v))), (kw, kw)]:
        for _k in _iter:
            _v = _container[_k]
            if isinstance(_v, theano.gof.Variable):
                symbolic = True
    return symbolic

Simple Example: LegendreP

Here is a simple example of this approach teaching Theano about the LegendreP function

import scipy.special

class LegendreP(theano.scalar.ScalarOp):
    r"""Theano wrapper for :func:`_LegendreP`"""
    def __init__(self):
        self.name = "P"
        self.output_types_preference = theano.scalar.upgrade_to_float

    def impl(self, alpha, x):
        r"""Just compute the numerical result here."""
        return scipy.special.lpmv(0,alpha,x)

    def grad(self, (alpha, x), (gz,)):
        r"""Return the derivative.  Note: I don't the derivative with respect to alpha
        so I return `None`.  Note also that you must also multiply by `gz` here to
        carry through the chain rule as specified in the documentation."""
        dP_dx = (alpha + 1.0)*(x*LegendreP()(alpha, x)
                                - LegendreP()(1.0 + alpha, x))/(1.0 - x*x)
        return [None, gz * dP_dx]

class Elemwise(theano.tensor.Elemwise):
    r"""Wrapper for :class:`theano.tensor.Elemwise` that overloads
    :meth:`__call__` to directly call the implementation for numerical
    arguments."""
    def __call__(self, *v, **kw):
        if _is_symbolic(v, kw):
            return theano.tensor.Elemwise.__call__(self, *v, **kw)
        else:
            return self.scalar_op.impl(*v, **kw)

legendre_P_scalar = LegendreP()
legendre_P = Elemwise(legendre_P_scalar)

The theano.tensor.Elemwise wrapper converts the scalar Op to a tensor Op that can be applied to arrays for example. (I don't have a deep understanding about what is needed here yet: someone please explain or point my to the best documentation about this. Why can one not just work with the scalar op?) My wrapper makes one modification to the __call__ method: it calls the underlying numerical routine directly if numbers are passed as arguments. I find this useful for interactive work and testing. (Will this muck up any optimizations or other behaviour?)

Here is an example of the usage:

>>> alpha, x = theano.tensor.dscalars(['alpha', 'x'])
>>> _P = legendre_P(alpha,x)
>>> _dP = theano.tensor.grad(_P,x)
>>> _ddP = theano.tensor.grad(_dP,x)
>>> P = theano.function([alpha, x], [_P, _dP, _ddP], mode='FAST_COMPILE')
>>> P(alpha=2.0, x=1.2)
[array(1.660000000000001), array(3.5999999999999925), array(3.000000000000128)]

Here is the result using Maple:

> [LegendreP, D[2](LegendreP), D[2](D[2](LegendreP))](2.0,1.2);
                    [1.660000000, 3.600000000, 3.00000000]

Functions with Derivatives

Here is an example of a class to wrap functions that can compute their own derivatives. We add a parameter d that represents the derivative and include this in the hash (is this correct? I am not completely clear on the use of the hash function.)

import numpy as np
import inspect
class UnaryScalarOp(theano.scalar.UnaryScalarOp):
    r"""Theano wrapper for functions of a single variable that can compute their
    own derivative via a flag `d`.  To use this, simply subclass and define
    :meth:`impl`.  Several things happen automatically:

    1) If not provided, the name will be set to the lowercase version of the
       class name '^{(d)}' used to denote derivatives.
    2) The :attr:`output_types_preference` is set to
       :attr:`theano.scalar.upgrade_to_float_no_complex` (can override by
       defining as a class variable.)
    3) :meth:`__call__` is overloaded to call :meth:`impl` if none of the inputs
       is a theano variable.  This will use variables names as listed in the
       arguments of :meth:`impl` as the theano variables, and will either use
       the class variables of the same name or will use :attr:`_variable_type`
       to create the variables.
    4) Provides a convenience method :meth:`get_function` that returns the
       compiled function, inspecting :meth:`impl` to determine the arguments and
       using :attr:`_variable_type` to define the types.

    .. note:: Be sure to wrap the generated Op with
       :class:`theano.tensor.Elemwise` if you want to apply this to arrays.
    """
    _variable_type = theano.tensor.dscalar
    _compile_mode = _FAST_PY
    def __hash__(self):
        return hash(type(self)) ^ hash(self.d)

    def __eq__(self, other):
        return hash(self) == hash(other)

    def __init__(self, d=0, name=None, _proxy=None):
        self.d = d
        if name is None:
            name = self.__class__.__name__.lower()

        self.name = "%s^(%i)" % (name, d)

        if not hasattr(self, 'output_types_preference'):
            self.output_types_preference = theano.scalar.upgrade_to_float
        self._proxy = _proxy    # This can provide the implementation.
        
    def impl(self, *v, **kw):
        r"""Required.  Compute and return the `self.d`th derivative
        numerically."""
        if self._proxy:
            return self._proxy.impl(*v, **kw)
        else:
            raise NotImplementedError

    def grad(self, inputs, output_gradients):
        x, = inputs
        gx, = output_gradients
        return [gx * self.__class__(d=self.d + 1, _proxy=self._proxy)(x)]

    def __call__(self, *v, **kw):
        if _is_symbolic(v, kw):
            return theano.scalar.UnaryScalarOp.__call__(self, *v, **kw)
        else:
            return self.impl(*v, **kw)

    def get_function(self, vectorize=True):
        r"""Return the op as a compiled function.  Inspects :meth:`impl` to
        determine the names of the arguments.  This will of course fail if
        `_proxy` is used."""
        op = self
        if vectorize:
            op = theano.tensor.Elemwise(op)
        names = inspect.getargs(self.impl.im_func.func_code).args[1:]
        vars = [getattr(self.__class__, _n, self._variable_type(_n))
                for _n in names]
        return theano.function(vars, op(*vars), mode=self._compile_mode)

Here is an example of using this code:

>>> def _f(x, d=0):
...     r"""Here is our function.  Assume this is some numerical computation that
...     cannot be expressed in terms of simple operations.  Note that all
...     derivatives can be computed numerically here."""
...     d = d % 2
...     if 0 == d:
...         return np.sinh(x)
...     else:
...         return np.cosh(x) 
>>> class _F(UnaryScalarOp):
...     def impl(self, x):
...         return _f(x, d=self.d)
>>> F = _F().get_function()
>>> dF = _F(d=1).get_function()
>>> ddF = _F(d=2).get_function()
>>> F(1.0), dF(1.0), ddF(1.0)
(array(1.1752011936438014),
 array(1.5430806348152437),
 array(1.1752011936438014))

Inverse Function

Now we can wrap a function and its derivatives. Suppose we need to numerically solve for the inverse function. Here is a strategy for wrapping the inverse function based on the previous code:

class InverseUnaryScalarOp(UnaryScalarOp):
    r"""Class skeleton to represent the inverse of an operator.  You must
    provide the implementation (presumably using a root-finder of some sort.
    Presently you need to provide an operator (not applied to its arguments) for
    both `f` and its derivative `df`.  This could be relaxed if we figure out
    how to "unapply" an operation.

    Notation, `y=f(x)`, this function thus has argument `y`.
    """
    f = NotImplemented
    df = NotImplemented
    def __hash__(self):
        return hash(type(self)) ^ hash(self.f) ^ hash(self.df)

    def __init__(self, name=None):
        if name is None:
            name = self.__class__.__name__.lower()

        if not hasattr(self, 'output_types_preference'):
            self.output_types_preference = \
                theano.scalar.upgrade_to_float_no_complex
        
    def impl(self, *v, **kw):
        r"""Required.  Compute and return the inverse function numerically."""
        raise NotImplementedError

    def grad(self, (y,), (gz,)):
        r"""This is the key: we use `self` to compute `x` then return `1/df`."""
        x = self(y)
        return [self.df(x).__rdiv__(gz)]

Here is an example of using this. Note: this code uses the definition of _f(x) = sinh(x) from the previous example and shows how to wrap the inverse.

>>> def _finv(x):
...     r"""Here is the inverse to the previous function `_f`.  We assume
...     we can only compute the value (perhaps using a root-finding algorithm)
...     and want to use Theano to compute the derivatives."""
...     return np.arcsinh(x)
>>> class _Finv(InverseUnaryScalarOp):
...     f = _F(d=0)
...     df = _F(d=1)
...     def impl(self, y):
...         return _finv(y)
>>> x = theano.tensor.dscalar('x')
>>> Finv_x = theano.tensor.Elemwise(_Finv())(x)
>>> dFinv_x = theano.tensor.grad(Finv_x, x)
>>> ddFinv_x = theano.tensor.grad(dFinv_x, x)
>>> Finv = theano.function([x], [Finv_x, dFinv_x, ddFinv_x], mode=_FAST_PY)
>>> Finv(1.0)
[array(0.881373587019543),
 array(0.7071067811865475),
 array(-0.3535533905932737)]

Here is the result using Maple:

> [arcsinh, D(arcsinh), D(D(arcsinh))](1.0);                        
                  [0.8813735870, 0.7071067814, -0.3535533909]

Polynomial

The goal here is to make a polynomial that can accept an arbitrary set of coefficients at runtime. There is some example code showing how to do this with scan but unfortunately this fails when derivatives are computed (see this thread. Here is my work-around until this is fixed.

class Polynomial(theano.Op):
    r"""Theano wrapper for polynomials.

    Examples
    --------
    >>> import theano.tensor as T
    >>> x = T.scalar('x')
    >>> c = T.vector('c')
    >>> p = Polynomial()(c, x)
    >>> dp = T.grad(p, x)
    >>> ddp = T.grad(dp, x)
    >>> f = theano.function([c, x], [p, dp, ddp], mode='FAST_COMPILE')
    >>> _c = [1.0,2.0,-3.0,4.0]
    >>> f(_c, 2.0)
    [25.0, array(38.0), array(42.0)]
    """
    def __init__(self, d=0):
        self.d = d
    def __eq__(self, other):
        return type(self) == type(other) and self.d == other.d
    def __hash__(self):
        return hash(type(self)) + hash(self.d)
    def make_node(self, c, x):
        x_ = theano.tensor.as_tensor_variable(x)
        c_ = theano.tensor.as_tensor_variable(c)
        return theano.Apply(self, 
                            inputs=[c_, x_],
                            outputs=[x_.type()])
    def perform(self, node, inputs, output_storage):
        c, x = inputs
        output_storage[0][0] = np.polynomial.Polynomial(c).deriv(self.d)(x)
    def grad(self, inputs, output_gradients):
        r"""We do not yet support differentiation wrt `c` though this should be
        easy (except for broadcasting issues)."""
        c, x = inputs
        dC_dp = output_gradients[0]
        dp_dx = Polynomial(d=self.d+1)(c, x)
        return [None, dC_dp * dp_dx]

Loading/storing parameter values (weight/bias)

Sometimes it is needed to reload certain parameters of a THEANO model. In the case of neural classifiers, f.e., we want to reload our trained weight matrix and corresponding bias values of our model. In order to achieve this we have add (3) important functionalities to our code.

   (1) the parameters of your model have to be fetched (get_params())
   (2) the parameters have to be pickled to file (save_params())
   (3) the parameters have to be loaded again (load_params())

An executable example code with class constructors is shown below. The 'pickle()' function call can be replaced with 'numpy.save()' as well. This might work faster & consumes less memory.

import theano
import numpy
import cPickle

# ------------------------------------------------------------------------------
"""
    defines a pseudo model
"""
class Model(object):

    def __init__(self, name='Model', n_in=100, n_hidden=100, params=None):

        # set up parameters
        if(params != None):
            W_in = theano.shared(params[str(name) + '_W_in'], name=str(name) + \
                      '_W_in', borrow=True)
            b_in = theano.shared(params[str(name) + '_b_in'], name=str(name) + \
                                          '_b_in', borrow=True)
            print '... restoring model weights/bias'
        else:
            weights_in, bias_in = self.__init_wb((n_in, n_hidden))
            # input to hidden layer weights
            W_in = theano.shared(weights_in, name=str(name) + '_W_in', borrow=True)
            # input bias
            b_in = theano.shared(bias_in, name=str(name) + '_b_in', borrow=True)
            print '... creating model weights/bias'

        # store to params
        self.params = []
        self.params.extend([W_in, b_in])

    # ------------------------------------------------------------------------------
    def __init_wb(self, size, mu=0, sigma=0.1):

            weights = numpy.random.normal(mu, sigma, (size))
            bias = numpy.zeros((size[1],))
            return weights, bias

    # --------------------------------------------------------------------------
    def get_params(self):

        params = {}
        for param in self.params:
            params[param.name] = param.get_value()
        return params

    # ------------------------------------------------------------------------------
    # add additional functions here

# ------------------------------------------------------------------------------
class Engine(object):

    # --------------------------------------------------------------------------
    def __init__(self):

        # path to store parameters
        path = './params.dat'
        # build your model
        model = Model()
        # store parameters of model
        self.save_params(model.get_params(), path)
        # reload parameters
        params = self.load_params(path)
        # create a new model
        reloaded_model = Model(params=params)

    # --------------------------------------------------------------------------
    def load_params(self, path):
        f = file(path, 'r')
        obj = cPickle.load(f)
        f.close()
        return obj

    # --------------------------------------------------------------------------
    def save_params(self, obj, path):
        f = file(path, 'wb')
        cPickle.dump(obj, f, protocol=cPickle.HIGHEST_PROTOCOL)
        f.close()

# ------------------------------------------------------------------------------
if __name__ == '__main__':

    engine = Engine()