-
Notifications
You must be signed in to change notification settings - Fork 1
Cookbook
This page demonstrates some uses of theano.
This section discusses using the symbolic manipulation and automatic differentiation features of Theano.
Theano works well if you can express your result in terms of operations that Theano knows, but the mechanism for teaching Theano how to work with new functions is not particularly transparent. One needs to define a new [Op] The documentation (as for version 0.5rc1) has the following examples:
- Making arithmetic Ops on double: This section discusses some methods of the Op, however, how to properly write the make-node and perform was not completely clear to me at the start (one needs to know how to properly create an Apply node for example) and quite far removed from the core of needing to specify what the new Op does in terms of the math. This snipit might help.
-
Upgrade a Scalar Op: This snip of documentation from the sandbox gives an example of writing a scalar operation using the functionality in
theano.scalar
. Unfortunately, this is undocumented and has a warning not to use the code! I assume this warning is only trying to redirect users to use thetensor
module for daily use rather than suggesting not to use the code since this module seems to be at the core of the define functionality. The various subclasses oftheano.scalar.ScalarOp
allow you to define a new function in a much more intuitive manner, later upgrading it to a tensor Op. This is the approach I have taken below. Somebody please correct me if I should be doing this in another way:-)
All of the examples below assume that this code has been executed:
import theano.compile.mode
import theano.scalar
import theano.gof
# This compile mode will optimize the expression tree but will not compile any of the code.
_FAST_PY = theano.compile.mode.Mode('py', 'fast_run')
def _is_symbolic(v, kw):
r"""Return `True` if any of the arguments are symbolic."""
symbolic = False
v = list(v)
for _container, _iter in [(v, xrange(len(v))), (kw, kw)]:
for _k in _iter:
_v = _container[_k]
if isinstance(_v, theano.gof.Variable):
symbolic = True
return symbolic
Here is a simple example of this approach teaching Theano about the LegendreP function
import scipy.special
class LegendreP(theano.scalar.ScalarOp):
r"""Theano wrapper for :func:`_LegendreP`"""
def __init__(self):
self.name = "P"
self.output_types_preference = theano.scalar.upgrade_to_float
def impl(self, alpha, x):
r"""Just compute the numerical result here."""
return scipy.special.lpmv(0,alpha,x)
def grad(self, (alpha, x), (gz,)):
r"""Return the derivative. Note: I don't the derivative with respect to alpha
so I return `None`. Note also that you must also multiply by `gz` here to
carry through the chain rule as specified in the documentation."""
dP_dx = (alpha + 1.0)*(x*LegendreP()(alpha, x)
- LegendreP()(1.0 + alpha, x))/(1.0 - x*x)
return [None, gz * dP_dx]
class Elemwise(theano.tensor.Elemwise):
r"""Wrapper for :class:`theano.tensor.Elemwise` that overloads
:meth:`__call__` to directly call the implementation for numerical
arguments."""
def __call__(self, *v, **kw):
if _is_symbolic(v, kw):
return theano.tensor.Elemwise.__call__(self, *v, **kw)
else:
return self.scalar_op.impl(*v, **kw)
legendre_P_scalar = LegendreP()
legendre_P = Elemwise(legendre_P_scalar)
The theano.tensor.Elemwise
wrapper converts the scalar Op to a tensor Op that can be applied to arrays for example. (I don't have a deep understanding about what is needed here yet: someone please explain or point my to the best documentation about this. Why can one not just work with the scalar op?) My wrapper makes one modification to the __call__
method: it calls the underlying numerical routine directly if numbers are passed as arguments. I find this useful for interactive work and testing. (Will this muck up any optimizations or other behaviour?)
Here is an example of the usage:
>>> alpha, x = theano.tensor.dscalars(['alpha', 'x'])
>>> _P = legendre_P(alpha,x)
>>> _dP = theano.tensor.grad(_P,x)
>>> _ddP = theano.tensor.grad(_dP,x)
>>> P = theano.function([alpha, x], [_P, _dP, _ddP], mode='FAST_COMPILE')
>>> P(alpha=2.0, x=1.2)
[array(1.660000000000001), array(3.5999999999999925), array(3.000000000000128)]
Here is the result using Maple:
> [LegendreP, D[2](LegendreP), D[2](D[2](LegendreP))](2.0,1.2);
[1.660000000, 3.600000000, 3.00000000]
Here is an example of a class to wrap functions that can compute their own derivatives. We add a parameter d
that represents the derivative and include this in the hash (is this correct? I am not completely clear on the use of the hash function.)
import numpy as np
import inspect
class UnaryScalarOp(theano.scalar.UnaryScalarOp):
r"""Theano wrapper for functions of a single variable that can compute their
own derivative via a flag `d`. To use this, simply subclass and define
:meth:`impl`. Several things happen automatically:
1) If not provided, the name will be set to the lowercase version of the
class name '^{(d)}' used to denote derivatives.
2) The :attr:`output_types_preference` is set to
:attr:`theano.scalar.upgrade_to_float_no_complex` (can override by
defining as a class variable.)
3) :meth:`__call__` is overloaded to call :meth:`impl` if none of the inputs
is a theano variable. This will use variables names as listed in the
arguments of :meth:`impl` as the theano variables, and will either use
the class variables of the same name or will use :attr:`_variable_type`
to create the variables.
4) Provides a convenience method :meth:`get_function` that returns the
compiled function, inspecting :meth:`impl` to determine the arguments and
using :attr:`_variable_type` to define the types.
.. note:: Be sure to wrap the generated Op with
:class:`theano.tensor.Elemwise` if you want to apply this to arrays.
"""
_variable_type = theano.tensor.dscalar
_compile_mode = _FAST_PY
def __hash__(self):
return hash(type(self)) ^ hash(self.d)
def __eq__(self, other):
return hash(self) == hash(other)
def __init__(self, d=0, name=None, _proxy=None):
self.d = d
if name is None:
name = self.__class__.__name__.lower()
self.name = "%s^(%i)" % (name, d)
if not hasattr(self, 'output_types_preference'):
self.output_types_preference = theano.scalar.upgrade_to_float
self._proxy = _proxy # This can provide the implementation.
def impl(self, *v, **kw):
r"""Required. Compute and return the `self.d`th derivative
numerically."""
if self._proxy:
return self._proxy.impl(*v, **kw)
else:
raise NotImplementedError
def grad(self, inputs, output_gradients):
x, = inputs
gx, = output_gradients
return [gx * self.__class__(d=self.d + 1, _proxy=self._proxy)(x)]
def __call__(self, *v, **kw):
if _is_symbolic(v, kw):
return theano.scalar.UnaryScalarOp.__call__(self, *v, **kw)
else:
return self.impl(*v, **kw)
def get_function(self, vectorize=True):
r"""Return the op as a compiled function. Inspects :meth:`impl` to
determine the names of the arguments. This will of course fail if
`_proxy` is used."""
op = self
if vectorize:
op = theano.tensor.Elemwise(op)
names = inspect.getargs(self.impl.im_func.func_code).args[1:]
vars = [getattr(self.__class__, _n, self._variable_type(_n))
for _n in names]
return theano.function(vars, op(*vars), mode=self._compile_mode)
Here is an example of using this code:
>>> def _f(x, d=0):
... r"""Here is our function. Assume this is some numerical computation that
... cannot be expressed in terms of simple operations. Note that all
... derivatives can be computed numerically here."""
... d = d % 2
... if 0 == d:
... return np.sinh(x)
... else:
... return np.cosh(x)
>>> class _F(UnaryScalarOp):
... def impl(self, x):
... return _f(x, d=self.d)
>>> F = _F().get_function()
>>> dF = _F(d=1).get_function()
>>> ddF = _F(d=2).get_function()
>>> F(1.0), dF(1.0), ddF(1.0)
(array(1.1752011936438014),
array(1.5430806348152437),
array(1.1752011936438014))
Now we can wrap a function and its derivatives. Suppose we need to numerically solve for the inverse function. Here is a strategy for wrapping the inverse function based on the previous code:
class InverseUnaryScalarOp(UnaryScalarOp):
r"""Class skeleton to represent the inverse of an operator. You must
provide the implementation (presumably using a root-finder of some sort.
Presently you need to provide an operator (not applied to its arguments) for
both `f` and its derivative `df`. This could be relaxed if we figure out
how to "unapply" an operation.
Notation, `y=f(x)`, this function thus has argument `y`.
"""
f = NotImplemented
df = NotImplemented
def __hash__(self):
return hash(type(self)) ^ hash(self.f) ^ hash(self.df)
def __init__(self, name=None):
if name is None:
name = self.__class__.__name__.lower()
if not hasattr(self, 'output_types_preference'):
self.output_types_preference = \
theano.scalar.upgrade_to_float_no_complex
def impl(self, *v, **kw):
r"""Required. Compute and return the inverse function numerically."""
raise NotImplementedError
def grad(self, (y,), (gz,)):
r"""This is the key: we use `self` to compute `x` then return `1/df`."""
x = self(y)
return [self.df(x).__rdiv__(gz)]
Here is an example of using this. Note: this code uses the definition of _f(x) = sinh(x)
from the previous example and shows how to wrap the inverse.
>>> def _finv(x):
... r"""Here is the inverse to the previous function `_f`. We assume
... we can only compute the value (perhaps using a root-finding algorithm)
... and want to use Theano to compute the derivatives."""
... return np.arcsinh(x)
>>> class _Finv(InverseUnaryScalarOp):
... f = _F(d=0)
... df = _F(d=1)
... def impl(self, y):
... return _finv(y)
>>> x = theano.tensor.dscalar('x')
>>> Finv_x = theano.tensor.Elemwise(_Finv())(x)
>>> dFinv_x = theano.tensor.grad(Finv_x, x)
>>> ddFinv_x = theano.tensor.grad(dFinv_x, x)
>>> Finv = theano.function([x], [Finv_x, dFinv_x, ddFinv_x], mode=_FAST_PY)
>>> Finv(1.0)
[array(0.881373587019543),
array(0.7071067811865475),
array(-0.3535533905932737)]
Here is the result using Maple:
> [arcsinh, D(arcsinh), D(D(arcsinh))](1.0);
[0.8813735870, 0.7071067814, -0.3535533909]
The goal here is to make a polynomial that can accept an arbitrary set of coefficients at runtime. There is some example code showing how to do this with scan
but unfortunately this fails when derivatives are computed (see this thread. Here is my work-around until this is fixed.
class Polynomial(theano.Op):
r"""Theano wrapper for polynomials.
Examples
--------
>>> import theano.tensor as T
>>> x = T.scalar('x')
>>> c = T.vector('c')
>>> p = Polynomial()(c, x)
>>> dp = T.grad(p, x)
>>> ddp = T.grad(dp, x)
>>> f = theano.function([c, x], [p, dp, ddp], mode='FAST_COMPILE')
>>> _c = [1.0,2.0,-3.0,4.0]
>>> f(_c, 2.0)
[25.0, array(38.0), array(42.0)]
"""
def __init__(self, d=0):
self.d = d
def __eq__(self, other):
return type(self) == type(other) and self.d == other.d
def __hash__(self):
return hash(type(self)) + hash(self.d)
def make_node(self, c, x):
x_ = theano.tensor.as_tensor_variable(x)
c_ = theano.tensor.as_tensor_variable(c)
return theano.Apply(self,
inputs=[c_, x_],
outputs=[x_.type()])
def perform(self, node, inputs, output_storage):
c, x = inputs
output_storage[0][0] = np.polynomial.Polynomial(c).deriv(self.d)(x)
def grad(self, inputs, output_gradients):
r"""We do not yet support differentiation wrt `c` though this should be
easy (except for broadcasting issues)."""
c, x = inputs
dC_dp = output_gradients[0]
dp_dx = Polynomial(d=self.d+1)(c, x)
return [None, dC_dp * dp_dx]
Sometimes it is needed to reload certain parameters of a THEANO model. In the case of neural classifiers, f.e., we want to reload our trained weight matrix and corresponding bias values of our model. In order to achieve this we have add (3) important functionalities to our code.
(1) the parameters of your model have to be fetched (get_params())
(2) the parameters have to be pickled to file (save_params())
(3) the parameters have to be loaded again (load_params())
An executable example code with class constructors is shown below. The 'pickle()' function call can be replaced with 'numpy.save()' as well. This might work faster & consumes less memory.
import theano
import numpy
import cPickle
# ------------------------------------------------------------------------------
"""
defines a pseudo model
"""
class Model(object):
def __init__(self, name='Model', n_in=100, n_hidden=100, params=None):
# set up parameters
if(params != None):
W_in = theano.shared(params[str(name) + '_W_in'], name=str(name) + \
'_W_in', borrow=True)
b_in = theano.shared(params[str(name) + '_b_in'], name=str(name) + \
'_b_in', borrow=True)
print '... restoring model weights/bias'
else:
weights_in, bias_in = self.__init_wb((n_in, n_hidden))
# input to hidden layer weights
W_in = theano.shared(weights_in, name=str(name) + '_W_in', borrow=True)
# input bias
b_in = theano.shared(bias_in, name=str(name) + '_b_in', borrow=True)
print '... creating model weights/bias'
# store to params
self.params = []
self.params.extend([W_in, b_in])
# ------------------------------------------------------------------------------
def __init_wb(self, size, mu=0, sigma=0.1):
weights = numpy.random.normal(mu, sigma, (size))
bias = numpy.zeros((size[1],))
return weights, bias
# --------------------------------------------------------------------------
def get_params(self):
params = {}
for param in self.params:
params[param.name] = param.get_value()
return params
# ------------------------------------------------------------------------------
# add additional functions here
# ------------------------------------------------------------------------------
class Engine(object):
# --------------------------------------------------------------------------
def __init__(self):
# path to store parameters
path = './params.dat'
# build your model
model = Model()
# store parameters of model
self.save_params(model.get_params(), path)
# reload parameters
params = self.load_params(path)
# create a new model
reloaded_model = Model(params=params)
# --------------------------------------------------------------------------
def load_params(self, path):
f = file(path, 'r')
obj = cPickle.load(f)
f.close()
return obj
# --------------------------------------------------------------------------
def save_params(self, obj, path):
f = file(path, 'wb')
cPickle.dump(obj, f, protocol=cPickle.HIGHEST_PROTOCOL)
f.close()
# ------------------------------------------------------------------------------
if __name__ == '__main__':
engine = Engine()