Useful Sub Packages

This wiki page will go over some of the useful sub packages that are often used by RL.ts itself internally and are available for you to use as well. These include cluster tools for distributed training, handy functions for dealing with Tensors from tfjs and NdArrays from numjs

Explanations here will be using Typescript as examples. If you are unfamiliar with Typescript, just remember that

import pkg from 'location' // or
import * as pkg from 'location'

is like

const pkg = require('location')

and whenever there is a : following a variable name, it is just a type definition.

Cluster Tools

For those fluent in the tools used by OpenAI in their Baselines and Spinning Up libraries, this is most equivalent to the MPI tooling often used to fork a training process into multiple parallel processes for trajectory sampling, gradient computations etc.

You can find the source code for the cluster tools at https://github.com/StoneT2000/rl-ts/tree/main/src/utils/clusterTools

This sub package relies on the Node.js cluster module. Note that this is only tested on Node.js environments and probably does not work on web environments yet (where web workers would probably be used).

To import this tooling, you can either import it directly as so:

import * as ct from 'rl-ts/lib/utils/clusterTools'

or if you imported the RL package as a whole, you can access the cluster tools sub package as so

import * as RL from 'rl-ts'
RL.ct // access the cluster tools library from this entrypoint

The first key function you should call before doing anything else is

await ct.fork(forkCount)

This will automatically create forkCount worker processes in addition to your primary process (the process that was started when you ran the script) and will synchronize all processes to that point. Note that the await keyword is important here as we need to synchronize all the workers with the primary process.

Now, you can start performing operations across processes that will perform element-wise operations. For example, suppose we created 2 worker processes, meaning we have a total of 3 processes including the primary one, these are sample calculations

const val = await ct.sumNumber(15); 
// val will equal 15 + 15 + 15 = 45

const res = await ct.avg(tf.tensor([[-5, 10]]) ;
// res will equal tf.tensor([[(-5 + -5) / 2, (10 + 10) / 2]]) = tf.tensor([[-5, 10]]

Now this is useless if all processes have the same hardcoded data. Suppose we have a tensor x which is equal to [0, 1] in the first process, [1, 2] in the second process, and [2, 3] in the third process, then we will see how this really processes data element-wise. Then

const res = await ct.avg(x);
// res will equal tf.tensor([(0 + 1 + 2) / 3, [1 + 2 + 3]/ 3]) = tf.tensor([1, 2])

Some other provided operations include ct.sum, ct.max, ct.min

There is also a special ct.averageGradients function which is built specifically for TensorFlow based models. It will take a gradient computation produced by e.g. the tf.computeGradients function and average them across all processes element wise. Example usage of this gradient averaging can be found in the vpg, ppo implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Useful Sub Packages

Cluster Tools

Clone this wiki locally