-
Notifications
You must be signed in to change notification settings - Fork 5
Useful Sub Packages
This wiki page will go over some of the useful sub packages that are often used by RL.ts itself internally and are available for you to use as well. These include cluster tools for distributed training, handy functions for dealing with Tensors from tfjs and NdArrays from numjs
Explanations here will be using Typescript as examples. If you are unfamiliar with Typescript, just remember that
import pkg from 'location' // or
import * as pkg from 'location'
is like
const pkg = require('location')
and whenever there is a :
following a variable name, it is just a type definition.
For those fluent in the tools used by OpenAI in their Baselines and Spinning Up libraries, this is most equivalent to the MPI tooling often used to fork a training process into multiple parallel processes for trajectory sampling, gradient computations etc.
You can find the source code for the cluster tools at https://github.com/StoneT2000/rl-ts/tree/main/src/utils/clusterTools
This sub package relies on the Node.js cluster module. Note that this is only tested on Node.js environments and probably does not work on web environments yet (where web workers would probably be used).
To import this tooling, you can either import it directly as so:
import * as ct from 'rl-ts/lib/utils/clusterTools'
or if you imported the RL package as a whole, you can access the cluster tools sub package as so
import * as RL from 'rl-ts'
RL.ct // access the cluster tools library from this entrypoint
The first key function you should call before doing anything else is
await ct.fork(forkCount)
This will automatically create forkCount
worker processes in addition to your primary process (the process that was started when you ran the script) and will synchronize all processes to that point. Note that the await
keyword is important here as we need to synchronize all the workers with the primary process.
Now, you can start performing operations across processes that will perform element-wise operations. For example, suppose we created 2 worker processes, meaning we have a total of 3 processes including the primary one, these are sample calculations
const val = await ct.sumNumber(15);
// val will equal 15 + 15 + 15 = 45
const res = await ct.avg(tf.tensor([[-5, 10]]) ;
// res will equal tf.tensor([[(-5 + -5) / 2, (10 + 10) / 2]]) = tf.tensor([[-5, 10]]
Now this is useless if all processes have the same hardcoded data. Suppose we have a tensor x
which is equal to [0, 1]
in the first process, [1, 2]
in the second process, and [2, 3]
in the third process, then we will see how this really processes data element-wise. Then
const res = await ct.avg(x);
// res will equal tf.tensor([(0 + 1 + 2) / 3, [1 + 2 + 3]/ 3]) = tf.tensor([1, 2])
Some other provided operations include ct.sum, ct.max, ct.min
There is also a special ct.averageGradients
function which is built specifically for TensorFlow based models. It will take a gradient computation produced by e.g. the tf.computeGradients
function and average them across all processes element wise. Example usage of this gradient averaging can be found in the vpg, ppo
implementations.