11 Jan 21:31

luomai

ef46834

KungFu 0.2.1 Release Latest

Latest

This release contains the following updates:

We have enabled different logging levels in KungFu. See PR #239 for details.
We have improved the scalability of KungFu in a Cloud environment where network bandwidth is often limited. This is benefited from building multiple aggregation/broadcast trees to utilise the bandwidth on all possible network paths. See PR #242 for details and the performance comparison with Horovod NCCL.

Assets 2

10 Nov 04:44

luomai

v0.2.0

cbe2770

KungFu 0.2.0 release

Release notes

The KungFu team has been receiving many valuable feedbacks from the SOSP audience and early industry users. We have tried our best to integrate their feedback to improve the usability of KungFu which is the focus of the 0.2.0 release. The following are the main novel features of this release:

New framework support

KungFu supports TensorFlow 1/2, TensorLayer 1/2, and Keras. This covers most of the models trained with TensorFlow. We have released examples that show how to use KungFu within various TensorFlow programs. Check here.

New advanced examples

KungFu provides many advanced examples that show how to enable KungFu within complex AI models including:

Google BERT
Generative Adversarial Learning (CycleGAN)
Reinforcement learning (Alpha Zero)
ResNet and many useful DNNs for ImageNet
Pose estimation network (OpenPose)

New distributed optimiser

We release a new distributed optimiser SynchronousAveragingOptimizer. This optimiser tries to preserve the property of small-batch training when adopting many parallel workers, making it a useful option for AI models that are restricted to train with small batch sizes. Check here for more details.

Better performance

We have greatly improved the performance of asynchronous training.

Assets 2

26 Oct 17:02

luomai

v0.1.0

09f6d7e

KungFu 0.1.0 pre-release Pre-release

Pre-release

This is the first release of KungFu.

This release contains two features:

SynchronousSGDOptimizer: This optimiser implements the classical Synchronous SGD algorithm for distributed training.
PairAeveragingOptimizer: This optimiser implements communication-efficient asynchronous training while reaching the same evaluation accuracy as the S-SGD.

We have tested and deployed these optimisers in a cloud testbed and a production cluster. Check out their performance in the Benchmark section in README.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release notes

New framework support

New advanced examples

New distributed optimiser

Better performance

Releases: lsds/KungFu

KungFu 0.2.1 Release

KungFu 0.2.0 release

Release notes

New framework support

New advanced examples

New distributed optimiser

Better performance

KungFu 0.1.0 pre-release