Releases: lsds/KungFu
KungFu 0.2.1 Release
This release contains the following updates:
- We have enabled different logging levels in KungFu. See PR #239 for details.
- We have improved the scalability of KungFu in a Cloud environment where network bandwidth is often limited. This is benefited from building multiple aggregation/broadcast trees to utilise the bandwidth on all possible network paths. See PR #242 for details and the performance comparison with Horovod NCCL.
KungFu 0.2.0 release
Release notes
The KungFu team has been receiving many valuable feedbacks from the SOSP audience and early industry users. We have tried our best to integrate their feedback to improve the usability of KungFu which is the focus of the 0.2.0 release. The following are the main novel features of this release:
New framework support
KungFu supports TensorFlow 1/2, TensorLayer 1/2, and Keras. This covers most of the models trained with TensorFlow. We have released examples that show how to use KungFu within various TensorFlow programs. Check here.
New advanced examples
KungFu provides many advanced examples that show how to enable KungFu within complex AI models including:
- Google BERT
- Generative Adversarial Learning (CycleGAN)
- Reinforcement learning (Alpha Zero)
- ResNet and many useful DNNs for ImageNet
- Pose estimation network (OpenPose)
New distributed optimiser
We release a new distributed optimiser SynchronousAveragingOptimizer
. This optimiser tries to preserve the property of small-batch training when adopting many parallel workers, making it a useful option for AI models that are restricted to train with small batch sizes. Check here for more details.
Better performance
We have greatly improved the performance of asynchronous training.
KungFu 0.1.0 pre-release
This is the first release of KungFu.
This release contains two features:
- SynchronousSGDOptimizer: This optimiser implements the classical Synchronous SGD algorithm for distributed training.
- PairAeveragingOptimizer: This optimiser implements communication-efficient asynchronous training while reaching the same evaluation accuracy as the S-SGD.
We have tested and deployed these optimisers in a cloud testbed and a production cluster. Check out their performance in the Benchmark section in README.