Skip to content

Commit

Permalink
feat: Implement python packaging (#42)
Browse files Browse the repository at this point in the history
- Fix bug in gentun.algorithms
- Add PR Lint check workflow
- Add code lint check workflow
- Automate CICD workflows for publishing
- Add unit tests and coverage
- Add repo badges
- Add an index to the README
- Create CONTRIBUTE.md
  • Loading branch information
gmontamat authored Sep 3, 2024
1 parent d945f8c commit a5d9719
Show file tree
Hide file tree
Showing 27 changed files with 1,102 additions and 149 deletions.
44 changes: 44 additions & 0 deletions .github/workflows/check-code.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: "Code linter check and unit tests"

on: [push, pull_request]

jobs:
main:
name: Check code formatting and run unit tests
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install flit
run: |
python -m pip install --upgrade pip
python -m pip install 'flit>=3.8.0'
- name: Install project dependencies
run: |
flit install --deps develop --extras tensorflow,xgboost
- name: Run isort
run: |
isort --check src/
isort --check tests/
- name: Run black
run: |
black --check src/
black --check tests/
- name: Run pytest
run: |
pytest
- name: Run pylint
run: |
pylint src/ || true
63 changes: 63 additions & 0 deletions .github/workflows/generate-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: "Generate Tag and Release"

on:
workflow_dispatch:

jobs:
release:
name: Create Tag and Release
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Get version from __init__.py
id: get_version
run: |
VERSION=$(python -c "import re; content=open('src/gentun/__init__.py').read(); version=re.search(r'__version__ = \"(.*?)\"', content).group(1); print(version)")
echo "::set-output name=version::$VERSION"
- name: Create Tag
run: |
git tag v${{ steps.get_version.outputs.version }}
git push origin v${{ steps.get_version.outputs.version }}
- name: Generate Release Notes
id: generate_release_notes
uses: actions/github-script@v6
with:
script: |
const { data: releases } = await github.repos.listReleases({
owner: context.repo.owner,
repo: context.repo.repo
});
let releaseNotes;
if (releases.length === 0) {
releaseNotes = `Release of version ${{ steps.get_version.outputs.version }}\n\n` +
`This is the first release.`;
} else {
const latestRelease = releases[0];
releaseNotes = `Release of version ${{ steps.get_version.outputs.version }}\n\n` +
`Changes since last release:\n` +
`${latestRelease.body}`;
}
return { releaseNotes };
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Create Release
uses: actions/create-release@v1
with:
tag_name: v${{ steps.get_version.outputs.version }}
release_name: Release ${{ steps.get_version.outputs.version }}
body: ${{ steps.generate_release_notes.outputs.releaseNotes }}
draft: true
prerelease: false
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
36 changes: 36 additions & 0 deletions .github/workflows/publish-pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: "Publish to PyPI"

on:
release:
types: [published]

jobs:
publish:
name: Publish to PyPI
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install flit
run: |
python -m pip install --upgrade pip
python -m pip install 'flit>=3.8.0'
- name: Build the package
run: |
flit build
- name: Publish to PyPI
env:
FLIT_INDEX_URL: https://upload.pypi.org/legacy/
FLIT_USERNAME: __token__
FLIT_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: |
flit publish --repository pypi
37 changes: 37 additions & 0 deletions .github/workflows/publish-test-pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: "Publish to Test PyPI"

on:
push:
branches:
- develop

jobs:
publish:
name: Publish to Test PyPI
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install flit
run: |
python -m pip install --upgrade pip
python -m pip install 'flit>=3.8.0'
- name: Build the package
run: |
flit build
- name: Publish to Test PyPI
env:
FLIT_INDEX_URL: https://test.pypi.org/legacy/
FLIT_USERNAME: __token__
FLIT_PASSWORD: ${{ secrets.TEST_PYPI_API_TOKEN }}
run: |
flit publish --repository testpypi
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: "Lint PR"
name: "Semantic PR Check"

on:
pull_request_target:
Expand All @@ -7,6 +7,8 @@ on:
- edited
- synchronize
- reopened
branches:
- develop

permissions:
pull-requests: read
Expand Down
28 changes: 28 additions & 0 deletions CONTRIBUTE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Contributing

When contributing to this repository, please first discuss the change you wish to make via issue,
email, or any other method with the owners of this repository before making a change.

Here are areas that can be improved in this library:

- Add [genes](src/gentun/genes.py#L11-L47) and [models](src/gentun/models/base.py#L9-L25) to
support more paper implementation
- Add a method to share the training data between the controller node and workers
- Add some type of proof-of-work validation for workers

You can also help us speed up hyperparameter search by contributing your spare GPU time.

There was a major refactor of this library in 2024, the old version is still available in
the [`old` branch](https://github.com/gmontamat/gentun/tree/old). Some cool forks added
features to this release.

## Pull Request Process

1. Ensure your branch is linted with `black` and `isort` using
the [pyproject.toml](./pyproject.toml) configurations.
2. Update the README.md with details of changes to the interface, this includes new environment
variables, exposed ports, useful file locations and container parameters.
3. Increase the version numbers in any examples files and the README.md to the new version that this
Pull Request would represent. The versioning scheme we use is [SemVer](http://semver.org/).
4. Use [The Conventional Commits specification](https://www.conventionalcommits.org/en/v1.0.0/) in
your PR title and description.
64 changes: 49 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# gentun: distributed genetic algorithm for hyperparameter tuning

[![PyPI](https://img.shields.io/pypi/v/gentun)](https://pypi.org/project/gentun/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/gentun)](https://pypi.org/project/gentun/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/gentun)](https://pypi.org/project/gentun/)
[![PyPI - License](https://img.shields.io/pypi/l/gentun)](https://pypi.org/project/gentun/)

The goal of this project is to create a simple framework
for [hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) tuning of machine learning models,
like Neural Networks and Gradient Boosting Trees, using a genetic algorithm. Evaluating the fitness of an individual in
Expand All @@ -15,25 +20,31 @@ and mutation.
which inspires us to adopt the genetic algorithm to efficiently traverse this large search space."* ~
[Genetic CNN](https://arxiv.org/abs/1703.01513) paper

## :construction: Supported models

This project supports hyperparameter tuning for the following models:
- [Installation](#installation)
- [Usage](#usage)
- [Single node](#single-node)
- [Pre-defined individuals](#adding-pre-defined-individuals)
- [Grid search](#performing-a-grid-search)
- [Multiple nodes](#multiple-nodes)
- [Redis setup](#redis-setup)
- [Controller](#controller-node)
- [Workers](#worker-nodes)
- [Supported models](#supported-models)
- [Contributing](#contributing)
- [References](#references)

## Installation

- [x] XGBoost regressor and classifier
- [x] Scikit-learn regressor and classifier
- [x] [Genetic CNN](https://arxiv.org/pdf/1703.01513.pdf) with Tensorflow
- [ ] [A Genetic Programming Approach to Designing Convolutional Neural Network Architectures](https://arxiv.org/pdf/1704.00764.pdf)

## :construction: Contributing

Feel free to submit your custom [`gentun.models.Handler`](src/gentun/models/base.py#L9-L25)
and [`gentun.genes.Gene`](src/gentun/genes.py#L12-L44) subclasses to enhance the project. You can also help us speed up
hyperparameter search with your spare GPU time. Check our documentation on [how to contribute](./CONTRIBUTE.md).
```bash
pip install gentun
```

## :construction: Installation
To setup a development environment, run:

```bash
pip install gentun
python -m pip install --upgrade pip
pip install 'flit>=3.8.0'
flit install --deps develop --extras tensorflow,xgboost
```

## Usage
Expand Down Expand Up @@ -203,6 +214,29 @@ worker = RedisWorker("experiment", XGBoostCV, host="localhost", port=6379)
worker.run(x_train, y_train)
```

## Supported models

This project supports hyperparameter tuning for the following models:

- [x] XGBoost regressor and classifier
- [x] Scikit-learn regressor and classifier
- [x] [Genetic CNN](https://arxiv.org/pdf/1703.01513.pdf) with Tensorflow
- [ ] [A Genetic Programming Approach to Designing Convolutional Neural Network Architectures](https://arxiv.org/pdf/1704.00764.pdf)

## Contributing

We welcome contributions to enhance this library. You can submit your custom subclasses for:
- [`gentun.models.Handler`](src/gentun/models/base.py#L9-L25)
- [`gentun.genes.Gene`](src/gentun/genes.py#L11-L47)

Our roadmap includes:
- Training data sharing between the controller and worker nodes
- Proof-of-work validation of what worker nodes submit

You can also help us speed up hyperparameter search by contributing your spare GPU time.

For more details on how to contribute, please check our [contribution guide](./CONTRIBUTE.md).

## References

### Genetic algorithms
Expand Down
18 changes: 17 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@
# gentun examples

This directory contains examples of how to use the `gentun` package to optimize model hyperparameters.
Be sure to download the required datasets using [the script provided](./get_datasets.sh) before testing these scripts.
Be sure to [install the gentun package](../README.md#installation) and download the required datasets
using [the script provided](./get_datasets.sh) before running these examples.

## Sample distributed algorithm

To run this example, [setup your Redis server](../README.md#redis-setup) in addition to the previous
steps. Next, start the controller node:

```python
python sample_controller.py
```

And, for evary worker node, start your worker code:

```python
python sample_worker.py
```
12 changes: 4 additions & 8 deletions examples/geneticcnn_mnist.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@
http://arxiv.org/pdf/1703.01513
"""

import os
import random
import sys
from typing import Tuple

import numpy as np

sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "src")))
from gentun.algorithms import RussianRoulette
from gentun.genes import Binary
from gentun.models.tensorflow import GeneticCNN
from gentun.populations import Population


def load_mnist(file_name: str, sample_size: int = 10000) -> Tuple[np.ndarray, np.ndarray]:
Expand All @@ -37,11 +38,6 @@ def load_mnist(file_name: str, sample_size: int = 10000) -> Tuple[np.ndarray, np


if __name__ == "__main__":
from gentun.algorithms import RussianRoulette
from gentun.genes import Binary
from gentun.models.tensorflow import GeneticCNN
from gentun.populations import Population

# Genetic CNN static parameters
kwargs = {
"nodes": (3, 5),
Expand Down
15 changes: 5 additions & 10 deletions tests/test_controller.py → examples/sample_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,13 @@
which sums hyperparameter values.
"""

import os
import sys

sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "src")))
from gentun.algorithms import RussianRoulette, Tournament
from gentun.genes import RandomChoice
from gentun.models.base import Dummy
from gentun.populations import Population
from gentun.services import RedisController

if __name__ == "__main__":
from gentun.algorithms import RussianRoulette, Tournament
from gentun.genes import RandomChoice
from gentun.models.base import Dummy
from gentun.populations import Population
from gentun.services import RedisController

genes = [RandomChoice(f"hyperparam_{i}", [0, 1, 2]) for i in range(10)]
# This assumes you're running a Redis server on localhost in port 6379
# The simplest way to set it up is via docker:
Expand Down
Loading

0 comments on commit a5d9719

Please sign in to comment.