Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

Commit

Permalink
Merge branch 'tai50' into 'master'
Browse files Browse the repository at this point in the history
Taiyaki 5.0

See merge request algorithm/taiyaki!150
  • Loading branch information
tmassingham-ont committed Oct 1, 2019
2 parents db411ab + 933187c commit 94f3adc
Show file tree
Hide file tree
Showing 12 changed files with 50 additions and 49 deletions.
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@ Version numbers: major.minor.patch
* Minor version bump indicates a change in functionality that may affect users.
* Patch version bump indicates bug-fixes or minor improvements not expected to affect users.

## v5.0.0
* Based on pytorch version 1.2
* Improved training stability: gradient capping and warm-up
* Merged mod-base and canonical entry points
* Custom model definitions should now take an
`alphabet_info` argument rather than `outsize`
* Improved RNA support: tools can reverse references and basecalls
* Basecaller changes:
* chunk size argument now matches guppy
* CPU calling enabled
* lower memory usage
* Multi-GPU training enabled
* Bug fixes

## v4.1.0
* Ab initio ("bootstrap") training of models

Expand All @@ -16,7 +30,6 @@ Version numbers: major.minor.patch
* Training walk-through
* Tweaks to optimisation parameters


## v3.0.2
* Improved training parameters
* Use orthonormal initialisation of starting weights
Expand Down
8 changes: 3 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,17 @@ CUDA ?= $(shell (which nvcc && nvcc --version) | grep -oP "(?<=release )[0-9.]+"


# Determine correct torch package to install
TORCH_CUDA_8.0 = cu80
TORCH_CUDA_9.0 = cu90
TORCH_CUDA_9.2 = cu92
TORCH_CUDA_10.0 = cu100
TORCH_PLATFORM ?= $(if $(TORCH_CUDA_$(CUDA)),$(TORCH_CUDA_$(CUDA)),cpu)
PY3_MINOR = $(shell $(PYTHON) -c "import sys; print(sys.version_info.minor)")
TORCH_Linux = http://download.pytorch.org/whl/${TORCH_PLATFORM}/torch-1.0.0-cp3${PY3_MINOR}-cp3${PY3_MINOR}m-linux_x86_64.whl
TORCH_Linux = http://download.pytorch.org/whl/${TORCH_PLATFORM}/torch-1.2.0-cp3${PY3_MINOR}-cp3${PY3_MINOR}m-manylinux1_x86_64.whl
TORCH_Darwin = torch
TORCH ?= $(TORCH_$(shell uname -s))


# determine correct cupy package to install
CUPY_8.0 = cupy-cuda80
CUPY_9.0 = cupy-cuda90
CUPY_9.2 = cupy-cuda92
CUPY_10.0 = cupy-cuda100
CUPY ?= $(CUPY_$(CUDA))

Expand Down
22 changes: 12 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ expect to get your hands dirty.
# Contents

1. [Installing system prerequisites](#installing-system-prerequisites)
2. [Installing Taiyaki](#installation)
2. [Installing Taiyaki](#installing-taiyaki)
3. [Tests](#tests)
4. [Walk through](#walk-through)
4. [Walk through](#walk-throughs-and-further-documentation)
5. [Workflows](#workflows)<br>
* [Using the workflow Makefile](#using-the-workflow-makefile)<br>
* [Steps from fast5 files to basecalling](#steps-from-fast5-files-to-basecalling)<br>
Expand Down Expand Up @@ -86,7 +86,7 @@ Windows is not supported.
If you intend to use Taiyaki with a GPU, make sure you have installed and set up [CUDA](#cuda) before proceeding.
---

## Install Taiyaki in a new virtual environment
## Install Taiyaki in a new virtual environment (RECOMMENDED)

We recommend installing Taiyaki in a self-contained [virtual environment](https://docs.python.org/3/tutorial/venv.html).

Expand All @@ -99,6 +99,9 @@ You will need to run `source venv/bin/activate` at the start of each session whe

## Install Taiyaki system-wide or into activated Python environment

This is not the recommended installation method: we recommend that you install taiyaki in its
[own virtual environment](#install-taiyaki-in-a-new-virtual-environment) if possible.

Taiyaki can be installed from source using either:

python3 setup.py install
Expand All @@ -111,14 +114,13 @@ Alternatively, you can use pip with either:

# Tests

Tests can be run as follows:

make workflow #runs scripts which carry out the workflow for basecall-network training and for squiggle-predictor training
make acctest #runs acceptance tests
make unittest #runs unit tests
make multiGPU_test #runs multi-GPU test (GPUs 0 and 1 must be available, and CUDA must be installed - see below)
Tests can be run as follows, provided that the recommended `make install` installation method was used:

If Taiyaki has been installed in a virtual environment, it will have to activated before running tests: `source venv/bin/activate`. To deactivate, run `deactivate`.
source venv/bin/activate # activates taiyaki virtual environment (do this first)
make workflow # runs scripts which carry out the workflow for basecall-network training and for squiggle-predictor training
make acctest # runs acceptance tests
make unittest # runs unit tests
make multiGPU_test # runs multi-GPU test (GPUs 0 and 1 must be available, and CUDA must be installed - see below)

# Walk throughs and further documentation
For a walk-through of Taiyaki model training, including how to obtain sample training data, see [docs/walkthrough.rst](docs/walkthrough.rst).
Expand Down
10 changes: 5 additions & 5 deletions bin/basecall.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,21 +30,21 @@

add_common_command_args(parser, 'alphabet device input_folder input_strand_list limit output quiet recursive version'.split())

parser.add_argument("--chunk_size", type=Positive(int),
parser.add_argument("--chunk_size", type=Positive(int), metavar="blocks",
default=basecall_helpers._DEFAULT_CHUNK_SIZE,
help="Size of signal chunks sent to GPU")
help="Size of signal chunks sent to GPU is chunk_size * model stride")
parser.add_argument("--max_concurrent_chunks", type=Positive(int),
default=128, help="Maximum number of chunks to call at "
"once. Lower values will consume less (GPU) RAM.")
parser.add_argument("--modified_base_output", action=FileAbsent, default=None,
parser.add_argument("--modified_base_output", action=FileAbsent, default=None, metavar="mod_basecalls.hdf5",
help="Output filename for modified base output.")
parser.add_argument("--overlap", type=NonNegative(int),
parser.add_argument("--overlap", type=NonNegative(int), metavar="blocks",
default=basecall_helpers._DEFAULT_OVERLAP,
help="Overlap between signal chunks sent to GPU")
parser.add_argument('--reverse', default=False, action=AutoBool,
help='Reverse sequences in output')
parser.add_argument('--scaling', action=FileExists, default=None,
help='Per-read scaling params')
help='Path to TSV containing per-read scaling params')
parser.add_argument("model", action=FileExists,
help="Model checkpoint file to use for basecalling")

Expand Down
4 changes: 3 additions & 1 deletion bin/train_abinitio.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,6 @@ def save_model(network, outdir, index=None):


for i in range(args.niteration):
lr_scheduler.step()

idx = np.random.randint(len(chunks), size=args.batch_size)
indata = chunks[idx].transpose(1, 0)
Expand Down Expand Up @@ -186,4 +185,7 @@ def save_model(network, outdir, index=None):
total_samples = 0
t0 = tn

lr_scheduler.step()


save_model(network, args.outdir)
5 changes: 3 additions & 2 deletions bin/train_flipflop.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,8 +419,6 @@ def main():

for i in range(args.niteration):

lr_scheduler.step()

# Chunk length is chosen randomly in the range given but forced to
# be a multiple of the stride
batch_chunk_len = (np.random.randint(
Expand Down Expand Up @@ -520,6 +518,9 @@ def main():
# log.write("* GPU{} params:".format(args.local_rank))
#log.write("{}...{}\n".format(v,u))

lr_scheduler.step()


if is_lead_process:
helpers.save_model(network, args.outdir,
model_skeleton=network_save_skeleton)
Expand Down
2 changes: 1 addition & 1 deletion bin/train_squiggle.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,6 @@ def main():
total_chunks = 0

for i in range(args.niteration):
lr_scheduler.step()
# If the logging threshold is 0 then we log all chunks, including those rejected, so pass the log
# object into assemble_batch
# chunk_batch is a list of dicts.
Expand Down Expand Up @@ -194,6 +193,7 @@ def main():
log.write(" {:.1%} chunks filtered".format(n_fail / n_tot))
log.write("\n")

lr_scheduler.step()

helpers.save_model(conv_net, args.outdir)

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ ont_fast5_api == 1.2.0
pysam >= 0.15.0
matplotlib >= 2.0.0
scipy >= 1
torch >= 1, < 1.1
torch == 1.2
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"matplotlib >= 2.0.0",
"pysam >= 0.15.0",
"scipy >= 1",
"torch >= 1, < 1.1",
"torch == 1.2"
]


Expand Down
4 changes: 2 additions & 2 deletions taiyaki/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Custard owns my heart!"""
__version_info__ = {
'major': 4,
'minor': 1,
'major': 5,
'minor': 0,
'revision': 0,
}
__version__ = "{major}.{minor}.{revision}".format(**__version_info__)
23 changes: 4 additions & 19 deletions taiyaki/layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
def init_(param, value):
"""Set parameter value (inplace) from tensor, numpy array, list or tuple"""
value_as_tensor = torch.tensor(value, dtype=param.data.dtype)
param.data.detach_().set_(value_as_tensor)
with torch.no_grad():
param.set_(value_as_tensor)


def random_orthonormal(n, m=None):
Expand Down Expand Up @@ -595,24 +596,8 @@ def birnn(forward, backward):


@torch.jit.script
def logaddexp_fwdbwd(x, y):
z = torch.max(x, y) + torch.log1p(torch.exp(-torch.abs(x - y)))
return z, (x-z).exp(), (y-z).exp()


class LogAddExp(torch.autograd.Function):
@staticmethod
def forward(ctx, x, y):
z, xmz, ymz = logaddexp_fwdbwd(x, y)
ctx.save_for_backward(xmz, ymz)
return z

@staticmethod
def backward(ctx, outgrad):
xmz, ymz = ctx.saved_tensors
return outgrad * xmz, outgrad * ymz

logaddexp = LogAddExp.apply
def logaddexp(x, y):
return torch.max(x, y) + torch.log1p(torch.exp(-torch.abs(x - y)))


@torch.jit.script
Expand Down
2 changes: 1 addition & 1 deletion test/unit/test_layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ def test_cupy_and_non_cupy_same(self):
# rtol before softmax = atol after softmax. Therefore I've replaced
# the atol with the default value for rtol.
print((abs(x1.grad - x2.grad)).max())
self.assertTrue(torch.allclose(x1.grad, x2.grad, atol=1e-05))
self.assertTrue(torch.allclose(x1.grad, x2.grad, atol=1e-04))


class UpSampleTest(LayerTest, unittest.TestCase):
Expand Down

0 comments on commit 94f3adc

Please sign in to comment.