You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training ConvTasNet on Librimix train-100 dataset. It works fine when I train it using sep_noisy mode, while it prompts such an error when I train it using enh_single mode:
Results from the following experiment will be stored in exp/train_convtasnet_3rd_causal
Stage 2: Training
/O/asteroid/asteroid/models/conv_tasnet.py:89: UserWarning: In causal configuration cumulative layer normalization (cgLN)or channel-wise layer normalization (chanLN) must be used. Changing cLN to cLN
warnings.warn(
/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/lightning_fabric/plugins/environments/slurm.py:204: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python train.py --exp_dir exp/train_convtasnet_3rd_causal - ...
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W CUDAAllocatorConfig.h:30] Warning: expandable_segments not supported on this platform (function operator())
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------------------
0 | model | ConvTasNet | 5.1 M
1 | loss_func | PITLossWrapper | 0
---------------------------------------------
5.1 M Trainable params
0 Non-trainable params
5.1 M Total params
20.202 Total estimated model params size (MB)
{'data': {'n_src': 2,
'sample_rate': 8000,
'segment': 3,
'task': 'enh_single',
'train_dir': 'data/wav8k/min/train-100',
'valid_dir': 'data/wav8k/min/dev'},
'filterbank': {'kernel_size': 16, 'n_filters': 512, 'stride': 8},
'main_args': {'exp_dir': 'exp/train_convtasnet_3rd_causal', 'help': None},
'masknet': {'bn_chan': 128,
'hid_chan': 512,
'mask_act': 'relu',
'n_blocks': 8,
'n_repeats': 3,
'skip_chan': 128},
'optim': {'lr': 0.001, 'optimizer': 'adam', 'weight_decay': 0.0},
'positional arguments': {},
'training': {'batch_size': 14,
'early_stop': True,
'epochs': 200,
'half_lr': True,
'num_workers': 4}}
Drop 0 utterances from 13900 (shorter than 3 seconds)
Drop 0 utterances from 13900 (shorter than 3 seconds)
Sanity Checking: | | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
File "O/asteroid/egs/librimix/ConvTasNet/train.py", line 146, in <module>
main(arg_dic)
File "O/asteroid/egs/librimix/ConvTasNet/train.py", line 112, in main
trainer.fit(system)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
results = self._run_stage()
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1031, in _run_stage
self._run_sanity_check()
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1060, in _run_sanity_check
val_loop.run()
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
return loop_run(self, *args, **kwargs)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 128, in run
batch, batch_idx, dataloader_idx = next(data_fetcher)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 133, in __next__
batch = super().__next__()
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 60, in __next__
batch = next(self.iterator)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 341, in __next__
out = next(self._iterator)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 142, in __next__
out = next(self.iterators[0])
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
data = self._next_data()
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'source_2_path'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "O/asteroid/asteroid/data/librimix_dataset.py", line 106, in __getitem__
source_path = row[f"source_{i + 1}_path"]
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/series.py", line 1112, in __getitem__
return self._get_value(key)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/series.py", line 1228, in _get_value
loc = self.index.get_loc(label)
File "/home/ionotronics/.pyenv/versions/audio_tf/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: 'source_2_path'
And here is my run.sh file:
#!/bin/bash
# Exit on error
set -e
set -o pipefail
# If you haven't generated LibriMix start from stage 0
# Main storage directory. You'll need disk space to store LibriSpeech, WHAM noises
# and LibriMix. This is about 500 Gb
storage_dir=O/asteroid/datasets
# After running the recipe a first time, you can run it from stage 3 directly to train new models.
# Path to the python you'll use for the experiment. Defaults to the current python
# You can run ./utils/prepare_python_env.sh to create a suitable python environment, paste the output here.
python_path=python
# Example usage
# ./run.sh --stage 3 --tag my_tag --task sep_noisy --id 0,1
# General
stage=0 # Controls from which stage to start
tag="" # Controls the directory name associated to the experiment
# You can ask for several GPUs using id (passed to CUDA_VISIBLE_DEVICES)
id=$CUDA_VISIBLE_DEVICES
out_dir=librimix # Controls the directory name associated to the evaluation results inside the experiment directory
# Network config
n_blocks=8 # Number of conv blocks in each repeat
n_repeats=3 # Number of repeats in the Conv-TasNet
mask_act=relu
# Training config
epochs=200
batch_size=14
num_workers=4
half_lr=yes
early_stop=yes
# Optim config
optimizer=adam
lr=0.001
weight_decay=0.
# Data config
sample_rate=8000
mode=min # max for val_acc, min for val_loss
n_src=2 # Number of voice sources in the speech
segment=3
task=enh_single # one of 'enh_single', 'enh_both', 'sep_clean', 'sep_noisy'
eval_use_gpu=1
# Need to --compute_wer 1 --eval_mode max to be sure the user knows all the metrics
# are for the all mode.
compute_wer=0
eval_mode=
. utils/parse_options.sh
sr_string=$(($sample_rate/1000))
suffix=wav${sr_string}k/$mode
if [ -z "$eval_mode" ]; then
eval_mode=$mode
fi
train_dir=data/$suffix/train-100
valid_dir=data/$suffix/dev
test_dir=data/wav${sr_string}k/$eval_mode/test
if [[ $stage -le 0 ]]; then
echo "Stage 0: Generating Librimix dataset"
if [ -z "$storage_dir" ]; then
echo "Need to fill in the storage_dir variable in run.sh to run stage 0. Exiting"
exit 1
fi
. local/generate_librimix.sh --storage_dir $storage_dir --n_src $n_src
fi
if [[ $stage -le 1 ]]; then
echo "Stage 1: Generating csv files including wav path and duration"
. local/prepare_data.sh --storage_dir $storage_dir --n_src $n_src
fi
# Generate a random ID for the run if no tag is specified
uuid=$($python_path -c 'import uuid, sys; print(str(uuid.uuid4())[:8])')
if [[ -z ${tag} ]]; then
tag=${uuid}
fi
expdir=exp/train_convtasnet_${tag}
mkdir -p $expdir && echo $uuid >> $expdir/run_uuid.txt
echo "Results from the following experiment will be stored in $expdir"
if [[ $stage -le 2 ]]; then
echo "Stage 2: Training"
mkdir -p logs
CUDA_VISIBLE_DEVICES=$id $python_path train.py --exp_dir $expdir \
--n_blocks $n_blocks \
--n_repeats $n_repeats \
--mask_act $mask_act \
--epochs $epochs \
--batch_size $batch_size \
--num_workers $num_workers \
--half_lr $half_lr \
--early_stop $early_stop \
--optimizer $optimizer \
--lr $lr \
--weight_decay $weight_decay \
--train_dir $train_dir \
--valid_dir $valid_dir \
--sample_rate $sample_rate \
--n_src $n_src \
--task $task \
--segment $segment | tee logs/train_${tag}.log
cp logs/train_${tag}.log $expdir/train.log
# Get ready to publish
mkdir -p $expdir/publish_dir
echo "librimix/ConvTasNet" > $expdir/publish_dir/recipe_name.txt
fi
if [[ $stage -le 3 ]]; then
echo "Stage 3 : Evaluation"
if [[ $compute_wer -eq 1 ]]; then
if [[ $eval_mode != "max" ]]; then
echo "Cannot compute WER without max mode. Start again with --stage 2 --compute_wer 1 --eval_mode max"
exit 1
fi
# Install espnet if not instaled
if ! python -c "import espnet" &> /dev/null; then
echo 'This recipe requires espnet. Installing requirements.'
$python_path -m pip install espnet_model_zoo
$python_path -m pip install jiwer
$python_path -m pip install tabulate
fi
fi
$python_path eval.py \
--exp_dir $expdir \
--test_dir $test_dir \
--out_dir $out_dir \
--use_gpu $eval_use_gpu \
--compute_wer $compute_wer \
--task $task | tee logs/eval_${tag}.log
cp logs/eval_${tag}.log $expdir/eval.log
fi
Could you please suggest if there is any issue in the run.sh configuration?
Thanks,
Colin
The text was updated successfully, but these errors were encountered:
I am training ConvTasNet on Librimix train-100 dataset. It works fine when I train it using sep_noisy mode, while it prompts such an error when I train it using enh_single mode:
And here is my run.sh file:
Could you please suggest if there is any issue in the run.sh configuration?
Thanks,
Colin
The text was updated successfully, but these errors were encountered: