v1.5.1: charlm & transformer integration in depparse
Features
depparse can have transformer as an embedding ee171cd
Lemmatizer can remember word,pos it has seen before with a flag #1263 a87ffd0
Scoring scripts for Flair and spAcy NER models (requires the appropriate packages, of course) 63dc212 c42aed5 eab0623
SceneGraph connection for the CoreNLP client d21a95c
Update constituency parser to reduce the learning rate on plateau. Fiddling with the learning rates significantly improves performance f753a4f
Tokenize [] based on () rules if the original dataset doesn't have [] in it 063b4ba
Attempt to finetune the charlm when building models (have not found effective settings for this yet) 048fdc9
Add the charlm to the lemmatizer - this will not be the default, since it is slower, but it is more accurate e811f52 66add6d f086de2
Bugfixes
Forgot to include the lemmatizer in CoreNLP 4.5.3, now in 4.5.4 4dda14b bjascob/LemmInflect#14 (comment)
prepare_ner_dataset was always creating an Armenian pipeline, even for non-Armenian langauges 78ff85c
Fix an empty bulk_process
throwing an exception 5e2d15d #1278
Unroll the recursion in the Tarjan part of the Chuliu-Edmonds algorithm - should remove stack overflow errors e0917b0
Minor updates
Put NER and POS scores on one line to make it easier to grep for: da2ae33 8c4cb04
Switch all pretrains to use a name which indicates their source, rather than the dataset they are used for: d1c68ed and many others
Pipeline uses torch.no_grad()
for a slight speed boost 36ab82e
Generalize save names, which eventually allows for putting transformer
, charlm
or nocharlm
in the save name - this lets us distinguish different complexities of model cc08458 for constituency, and others for the other models
Add the model's flags to the --help
for the run
scripts, such as 83c0901 7c171dd 8e1d112
Remove the dependency on six
6daf971 (thank you @BLKSerene )
New Models
VLSP constituency 500435d
VLSP constituency -> tagging cb0f22d
CTB 5.1 constituency f2ef62b
Add support for CTB 9.0, although those models are not distributed yet 1e3ea8a
Added an Indonesian charlm
Indonesian constituency from ICON treebank #1218
All languages with pretrained charlms now have an option to use that charlm for dependency parsing
French combined models out of GSD
, ParisStories
, Rhapsodie
, and Sequoia
ba64d37
UD 2.12 support 4f987d2