Skip to content

v1.5.1: charlm & transformer integration in depparse

Compare
Choose a tag to compare
@AngledLuffa AngledLuffa released this 08 Sep 22:22
· 695 commits to main since this release

Features

depparse can have transformer as an embedding ee171cd

Lemmatizer can remember word,pos it has seen before with a flag #1263 a87ffd0

Scoring scripts for Flair and spAcy NER models (requires the appropriate packages, of course) 63dc212 c42aed5 eab0623

SceneGraph connection for the CoreNLP client d21a95c

Update constituency parser to reduce the learning rate on plateau. Fiddling with the learning rates significantly improves performance f753a4f

Tokenize [] based on () rules if the original dataset doesn't have [] in it 063b4ba

Attempt to finetune the charlm when building models (have not found effective settings for this yet) 048fdc9

Add the charlm to the lemmatizer - this will not be the default, since it is slower, but it is more accurate e811f52 66add6d f086de2

Bugfixes

Forgot to include the lemmatizer in CoreNLP 4.5.3, now in 4.5.4 4dda14b bjascob/LemmInflect#14 (comment)

prepare_ner_dataset was always creating an Armenian pipeline, even for non-Armenian langauges 78ff85c

Fix an empty bulk_process throwing an exception 5e2d15d #1278

Unroll the recursion in the Tarjan part of the Chuliu-Edmonds algorithm - should remove stack overflow errors e0917b0

Minor updates

Put NER and POS scores on one line to make it easier to grep for: da2ae33 8c4cb04

Switch all pretrains to use a name which indicates their source, rather than the dataset they are used for: d1c68ed and many others

Pipeline uses torch.no_grad() for a slight speed boost 36ab82e

Generalize save names, which eventually allows for putting transformer, charlm or nocharlm in the save name - this lets us distinguish different complexities of model cc08458 for constituency, and others for the other models

Add the model's flags to the --help for the run scripts, such as 83c0901 7c171dd 8e1d112

Remove the dependency on six 6daf971 (thank you @BLKSerene )

New Models

VLSP constituency 500435d

VLSP constituency -> tagging cb0f22d

CTB 5.1 constituency f2ef62b

Add support for CTB 9.0, although those models are not distributed yet 1e3ea8a

Added an Indonesian charlm

Indonesian constituency from ICON treebank #1218

All languages with pretrained charlms now have an option to use that charlm for dependency parsing

French combined models out of GSD, ParisStories, Rhapsodie, and Sequoia ba64d37

UD 2.12 support 4f987d2