Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conllu data - number of edge labels is different from Yamakata et al. 2020 #3

Open
paolo-gajo opened this issue Nov 4, 2024 · 0 comments

Comments

@paolo-gajo
Copy link

paolo-gajo commented Nov 4, 2024

Hello, I was trying to remake the data starting directly from Yamakata et al.'s data and, going through the gold training data and the dataloaders, I noticed that the parser training data, e.g. train.conllu, has 17 labels once loaded:

{'root': 18802, 't': 4863, 'o': 2191, 'd': 1564, 'f-eq': 742, 't-comp': 511, 'v-tm': 444, 'a': 434, 'f-part-of': 294, 'f-comp': 235, 'a-eq': 137, 't-eq': 135, 't-part-of': 115, 's': 78, 'v': 58, 'f-set': 8, '-': 2}

This seems incoherent with the r_NE classes from Yamakata et al. 2020, which has 13 classes (14 when using 'root' for the absence of an edge).

Could you confirm if this is indeed a bug or if I am missing something? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant