Skip to content

ltrc/telugu_treebank

Repository files navigation

telugu_treebank

IIIT-H Telugu treebank consisting of around 1600 sentences is made available in ICON 2009 tools contest. This treebank is combined with HCU Telugu treebank containing approximately 2000 sentences and another 200 sentences annotated at IIIT Hyderabad. The treebanks are annotated using Paninian dependency grammar. We clean up the treebank by removing sentences with wrong format or incomplete parse trees etc. The final treebank consists of 3220 sentences.

The intra-chunk dependencies are annotated using the treebank expander made available at https://github.com/ltrc/Shift-Reduce-Chunk-Expander based on Context Free Grammar (CFG) rules for Telugu. The grammar files for Telugu using both BIS and AnnCorra part-of-speech schemas are made available.

The intra-chunk annotated treebank is made available in both SSF and CoNLL-X format in WX notation.

Releases

No releases published

Packages

No packages published