Tracking Progress in Rich Context

The Coleridge Initiative at NYU has been researching Rich Context to enhance search and discovery of datasets used in scientific research – see the Background Info section for more details. Partnering with experts throughout academia and industry, NYU-CI has worked to leverage the closely adjacent fields of NLP/NLU, knowledge graph, recommender systems, scholarly infrastructure, data mining from scientific literature, dataset discovery, linked data, open vocabularies, metadata management, data governance, and so on. Leaderboards are published here on GitHub to track state-of-the-art (SOTA) progress among the top results.

Leaderboard 1

Entity Linking for Datasets in Publications

The first challenge is to identify the datasets used in research publications, initially focused on the problem of entity linking. Research papers will generally mention the datasets they've used, although there are no formal means to describe that metadata in a machine-readable way.

Identifying dataset mentions typically requires:

extracting text from an open access PDF
some NLP parsing of the text
feature engineering (e.g., paying attention to where the text is located in the paper)
modeling to identify up to 5 datasets per publication

See Evaluating Models for Entity Linking with Datasets for details about how the Top5uptoD leaderboard metric is calculated.

Current SOTA

source	precision	repo	corpus	date	contact
LARC	78.36	link	v0.1.5	2019-09-26	@philipskokoh

Instructions

Use of open source and open standards are especially important to further the cause for effective, reproducible research. We're hosting this competition to focus on the research challenges of specific machine learning use cases encountered within Rich Context – see the Workflow Stages section.

If you have any questions about the Rich Context leaderboard competition – and especially if you identify any problems in the corpus (e.g., data quality, incorrect metadata, broken links, etc.) – please use the GitHub issues for this repo and pull requests to report, discuss, and resolve them.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
corpus.jsonld		corpus.jsonld
corpus.py		corpus.py
corpus.ttl		corpus.ttl
download_corpus_resources.py		download_corpus_resources.py
requirements.txt		requirements.txt
vocab.json		vocab.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tracking Progress in Rich Context

Leaderboard 1

Entity Linking for Datasets in Publications

Current SOTA

Instructions

About

Releases

Packages

Languages

License

philipskokoh/rclc

Folders and files

Latest commit

History

Repository files navigation

Tracking Progress in Rich Context

Leaderboard 1

Entity Linking for Datasets in Publications

Current SOTA

Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages