TurkishGloVe

Türkçe GloVe - Repository for Turkish GloVe Word Embeddings

Training

We used official GloVe repository both to create word embeddings and evaluation. GloVe Github Repository

Download pre-trained word vectors

570K Vocab, cased, 300d vectors, 1.6 GB Text, 2.6 GB Binary link
253K Vocab, uncased, 300d vectors, 720 MB Text 1.2 GB Binary link:

Corpus

Corpus collected from January-December 2018 Commoncrawl. This corpus has 2,736B tokens. Corpus size: 5.4GB Corpus Link
Paper Link

Intrinsic Evaluation

This benchmark dataset is used for intrinsic evaluation on analogy task. We used synonyms, capitals, and antonyms for analogy task. Benchmark Dataset Link

Results

Semantic Evaluation	Antonyms Analogy Task	Capitals Analogy Task	Synonyms Analogy Task	Total Accuracy
GloVe Uncased	21.70	47.74	19.48	27.88

Extrinsic Evaluation

This dataset is used for extrinsic evaluation on text categorization. The dataset has 7 different classes.

Accuracy

	SVC	Logistic Regression
GloVe Cased	0.89306	0.89959
GloVe Uncased	0.89956	0.90530

Precision

	SVC	Logistic Regression
GloVe Cased	0.89388	0.89864
GloVe Uncased	0.90015	0.90619

Recall

	SVC	Logistic Regression
GloVe Cased	0.89306	0.89796
GloVe Uncased	0.89959	0.90531

We used the given machine learning techniques with default hyperparameters in scikit-learn.

Text Categorization Dataset Link

Examples

model.most_similar(positive=['fransa', 'berlin'], negative=['almanya'])

model.most_similar(positive=['geliyor', 'gitmek'], negative=['gelmek'])

model.most_similar("kedi")

References

https://cs224d.stanford.edu/lecture_notes/notes2.pdf
https://nlp.stanford.edu/pubs/glove.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
image		image
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TurkishGloVe

Training

Download pre-trained word vectors

Corpus

Intrinsic Evaluation

Results

Extrinsic Evaluation

Accuracy

Precision

Recall

Examples

References

About

Releases

Packages

Contributors 3

Languages

License

inzva/Turkish-GloVe

Folders and files

Latest commit

History

Repository files navigation

TurkishGloVe

Training

Download pre-trained word vectors

Corpus

Intrinsic Evaluation

Results

Extrinsic Evaluation

Accuracy

Precision

Recall

Examples

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages