Cohere-Parallel-Language-Sentence-Alignment

Cohere-Align

This repo takes two text files in the source and target languages, and returns sentences that are most likely translations of each other.

Before running, create an account on cohere to get your api key.

Then install cohere, using the following command

pip install cohere

To align sentences, create two text files, with each line containing a distinct text, for the source and target languages. Afterwards , run the following command:

Cohere

python3 scripts/cohere_align.py \
   --cohere_api_key '<api_key>' \
   -m 'embed-multilingual-v2.0' \
   -s src.txt \
   -t trg.txt \
   -o cohere \
   --retrieval 'nn' \
   --dot \
   --cuda

There's also a comparison with laser autoencoder for the same files

Laser

python3 scripts/laser_align.py \
  -s src.txt \
  -t trg.txt \
  -o cohere \
  --src_lang ha \
  --trg_lang en \
  --retrieval 'nn' \
  --dot \
  --cuda

where m is model name, s is source text path, t is target text path, o is output directory path, and provide the cuda option if you have GPU. For more parameters, see the alignment script.

You can also use the jupyter notebook above to align the sentences.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
scripts		scripts
Cohere_Align_Sentences.ipynb		Cohere_Align_Sentences.ipynb
README.md		README.md
src.txt		src.txt
trg.txt		trg.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cohere-Parallel-Language-Sentence-Alignment

Cohere-Align

Cohere

Laser

About

Releases

Packages

Languages

lukmanaj/Cohere-Parallel-Language-Sentence-Alignment

Folders and files

Latest commit

History

Repository files navigation

Cohere-Parallel-Language-Sentence-Alignment

Cohere-Align

Cohere

Laser

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages