Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should url be transcribed? #12

Closed
ssoto opened this issue Jun 30, 2021 · 9 comments
Closed

Should url be transcribed? #12

ssoto opened this issue Jun 30, 2021 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@ssoto
Copy link
Contributor

ssoto commented Jun 30, 2021

I did this tests:

import requests
from urllib.parse import urlencode

content = "Hola esto es una URL:    https://twitter.com/OficinaEspanol  Respetala porfa"
encoded_url_params = urlencode({"spanish": content})
response = requests.get(f"https://api.andaluh.es/epa?{encoded_url_params}")

response.json()
>  {'spanish': 'Hola esto es una URL:    https://twitter.com/OficinaEspanol  Respetala porfa',
>   'andaluh': 'Ola êtto êh una ÛL-L:    ttps://twîttêh.com/OfiçinaÊppanôh  Rêppetala porfa',
>  'rules': {'vaf': 'ç', 'vvf': 'h', 'escapeLinks': False}}

As you can see, transcription algorithm pass over url, appying rules and changing it.
Is this behaviour expected?

@ssoto ssoto added the question Further information is requested label Jun 30, 2021
@fontanon
Copy link
Member

Hi @ssoto please use escapeLinks url parameter to avoid transcribing URL. In other words: add "escapeLinks": True to your encoded_url_params

A curl example below:

curl "https://api.andaluh.es/epa?spanish=Vente%20a%20https://andaluh.es%20a%20ver%20la%20p%C3%A1gina&vaf=s&vvf=j&escapeLinks=true"

Alos, please open issues related with API into andaluh-api repo: https://github.com/andalugeeks/andaluh-api

@ssoto
Copy link
Contributor Author

ssoto commented Jun 30, 2021

Sorry for being harder, did you see the transcription of URL to ÛL-L? Let me know if this behaviour is properly for andaluh-api or for andaluh-py.
regards!

@penyaskito
Copy link

penyaskito commented Jun 30, 2021

Looks like the regex is parsing only the first part of the url

curl https://api.andaluh.es/epa?spanish=Vente%20a%20https://andaluh.es/esto/es/el/resto/de/una/url%20a%20ver%20la%20p%C3%A1gina&vaf=s&vvf=j&escapeLinks=true

returns

{
	  spanish: "Vente a https://andaluh.es/esto/es/el/resto/de/una/url a ver la página",
	  andaluh: "Bente a https://andaluh.es/êtto/êh/el/rêtto/de/una/ûl-l a bêh la pájina",
	  rules: {
		    vaf: "s",
		    vvf: "j",
		    escapeLinks: true
	  }
} 

@fontanon fontanon reopened this Jul 1, 2021
@fontanon
Copy link
Member

fontanon commented Jul 1, 2021

Confirmed this behaviour with andaluh-py, reopening this issue.

$ python3
>>> import andaluh
>>> print(andaluh.epa("Hola esto es una URL:    https://twitter.com/OficinaEspanol  Respetala porfa", escape_links=True))
Ola êtto êh una ÛL-L:    https://twitter.com/OfiçinaÊppanôh  Rêppetala porfa
>>> 

@fontanon
Copy link
Member

fontanon commented Jul 1, 2021

Related: andalugeeks/andaluh-api#3

@fontanon fontanon added bug Something isn't working and removed question Further information is requested labels Jul 1, 2021
@fontanon
Copy link
Member

fontanon commented Jul 1, 2021

Sorry for being harder, did you see the transcription of URL to ÛL-L? Let me know if this behaviour is properly for andaluh-api or for andaluh-py.
regards!

Please preceed with a hashtag those acronyms or foreign language words you do not want to be transcribed. The andaluh-py library just transcribes everything, it cannot be aware those situations.

@ssoto
Copy link
Contributor Author

ssoto commented Jul 1, 2021

Thanks for your analysis guys!
Anyone is working on fix it? I can try fix it, I'm going to asing it to myself!

@ssoto ssoto self-assigned this Jul 1, 2021
@fontanon
Copy link
Member

fontanon commented Jul 1, 2021

I was already, was easy. Thanks.

@fontanon fontanon self-assigned this Jul 1, 2021
@ssoto ssoto removed their assignment Jul 1, 2021
@fontanon
Copy link
Member

fontanon commented Jul 1, 2021

@ssoto @penyaskito please use master to test now.

0e9a405

@ssoto ssoto closed this as completed Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants