Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove non printable chars from titles #31 #32

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lioman
Copy link
Contributor

@lioman lioman commented Apr 12, 2023

My initial idea is working only for the mentioned \u00ad char. I stripped the non-printable chars by replacing them by " "

Not sure if we need json.dumps and for what it was needed in the first place. If so, we should add a test case for that.

@justinmayer
Copy link
Contributor

justinmayer commented Apr 13, 2023

Thanks, Lioman. According to the description in #23, the json.dumps() method:

should handle any arbitrary punctuation marks which may happen to be in the Title - ",',\,*,...etc.

I just tried putting those characters in article titles, and I didn't have any problems with the existing code in main, except that I see a backslash before double-quotation marks in the search result titles. The escaping logic from #15 is adding a backslash where there shouldn't be one.

(I moved the rest of this comment to a more relevant issue.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants