Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text splitting into small parts #3

Open
AigizK opened this issue Sep 29, 2021 · 0 comments
Open

Add text splitting into small parts #3

AigizK opened this issue Sep 29, 2021 · 0 comments

Comments

@AigizK
Copy link

AigizK commented Sep 29, 2021

The current version ignores the H1-H5 headers that were added by user. But when book was translate text from chapter 1 will be translate as a chapter 1 text into another language.
You can use this fact and split a big text to small parts.

Next idea - try split a big text to small blocks automatically:
Select a few sentences from original text(for example 10 sentences) and using loop try to find translate block in the thanslated text.

You can use the next psedocode:

left_array = original_sentences[100:110]
sum=[]
for i=50;i<150 do:
   right_array_candidate=translated_sentences[i:i+10]
   sum[i]=sum(cosunuse_distance(left_array,right_array_candidate))

rigth_array=get_index_with_max_value(sum)

left_text_split_index=left_array[0]
rigth_text_split_index=rigth_array[0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant