-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve concurrent downloading capacity of script. #18
Labels
good first issue
Good for newcomers
Comments
fffoivos
changed the title
Fix downloader script
Improve concurrent downloading capacity of script.
Sep 29, 2024
Can i work on this issue? |
Hi @Sadique982, Sure, look at scraping/download_and_extract_scripts/downloader.py and try to increase the PDFs per minute downloading at scraping/json_sitemaps/pergamos_list_pdf.json . Make sure you don't DDOS the site and you don't get blocked! One option if blocked is trying to implement multiple downloaders with torify. |
Sadique982
added a commit
to Sadique982/glossAPI
that referenced
this issue
Nov 2, 2024
In this update, I have increased the number of PDFs downloaded per minute while ensuring that we do not overwhelm the server. The changes can be found in `scraping/download_and_extract_scripts/downloader.py`. I’ve implemented methods to manage concurrent downloads more effectively, including utilizing semaphore limits and adding sleep intervals. Additionally, I have considered strategies to avoid getting blocked, such as implementing multiple downloaders with `torify` if necessary.
I want to work on this issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Improve concurrent downloading capacity of script.
The text was updated successfully, but these errors were encountered: