Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve concurrent downloading capacity of script. #18] #23

Closed
wants to merge 1 commit into from

Commits on Nov 2, 2024

  1. [Improve concurrent downloading capacity of script. eellak#18]

    In this update, I have increased the number of PDFs downloaded per minute while ensuring that we do not overwhelm the server. The changes can be found in `scraping/download_and_extract_scripts/downloader.py`. I’ve implemented methods to manage concurrent downloads more effectively, including utilizing semaphore limits and adding sleep intervals. Additionally, I have considered strategies to avoid getting blocked, such as implementing multiple downloaders with `torify` if necessary.
    Sadique982 committed Nov 2, 2024
    Configuration menu
    Copy the full SHA
    e4e5d68 View commit details
    Browse the repository at this point in the history