GitHub - bernardopires/wikipedia-crawler: A simple wikipedia web crawler.

A simple wikipedia crawler in python.

Running: celery worker -A crawler.tasks --loglevel=info -Q fetch_queue -n 'fetcher' celery worker -A crawler.tasks --loglevel=info -Q parse_queue -n 'parser'

For monitoring: celery -A crawler.tasks flower --broker=amqp://guest:guest@localhost:5672// --broker_api=http://guest:guest@localhost:15672/api/

https://www.rabbitmq.com/management.html rabbitmq-plugins enable rabbitmq_management

Flower: http://localhost:5555/ RabbitMQ: http://localhost:15672/

Why only wikipedia => pretty much guaranteed sane HTML

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
crawler		crawler
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

bernardopires/wikipedia-crawler

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages