Release v0.3.7 · binux/pyspider

ThreadBaseScheduler added to improve the performance of scheduler
robots.txt supported!
elasticsearch database backend supported!
new script callback on_finished, http://docs.pyspider.org/en/latest/About-Projects/#on_finished-callback
you can now set the delay time between retries:

retry_delay is a dict to specify retry intervals. The items in the dict
are {retried: seconds}, and a special key: '' (empty string) is used to
specify the default retry delay if not specified.

dict parameters in crawl_config, @config will be merged (e.g. headers), thanks to @ihipop
add parameter max_redirects in self.crawl to control maximum redirect numbers when doing the fetch, thanks to @AtaLuZiK
add parameter validate_cert in self.crawl to ignore the error of server’s certificate.
new property etree for Response, etree is a cached lxml.html.HtmlElement object, thanks to @waveyeung
you can now pass arguments to phantomjs from command line or config file.
support for pymongo 3.0
local.projectdb now accept a glob path (e.g. script/*.py) to load multiple projects from local filesystem.
queue size in the dashboard is not working for osx, thanks to @xyb
counters in dashboard will shown for stopped projects
other bug fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.7