Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如果知道动态网页的加载API并且也可以请求到json的数据,怎么能通过接口嵌入到框架里进一步抓取 #158

Open
smil4ever opened this issue Apr 23, 2021 · 1 comment

Comments

@smil4ever
Copy link

我想在抓取新闻网站,网站是动态加载的,通过浏览器是能看到它的请求更多数据的接口,也能请求下载到数据,但是是json格式的,相当于是能取到列表页数据的,这种场景还是比较多的,怎么能通过框架哪个中间件接口能嵌入进去继续爬取呢?目前我看到的是通过addUrl可以提前告诉框架所有详情页的url这种方式

@owner888
Copy link
Owner

owner888 commented Jul 1, 2021

我想在抓取新闻网站,网站是动态加载的,通过浏览器是能看到它的请求更多数据的接口,也能请求下载到数据,但是是json格式的,相当于是能取到列表页数据的,这种场景还是比较多的,怎么能通过框架哪个中间件接口能嵌入进去继续爬取呢?目前我看到的是通过addUrl可以提前告诉框架所有详情页的url这种方式

是的,通过addUrl可以实现

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants