New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
抓取及配置的流程重构 #6
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
wontfix
This will not be worked on
Comments
抓取流程重构进度:
|
zidoshare
added a commit
that referenced
this issue
Dec 4, 2018
zidoshare
added a commit
that referenced
this issue
Dec 4, 2018
zidoshare
added a commit
that referenced
this issue
Dec 5, 2018
zidoshare
added
enhancement
New feature or request
help wanted
Extra attention is needed
wontfix
This will not be worked on
labels
Dec 8, 2018
Merged
zidoshare
added a commit
that referenced
this issue
Dec 18, 2018
完成重构之后的api如下: spider.of(response -> {
response.modelName("blog");
response.asTarget().matchUrl("zido.site/?$");
response.asContent().url().save("source_url");
PartitionDescriptor partition = response.asPartition(new CssSelector(".page-container>.blog"));
partition.field().css("h2.blog-header-title").text().save("title");
partition.field().css("p.blog-content").text().save("description");
response.asContent().url().save("url").nullable(false);
//获取任务操作句柄后添加一个事件监听器
}) 具体更新:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
wontfix
This will not be worked on
覆盖配置机制更改 #4 。
整体抓取流程如下
构建Spider对象
组件
事件监听
以下支持的事件列表
配置项
定义抓取器
提供以下方式实现抓取配置:
将会实现的抓取配置:
可能性比较小的抓取配置方式:
javascript
,lua
,groovy
中选择实现这里等待建议与讨论...
添加入口
直接使用Spider对象相关的添加url入口的方法
The text was updated successfully, but these errors were encountered: