gocolly/colly

cache and proxy

Open

#187 创建于 2018年7月10日

在 GitHub 查看
 (2 评论) (0 反应) (0 负责人)Go (24,898 star) (1,837 fork)batch import
enhancementhelp wanted

描述

Hey guys,

I have an issue. In real world of scrappers when using proxies, checking if You have response code 200 or 500 is simply not enough. Plenty of proxies or even website itself can throw an output with code 200 which is invalid from developer point of view and yet its getting cached. Is there any way to apply some kind of custom logic which would determine if response is what we want? For example:

colly.CacheDir("some/dir", func(resp) {
   if bytes.Contains(resp.Body, []byte("some magic marker")) {
      return false // don't cache
   }
})

I'm new to colly and i was digging thru source code for a bit, but didn't find such of feature. It's also useful without proxies.

贡献者指南