m1guelpf/browser-agent

Respect `robots.txt`

Open

#2 opened on Mar 26, 2023

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Rust (721 stars) (68 forks)user submission
enhancementgood first issuehelp wanted

Description

ML-based bots need to respect the norms as all the other bots on the web. That means providing an identifiable user agent, loading robots.txt, and avoiding requests to places that robots.txt bans it from.

Contributor guide