ACM-VIT/scrag

A flexible web scraper that intelligently adapts to different website structures using multiple extraction strategies (newspaper3k, readability-lxml, BeautifulSoup, and optional headless rendering). It outputs clean, structured data for RAG pipelines or local LLMs, with an optional extension to automatically build RAG indexes from web queries.

PythonStars 3Forks 13Watchers 3Open issues 3License MIT License
Details
仓库信息
OwnerACM-VIT
Homepage
Last pushed2025-11-12
Last updated2025-12-14
Issues fetched at

Stats

Community at a glance

Loading...

Loading

--

Loading

--

Loading

--

Loading

--