3 comments (3 comments)0 reactions (0 reactions)0 assignees (0 assignees)Python1,144 stars (1,144 stars)353 forks (353 forks)batch import
help wanted
Description
-
It needs to support basic form of content address like "https://tumblr.blahblah.com/blah"
When approach to a certain tumblr blog with http protocol is blocked by ISP, try https:// or make it as basic form.
-
There should be a method to suppress repeating download when the download fails once.
Save dummy file with the file name, for example.
-
When the address is form of "https://www.tumblr.com/dashboard/blog/blah", it skips downloads.
- Issue type
- feature
- Research direction
- Review the existing URL parsing logic in the source code, likely in a Python module like `crawler.py`. Check the issue comments for additional context from the maintainer or other contributors. The three suggestions involve: (1) supporting HTTPS as the base URL format, (2) suppressing repeat downloads by saving a dummy file, and (3) correctly handling dashboard blog URLs. Implement a URL normalizer function and a download state tracker to address these. Look at the repository's test files to understand expected behavior.