ArchiveBox/ArchiveBox
GitHub で見る`COOKIES_FILE` isn't used when fetching page titles, leading to saving captcha-page titles like "Before you continue to YouTube..."
Open
#761 opened on 2021年6月5日
good first tickethelp wantedsize: easystatus: backlogwhy: functionality
説明
Describe the bug
Title becomes 'Before you continue to YouTube' instead of video title due to youtube redirects to a cookie consent form. This could be solved if you could add a cookie file to the curl command that is run.
["curl", "--silent", "--location", "--compressed", "--max-time", "60", "--user-agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.76.0 (amd64-portbld-freebsd12.2)", "https://www.youtube.com/watch?v=aP8sRCun63M"]
Steps to reproduce
archivebox add https://www.youtube.com/watch?v=aP8sRCun63M- Title becomes 'Before you continue to YouTube' when it should be 'ArchiveBox'
Screenshots or log output
N/A
ArchiveBox version
ArchiveBox v0.6.2
Cpython FreeBSD FreeBSD-12.2-RELEASE-p6-amd64-64bit-ELF amd64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep
[i] Dependency versions:
√ ARCHIVEBOX_BINARY v0.6.2 valid ./.local/bin/archivebox
√ PYTHON_BINARY v3.7.10 valid /usr/local/bin/python3.7
√ DJANGO_BINARY v3.1.12 valid ./.local/lib/python3.7/site-packages/django/bin/django-admin.py
√ CURL_BINARY v7.76.0 valid /usr/local/bin/curl
√ WGET_BINARY v1.21 valid /usr/local/bin/wget
√ NODE_BINARY v14.16.1 valid /usr/local/bin/node
√ SINGLEFILE_BINARY v0.3.13 valid ./node_modules/single-file/cli/single-file
√ READABILITY_BINARY v0.1.0 valid ./node_modules/readability-extractor/readability-extractor
√ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/mercury-parser/cli.js
√ GIT_BINARY v2.31.1 valid /usr/local/bin/git
√ YOUTUBEDL_BINARY v2021.05.16 valid /home/archivebox/.local/bin/youtube-dl
√ CHROME_BINARY v90.0.4430.212 valid /usr/local/bin/chrome
√ RIPGREP_BINARY v12.1.1 valid /usr/local/bin/rg
[i] Source-code locations:
√ PACKAGE_DIR 23 files valid ./.local/lib/python3.7/site-packages/archivebox
√ TEMPLATES_DIR 3 files valid ./.local/lib/python3.7/site-packages/archivebox/templates
- CUSTOM_TEMPLATES_DIR - disabled
[i] Secrets locations:
√ CHROME_USER_DATA_DIR 1 files valid ./~/.config/chromium
- COOKIES_FILE - disabled
[i] Data locations: