pypa/pip

Performance Issue: Too many hashes in RequirementPreparer

Open

#12,589 opened on Mar 23, 2024

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (8,952 stars) (3,032 forks)batch import
help wantedtype: performance

Description

Description

RequirementPreparer hashes more than necessary. This leads to poor performance on large wheels in the GB size range.

A call to prepare_linked_requirement calls down to _checked_download_dir. https://github.com/pypa/pip/blob/f5e4ee104e7b171a7cfb2843c9c602abf7a4e346/src/pip/_internal/operations/prepare.py#L501

If the file exists in the download_dir, hashing is triggered. The file is marked as downloaded. https://github.com/pypa/pip/blob/f5e4ee104e7b171a7cfb2843c9c602abf7a4e346/src/pip/_internal/operations/prepare.py#L516

Then we head into _prepare_linked_requirement and eventually hash again. https://github.com/pypa/pip/blob/main/src/pip/_internal/operations/prepare.py#L612

Potential Fix

Files which have passed the hash check can be marked as such, to prevent rehashing.

Expected behavior

RequirementPreparer hashes each file at most once.

pip version

24.0

Python version

3.11

OS

Windows 10

How to Reproduce

Construct a RequirementPreparer supplied with a download_dir. Run prepare_linked_requirement() for a link available as a wheel in the download_dir.

Output

No response

Code of Conduct

Contributor guide

Performance Issue: Too many hashes in RequirementPreparer · pypa/pip#12589 | Good First Issue