file_known_and_unchanged is wasteful and should be converted to inodes
#909 opened on Apr 15, 2016
Description
Moved from #908 I guess another change is to turn cache.file_known_and_unchanged into just inode_known_and_unchanged. Who cares what's the actual file name, since it's always stored in the archive anyway I believe? Additional benefit, the "files_cache" huge file in .cache might become smaller as it no longer needs to hold path names.
This poses some difficulties for filesystems without inode numbers like vfat and on Windows, as mentioned by @ThomasWaldmann
Windows (assuming they don't have inodes - I don't know much about ntfs) - some other unique identifier might be used, I guess. For filesystems that have no symlinks (like vfat) - file path might be unique enough. Still some races with content changing underneath, but at least nothing security critical (also hopefully nobody resurrects umsdos fs) (it's tempting to use filesize + create date (+modification date?) as unique id for vfat to handle renames more optimally there, but that might not be safe). On nfs there's "file handle" that's unique (no idea if it's easily visible from userspace). fuse and friends are harder, I guess. ntfs has symlinks and hardlinks, so they must have have unique file ids similar to inode numbers of some sort. These strategies could be determined by filesystem type (available from statfs(2) and could be called once we cross a mountpoint boundary assuming this is enabled, also separately checked for every path). It should be noted that since inodes are not unique outside of a normal filesystem, really inode in the table should be some combination of actual inode number and filesystem id (similar to how nfs creates file handles). Filesystem id might be anything from mount path (unsafe) to underlying block device id (unsafe) to fs uuid (not bullet proof either) to some other options.
Using file path as generic fallback for inode number for cases when inode numbers are ignored for other reasons (commandline switch) is also possible, though it should be marked as "potentially suboptimal".