ipfs/kubo

feature: gzip multi member dependant chunker / importer, warc, tar

Open

#3,604 创建于 2017年1月17日

在 GitHub 查看
 (4 评论) (1 反应) (0 负责人)Go (13,906 star) (2,725 fork)batch import
help wantedkind/enhancement

描述

Version information:

go-ipfs version: 0.4.4

Type: Feature, Enhancement

Priority: P4

Area: Tools, Importer

Description:

Like in case of WARCs, gzip files do support multiple members, effectively making it possible to stitch together large files from smaller ones by mere concatenation.
This gives the possibility to compress meta and each record separately, concatenate onto a single file, then do partial fetches and decompression, including HTTP Range requests.

By having the static chunker also split at gzip member bondaries, one can easily construct .tar.gz files, or .tar of .gz files, and all sorts of derived data sets easily, without duplication.

There are two ways to approach this: a) the chunker works as usual, but also additionally splitting a block at member boundary
(resulting in 1:1 result, except replacing one block per member with two split in half) b) the chunker works as usual, but when encountering gzip member boundary, it makes one block smaller, starting new member in it's own 256k data block
(resulting in shift, and hence duplication of data. probably not the way to do it)

This should work for all gzip files, tar files, and more.

Related: https://tools.ietf.org/html/rfc1952

贡献者指南