Make the Document binary serialization format more compact.
#903 建立於 2020年10月5日
描述
The DocStore builds blocks of contiguous document serialized in an adhoc simple binary format. These blocks are then compressed.
The format goes:
- field: u32
- type tag: u8
- value: specific to the type but for instance u64 simply take 8 byts.
@ppodolsky noticed that despite the compression, it might be possible to shave off a few percent of storage by changing the encoding. This might be especially useful when running
Field, unsigned int, and date could be using variable byte encoding. Signed int could use zigzag encoding.
This issue is open to new contributor. Variable int encoding is available (search VInt). Zigzag is not implemented yet.
Please add a bench to ensure there is not a catastrophic change in CPU usage when using LZ4, and please try to quantify the size impact on a pathological schema (100 uint for instance).