help wantedsig/executiontype/enhancement
Description
Enhancement
Now we support spiling unparallel hashagg by a naive approach: when memory usage is higher than quota, spilling all unprocessed data reading from child executor.
There is a reasonable optimization point: Hash partition the data while spilling data
There are many advantages following:
- Correctness. The way can keep all data that have the same key will be spilled in the same partition, and processed in the same time soon.
- Less memory usage. Obviously, processing less data will use less memory.
- Reduce IO. Now the spilling algorithm maybe re-spilling some data that has been spilled last round when memory usage is higher than quota again. Spilling partition data is always better than spilling full data.
Reference
- Design doc for spilling HashAgg: https://github.com/pingcap/tidb/blob/master/docs/design/2021-06-23-spilled-unparallel-hashagg.md
- Implement code: https://github.com/pingcap/tidb/blob/master/executor/aggregate.go#L896