pingcap/tidb

Support partition spilling for HashAgg

Open

#26,915 opened on Aug 5, 2021

View on GitHub
 (4 comments) (0 reactions) (1 assignee)Go (40,090 stars) (6,186 forks)batch import
help wantedsig/executiontype/enhancement

Description

Enhancement

Now we support spiling unparallel hashagg by a naive approach: when memory usage is higher than quota, spilling all unprocessed data reading from child executor.

There is a reasonable optimization point: Hash partition the data while spilling data

There are many advantages following:

  • Correctness. The way can keep all data that have the same key will be spilled in the same partition, and processed in the same time soon.
  • Less memory usage. Obviously, processing less data will use less memory.
  • Reduce IO. Now the spilling algorithm maybe re-spilling some data that has been spilled last round when memory usage is higher than quota again. Spilling partition data is always better than spilling full data.

Reference

Contributor guide