pingcap/tidb

Support spilling for distinct aggregate function

Open

#27,092 opened on Aug 11, 2021

View on GitHub
 (4 comments) (0 reactions) (1 assignee)Go (40,090 stars) (6,186 forks)batch import
help wantedsig/executiontype/enhancement

Description

Enhancement

Although we support spilling for unparallel HashAgg now, the memory usage of distinct is also out of control. The reason is that we need a IntSet/FloatSet/StringSet to check whether a value is distinct, and the Set maybe take up huge memory.

We need some spilling strategy for distinct function, to control the memory usage of SQL included distinct function.

Fortunately, only few aggregate function need to support distinct, so we only need consider the following distinct function: Count, Sum, Avg, GroupConcat

TiDB seems also support distinct for STDDEV_POP,STDDEV_SAMP, etc. But Mysql doesn't support the syntax

Contributor guide