pingcap/tidb

executor: return the result immediately when combining LIMIT row_count with DISTINCT

Open

#15,284 建立於 2020年3月11日

在 GitHub 查看
 (13 留言) (0 反應) (0 負責人)Go (40,090 star) (6,186 fork)batch import
challenge-programfeature/acceptedhelp wantedsig/executiontype/feature-requesttype/performance

描述

All issues are open and welcomed to contributors. You can join #sig-exec on tidb community slack to discuss and get help from someone.

Description

Is your feature request related to a problem? Please describe:

mysql> create table t(a int);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t values (1), (2), (1), (3), (4), (5), (6);
Query OK, 7 rows affected (0.00 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> set @@tidb_hashagg_partial_concurrency=1;
Query OK, 0 rows affected (0.00 sec)

mysql> set @@tidb_hashagg_final_concurrency=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select distinct a from t limit 3;
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.00 sec)

Describe the feature you'd like:

The Hash Aggregation executor currently needs to scan the whole table. But actually it can break after the first 4 rows([1, 2, 1, 3]) scanned.

If we,

  1. treat requiredRows correctly in hashAgg executor
  2. return the result of function firstrow(The aggregation function to handle disctinct) immediately in hashAgg
  3. refine the cost in Planner w/ this situation

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Mysql may also have this optimazation, see https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html

When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.

Difficulty

  • Medium

Score

  • 900

Mentor(s)

  • @wshwsh12

Recommended Skills

  • SQL Optimization
  • Golang Profiling
  • Code Refactoring

貢獻者指南