pingcap/tidb
在 GitHub 查看executor: return the result immediately when combining LIMIT row_count with DISTINCT
Open
#15,284 建立於 2020年3月11日
challenge-programfeature/acceptedhelp wantedsig/executiontype/feature-requesttype/performance
描述
All issues are open and welcomed to contributors. You can join #sig-exec on tidb community slack to discuss and get help from someone.
Description
Is your feature request related to a problem? Please describe:
mysql> create table t(a int);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t values (1), (2), (1), (3), (4), (5), (6);
Query OK, 7 rows affected (0.00 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql> set @@tidb_hashagg_partial_concurrency=1;
Query OK, 0 rows affected (0.00 sec)
mysql> set @@tidb_hashagg_final_concurrency=1;
Query OK, 0 rows affected (0.00 sec)
mysql> select distinct a from t limit 3;
+------+
| a |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)
Describe the feature you'd like:
The Hash Aggregation executor currently needs to scan the whole table. But actually it can break after the first 4 rows([1, 2, 1, 3]) scanned.
If we,
- treat
requiredRowscorrectly inhashAggexecutor - return the result of function
firstrow(The aggregation function to handledisctinct) immediately inhashAgg - refine the cost in Planner w/ this situation
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
Mysql may also have this optimazation, see https://dev.mysql.com/doc/refman/8.0/en/distinct-optimization.html
When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.
Difficulty
- Medium
Score
- 900
Mentor(s)
- @wshwsh12
Recommended Skills
- SQL Optimization
- Golang Profiling
- Code Refactoring