Consider using a disk-based hash table for hash join avoiding OOM
#11,607 opened on 2019年8月5日
説明
Feature Request
Is your feature request related to a problem? Please describe:
Consider using a disk-based hash table for hash join avoiding OOM.
HashJoinExecutor uses a hash table describing the map of join keys and inner table rows.
TiDB's hash join is implemented by innerResult and mvmap.MVMap. The innerResult stores all the rows of the inner table, and the mvmap.MVMap stores the map of (join key, inner table pointer). This allows us to use these two structures to get a map of join keys and inner table rows.
When the inner table is particularly large, the innerResult will take up a lot of memory; when the join key is particularly large, mvmap.MVMap will also take up a lot of memory. There will be problems with OOM at this time.
Describe the feature you'd like:
- We already have a config
mem-quota-query, which set the memory quota for a query in bytes. - Introduce a new config
oom-use-tmp-storage, default istrue. Set to true to enable use of temporary disk for some executors(in this issue, it is hash join) whenmem-quota-queryis exceeded. - Show disk usage of an executor in
explain analyze - Show disk usage of a query in
SELECT * FROM information_schema.processlist; - Consider disk usage in cost model.
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
tasks:
- The improvement of mvmap.MVMap
- hash join #11832
- index join
- performance and code clean #11937
- Disk-based innerResult
- hash join
- utilities: #12116
- implement disk-based hash join: #12067
- index join
- cost model, explain analyze, and disk usage control
- change cost model of a hash join if it will be spilled #13246
- show disk usage information in
explain analyze#12625
Some tiny issues
- [For new contributor]Show disk usage of a query in
SELECT * FROM information_schema.processlist;#13931 - [For new contributor] Show disk usage of a query in low query and statement summary #16883
- add metrics for disk usage of a query #17263
- [For new contributor]change the default value of
mem-quota-query#12937 - [For new contributor]temporary storage usage limitation of all queries. #13983
- [For new contributor]Define temporary storage in config file. #13982
- [help wanted]multiple instances of tidb-server may use the same temporary directy #13981