yugabyte/yugabyte-db

[YSQL] Investigate point-lookup path for fully-bound hash PK IN queries

Open

#32,026 opened on Jun 4, 2026

View on GitHub
 (0 comments) (0 reactions) (0 assignees)C (8,229 stars) (1,003 forks)batch import
area/ysqlgood first issuekind/enhancementpriority/medium

Description

Jira Link: DB-21716

Created by opus 4.8

Problem

For a query whose IN predicates fully determine a hash primary key, e.g.

CREATE TABLE foo (h1 int, h2 int, PRIMARY KEY ((h1, h2) HASH));
SELECT * FROM foo WHERE h1 IN (0,1) AND h2 IN (0,1);

every target row is a complete, known primary key — the four combinations (0,0),(0,1),(1,0),(1,1). This is effectively a batch of point lookups. Instead, the storage layer executes it through HybridScanChoices, the general option-iteration engine, rather than the direct ybctid/SeekTuple point-read path. This issue is to investigate whether the fully-bound case can take the cheaper point-lookup path, and whether it is worth doing.

User impact

The query does the same RocksDB seeks either way, so there is no extra I/O, but HybridScanChoices adds per-target CPU bookkeeping (SkipTargetsUpTo, IncrementScanTargetAtOptionList, ValidateHashGroup, CurrentTargetMatchesKey, group-end/option-group handling) to do what is fundamentally N point seeks.

Root cause / current behavior

The path is chosen in PgDmlRead::Exec (src/yb/yql/pggate/pg_dml_read.cc):

if (doc_op_ && !ybctid_provider() && IsAllKeyColumnsBound()) {
    SubstituteKeyBindsWithYbctids();   // ybctid batch -> DocDB SeekTuple
} else {
    ProcessEmptyKeyBinds();            // -> hash permutations -> HybridScanChoices
}

IsAllKeyColumnsBound() returns false for any IN on a hash column (i < num_hash_key_columns is an unsupported-IN early-out). The reason is that the ybctid builder, BuildYbctidsFromKeyBinds(), only handles "all hash columns fixed to single values + at most one range column with IN"; it does not enumerate a cross-product of multiple hash IN-lists.

So hash-IN falls through to ProcessEmptyKeyBinds -> PopulateNextHashPermutationOps, which enumerates the permutations and rewrites them into a single pushed-down condition (yb_hash_code, h1, h2) IN ((hc,0,0),(hc,0,1),...). On the DocDB side DocPgsqlScanSpec::InitOptions turns that tuple-IN into options(), and ScanChoices::Create (src/yb/docdb/scan_choices.cc) constructs a HybridScanChoices because doc_spec.options() is non-null.

Notably, the information needed to take the point-lookup path is already available at rewrite time: PgDocReadOp::BindExprsToBatch computes the full hash key per permutation (EncodePgsqlHash + DecodeMultiColumnHashValue, src/yb/yql/pggate/pg_doc_op.cc). At that exact point the full doc key is in hand — it is just emitted as a tuple-IN condition rather than as a ybctid.

Contributor guide