-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: partition table query optimize #1594
base: main
Are you sure you want to change the base?
Conversation
table_scan_ctx: unresolved.table_scan_ctx.clone(), | ||
table_scan_ctx: if let Some(ref predicates) = unresolved.predicates { | ||
let mut ctx = unresolved.table_scan_ctx.clone(); | ||
ctx.predicate = Arc::new(predicates[idx].clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments why we need to overwrite old predicate.
}, | ||
provider::TableScanBuilder, | ||
remote::model::TableIdentifier, | ||
table::ReadRequest, | ||
}; | ||
|
||
use super::partitioned_predicates; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our codebase should prefer absolute import over super import.
&partitions, | ||
&mut partitioned_key_indices, | ||
) | ||
.map_err(|e| DataFusionError::Internal(format!("err:{e}")))?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map_err(|e| DataFusionError::Internal(format!("err:{e}")))?, | |
.map_err(|e| DataFusionError::Internal(format!("partition predicates failed, err:{e}")))?, |
let df_partition_rule = DfPartitionRuleAdapter::new(partition_info, &schema).unwrap(); | ||
|
||
let exprs = vec![ | ||
Expr::BinaryExpr(BinaryExpr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFusion provider some helper functions to define expr:
}, | ||
BuildPartitionRule, PartitionInfo, Result, | ||
}; | ||
|
||
mod extractor; | ||
|
||
pub type PartitionedFilterKeyIndex = HashMap<usize, HashMap<usize, BTreeSet<usize>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments explaining the type meaning of those usize.
let mut partitions = BTreeSet::new(); | ||
// Retrieve all the key DatumView instances along with their corresponding | ||
// indices related to their positions in the predicate inlist. Since DatumView |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
horaedb/src/common_types/src/datum.rs
Line 1313 in e065ba7
impl<'a> std::hash::Hash for DatumView<'a> { |
DatumView already impl Hash
Rationale
#1441
Detailed Changes
TLDR
The performance issue with inlist queries is due to the extra overhead from bloom-filter-like directory lookups when scanning each SST file for rows. The solution is to create a separate predicate for each partition, containing only the keys relevant to that partition. Since the current partition filter only supports BinaryExpr(Column, operator, Literal) and non-negated InList expressions, this solution will address only those specific cases.
Changes
e.g.
conditions:
and with two partitions
"cc", "dd")
partition expectations:
yield two predicates
p0: col1 = '33' and col2 in ("aa", "bb", "cc");
p1: col1 = '33' and col2 in ("dd")
Other issues discovered
When the inlist key args length is less than three, Expr will be refactored to nested BinaryExpr which bypasses the FilterExtractor.
e.g.
SQL: select * from table where col1 in ("aa", "bb") and col2 in (1,2,3,4,5...1000)
Since ("aa", "bb") has fewer than three elements, the col1 key filter is not included in partition computation, which interrupts the partitioning process in the get_candidate_partition_keys_groups function, as contains_empty_filter is set to true.
Test Plan