-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28600: Iceberg: Check that table/partition requires compaction b… #5529
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -458,6 +458,52 @@ public static List<DeleteFile> getDeleteFiles(Table table, String partitionPath) | |
t -> ((PositionDeletesScanTask) t).file())); | ||
} | ||
|
||
public static float getUncompactedRatio(Table table, String partitionPath, long fileSizeInBytesThreshold) { | ||
long uncompactedFilesCount = getDataFileCount(table, partitionPath, fileSizeInBytesThreshold, true); | ||
long compactedFilesCount = getDataFileCount(table, partitionPath, fileSizeInBytesThreshold, false); | ||
|
||
if (uncompactedFilesCount == 0) { | ||
return 0; | ||
} else if (compactedFilesCount == 0) { | ||
return 1; | ||
} else { | ||
return uncompactedFilesCount * 1.0f / (uncompactedFilesCount + compactedFilesCount); | ||
} | ||
} | ||
|
||
private static long getDataFileCount(Table table, String partitionPath, long fileSizeInBytesThreshold, | ||
boolean isLess) { | ||
CloseableIterable<FileScanTask> fileScanTasks = | ||
table.newScan().useSnapshot(table.currentSnapshot().snapshotId()).ignoreResiduals().planFiles(); | ||
CloseableIterable<FileScanTask> filteredFileScanTasks = | ||
CloseableIterable.filter(fileScanTasks, t -> { | ||
DataFile file = t.asFileScanTask().file(); | ||
return (!table.spec().isPartitioned() || | ||
partitionPath == null && file.specId() != table.spec().specId() || | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use better separation between operands to make it readable/understandable and reliable, since now the precedence can cause issues. Like: ((a==b && c!=2) && (a!=b && c=2) && (...) ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previously I used braces in very similar case, and @deniskuzZ asked to remove them: #5389 (comment) The precedence won't cause issues because in Java the && operand has higher precedence over ||, so it will be evaluated deterministically. |
||
partitionPath != null && | ||
table.specs().get(file.specId()).partitionToPath(file.partition()).equals(partitionPath)) && | ||
(isLess ? file.fileSizeInBytes() < fileSizeInBytesThreshold : | ||
file.fileSizeInBytes() >= fileSizeInBytesThreshold); | ||
}); | ||
return Lists.newArrayList(filteredFileScanTasks).size(); | ||
} | ||
|
||
public static long countDeleteRecords(Table table, String partitionPath) { | ||
Table deletesTable = | ||
MetadataTableUtils.createMetadataTableInstance(table, MetadataTableType.POSITION_DELETES); | ||
CloseableIterable<ScanTask> deletesScanTasks = deletesTable.newBatchScan().planFiles(); | ||
CloseableIterable<ScanTask> filteredDeletesScanTasks = | ||
CloseableIterable.filter(deletesScanTasks, t -> { | ||
DeleteFile file = ((PositionDeletesScanTask) t).file(); | ||
return !table.spec().isPartitioned() || | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use better separation between operands to make it readable/understandable and reliable, since now the precedence can cause issues. Like: ((a==b && c!=2) && (a!=b && c=2) && (...) ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to previous comment, previously I used braces in very similar case, and @deniskuzZ asked to remove them: #5389 (comment) The precedence won't cause issues because in Java the && operand has higher precedence over ||, so it will be evaluated deterministically. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Highly disagree. Please check this:
The result for the first is true, for the second it is false. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There seems to be no contradiction, your test confirms that && has higher precedence than || as it gets evaluated first. First case:
Second case:
|
||
partitionPath == null && file.specId() != table.spec().specId() || | ||
partitionPath != null && | ||
table.specs().get(file.specId()).partitionToPath(file.partition()).equals(partitionPath); | ||
}); | ||
return Lists.newArrayList(CloseableIterable.transform(filteredDeletesScanTasks, | ||
t -> ((PositionDeletesScanTask) t).file().recordCount())).stream().mapToLong(Long::longValue).sum(); | ||
} | ||
|
||
public static Expression generateExpressionFromPartitionSpec(Table table, Map<String, String> partitionSpec) | ||
throws SemanticException { | ||
Map<String, PartitionField> partitionFieldMap = getPartitionFields(table).stream() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the fileSizeInBytesThreshold is the comparator if a file is uncompacted or compacted?
Isn't there a better approach to make this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A file is decided to be compacted or uncompacted based on comparing its actual size with the threshold defined in Conf depending on compaction type:
fileSizeInBytesThreshold
gets the value from these configs depending on compaction type.