Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HIVE-28585]: Ensure SerDes is case insensitive for data type-'string' to align with HQL and SQL #5515

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shivjha30
Copy link
Contributor

@shivjha30 shivjha30 commented Oct 22, 2024

What changes were proposed in this pull request?
The changes proposed in this pull request aim to make the SerDe API case-insensitive for the data type "string". Specifically, the modification involves adjusting the typeNameToTypeEntry map lookup to handle case insensitivity by converting the key to lowercase before performing the lookup.

Why are the changes needed?
The changes are needed to address inconsistencies between Hive and other connectors like Trino/Presto when using the SerDe API. Hive treats the data type "STRING" in a case-insensitive manner, but the current SerDe implementation does not. This discrepancy can lead to issues where SQL and HQL queries behave differently, causing potential data handling problems and integration issues. By ensuring case insensitivity in the SerDe API, we can maintain consistent behavior across different APIs and connectors interfacing with Hive.
This issue was observed in OpenCSVSerde

Does this PR introduce any user-facing change?
NO

@shivjha30 shivjha30 changed the title [wip][HIVE-28585]: Ensure SerDes is case insensitive for data type-'string… [WIP][HIVE-28585]: Ensure SerDes is case insensitive for data type-'string' to align with HQL and SQL Oct 22, 2024
@shivjha30 shivjha30 force-pushed the string branch 2 times, most recently from 1238f9b to 9260ab9 Compare October 22, 2024 10:02
Copy link

github-actions bot commented Oct 23, 2024

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (1)

Coulmn

Previously acknowledged words that are now absent aarry bytecode HIVEFETCHOUTPUTSERDE timestamplocal yyyy
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:shivjha30/hive.git repository
on the string branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/2430898672" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u
If the flagged items do not appear to be text

If items relate to a ...

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

  • binary file.

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

@shivjha30 shivjha30 changed the title [WIP][HIVE-28585]: Ensure SerDes is case insensitive for data type-'string' to align with HQL and SQL [HIVE-28585]: Ensure SerDes is case insensitive for data type-'string' to align with HQL and SQL Oct 24, 2024
Copy link

sonarcloud bot commented Oct 29, 2024

@shivjha30
Copy link
Contributor Author

@zhangbutao Could you please review?

@shivjha30
Copy link
Contributor Author

@ayushtkn Could you review the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants