-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vendor charset-normalizer instead of chardet #12638
Conversation
b871b34
to
44ef926
Compare
097707f
to
2532ac6
Compare
A little late to the conversation on this, we may consider removing chardet/charset-normalizer entirely from the |
@nateprewitt I think that would be great. Generally how would you suggest I go about verifying that charset-normalizer is definitely not useful for pip? |
Both I'd like to get some feedback from other |
Hmm, yeah we don’t really need charset detection anywhere. If reqeusts supports the library not being present, that’s an easy win (assuming requests is willing to support it). I’d say it’s worthwhile even if pip needs to do some patching; we already need to patch requests anyway. |
I put up a quick PR for what this might look like in Requests (psf/requests#6702). I'll wait to see if Ian comes back with any concerns. If he's bought in, we can look at porting it here. I don't see this as being particularly difficult for us to support though. |
Description
The
requests
library which pip vendors supports eitherchardet
orcharset-normalizer
. Currently pip vendorschardet
, howevercharset-normalizer
has better import time thanchardet
, which improves pip's startup time. So this PR switches to it.As for the difference between the libraries, i.e. which is better, I didn't really try to ascertain that, because I'm mostly assuming that the charset-detection feature of
requests
is not really utilized by pip so it doesn't really matter. However givenrequests
supportscharset-normalizer
I assume it's fit for purpose.The author of
charset-normalizer
agreed to pip vendoring: jawah/charset_normalizer#456Refs #4768 (comment)
Note on
charset-normalizer
wheelpip requires pure-python wheels, but
charset-normalizer
offers bothany
and platform-specific wheel with some optimizations. The current latest version ofvendoring
, 1.2.0, installs the platform-specific wheel. This was changed in pradyunsg/vendoring@a8706ae, but this commit is not released yet.In order not to rely on an unreleased version of
vendoring
, I let it install the platform-specific wheel but remove the.so
files in thevendoring
config inpyproject.toml
. If a new version ofvendoring
is released these lines can be removed.See: pradyunsg/vendoring#62
How I created the PR
requests.patch
to a local checkout of requests 2.31.0 (the version currently vendored by pip)charset-normalizer
requests.patch
vendoring sync . -v
usingvendoring
version 1.2.0.Therefore to verify the huge diff in this PR, you can:
requests.patch
andpyproject.toml
vendoring sync . -v
and see there is no diff