-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support linkcheck_ignore
in link redirection
#11233
Comments
Sounds very reasonable to me. When a user expects all links to a domain (or path) to be ignored, linkcheck should also ignore the redirections pointing to that domain. |
I tried to understand why Wiley URLs have this problem but was not successful, so I asked for help upstream: psf/requests#6471 Meanwhile having the option to ignore specific redirect sites of DOI links would a good idea. Here is what I have tried here if someone is interestedIf I for instance visit `https://doi.org/10.1002/jccs.200600142` with my browser, everything is fine. But both requests and Sphinx fail:python -c "import requests; print(requests.head('https://doi.org/10.1002/jccs.200600142', allow_redirects=True)); import sphinx.util.requests; print(sphinx.util.requests.head('https://doi.org/10.1002/jccs.200600142', allow_redirects=True))"
<Response [403]>
<Response [403]> I also tried accepting cookies and changing the user-agent, which also did not help: import requests
with requests.Session() as s:
print(s.get('https://doi.org/10.1002/jccs.200600142', allow_redirects=True, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'})) I posted this upstream: psf/requests#6471 |
Unsure what problem Wiley URLs have? They are always replying with a 403. The issue here is that developers can instruct linkcheck to ignore URLs matching a regexp pattern, but that URL is only ignored if it appears in the documentation, not if it comes as the result of a redirect. So: # conf.py
linkcheck_ignore = [
"https://onlinelibrary.wiley.com", # 403 Client Error: Forbidden for url
] .. doc.rst
.. this link is ignored by linkcheck, it matches the pattern from linkcheck_ignore
`direct link <https://onlinelibrary.wiley.com/doi/10.1002/jemt.20597>`_
.. doi.org redirects to Wiley, but linkcheck does not check the linkcheck_ignore during the redirection chain, so the following shows as a broken link for linkcheck
`indirect link <https://doi.org/10.1002/jemt.20597>`_ |
I'm interested in implementing this feature, but there's a detail about the existing implementation that I think is important to consider first: The Sphinx My intuition for a feature like this is that we'd ideally want to ignore redirections as soon as they suggest navigating through an ignored path -- that is, we'd follow the initial hyperlink, and if it tells us to go to a known-ignorable URL, we'd stop immediately and return the To do so, however, we'd probably want to adjust Sphinx's We could perhaps apply the naive solution, and ignore URLs after |
Is your feature request related to a problem? Please describe.
Specifying a domain in
linkcheck_ignore
works well for links containing this domain but it doesnn't for links which redirect to a link to the domain to be ignored.For example, the following configuration:
works perfectly for links like
https://onlinelibrary.wiley.com/doi/10.1002/jemt.20597
but not forhttps://doi.org/10.1002/jemt.20597
, which redirect tohttps://onlinelibrary.wiley.com/doi/10.1002/jemt.20597
Describe the solution you'd like
The
linkcheck_ignore
configuration parameters should also apply to redirect links.Additional context
See for example hyperspy/hyperspy#3108. This typically happen for DOI links, which are by design permanent url and redirect to urls which can changed. In this case, the DOI should be used in favour of the redirect url however, the
linkcheck_ignore
will not be effective on the redirect url.The text was updated successfully, but these errors were encountered: