-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RateLimiter: Use Cloudflare Turnstile to detect bots #4079
base: dev
Are you sure you want to change the base?
Conversation
# Verify API URL. Default is https://challenges.cloudflare.com/turnstile/v0/siteverify | ||
# verifyUrl: https://challenges.cloudflare.com/turnstile/v0/siteverify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels silly to write the URL twice here and again as the default in the code, when honestly I'm not sure it should be configurable in the first place (if the URL changes, there's likely some logic within that will also change). Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth making this configurable. I could imagine region-specific URLs or even something like an open source "work-alike" service that implements the same API. Probably not likely, but it doesn't hurt to have an easy configuration point just in case, so if the situation arises, we don't have to do any work to react to it. That being said, if you don't want to include the URL twice in the configuration, you could say "The example below shows the default value" (or something like that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make sense...but if I'm planning for a work-alike service, would it also share the JS https://challenges.cloudflare.com/turnstile/v0/api.js loaded in the template? I'm thinking I should make that configurable as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. I would always avoid hard-coded URLs when possible.
$policyId = $rateLimiterManager->checkPolicyUsesTurnstile($event); | ||
$context = base64_encode(json_encode([ | ||
'policyId' => $policyId, | ||
'destination' => $event->getRequest()->getUri()->getPath(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just display the challenge right now. I'm redirecting to a /Turnstile/Challenge
URL so the requested page is not accessible to Turnstile's javascript code, so it can't be sent to Cloudflare as part of the signal. (This also assumes that referrer is blocked.) Combining these params and base64-encoding them is an attempt at further obfuscation in case it uses the query parameters as signals as well. Of course it's still decode-able, but it seemed worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth putting some of this reasoning into a comment in the code for future reference... and/or making a separate redirectToTurnstile
method so it's easy to override/customize the behavior if desired.
$priorTurnstileResult = $this->checkPriorTurnstileResult($policyId, $this->clientIp); | ||
if (!$priorTurnstileResult && !$turnstileLimit->isAccepted()) { | ||
$result['allow'] = false; | ||
$result['message'] = $this->getTooManyRequestsResponseMessage($event, $result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any need for a different message when Turnstile is the cause of the rate limit failure, but someone may disagree.
protected function getTurnstileCacheKey($policyId, $clientIp) | ||
{ | ||
$key = $policyId . '--' . $clientIp; | ||
$key = str_replace('.', '-', $key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (temporary) cache used now complained about the dots in the IP address. This is possibly moot once we have a real cache, depending what its rules are for keys.
$cacheManager = $this->getService(\VuFind\Cache\Manager::class); | ||
$cache = $cacheManager->getCache('object', $storageOptions['namespace']); | ||
$cache->getOptions()->setTtl($storageOptions['turnstileTtl'] ?? 60 * 60 * 24); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So createTurnstileCache
is for the cache that records whether you have already taken a Turnstile challenge, and what the result was. Using \VuFind\CacheManager is a problem; I definitely don't want to use a file-based cache for the same performance reason that it's not used for the rest of the RateLimiter. I could use the createCache
method above to create (yet) another in-memory cache instance for this purpose, but at minimum I'd have to mess with it a little to pass in the appropriate TTL, and there might be a better approach.
<input name="policyId" type="hidden" value="<?=$this->escapeHtml($policyId)?>"> | ||
<input name="destination" type="hidden" value="<?=$this->escapeHtml($destination)?>"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm escaping these two variables (and $siteKey above) even though they never should come from user input, but they flowed through URL params on the way here, so it seemed the safe thing to do. If someone actually does mess with the params intentionally
- a fake $siteKey would in theory let the attacker supply their own key, but in Turnstile configuration their key would not be set to work with this domain, so the challenge should not return a valid token.
- a fake $policyId could let them substitute a more lenient policy, I guess, that they happen to know is also configured on the same server? That one I guess is a risk
- a fake $destination ... would just take them to a fake destination after success, but that destination is still subject to the rate limiter, so no point there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to include a salted hash of the key parameters to validate that no tampering has occurred? Or does this flow in a way that makes that impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would welcome some insights on that, not sure how to make it work. I guess the salt would be a random value secret in the config file, that the institution would have to create?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a Security/HMACkey setting in config.ini that might be suitable for salting purposes. (It also looks like it might be valuable to randomize this as part of the InstallController, but that's a separate project).
@@ -130,3 +143,17 @@ Policies: | |||
policy: token_bucket | |||
limit: 2000 | |||
rate: { interval: '10 minutes', amount: 400 } | |||
|
|||
# Cloudflare Turnstile is a cloud-based "CAPTCHA alternative" to detect bots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did I not use VuFind's existing Captcha logic? Mostly because I really see this an extension of RateLimiter and want to take advantage of all the existing config logic there (ipRanges, loggedIn, filters, etc.), as well as the way Bootstrapper prevents the page from loading until it's satisfied -- none of that is in Captcha. That said there may be a way to combine the two existing services, if someone thinks I should look in that direction. We discussed this briefly in Slack and I did reference config.ini captchas below.
{ | ||
$context = json_decode(base64_decode($this->params()->fromQuery('context')), true); | ||
|
||
$yamlReader = $this->getService(\VuFind\Config\YamlReader::class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of the "thin controller" principle, I wonder if we'd be better off having a Turnstile-specific service that has the configuration injected by a factory... then we could get all this factory and business logic out of the controller, and make the controller itself a lot simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I thought of loading the config in a TurnstileControllerFactory, didn't do that just as it didn't seem the pattern (I see only one controller factory). I also considered doing a Turnstile service at one stage but didn't see the advantage, but I didn't get that we were trying to keep the controllers themselves to a minimum -- I guess that does make sense in an optimal MVC context. I'll iterate on this, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of technical debt in our controllers, so the patterns you see are definitely not the patterns we'd necessarily like to have in the long run. I just saw an opportunity to do this in a more sustainable way and figured it might be worth doing since we're in a green field anyway. ;-)
Thanks, @maccabeelevine. I haven't given this a really close review yet since it's still a draft, but I responded to some of your comments and left a couple thoughts of my own. Please let me know if/when you'd like a deeper and more formal review. |
This extends the RateLimiterManager module to use a free cloud-hosted "CAPTCHA alternative" bot-detection service, Cloudflare Turnstile.
https://developers.cloudflare.com/turnstile/
The idea is that a RateLimiterManager policy can have a second RateLimiter with a lower quota, which when it first runs out, challenges the user to prove they are not a bot.
The challenges are non-intrusive (i.e. no CAPTCHA; either no interaction at all, or just checking a box) and the page linked above claims it is WCAG 2.1 AA compliant.
Regarding user privacy, Cloudflare claims
and I've taken some extra steps to obfuscate the page that the user requested.
TODO