-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RateLimiter: Use Cloudflare Turnstile to detect bots #4079
base: dev
Are you sure you want to change the base?
Changes from 1 commit
9dbf50e
529f12d
3390d6b
eafe450
14f595a
ecd4a2f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,12 @@ Storage: | |
#redis_version : 6 | ||
#redis_standalone : true | ||
|
||
# Namespace for Turnstile result cache (default is Turnstile) | ||
# turnstileNamespace: Turnstile | ||
|
||
# Time-to-live (seconds) for Turnstile result cache (default is 86400, i.e. 1 day) | ||
# turnstileTtl: 86400 | ||
|
||
# Policies define the actual rate limiting settings. The request is checked against | ||
# the list of policies, and the first matching policy is used. | ||
# Keys under "Policies" are used as rate limiter IDs. | ||
|
@@ -82,6 +88,13 @@ Storage: | |
# limit: 500 | ||
# rate: { interval: '10 minutes', amount: 100 } | ||
# | ||
# turnstileRateLimiterSettings Defined like rateLimiterSettings, but when the limit is | ||
# reached, a Turnstile challenge is used instead of simply | ||
# returning a 429. The challenge result is cached. | ||
# Failing the challenge results in standard 429 behavior. | ||
# Passing the challenge bypasses this check, but note the | ||
# rateLimiterSettings quota is still separately applied. | ||
# | ||
# addHeaders Whether to add X-RateLimit-* headers (default is false) | ||
# | ||
# reportOnly If set to true, will not enforce the policy even if the main | ||
|
@@ -130,3 +143,17 @@ Policies: | |
policy: token_bucket | ||
limit: 2000 | ||
rate: { interval: '10 minutes', amount: 400 } | ||
|
||
# Cloudflare Turnstile is a cloud-based "CAPTCHA alternative" to detect bots. | ||
# https://developers.cloudflare.com/turnstile/ | ||
# These are the global settings for Turnstile. See also the turnstile config above: | ||
# - within each policy, to enable turnstile for that policy | ||
# - storage settings for the result cache | ||
# Turnstile: | ||
# These two keys are required. See also values they can be set to for testing purposes: | ||
# https://developers.cloudflare.com/turnstile/troubleshooting/testing/ | ||
# siteKey: 0x1234567890 | ||
# secretKey: 0x1234567890 | ||
|
||
# Verify API URL. Default is https://challenges.cloudflare.com/turnstile/v0/siteverify | ||
# verifyUrl: https://challenges.cloudflare.com/turnstile/v0/siteverify | ||
Comment on lines
+160
to
+161
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels silly to write the URL twice here and again as the default in the code, when honestly I'm not sure it should be configurable in the first place (if the URL changes, there's likely some logic within that will also change). Remove? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's worth making this configurable. I could imagine region-specific URLs or even something like an open source "work-alike" service that implements the same API. Probably not likely, but it doesn't hurt to have an easy configuration point just in case, so if the situation arises, we don't have to do any work to react to it. That being said, if you don't want to include the URL twice in the configuration, you could say "The example below shows the default value" (or something like that). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That make sense...but if I'm planning for a work-alike service, would it also share the JS https://challenges.cloudflare.com/turnstile/v0/api.js loaded in the template? I'm thinking I should make that configurable as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, definitely. I would always avoid hard-coded URLs when possible. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -394,8 +394,21 @@ protected function initRateLimiter(): void | |
$result = $rateLimiterManager->check($event); | ||
if (!$result['allow']) { | ||
$response = $event->getResponse(); | ||
$response->setStatusCode(429); | ||
$response->setContent($result['message']); | ||
if ($result['performTurnstileChallenge'] ?? false) { | ||
$response->setStatusCode(307); | ||
$policyId = $rateLimiterManager->checkPolicyUsesTurnstile($event); | ||
$context = base64_encode(json_encode([ | ||
'policyId' => $policyId, | ||
'destination' => $event->getRequest()->getUri()->getPath(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could just display the challenge right now. I'm redirecting to a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be worth putting some of this reasoning into a comment in the code for future reference... and/or making a separate |
||
])); | ||
$response->getHeaders()->addHeaderLine( | ||
'Location', | ||
'/vufind/Turnstile/Challenge?context=' . $context | ||
maccabeelevine marked this conversation as resolved.
Show resolved
Hide resolved
|
||
); | ||
} else { | ||
$response->setStatusCode(429); | ||
$response->setContent($result['message']); | ||
} | ||
$event->stopPropagation(true); | ||
return $response; | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
<?php | ||
|
||
/** | ||
* Turnstile Controller | ||
* | ||
* PHP version 8 | ||
* | ||
* Copyright (C) Villanova University 2024. | ||
* | ||
* This program is free software; you can redistribute it and/or modify | ||
* it under the terms of the GNU General Public License version 2, | ||
* as published by the Free Software Foundation. | ||
* | ||
* This program is distributed in the hope that it will be useful, | ||
* but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
* GNU General Public License for more details. | ||
* | ||
* You should have received a copy of the GNU General Public License | ||
* along with this program; if not, write to the Free Software | ||
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA | ||
* | ||
* @category VuFind | ||
* @package Controller | ||
* @author Maccabee Levine <[email protected]> | ||
* @license http://opensource.org/licenses/gpl-2.0.php GNU General Public License | ||
* @link https://vufind.org Main Page | ||
*/ | ||
|
||
namespace VuFind\Controller; | ||
|
||
use Laminas\Log\LoggerAwareInterface; | ||
use VuFind\Log\LoggerAwareTrait; | ||
use VuFindHttp\HttpServiceAwareInterface; | ||
use VuFindHttp\HttpServiceAwareTrait; | ||
|
||
/** | ||
* Controller Cloudflare Turnstile access checks. | ||
* | ||
* @category VuFind | ||
* @package Controller | ||
* @author Maccabee Levine <[email protected]> | ||
* @license http://opensource.org/licenses/gpl-2.0.php GNU General Public License | ||
* @link https://vufind.org Main Page | ||
*/ | ||
class TurnstileController extends AbstractBase implements | ||
HttpServiceAwareInterface, | ||
LoggerAwareInterface | ||
{ | ||
use HttpServiceAwareTrait; | ||
use LoggerAwareTrait; | ||
|
||
/** | ||
* Present the Turnstile challenge to the user | ||
* | ||
* @return mixed | ||
*/ | ||
public function challengeAction() | ||
{ | ||
$context = json_decode(base64_decode($this->params()->fromQuery('context')), true); | ||
|
||
$yamlReader = $this->getService(\VuFind\Config\YamlReader::class); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the interest of the "thin controller" principle, I wonder if we'd be better off having a Turnstile-specific service that has the configuration injected by a factory... then we could get all this factory and business logic out of the controller, and make the controller itself a lot simpler. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I thought of loading the config in a TurnstileControllerFactory, didn't do that just as it didn't seem the pattern (I see only one controller factory). I also considered doing a Turnstile service at one stage but didn't see the advantage, but I didn't get that we were trying to keep the controllers themselves to a minimum -- I guess that does make sense in an optimal MVC context. I'll iterate on this, thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a lot of technical debt in our controllers, so the patterns you see are definitely not the patterns we'd necessarily like to have in the long run. I just saw an opportunity to do this in a more sustainable way and figured it might be worth doing since we're in a green field anyway. ;-) |
||
$config = $yamlReader->get('RateLimiter.yaml'); | ||
$context['siteKey'] = $config['Turnstile']['siteKey']; | ||
|
||
$this->layout()->searchbox = false; | ||
return $this->createViewModel($context); | ||
} | ||
|
||
/** | ||
* Verify the Turnstile widget result against the Turnstile backend | ||
* | ||
* @return mixed | ||
*/ | ||
public function verifyAction() | ||
{ | ||
$token = $this->params()->fromPost('token'); | ||
$policyId = $this->params()->fromPost('policyId'); | ||
$destination = $this->params()->fromPost('destination'); | ||
|
||
// Call the Turnstile verify API to validate the token | ||
$yamlReader = $this->getService(\VuFind\Config\YamlReader::class); | ||
$config = $yamlReader->get('RateLimiter.yaml'); | ||
$secretKey = $config['Turnstile']['secretKey']; | ||
$url = $config['Turnstile']['verifyUrl'] ?? | ||
'https://challenges.cloudflare.com/turnstile/v0/siteverify'; | ||
$body = [ | ||
'secret' => $secretKey, | ||
'response' => $token, | ||
]; | ||
$response = $this->httpService->post( | ||
$url, | ||
json_encode($body), | ||
'application/json' | ||
); | ||
|
||
if ($response->isOk()) { | ||
$responseData = json_decode($response->getBody(), true); | ||
$success = $responseData['success']; | ||
} else { | ||
// Unexpected error. Treat as a positive result, since it's not the user's fault. | ||
$this->logWarning('Verification process failed, allowing traffic: ' | ||
. $response->getStatusCode() . $response->getBody()); | ||
$success = true; | ||
} | ||
|
||
// Save the Turnstile result for future requests | ||
$rateLimiterManager = $this->getService(\VuFind\RateLimiter\RateLimiterManager::class); | ||
$rateLimiterManager->setTurnstileResult( | ||
$policyId, | ||
$this->event->getRequest()->getServer('REMOTE_ADDR'), | ||
$success | ||
); | ||
|
||
// Either way, return a http redirect to the referrer page. | ||
return $this->redirect()->toUrl($destination); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,7 @@ | |
namespace VuFind\RateLimiter; | ||
|
||
use Closure; | ||
use Laminas\Cache\Storage\StorageInterface; | ||
use Laminas\EventManager\EventInterface; | ||
use Laminas\Log\LoggerAwareInterface; | ||
use Laminas\Mvc\MvcEvent; | ||
|
@@ -72,17 +73,19 @@ class RateLimiterManager implements LoggerAwareInterface, TranslatorAwareInterfa | |
/** | ||
* Constructor | ||
* | ||
* @param array $config Rate limiter configuration | ||
* @param string $clientIp Client's IP address | ||
* @param ?int $userId User ID or null if not logged in | ||
* @param Closure $rateLimiterFactoryCallback Rate limiter factory callback | ||
* @param IpAddressUtils $ipUtils IP address utilities | ||
* @param array $config Rate limiter configuration | ||
* @param string $clientIp Client's IP address | ||
* @param ?int $userId User ID or null if not logged in | ||
* @param Closure $rateLimiterFactoryCallback Rate limiter factory callback | ||
* @param StorageInterface $turnstileCache A cache for Turnstile results | ||
* @param IpAddressUtils $ipUtils IP address utilities | ||
*/ | ||
public function __construct( | ||
protected array $config, | ||
protected string $clientIp, | ||
protected ?int $userId, | ||
protected Closure $rateLimiterFactoryCallback, | ||
protected StorageInterface $turnstileCache, | ||
protected IpAddressUtils $ipUtils | ||
) { | ||
$this->clientLogDetails = "ip:$clientIp"; | ||
|
@@ -146,6 +149,16 @@ public function check(EventInterface $event): array | |
// We have a policy matching the route, so check rate limiter: | ||
$limiter = ($this->rateLimiterFactoryCallback)($this->config, $policyId, $this->clientIp, $this->userId); | ||
$limit = $limiter->consume(1); | ||
if ($this->config['Policies'][$policyId]['turnstileRateLimiterSettings'] ?? false) { | ||
$turnstileLimiter = ($this->rateLimiterFactoryCallback)( | ||
$this->config, | ||
$policyId, | ||
$this->clientIp, | ||
$this->userId, | ||
'turnstileRateLimiterSettings' | ||
); | ||
$turnstileLimit = $turnstileLimiter->consume(1); | ||
} | ||
$result = [ | ||
'allow' => true, | ||
'requestsRemaining' => $limit->getRemainingTokens(), | ||
|
@@ -171,6 +184,15 @@ public function check(EventInterface $event): array | |
] | ||
); | ||
} | ||
if (isset($turnstileLimit)) { | ||
$priorTurnstileResult = $this->checkPriorTurnstileResult($policyId, $this->clientIp); | ||
if (!$priorTurnstileResult && !$turnstileLimit->isAccepted()) { | ||
$result['allow'] = false; | ||
$result['message'] = $this->getTooManyRequestsResponseMessage($event, $result); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think there's any need for a different message when Turnstile is the cause of the rate limit failure, but someone may disagree. |
||
$result['performTurnstileChallenge'] = ($priorTurnstileResult === null); | ||
return $result; | ||
} | ||
} | ||
if ($limit->isAccepted()) { | ||
return $result; | ||
} | ||
|
@@ -199,6 +221,9 @@ public function check(EventInterface $event): array | |
*/ | ||
protected function getPolicyIdForEvent(MvcEvent $event): ?string | ||
{ | ||
if ($event->getRouteMatch()->getParams()['controller'] == 'Turnstile') { | ||
return null; | ||
} | ||
$isCrawler = null; | ||
foreach ($this->config['Policies'] ?? [] as $name => $settings) { | ||
if (null !== ($loggedIn = $settings['loggedIn'] ?? null)) { | ||
|
@@ -318,4 +343,68 @@ protected function isCrawlerRequest(MvcEvent $event): bool | |
$crawlerDetect = new \Jaybizzle\CrawlerDetect\CrawlerDetect(); | ||
return $crawlerDetect->isCrawler($agent); | ||
} | ||
|
||
/** | ||
* Check whether the RateLimiter policy for this event uses Turnstile. | ||
* | ||
* @param MvcEvent $event Request event | ||
* | ||
* @return bool | ||
*/ | ||
public function checkPolicyUsesTurnstile($event) | ||
{ | ||
$policyId = $this->getPolicyIdForEvent($event); | ||
if ($this->config['Policies'][$policyId]['turnstileRateLimiterSettings'] ?? false) { | ||
return $policyId; | ||
} | ||
return false; | ||
} | ||
|
||
/** | ||
* Check for a prior, cached result from Turnstile under this client IP and policy. | ||
* | ||
* @param string $policyId The policy ID | ||
* @param string $clientIp The client IP | ||
* | ||
* @return ?bool Null if there is no prior result, or if Turnstile is disabled; | ||
* otherwise a boolean representing the Turnstile result. | ||
*/ | ||
protected function checkPriorTurnstileResult($policyId, $clientIp) | ||
{ | ||
if (!($this->config['Policies'][$policyId]['turnstileRateLimiterSettings'] ?? false)) { | ||
return null; | ||
} | ||
$cacheKey = $this->getTurnstileCacheKey($policyId, $clientIp); | ||
return $this->turnstileCache->getItem($cacheKey); | ||
} | ||
|
||
/** | ||
* Store a Turnstile result for this client IP and policy. | ||
* | ||
* @param string $policyId The policy ID | ||
* @param string $clientIp The client IP | ||
* @param bool $success The result to store | ||
* | ||
* @return void | ||
*/ | ||
public function setTurnstileResult($policyId, $clientIp, $success) | ||
{ | ||
$cacheKey = $this->getTurnstileCacheKey($policyId, $clientIp); | ||
$this->turnstileCache->setItem($cacheKey, $success); | ||
} | ||
|
||
/** | ||
* Generate a key for the Turnstile cache. | ||
* | ||
* @param string $policyId The policy ID | ||
* @param string $clientIp The client IP | ||
* | ||
* @return string The cache key | ||
*/ | ||
protected function getTurnstileCacheKey($policyId, $clientIp) | ||
{ | ||
$key = $policyId . '--' . $clientIp; | ||
$key = str_replace('.', '-', $key); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The (temporary) cache used now complained about the dots in the IP address. This is possibly moot once we have a real cache, depending what its rules are for keys. |
||
return $key; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did I not use VuFind's existing Captcha logic? Mostly because I really see this an extension of RateLimiter and want to take advantage of all the existing config logic there (ipRanges, loggedIn, filters, etc.), as well as the way Bootstrapper prevents the page from loading until it's satisfied -- none of that is in Captcha. That said there may be a way to combine the two existing services, if someone thinks I should look in that direction. We discussed this briefly in Slack and I did reference config.ini captchas below.