Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RateLimiter: Use Cloudflare Turnstile to detect bots #4079

Draft
wants to merge 6 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions config/vufind/RateLimiter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ Storage:
#redis_version : 6
#redis_standalone : true

# Namespace for Turnstile result cache (default is Turnstile)
# turnstileNamespace: Turnstile

# Time-to-live (seconds) for Turnstile result cache (default is 86400, i.e. 1 day)
# turnstileTtl: 86400

# Policies define the actual rate limiting settings. The request is checked against
# the list of policies, and the first matching policy is used.
# Keys under "Policies" are used as rate limiter IDs.
Expand Down Expand Up @@ -82,6 +88,13 @@ Storage:
# limit: 500
# rate: { interval: '10 minutes', amount: 100 }
#
# turnstileRateLimiterSettings Defined like rateLimiterSettings, but when the limit is
# reached, a Turnstile challenge is used instead of simply
# returning a 429. The challenge result is cached.
# Failing the challenge results in standard 429 behavior.
# Passing the challenge bypasses this check, but note the
# rateLimiterSettings quota is still separately applied.
#
# addHeaders Whether to add X-RateLimit-* headers (default is false)
#
# reportOnly If set to true, will not enforce the policy even if the main
Expand Down Expand Up @@ -130,3 +143,17 @@ Policies:
policy: token_bucket
limit: 2000
rate: { interval: '10 minutes', amount: 400 }

# Cloudflare Turnstile is a cloud-based "CAPTCHA alternative" to detect bots.
Copy link
Member Author

@maccabeelevine maccabeelevine Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did I not use VuFind's existing Captcha logic? Mostly because I really see this an extension of RateLimiter and want to take advantage of all the existing config logic there (ipRanges, loggedIn, filters, etc.), as well as the way Bootstrapper prevents the page from loading until it's satisfied -- none of that is in Captcha. That said there may be a way to combine the two existing services, if someone thinks I should look in that direction. We discussed this briefly in Slack and I did reference config.ini captchas below.

# https://developers.cloudflare.com/turnstile/
# These are the global settings for Turnstile. See also the turnstile config above:
# - within each policy, to enable turnstile for that policy
# - storage settings for the result cache
# Turnstile:
# These two keys are required. See also values they can be set to for testing purposes:
# https://developers.cloudflare.com/turnstile/troubleshooting/testing/
# siteKey: 0x1234567890
# secretKey: 0x1234567890

# Verify API URL. Default is https://challenges.cloudflare.com/turnstile/v0/siteverify
# verifyUrl: https://challenges.cloudflare.com/turnstile/v0/siteverify
Comment on lines +160 to +161
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels silly to write the URL twice here and again as the default in the code, when honestly I'm not sure it should be configurable in the first place (if the URL changes, there's likely some logic within that will also change). Remove?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth making this configurable. I could imagine region-specific URLs or even something like an open source "work-alike" service that implements the same API. Probably not likely, but it doesn't hurt to have an easy configuration point just in case, so if the situation arises, we don't have to do any work to react to it. That being said, if you don't want to include the URL twice in the configuration, you could say "The example below shows the default value" (or something like that).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense...but if I'm planning for a work-alike service, would it also share the JS https://challenges.cloudflare.com/turnstile/v0/api.js loaded in the template? I'm thinking I should make that configurable as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. I would always avoid hard-coded URLs when possible.

4 changes: 4 additions & 0 deletions module/VuFind/config/module.config.php
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@
'VuFind\Controller\SummonController' => 'VuFind\Controller\AbstractBaseFactory',
'VuFind\Controller\SummonrecordController' => 'VuFind\Controller\AbstractBaseFactory',
'VuFind\Controller\TagController' => 'VuFind\Controller\AbstractBaseFactory',
'VuFind\Controller\TurnstileController' => 'VuFind\Controller\AbstractBaseFactory',
'VuFind\Controller\UpgradeController' => 'VuFind\Controller\UpgradeControllerFactory',
'VuFind\Controller\WebController' => 'VuFind\Controller\AbstractBaseFactory',
'VuFind\Controller\WorldcatController' => 'VuFind\Controller\AbstractBaseFactory',
Expand Down Expand Up @@ -346,6 +347,8 @@
'summonrecord' => 'VuFind\Controller\SummonrecordController',
'Tag' => 'VuFind\Controller\TagController',
'tag' => 'VuFind\Controller\TagController',
'Turnstile' => 'VuFind\Controller\TurnstileController',
'turnstile' => 'VuFind\Controller\TurnstileController',
'Upgrade' => 'VuFind\Controller\UpgradeController',
'upgrade' => 'VuFind\Controller\UpgradeController',
'Web' => 'VuFind\Controller\WebController',
Expand Down Expand Up @@ -826,6 +829,7 @@
'Search2/Versions', 'SimulatedSSO/Login',
'Summon/Advanced', 'Summon/FacetList', 'Summon/Home', 'Summon/Search',
'Tag/Home',
'Turnstile/Challenge', 'Turnstile/Verify',
'Upgrade/ConfirmDeprecatedColumns',
'Upgrade/FixAnonymousTags', 'Upgrade/FixDuplicateTags',
'Upgrade/FixConfig', 'Upgrade/FixDatabase', 'Upgrade/FixMetadata',
Expand Down
17 changes: 15 additions & 2 deletions module/VuFind/src/VuFind/Bootstrapper.php
Original file line number Diff line number Diff line change
Expand Up @@ -394,8 +394,21 @@ protected function initRateLimiter(): void
$result = $rateLimiterManager->check($event);
if (!$result['allow']) {
$response = $event->getResponse();
$response->setStatusCode(429);
$response->setContent($result['message']);
if ($result['performTurnstileChallenge'] ?? false) {
$response->setStatusCode(307);
$policyId = $rateLimiterManager->checkPolicyUsesTurnstile($event);
$context = base64_encode(json_encode([
'policyId' => $policyId,
'destination' => $event->getRequest()->getUri()->getPath(),
Copy link
Member Author

@maccabeelevine maccabeelevine Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could just display the challenge right now. I'm redirecting to a /Turnstile/Challenge URL so the requested page is not accessible to Turnstile's javascript code, so it can't be sent to Cloudflare as part of the signal. (This also assumes that referrer is blocked.) Combining these params and base64-encoding them is an attempt at further obfuscation in case it uses the query parameters as signals as well. Of course it's still decode-able, but it seemed worthwhile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth putting some of this reasoning into a comment in the code for future reference... and/or making a separate redirectToTurnstile method so it's easy to override/customize the behavior if desired.

]));
$response->getHeaders()->addHeaderLine(
'Location',
'/vufind/Turnstile/Challenge?context=' . $context
maccabeelevine marked this conversation as resolved.
Show resolved Hide resolved
);
} else {
$response->setStatusCode(429);
$response->setContent($result['message']);
}
$event->stopPropagation(true);
return $response;
}
Expand Down
118 changes: 118 additions & 0 deletions module/VuFind/src/VuFind/Controller/TurnstileController.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
<?php

/**
* Turnstile Controller
*
* PHP version 8
*
* Copyright (C) Villanova University 2024.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*
* @category VuFind
* @package Controller
* @author Maccabee Levine <[email protected]>
* @license http://opensource.org/licenses/gpl-2.0.php GNU General Public License
* @link https://vufind.org Main Page
*/

namespace VuFind\Controller;

use Laminas\Log\LoggerAwareInterface;
use VuFind\Log\LoggerAwareTrait;
use VuFindHttp\HttpServiceAwareInterface;
use VuFindHttp\HttpServiceAwareTrait;

/**
* Controller Cloudflare Turnstile access checks.
*
* @category VuFind
* @package Controller
* @author Maccabee Levine <[email protected]>
* @license http://opensource.org/licenses/gpl-2.0.php GNU General Public License
* @link https://vufind.org Main Page
*/
class TurnstileController extends AbstractBase implements
HttpServiceAwareInterface,
LoggerAwareInterface
{
use HttpServiceAwareTrait;
use LoggerAwareTrait;

/**
* Present the Turnstile challenge to the user
*
* @return mixed
*/
public function challengeAction()
{
$context = json_decode(base64_decode($this->params()->fromQuery('context')), true);

$yamlReader = $this->getService(\VuFind\Config\YamlReader::class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of the "thin controller" principle, I wonder if we'd be better off having a Turnstile-specific service that has the configuration injected by a factory... then we could get all this factory and business logic out of the controller, and make the controller itself a lot simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I thought of loading the config in a TurnstileControllerFactory, didn't do that just as it didn't seem the pattern (I see only one controller factory). I also considered doing a Turnstile service at one stage but didn't see the advantage, but I didn't get that we were trying to keep the controllers themselves to a minimum -- I guess that does make sense in an optimal MVC context. I'll iterate on this, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of technical debt in our controllers, so the patterns you see are definitely not the patterns we'd necessarily like to have in the long run. I just saw an opportunity to do this in a more sustainable way and figured it might be worth doing since we're in a green field anyway. ;-)

$config = $yamlReader->get('RateLimiter.yaml');
$context['siteKey'] = $config['Turnstile']['siteKey'];

$this->layout()->searchbox = false;
return $this->createViewModel($context);
}

/**
* Verify the Turnstile widget result against the Turnstile backend
*
* @return mixed
*/
public function verifyAction()
{
$token = $this->params()->fromPost('token');
$policyId = $this->params()->fromPost('policyId');
$destination = $this->params()->fromPost('destination');

// Call the Turnstile verify API to validate the token
$yamlReader = $this->getService(\VuFind\Config\YamlReader::class);
$config = $yamlReader->get('RateLimiter.yaml');
$secretKey = $config['Turnstile']['secretKey'];
$url = $config['Turnstile']['verifyUrl'] ??
'https://challenges.cloudflare.com/turnstile/v0/siteverify';
$body = [
'secret' => $secretKey,
'response' => $token,
];
$response = $this->httpService->post(
$url,
json_encode($body),
'application/json'
);

if ($response->isOk()) {
$responseData = json_decode($response->getBody(), true);
$success = $responseData['success'];
} else {
// Unexpected error. Treat as a positive result, since it's not the user's fault.
$this->logWarning('Verification process failed, allowing traffic: '
. $response->getStatusCode() . $response->getBody());
$success = true;
}

// Save the Turnstile result for future requests
$rateLimiterManager = $this->getService(\VuFind\RateLimiter\RateLimiterManager::class);
$rateLimiterManager->setTurnstileResult(
$policyId,
$this->event->getRequest()->getServer('REMOTE_ADDR'),
$success
);

// Either way, return a http redirect to the referrer page.
return $this->redirect()->toUrl($destination);
}
}
99 changes: 94 additions & 5 deletions module/VuFind/src/VuFind/RateLimiter/RateLimiterManager.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
namespace VuFind\RateLimiter;

use Closure;
use Laminas\Cache\Storage\StorageInterface;
use Laminas\EventManager\EventInterface;
use Laminas\Log\LoggerAwareInterface;
use Laminas\Mvc\MvcEvent;
Expand Down Expand Up @@ -72,17 +73,19 @@ class RateLimiterManager implements LoggerAwareInterface, TranslatorAwareInterfa
/**
* Constructor
*
* @param array $config Rate limiter configuration
* @param string $clientIp Client's IP address
* @param ?int $userId User ID or null if not logged in
* @param Closure $rateLimiterFactoryCallback Rate limiter factory callback
* @param IpAddressUtils $ipUtils IP address utilities
* @param array $config Rate limiter configuration
* @param string $clientIp Client's IP address
* @param ?int $userId User ID or null if not logged in
* @param Closure $rateLimiterFactoryCallback Rate limiter factory callback
* @param StorageInterface $turnstileCache A cache for Turnstile results
* @param IpAddressUtils $ipUtils IP address utilities
*/
public function __construct(
protected array $config,
protected string $clientIp,
protected ?int $userId,
protected Closure $rateLimiterFactoryCallback,
protected StorageInterface $turnstileCache,
protected IpAddressUtils $ipUtils
) {
$this->clientLogDetails = "ip:$clientIp";
Expand Down Expand Up @@ -146,6 +149,16 @@ public function check(EventInterface $event): array
// We have a policy matching the route, so check rate limiter:
$limiter = ($this->rateLimiterFactoryCallback)($this->config, $policyId, $this->clientIp, $this->userId);
$limit = $limiter->consume(1);
if ($this->config['Policies'][$policyId]['turnstileRateLimiterSettings'] ?? false) {
$turnstileLimiter = ($this->rateLimiterFactoryCallback)(
$this->config,
$policyId,
$this->clientIp,
$this->userId,
'turnstileRateLimiterSettings'
);
$turnstileLimit = $turnstileLimiter->consume(1);
}
$result = [
'allow' => true,
'requestsRemaining' => $limit->getRemainingTokens(),
Expand All @@ -171,6 +184,15 @@ public function check(EventInterface $event): array
]
);
}
if (isset($turnstileLimit)) {
$priorTurnstileResult = $this->checkPriorTurnstileResult($policyId, $this->clientIp);
if (!$priorTurnstileResult && !$turnstileLimit->isAccepted()) {
$result['allow'] = false;
$result['message'] = $this->getTooManyRequestsResponseMessage($event, $result);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any need for a different message when Turnstile is the cause of the rate limit failure, but someone may disagree.

$result['performTurnstileChallenge'] = ($priorTurnstileResult === null);
return $result;
}
}
if ($limit->isAccepted()) {
return $result;
}
Expand Down Expand Up @@ -199,6 +221,9 @@ public function check(EventInterface $event): array
*/
protected function getPolicyIdForEvent(MvcEvent $event): ?string
{
if ($event->getRouteMatch()->getParams()['controller'] == 'Turnstile') {
return null;
}
$isCrawler = null;
foreach ($this->config['Policies'] ?? [] as $name => $settings) {
if (null !== ($loggedIn = $settings['loggedIn'] ?? null)) {
Expand Down Expand Up @@ -318,4 +343,68 @@ protected function isCrawlerRequest(MvcEvent $event): bool
$crawlerDetect = new \Jaybizzle\CrawlerDetect\CrawlerDetect();
return $crawlerDetect->isCrawler($agent);
}

/**
* Check whether the RateLimiter policy for this event uses Turnstile.
*
* @param MvcEvent $event Request event
*
* @return bool
*/
public function checkPolicyUsesTurnstile($event)
{
$policyId = $this->getPolicyIdForEvent($event);
if ($this->config['Policies'][$policyId]['turnstileRateLimiterSettings'] ?? false) {
return $policyId;
}
return false;
}

/**
* Check for a prior, cached result from Turnstile under this client IP and policy.
*
* @param string $policyId The policy ID
* @param string $clientIp The client IP
*
* @return ?bool Null if there is no prior result, or if Turnstile is disabled;
* otherwise a boolean representing the Turnstile result.
*/
protected function checkPriorTurnstileResult($policyId, $clientIp)
{
if (!($this->config['Policies'][$policyId]['turnstileRateLimiterSettings'] ?? false)) {
return null;
}
$cacheKey = $this->getTurnstileCacheKey($policyId, $clientIp);
return $this->turnstileCache->getItem($cacheKey);
}

/**
* Store a Turnstile result for this client IP and policy.
*
* @param string $policyId The policy ID
* @param string $clientIp The client IP
* @param bool $success The result to store
*
* @return void
*/
public function setTurnstileResult($policyId, $clientIp, $success)
{
$cacheKey = $this->getTurnstileCacheKey($policyId, $clientIp);
$this->turnstileCache->setItem($cacheKey, $success);
}

/**
* Generate a key for the Turnstile cache.
*
* @param string $policyId The policy ID
* @param string $clientIp The client IP
*
* @return string The cache key
*/
protected function getTurnstileCacheKey($policyId, $clientIp)
{
$key = $policyId . '--' . $clientIp;
$key = str_replace('.', '-', $key);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (temporary) cache used now complained about the dots in the IP address. This is possibly moot once we have a real cache, depending what its rules are for keys.

return $key;
}
}
Loading
Loading