-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update locking and caching for OpenIdConnectCachingSecurityTokenProvider #3118
base: master
Are you sure you want to change the base?
Conversation
99cfb8d
to
7652290
Compare
SummarySummary
CoverageMicrosoft.Identity.Web - 63.6%
Microsoft.Identity.Web.Certificate - 41.4%
Microsoft.Identity.Web.Certificateless - 40.1%
Microsoft.Identity.Web.Diagnostics - 10.2%
Microsoft.Identity.Web.DownstreamApi - 14.5%
Microsoft.Identity.Web.MicrosoftGraph - 42%
Microsoft.Identity.Web.Test.Common - 69.3%
Microsoft.Identity.Web.TokenAcquisition - 53.2%
Microsoft.Identity.Web.TokenCache - 80.8%
|
SummarySummary
CoverageMicrosoft.Identity.Web - 63.6%
Microsoft.Identity.Web.Certificate - 41.4%
Microsoft.Identity.Web.Certificateless - 40.1%
Microsoft.Identity.Web.Diagnostics - 10.2%
Microsoft.Identity.Web.DownstreamApi - 14.5%
Microsoft.Identity.Web.MicrosoftGraph - 42%
Microsoft.Identity.Web.Test.Common - 69.3%
Microsoft.Identity.Web.TokenAcquisition - 53.2%
Microsoft.Identity.Web.TokenCache - 80.8%
|
{ | ||
_synclock.EnterWriteLock(); | ||
try | ||
_syncLock.EnterReadLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't ConfigurationManagerBase already control the threadings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for reading/writing the fields in this class. This is to ensure they aren't changed in the middle of reading leading to unexpected behavior. The ConfigurationManager
handles all the threading for making the actual metadata refresh call as needed internally, but that won't help here unless we change this to not cache the issuer and keys and instead just always call into the config manager (e.g., public string? Issuer => _configManager.GetConfigurationAsync().Result.Issuer
), but it's my understanding that that is the issue being reported in the first place.
SummarySummary
CoverageMicrosoft.Identity.Web - 63.6%
Microsoft.Identity.Web.Certificate - 41.4%
Microsoft.Identity.Web.Certificateless - 40.1%
Microsoft.Identity.Web.Diagnostics - 10.2%
Microsoft.Identity.Web.DownstreamApi - 14.5%
Microsoft.Identity.Web.MicrosoftGraph - 42%
Microsoft.Identity.Web.Test.Common - 69.3%
Microsoft.Identity.Web.TokenAcquisition - 53.2%
Microsoft.Identity.Web.TokenCache - 80.8%
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I don't have much experience with multi-threading programming, locking, or synchronization.
First, thanks a lot for looking at this quickly after the issue was reported, it's much appreciated.
Overall, I think this version will improve the situation, since it heavily reduces the number of times we'll enter in a write lock.
I still have some concerns / questions.
Read lock
This is to ensure they aren't changed in the middle of reading leading to unexpected behavior.
I was thinking about this; is there a possibility that a thread reads the value of a variable while it's being written to by another thread, leading to a corrupt value being assigned?
I was under the impression that both variable assignment and read are atomic operations in .NET.
If they are, I'd question the need for a read lock altogether.
Caching
I see this is managing a cache of its own through syncAfter
to minimize the number of times we're calling _configManager.GetConfigurationAsync
.
It feels to me that we're trying to recreate what ConfigurationManager
already implements, since most calls to GetConfigurationAsync
will return cached data.
Kind of related, @msbw2 also mentioned (emphasis mine):
[...] won't help here unless we change this to not cache the issuer and keys and instead just always call into the config manager (e.g., public string? Issuer => _configManager.GetConfigurationAsync().Result.Issuer), but it's my understanding that that is the issue being reported in the first place.
I don't think that's what @alex-set and myself reported. The root issue is that under sufficient sustained concurrency, the original code leads to deadlock-style situations because of excessive locking and unbound synchronization, in the sense that threads will wait forever to acquire a read or write lock, without a timeout.
Calling GetConfigurationAsync
on a threadpool thread every time is not ideal, I agree. However I'm not entirely sold on building our own cache on top of ConfigurationManager
to avoid it.
Proposal / Request for review
We had to push a hand-rolled fix for this, and landed on a solution that has the following characteristics:
- Lightweight write lock through
Interlocked.Exchange
. - If a thread can't acquire a write lock, it will not wait, and will return the current issuer and signing keys.
- No read lock.
The rationale behind this is that:
- My understanding is that for a given tenant, the issuer never changes.
- New signing keys are advertised way before they're being actually used to sign new tokens.
Here's the implementation (let me know if you'd like me to open a PR for a more practical discussion):
Click to expand
public class OpenIdConnectCachingSecurityTokenProvider : IIssuerSecurityKeyProvider
{
private int _lock = 0;
public ConfigurationManager<OpenIdConnectConfiguration> _configManager;
private string _issuer;
private IEnumerable<SecurityKey> _keys;
public OpenIdConnectCachingSecurityTokenProvider(string metadataEndpoint)
{
_configManager = new ConfigurationManager<OpenIdConnectConfiguration>(metadataEndpoint, new OpenIdConnectConfigurationRetriever());
RetrieveMetadata();
}
/// <summary>
/// Gets the issuer the credentials are for.
/// </summary>
/// <value>
/// The issuer the credentials are for.
/// </value>
public string Issuer
{
get
{
RetrieveMetadata();
return _issuer;
}
}
/// <summary>
/// Gets all known security keys.
/// </summary>
/// <value>
/// All known security keys.
/// </value>
public IEnumerable<SecurityKey> SecurityKeys
{
get
{
RetrieveMetadata();
return _keys;
}
}
private void RetrieveMetadata()
{
// Try to acquire the lock
//
// Interlocked.Exchange returns the original value of _lock before it was swapped.
// If it's 0, it means it went from 0 to 1, so we did acquire the lock.
// If it's 1, then the lock was already acquired by another thread.
//
// See the example in the Exchange(Int32, Int32) overload: https://learn.microsoft.com/en-us/dotnet/api/system.threading.interlocked.exchange?view=netframework-4.7.2
if (Interlocked.Exchange(ref _lock, 1) == 0)
{
OpenIdConnectConfiguration config = Task.Run(_configManager.GetConfigurationAsync).Result;
_issuer = config.Issuer;
_keys = config.SigningKeys;
// Release the lock
Interlocked.Exchange(ref _lock, 0);
}
}
}
It's still early days, but we've been running this in production for a few days now, it's been OK under concurrency levels that brought the application down before.
I'm super interested in your opinion, feel free to point flaws / gaps in this implementation.
Likewise.
Looking into this it looks like reading and writing for both
Yes basically.
I see what you mean. Tying this back to the question of atomicity, I think we need some lock.
Sure. My change was mainly in the spirit of the intent behind the original implementation, but I do agree with what you're stating. It makes sense that the
The rationales make sense. I'm just unsure on the atomicity of reads and writes since the StackOverflow post seems to suggest it's not guaranteed in this case hence a read and write lock is necessary. |
Never mind. All of these are reference types, and the references should fit into a native word, so the locks aren't necessary. Your implementation is what I would do to remove the locks, so you can create a PR with your changes. Thanks @keegan-caruso for pointing out the reference types and sizes. |
Thanks a lot for verifying this, it's much appreciated. I opened a PR at #3124. |
To be clear, I think the biggest problem here is that ConfigurationManager is an async class and the IIssuerSecurityKeyProvider interface to which the OpenIdConnectCachingSecurityTokenProvider has to adhere is not async. If the interface had async get methods, this would be a non-issue. Under the current, larger design, no matter how you get this class to work, it will violate someone's rules for task concurrency. It's probably not realistic to address those issues in the short term, so I'll go back to talking about this specific class. I agree with everything @mderriey stated. Though one thing that gives me hesitation about the implementation he suggests is the use of Task.Run in RetrieveMetadata. It would mean that every call to either of the properties will put work on a separate thread even though the called async method will very rarely ever do any async work since it usually just returns the cached config. However, the passive locking on calling the task ensures that it will never enter the thread starvation issue from which I'm still traumatized. Here's what I used and it's been working fine for us:
One thing that's important to note is that the ConfigManager.GetConfigurationAsync method will only ever do async task work during the initial config retrieval. All other calls to the method will return the current config, though if it's time to refresh the config, it will start that on a separate thread, but it will still return the existing config rather than waiting for the new one. This means that GetConfigurationAsync will always returns a completed task after the first call to it completes. So when the class is constructed, we can just preserve the initial retrieval task. When the config is needed later, if the initial task is still running, we know that we have to wait for that to complete. If that task has already completed, we can just call GetConfigurationAsync again and know that it will return a completed task from which can just return the result. The risk here is that it depends on the internal functionality of the ConfigurationManager and I don't think there's a guarantee that it wouldn't change in future versions. If it did change and the call to GetConfigurationAsync returned a task that is still running when we expect it to return a completed task, calling Result on it could theoretically cause a deadlock. However, I would argue that this class is likely only accessed from deep on async-heavy library call stacks, and it's highly unlikely that it could be called while attached to a synchronization context that could cause a deadlock. If you want to stick with the Task.Run-always version instead of something like what I'm doing here, I have no problem with that. Like I said, it fixes the major bug that started the whole conversation. My remaining concerns are trivial by comparison. |
This makes sense to me.
The only thing I don't like here is the reference to the initial task that is kept for the entire lifetime of the class. Once |
One thing that finally occurred to me about the proposed implementation is that it calls RetrieveMetadata from the constructor which will block until the initial configuration loads. I think that's probably fine, and as a helpful side effect, it also makes the ConfigManager much easier to work with because the call to GetConfigurationAsync will always return a completed task without any async blocking. It handles keeping the configuration up to date separately, though it always returns the most recently cached version. I think this could be an incredibly simple class. It shouldn't ever need to cache config requests, because that's exactly what the ConfigurationManager is already doing. It doesn't need to be concerned with concurrency for the reasons I already mentioned. With caching and concurrency no longer factors, there's no reason to worry about locking either. Obviously, it doesn’t need to maintain a reference to the initial retrieval task either, because that completes in the class constructor. Here's what I have running in our moderately high-volume production environment:
There still may be a valid question about whether this relies too much on the internal functionality of a separate library. Specifically, what if a future version of the ConfigurationManager’s GetConfigurationAsync method could occasionally block to do async work after the first retrieval? In that instance, calling Result on the task could potentially deadlock if a certain type of synchronization context were in effect. Here’s a potential way to safeguard against that without adding much overhead or complexity.
I know it looks a little hacky. Keep in mind that with the current implementation of ConfigurationManager, the returned task is always completed, so the second half of this method is effectively unreachable. Even if the GetConfigurationAsync implementation changed, I think it would be safe to assume it would still almost always return a completed task with the current configuration. As an added precaution, we can check if the returned task is still running in which case we would just want to restart the call from a separate thread using Task.Run, from which we can safely collect the configuration result. The important part is that we only call Task.Run when it’s truly necessary. Sorry if I’m still laboring over some unimportant details. Hopefully this is helpful in some way, but I can drop it now that I’ve said my piece. |
From my perspective, the more people chiming in the better, and I appreciate you thinking about potential ways to make this simpler. I like your thinking, especially the perspective of having this class not use any locks or synchronization. I can't speak about the dependency on the internals of It's probably not important for the team here, but I'm in a position where we don't use Microsoft.Identity.Web.OWIN because reasons, and we vendored in the original implementation in our project. If your proposal gets in, I'm not certain we're in a position to use it as-is because the version of Thanks again. |
I didn't even consider the functionality of an older version of ConfigManager. I just looked at the history and you're totally right; the functionality on which I was relying is a newer development. With that in mind, I think we have to use locking and caching to safely avoid thread pool starvation. In fact, the only thing I would really change about your original proposal is the use of Task.Run in RetrieveMetadata (which also means additional considerations for exception handling). This is what I have running now:
The call to GetConfigurationAsync still usually returns a completed task even with older versions (I think the default update interval is 12 hours). When it doesn't return a completed task, we can just schedule the continuation on a thread pool task and return the cached config without waiting for the result. That also means that if the async continuation has an exception, we have to preserve it and throw it on the next call to one of the getters. I admit that this adds some non-trivial logic just to avoid Task.Run which may be overkill when passive locking is used to prevent overlapping calls. My only argument is that Task.Run is still really expensive to use in most calls to a property getter. I won't be offended if I don't persuade anyone. |
@alex-set Which of the two PRs do you think is closer to your proposal? You could leave comments to try to move it in your direction. Or, you could create your own PR with your suggestion. |
Update locking and caching for
OpenIdConnectCachingSecurityTokenProvider
. Fixes #3078.