Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScenarioDataStore broken in version 0.6.0 - race condition. #204

Closed
jensakejohansson opened this issue Feb 7, 2024 · 19 comments
Closed

Comments

@jensakejohansson
Copy link

@sriv When adding values to data store and then retrieving the value Gauge returns null intermittently. Seems to be race condition.

Related to previous PR to handle async methods: #199

Example to recreate problem can be found here: https://github.com/jensakejohansson/gauge-async

When I execute the scenario it seems to fail in about 2/3 of executions. It's just 2 steps. One that adds a value to the store and the next step tries to retrieve the stored value again.

@sriv
Copy link
Member

sriv commented Feb 7, 2024

thanks for reporting, I will investigate. the async change was big and I was surprised that nothing broke. The randomness explains why the tests didnt catch it.

@jensakejohansson
Copy link
Author

I think you should consider rolling back the release 0.6.0 if possible. People who runs test executions in a pipeline might just install the latest plugin version by default and then suddenly their tests starts failing due to a not so clear reason. I just got a mail about that... 😬

@sriv
Copy link
Member

sriv commented Feb 7, 2024

Done, 0.6.0 is marked as a pre-release and the release metadata has been reverted from gauge-repository.

Apologies for the mess.

@sriv
Copy link
Member

sriv commented Feb 8, 2024

I did some investigation. Thanks to your sample project I could replicate the issue. I believe it is because the datastores use a ThreadLocal<ConcurrentDictionary> to hold the data, and threadlocal+async do not go well together.

I will try changing the Gauge.CSharp.Lib project to use something else instead of a ThreadLocal<ConcurrentDictionary> and check.

@sriv
Copy link
Member

sriv commented Feb 13, 2024

I did some analysis - and my suspicion was right. Gauge.CSharp.Lib provides the types for DataStores which internally uses a ThreadLocal<ConcurrentDictionary>

The context here is that the initial releases of gauge supported parallel execution by bringing up as many runner processes as the number of streams. This had some limitation in terms of the footprint used and more importantly not very container friendly.

Subsequently, gauge supported multithreaded parallel execution for languages that have good multi threading support. However there was a bit of backward compatibility to be honoured. In a multi process execution, each stream had its own datastore instantiated. In a multi threaded environment, there was one datastore shared across streams causing contention and race conditions. As a result, to mimic the multi process datastore setup, it was decided to make the datastores threadlocal.

In the async world, the threadlocal does not help because there is no guarantee that tasks will run on the same thread because of the design of async. The reasonable replacement for this would be AsyncLocal which ensures isolation between the async runcontexts. But this may not help here too much.

So the choice is - break backward compatibility and have single datastores per process or not have the async support.

I am investigating if there is a way to do both, but I am not very hopeful given the fundamental difference between async paradigm and a sticky thread execution.

I see 2 options for this:

  1. Introduce a separate datastore API for Async
  2. Ditch the threadlocal and let the runtime control the async read/writes.

Happy to hear suggestions.

/cc @getgauge/core (if anyone has any ideas on this)

@jensakejohansson
Copy link
Author

Ouch! I see the problem.

Is it a correct understanding that option 2 above would solve the async problem and parallelization could be still be done, but less efficient by spawning processes instead of threads - potentially breaking some sort of backward compatibility in Gauge?

If so, then to me personally that would be the best option since async support (and a clean/simple to use API) is more important to us than efficient parallelization. We primarily run end-2-end user interface based test scenarios in Gauge, seldom in parallel at all, and if so, only at a very limited scale where some extra overhead would be totally acceptable. The lack of async support however is a major thing since many 3rd-party libraries/tools expose async APIs.

However, this is our specific use case, who am I to say what is best for all users?...

@mpekurny
Copy link
Contributor

mpekurny commented Apr 3, 2024

I would agree with above. I think option 2, having async work would be beneficial.

@jensakejohansson
Copy link
Author

jensakejohansson commented Jun 29, 2024

@sriv I have to come back to this issue again. Would you have the possibility to implement a solution for this so async is supported, even if it means compromises and it's not an 'optimal' solution? We really need a way forward here and any help would be much appreciated!

Best regards,

@sriv
Copy link
Member

sriv commented Jul 2, 2024

apologies @jensakejohansson - work has kept me very busy. I will try to see if I can spend some time over the weekend. Appreciate your patience.

@Necobee
Copy link

Necobee commented Aug 2, 2024

Hello @sriv - I came across this issue and I noticed that there have been no updates since Jul 2. Do you know if it is still in the plans to implement option#2?

@jensakejohansson
Copy link
Author

@sriv Sorry for nagging, but I guess you haven't have the possibility to address this?

It's hard for me to argue for using Gauge in new projects if it cannot handle async libraries. We'd like provide a solution ourselves, but I think it will be difficult due to our limited knowledge of all the technical details and history / potential backward compatibilities etc. related to this.

However, if you feel you are unable to address this I guess we will give it try and do our best (when we find time). If so, any guidance of what changes needs to be done and other pieces of advice are welcome.

Best regards,

@mpekurny
Copy link
Contributor

mpekurny commented Sep 6, 2024

@sriv I looked over the PR you added to the library, and I have some concerns about the solution proposed. While it might "work", I think it would have a very negative impact on running gauge multi-threaded. I am not sure if this is a problem, maybe most people do not run Gauge multi-threaded.

I have been working on a different solution to the async/await problem which would work similar to how the .NET gauge solution works now, but is not backwards compatible between the library and plugin (meaning you would have to upgrade both at the same time); however, it would just work without any new settings or changes to client's tests.

@sriv
Copy link
Member

sriv commented Sep 6, 2024

@mpekurny There were issues reported with multithreading and I had to rollback the release. So no surprises there :).

I think I am ok to let go of backward compatibility in this particular case for mainly the reason that no one has spoken strongly about it in this scenario and I would imagine that the refactoring to accomodate the new changes would not be drastic.

Would you be open to raise a PR for the components?

@jensakejohansson - I was thinking about a way to handle the backward and raised a PR for Lib. But I am interested in @mpekurny's idea.

@mpekurny
Copy link
Contributor

mpekurny commented Sep 6, 2024

@sriv @jensakejohansson @chadlwilson

Would you be open to raise a PR for the components?

Yes, when it's ready. With other work commitments, I am probably another week or so before I would be ready to raise PRs for the changes (it would require changes in both the plugin and library). If you could hold off merging in your changes until then?

@mpekurny
Copy link
Contributor

@sriv @jensakejohansson

I am doing a little bit more testing but I am just about ready to push up the code and create PRs. I expect to have PRs sometime Monday. Thanks for your patience.

Marty

@jensakejohansson
Copy link
Author

@mpekurny Great! I'm looking forward to it! 👍

@mpekurny
Copy link
Contributor

@jensakejohansson have you tried out the new async update? While it seems that there might be a new issue described here, #241, can we close this issue?

@jensakejohansson
Copy link
Author

jensakejohansson commented Oct 22, 2024

Yes, I've tried it and found no problems that I believe is related to the update, so close this. Thanks for your great work!

I've seen #241 too, but at the moment I'm seing that when running on a machine with Gauge 1.6.6 / dotnet 0.5.7. Meaning, it's not realiable in some circumstances, but unrelated to this update in my experience.

@chadlwilson
Copy link
Contributor

OK, will close this for now, as I am getting confused :-) I suppose getgauge/gauge-csharp-lib#44 isn't required now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants