Polling for lvm2 objects might deliver results out of order #966

mvollmer · 2022-04-04T12:27:13Z

We started noticing a failure in the Cockpit CI tests for lvm2. UDisks2 would fail with

Error waiting for logical volume object for 'lvol0': Timed out waiting for object

The ultimate reason for this was that multiple vgs_task_func instances were running concurrently and would deliver their results out of order.

The details are like this:

The test creates a VG names "TEST1", then does some stuff with it, then deletes it, then creates a new vgroup called "vgroup0", then creates a logical volume named "lvol0" in it. (All very quickly.) UDisks2 would quite often fail at this point with the timeout error mentioned above.

UDisks2 would follow along by running vgs_task_fun and lvs_task_func, each instance in a new thread.

This sequence happened:

Instance A of vg_task_func sees that there are no volume groups after TEST1 has been deleted.
Instance B sees that vgroup1 exists
Instance C sees that vgroup1 exists
The results of B are delivered, object O1 is created for vgroup0
A method is called on O1 to create "lvol0"
The method starts waiting for an object to show up in O1 for "lvol0"
The results of A are delivered, O1 is destroyed.
The results of C are delivered, O2 is created for vgroup0
lvs_task_func runs, a object for "lvo0" is eventually created and attached to O2
The method times out because O1 never gets to see an object for "lvol0".

The text was updated successfully, but these errors were encountered:

tbzatek · 2022-04-04T13:04:34Z

Correct, I was also seeing similar issues when doing simple load tests. My old attempt to improve things was: #814 but it was more or less just a workaround and I was not fully satisfied with the result. I have some more ideas for a proper solution going down to the udisks modular design, ultimately time and prioritization is my only problem now.

FYI the current code is more or less unchanged storaged legacy of yours :-)

mvollmer · 2022-04-05T06:53:43Z

My old attempt to improve things was: #814 but it was more or less just a workaround and I was not fully satisfied with the result.

I'll try to produce a simple PR that restores correctness without any attempt at optimization.

FYI the current code is more or less unchanged storaged legacy of yours :-)

Heh! :-) I am pretty sure it was much simpler back then, without so much asynchronicity. It was also probably much slower.

mvollmer · 2022-04-05T07:01:35Z

FYI the current code is more or less unchanged storaged legacy of yours :-)

Heh! :-) I am pretty sure it was much simpler back then, without so much asynchronicity. It was also probably much slower.

Oh, no, I just had a look and you are right! We did spawn a process instead of a thread, but otherwise it was all the same. So I am grumpy at myself!

Fixes storaged-project#966

For both vgs_task_func and lvs_task_func. Fixes storaged-project#966

Fixes storaged-project#966

tbzatek · 2022-04-05T12:00:09Z

So there are more optimizations that I tried to address in #814: if an update is already running, avoid spawning new one, wait for its result and then compare timestamps of a last uevent received for the particular object. This is to prevent running multiple lvs/vgs at the same time when we knew we'd throw out the results anyway since a new uevent has been received in the meantime. With a particularly large setup (1000+ LVs) and burst of uevents I've seen the updates slowly finishing even after 20 minutes - so many updates have been spawned. So this was a kind of ratelimiting to ensure reasonable responsiveness.

All of this was just a workaround anyway. The logic is telling me that it's not always necessary to update whole system picture, only look up potentially related devices and update those. That was my proposal for a proper solution, unimplemented at the moment.

Now, with tons of work still pending on the nvme support and libblockdev-3.0 API adaptations the udisks-2.10 release is still far away. If there's a desire to get the lvm2 module in better shape anytime soon, we might merge parts of #814 in the meantime. The uevent processing rewrite would be a major work and big chunks of the code would change anyway.

mvollmer added a commit to mvollmer/udisks that referenced this issue Apr 5, 2022

lvm2: Only install results of most recently started vgs_task_func

705bc05

Fixes storaged-project#966

mvollmer added a commit to mvollmer/udisks that referenced this issue Apr 5, 2022

lvm2: Only install results of most recently started vgs_task_func

8c429e0

Fixes storaged-project#966

mvollmer added a commit to mvollmer/udisks that referenced this issue Apr 5, 2022

lvm2: Only install results of most recently started update

37758b2

For both vgs_task_func and lvs_task_func. Fixes storaged-project#966

mvollmer added a commit to mvollmer/udisks that referenced this issue Apr 5, 2022

lvm2: Only install results of most recently started update

90c3686

For both vgs_task_func and lvs_task_func. Fixes storaged-project#966

mvollmer added a commit to mvollmer/udisks that referenced this issue Apr 5, 2022

lvm2: Only install results of most recently started vgs_task_func

ae75a13

Fixes storaged-project#966

mvollmer added a commit to mvollmer/udisks that referenced this issue Apr 5, 2022

lvm2: Only install results of most recently started udpates

0967045

Fixes storaged-project#966

mvollmer added a commit to mvollmer/udisks that referenced this issue Apr 5, 2022

lvm2: Only install results of most recently started udpates

28c3978

Fixes storaged-project#966

mvollmer mentioned this issue Apr 5, 2022

lvm2: Only install results of most recently started udpates #967

Merged

mvollmer mentioned this issue Apr 5, 2022

debian-stable: Error waiting for logical volume object for 'lvol0': Timed out waiting for object cockpit-project/bots#3191

Open

tbzatek closed this as completed in #967 Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polling for lvm2 objects might deliver results out of order #966

Polling for lvm2 objects might deliver results out of order #966

mvollmer commented Apr 4, 2022 •

edited

Loading

tbzatek commented Apr 4, 2022

mvollmer commented Apr 5, 2022

mvollmer commented Apr 5, 2022

tbzatek commented Apr 5, 2022

Polling for lvm2 objects might deliver results out of order #966

Polling for lvm2 objects might deliver results out of order #966

Comments

mvollmer commented Apr 4, 2022 • edited Loading

tbzatek commented Apr 4, 2022

mvollmer commented Apr 5, 2022

mvollmer commented Apr 5, 2022

tbzatek commented Apr 5, 2022

mvollmer commented Apr 4, 2022 •

edited

Loading