-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: LVM2 module uevent processing optimization #814
base: master
Are you sure you want to change the base?
Conversation
Supposed to be used for heavy load environment preparation or for running load tests in general. Should not be included in regular test runs.
This creates 1000 LVM snapshots over a persistent image file attached as loop device. Splitting the test case into "setup" and "teardown" parts allows debugging on such heavy environment.
Just like a primary coldplug that is run upon daemon startup this secondary coldplug indicates modules a secondary initialization over existing drive and block objects. Upon activating a module it's desirable to attach extra module interfaces over existing objects, perform additional probing and expose all available information in a synchronous fashion so that upon retuning from the org.freedesktop.UDisks2.Manager.EnableModule() method call all exported objects and their properties are up-to-date.
It is possible to distinguish a coldplug phase now so remove the dirty workaround. This will change timing of object properties validity and availability of additional objects at the expense of longer coldplug phase.
As long as GUdevDevice doesn't carry timestamp of the reported uevent, let's use system monotonic time and bind a timestamp to it as soon as possible.
Betting on natural atomicity of the 64-bit value, may need explicit measures to be added around (TODO).
Updates should be done sync while coldplugging and async updates should be rate limited. This is done by comparing timestamps of (filtered) uevents, keeping the update job limited to a single instance, optionally queueing another update once the previous one finished. We rely on uevents being generated for any change to lvm2 block devices, otherwise this wouldn't work.
For consistency reasons we would like to run updates sync, however with large number of LVs in the system this can take considerable amount of time, blocking the serialized daemon uevent processing. As a workaround this change introduces an arbitrary threshold that will switch updates back to async as was historically observed.
There's a known bug somewhere between lvm2 and udev, keeping /dev entries around while the block devices are actually gone. This causes problems on subsequent test runs while it seems nothing else breaks down.
This is a major change and will need to be followed up to make things right. This essientially reorders the moment the uevent is propagated to modules and may affect availability of information on exposed D-Bus objects. This is all about dependencies, i.e. in case a module gathers information from existing block objects, it needs to know whether that block object has already finished processing the particular uevent or not. This commit defines that order and it appears to be sensible to guarantee the "core" daemon objects and its interfaces are updated by the time udisks_module_new_object() is called. That happens as a last step of uevent processing chain which includes block object specific module callbacks for exposing extra D-Bus interfaces (run prior to udisks_module_new_object()). However it may not suit all cases and would be perhaps wise to provide pre- and post- udisks_module_new_object() callbacks for finer granularity and precise definition of the moment of uevent processing. For the moment let's run udisks_module_new_object() as the last in the chain and make modules aware of that.
@vojtechtrefny, @vpodzime: please let me know what do you think of the 'scalable mode' for the moment. The arbitrary threshold of 100 LVs in the system is just a wild shot, no real performance numbers behind it. The thing is that running all updates sync as part of uevent processing brings benefits to scripts and tests included. However it also delays processing of the sequential uevent queue and any sync updates should be treated with caution (until Also the loadtest LVM2 tests are perhaps a bit hacky but proved to be useful for creating a heavy environment for debugging. |
Jenkins, test this please. |
1 similar comment
Jenkins, test this please. |
This might trigger some critical warnings from the daemon as the VG/LV might not exist. Mostly harmless as until now the return code of open() wasn't actually checked. Subject to further fixes in the future.
This is a set of improvements to LVM2 module uevent processing. The primary goal was to rate-limit number of lvm commands spawned (using the libblockdev lvm-command based plugin) to prevent filling the sequential udisks daemon uevent queue. Uevent timestamp checking has been introduced and requests to the LVM updates are omitted when another update is running or when there was no new uevent received since the last update.
However there are couple of conceptual problems that are out of scope of this pull request. The major problem is the big global PVs/VGs/LVs update done in the udisks lvm2 module. The whole system picture is updated always when an uevent is received on a block device carrying DM/LVM signature. The big update then touches multiple D-Bus objects, updating object paths - cross references between objects, etc.
By skipping calls to this global update may however uncover unexpected issues that are not obvious on a first sight. I've changed the order of uevent processing to run callbacks to modules as last. That's just a workaround and proper solution should be implemented (thus marking this PR as WIP).
The proper way would be to introduce pre- and post- hooks to block device uevent processing chain and split the current big update to smaller chunks, reacting on particular block devices only. I.e. attaching extra D-Bus interfaces to existing block objects should be separated and proper module API used, meta module objects should be handled in the current big update routine and cross-object references should be updated in a post- hook. This is a non-trivial amount of work and subsequent debugging and verification.
Related: #811
Also, a simple load test to create and destroy 1000 LVs/snapshots is introduced to produce some load and prepare "heavy" testing environment.