Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional failure setting up lvmcluster storage #14

Open
gibmat opened this issue Sep 10, 2024 · 1 comment
Open

Occasional failure setting up lvmcluster storage #14

gibmat opened this issue Sep 10, 2024 · 1 comment

Comments

@gibmat
Copy link
Contributor

gibmat commented Sep 10, 2024

Occasionally in the "Add storage pools" task (maybe 1/20 runs) I see the following failure:

TASK [Add storage pools] *************************************************************************************************************************
skipping: [server03] => (item={'key': 'local', 'value': {'driver': 'zfs', 'local_config': {'source': '/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk3'}, 'description': 'Local storage pool'}})                                                                                                     
skipping: [server02] => (item={'key': 'local', 'value': {'driver': 'zfs', 'local_config': {'source': '/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk3'}, 'description': 'Local storage pool'}})                                                                                                     
skipping: [server04] => (item={'key': 'local', 'value': {'driver': 'zfs', 'local_config': {'source': '/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk3'}, 'description': 'Local storage pool'}})                                                                                                     
skipping: [server05] => (item={'key': 'local', 'value': {'driver': 'zfs', 'local_config': {'source': '/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk3'}, 'description': 'Local storage pool'}})                                                                                                     
skipping: [server03] => (item={'key': 'remote', 'value': {'driver': 'ceph', 'local_config': {'source': 'incus_baremetal'}, 'description': 'Distributed storage pool (cluster-wide)'}})                                                                                                              
skipping: [server02] => (item={'key': 'remote', 'value': {'driver': 'ceph', 'local_config': {'source': 'incus_baremetal'}, 'description': 'Distributed storage pool (cluster-wide)'}})                                                                                                              
skipping: [server04] => (item={'key': 'remote', 'value': {'driver': 'ceph', 'local_config': {'source': 'incus_baremetal'}, 'description': 'Distributed storage pool (cluster-wide)'}})                                                                                                              
skipping: [server05] => (item={'key': 'remote', 'value': {'driver': 'ceph', 'local_config': {'source': 'incus_baremetal'}, 'description': 'Distributed storage pool (cluster-wide)'}})                                                                                                              
skipping: [server03] => (item={'key': 'shared', 'value': {'driver': 'lvmcluster', 'local_config': {'source': 'vg0'}, 'default': True, 'description': 'Shared storage pool (cluster-wide)'}})                                                                                                        
skipping: [server03]
skipping: [server02] => (item={'key': 'shared', 'value': {'driver': 'lvmcluster', 'local_config': {'source': 'vg0'}, 'default': True, 'description': 'Shared storage pool (cluster-wide)'}})                                                                                                        
skipping: [server02]
skipping: [server04] => (item={'key': 'shared', 'value': {'driver': 'lvmcluster', 'local_config': {'source': 'vg0'}, 'default': True, 'description': 'Shared storage pool (cluster-wide)'}})                                                                                                        
skipping: [server04]
skipping: [server05] => (item={'key': 'shared', 'value': {'driver': 'lvmcluster', 'local_config': {'source': 'vg0'}, 'default': True, 'description': 'Shared storage pool (cluster-wide)'}})                                                                                                        
skipping: [server05]
changed: [server01] => (item={'key': 'local', 'value': {'driver': 'zfs', 'local_config': {'source': '/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk3'}, 'description': 'Local storage pool'}})                                                                                                      
changed: [server01] => (item={'key': 'remote', 'value': {'driver': 'ceph', 'local_config': {'source': 'incus_baremetal'}, 'description': 'Distributed storage pool (cluster-wide)'}})                                                                                                               
failed: [server01] (item={'key': 'shared', 'value': {'driver': 'lvmcluster', 'local_config': {'source': 'vg0'}, 'default': True, 'description': 'Shared storage pool (cluster-wide)'}}) => {"ansible_loop_var": "item", "changed": true, "cmd": "incus storage create shared lvmcluster source=vg0", "delta": "0:00:11.175597", "end": "2024-09-10 20:03:49.568029", "item": {"key": "shared", "value": {"default": true, "description": "Shared storage pool (cluster-wide)", "driver": "lvmcluster", "local_config": {"source": "vg0"}}}, "msg": "non-zero return code", "rc": 1, "start": "2024-09-10 20:03:38.392432", "stderr": "Error: Failed to run: vgchange --addtag incus_pool vg0: exit status 5 (VG vg0 lock failed: error -221)", "stderr_lines": ["Error: Failed to run: vgchange --addtag incus_pool vg0: exit status 5 (VG vg0 lock failed: error -221)"], "stdout": "", "stdout_lines": []}
@stgraber
Copy link
Member

This is particularly weird because the --addtag is already a TryRunCommand so it will have tried a bunch of time before failing.

It'd probably be useful to look at the vgs, lvmlockctl and sanlock following a failure to see what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants