oVirt/vdsm - vdsm - Linuxmonk: Open Source Repository Mirror

oVirt/vdsm

mirror of https://gerrit.ovirt.org/vdsm synced 2026-02-05 12:46:23 +01:00

Author	SHA1	Message	Date
Milan Zamazal	9101fa94e8	virt: Prevent migrations of VMs paused due to an I/O error Migrations of such VMs will be rejected by libvirt, it doesn't make sense to start the migration machinery for them. It's also good to avoid resuming such VMs during migration preparation if possible, which may fail and prevent the VM from resuming later or which may reveal unhandled races or corner cases. Migration of such VMs is already prevented in Engine but let's add an additional check to Vdsm to handle races, similarly to _not_migrating API guard. This patch doesn't handle the case when the VM gets paused and resumed when the migration process is already running. It will be addressed in another patch. Change-Id: Iec4e343f6c3cf39f36b339987d27c9c32b40c0a4 Bug-Url: https://bugzilla.redhat.com/2010478 Signed-off-by: Milan Zamazal <mzamazal@redhat.com>	2022-01-07 20:58:12 +01:00
Filip Januska	5075c8b8c0	concurrent: Add Timer class This patch adds a Timer class to the concurrent module. The class is based on and behaves pretty much the same as the threading.Timer class, except that the thread which carries out the target function is created with concurrent.thread instead of the regular threading.Thread. This makes the Timer class available for use in vdsm, while still ensuring every thread is created with concurrent.thread. The Timer also doesn't inherit from the threading.Thread class, like the threading.Timer does, but instead keeps the thread object as an attribute. Change-Id: I28f7f0a7f254088129964bc7d30e5fae846eb3fb Signed-off-by: Filip Januska <fjanuska@redhat.com>	2022-01-06 14:39:46 +01:00
Nir Soffer	83aacf4191	backup: Keep scratch disk info in drive When using block based scratch disk, keep the scratch disk info in the Drive object during the backup. Currently we have only the index, which can be used to register block threshold events. We need to modify engine to send also the scratch disk domain, image, and volume ids so we can also extend them. Change-Id: I7df8312386da2bc628efc7bf2fba6669c59f81d0 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-12-22 13:57:56 +02:00
Vojtěch Juránek	09f6afc5f2	tests: convert storageserver_test to pytest Convert storageserver_test to pytest. Also remove __future__ imports, which are not needed any more. Change-Id: Ibf26c7a6af8e0ce047fdf9b51e6a1b25537bce69 Signed-off-by: Vojtěch Juránek <vjuranek@redhat.com>	2021-12-21 16:34:47 +01:00
Tomáš Golembiovský	3baee2fb85	v2v: decode bytes input immediately Decode input from virt-v2v immediately. This avoids accidentally mixing str and bytes later on. Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=2026809 Change-Id: I90842212e9b79cd850d2ddbf644411150200ddfd Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>	2021-12-20 08:51:34 +00:00
Vojtěch Juránek	eaf51fe777	storage: use specific class for connection Instead of using parent class for connecting prepared connections, return child class from prepare connection function and use it for connecting. This finally delegates connection to sub-class and after this change we can overide connect function in sub-class and implement parallel connection for given sub-class. Mixing different types of connection is not supported, so we don't have to to care about this case. Change-Id: Iee4d531496a3b1754db42941d8e0c9364d64dbdf Bug-Url: https://bugzilla.redhat.com/1787192 Signed-off-by: Vojtěch Juránek <vjuranek@redhat.com>	2021-12-16 15:23:14 +01:00
Vojtěch Juránek	b8e9cf9b6d	storage: create storage server connection parent Create parent class for all storage server connections and move bulk connect and disconnect functionality there. This will allow child classes to override these methods and implement different way how to create multiple connections, e.g. creating them in parallel. Change-Id: I11a2d9fee45b4251973cb6b8703c0d83b3b7f441 Bug-Url: https://bugzilla.redhat.com/1787192 Signed-off-by: Vojtěch Juránek <vjuranek@redhat.com>	2021-12-16 15:19:58 +01:00
Nir Soffer	f2a2789886	tests: Fix storageServer tests Fix wrong test added in commit `ceb07387f2` storage: move connectStorageOverIser() function into IscsiConnection This is probably fixed in a pending patch. Github runs only the top patch, but we merge the first patches. Change-Id: Ied480b4d808044d4cf0fb42c4adeda02469a5a44 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-12-16 15:01:11 +02:00
Vojtěch Juránek	ceb07387f2	storage: move connectStorageOverIser() function into IscsiConnection Encapsulating connectStorageOverIser() into IscsiConnection object allows us to simplify this method little bit and also unifies connection of all storage server. Now, all storage servers are connected by calling connect() method on respective object, without any need to call some pre-connect method, which is specific to given storage server. To check if initiatorName was passed into as a connection parameter from the engine (which shouldn't be possible anyway as it's not covered in IscsiConnectionParameters [1]) we need to create iface object of the connection. However, the implementation of iscsi.IscsiInterface seems to be buggy and throws KeyError when initiatorName is not set so we need to check for KeyError when assing it. This would deserver fix and whole module revision, but this is different task than BZ #1787192 and can be done later. [1] https://github.com/oVirt/vdsm/blob/v4.50.0/lib/vdsm/api/vdsm-api.yml#L379 Change-Id: I1549393cc75faf2d195d62c13f89491f07301d14 Bug-Url: https://bugzilla.redhat.com/1787192 Signed-off-by: Vojtěch Juránek <vjuranek@redhat.com>	2021-12-15 16:52:02 +00:00
Tomáš Golembiovský	1f8eee8797	virt: CPU hotplugging support for dedicated CPUs When adding more CPUs for VMs with CPU policy engine has to dedicate the new CPUs to match the policy. For this the API has to be extended to allow passing the CPU sets for new VMs. Engine will decide whether or not it can pass the argument based on cluster version. For older cluster versions this argument does not have to be specified even on new VDSM because VMs with CPU policy cannot be used there. To make the API more flexible, we expect that engine passes configuration for all CPUs and not just the new ones. This potentially allows engine to relocate also the already assigned CPUs of the VM to optimize use of resources on the host. When decreasing the number of CPUs there is nothing special to do for the VM with dedicated CPUs. In all cases the shared CPU pool has to be updated and VMs without dedicated CPUs reconfigured. Change-Id: I3f05f4ae71101513df0c34f17245fdad290f9e20 Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com> Bug-Url: https://bugzilla.redhat.com/1782077	2021-12-14 17:45:32 +00:00
Tomáš Golembiovský	2e09860a9f	virt: migration support for dedicated CPUs When migrating the VM we need to remove any CPU pinning that was defined by VDSM. The reason is that pinned CPUs may serve a different purpose on the destination host (may be dedicated to another VM) or it may not be present on the destination at all. For VMs with no policy the pinning is simply dropped and it will be filled on destination. For VMs with manual CPU pinning or VMs that use NUMA auto pinning this affects only vCPUs that don't have any pinning defined by user and are using shared CPU pool. Such configuration is also removed and it will be filled again on destination. For VMs with a policy we expect Engine to pass a new pinning configuration to the source VDSM. The information is passed in "cpusets" parameter in the form of a list. Each item of the list corresponds to vCPU and contains a string with cpuset definition. Change-Id: I3de2d50e8ab26a8728beb662339fdbecb8aacf74 Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com> Bug-Url: https://bugzilla.redhat.com/1782077	2021-12-14 17:45:13 +00:00
Roman Bednar	cbc9b54296	tests: use run_command() in all lvm tests New function for running commands run_command() will replace the LVMCache.cmd(). The new function raises on failure (rc!=0) so there is no need to check the rc all over the place in tests. Also when we need to use pytest.raises now when command failure is expected. Signed-off-by: Roman Bednar <rbednar@redhat.com> Bug-Url: https://bugzilla.redhat.com/1536880 Change-Id: I4540270b32ecef5126f776baf9615e1b6c9ede4d	2021-12-14 13:26:20 +02:00
Roman Bednar	40bb75f2d4	lvm: use run_command() in changeVGTags() flow Modify changeVGTags() flow to use new run_command() which now raises LVMCommandError. The exceptions raised in this flow now inherit from LVMCommandError and provide more details for better debugging. Replacing VolumeGroupReplaceTagError with ValueError where appropriate same as was already done in changeLVsTags() here: https://gerrit.ovirt.org/c/vdsm/+/116780/4/lib/vdsm/storage/lvm.py#1836 Signed-off-by: Roman Bednar <rbednar@redhat.com> Bug-Url: https://bugzilla.redhat.com/1536880 Change-Id: I536196c037f5cbe6565f32fd6c65b1ad51c614ce	2021-12-14 13:26:20 +02:00
Roman Bednar	634ef40d57	lvm: use run_command() in chkVG() flow Modify chkVG() flow to use new run_command() which now raises LVMCommandError. The exceptions raised in this flow now inherit from LVMCommandError and provide more details for better debugging. chkVG() is used by blockSD.selftest() and blockSD.validate: https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/blockSD.py#L1217 https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/blockSD.py#L1231 The callers now handle LVMCommandError and raise the correct error - StorageDomainAccessError. This error needs to contain a reason as well so we don't hide useful details. Signed-off-by: Roman Bednar <rbednar@redhat.com> Bug-Url: https://bugzilla.redhat.com/1536880 Change-Id: Ia8c7babfacad3aa1daf5d434cf1a978c5f4e9e84	2021-12-14 13:26:20 +02:00
Nir Soffer	ccf1b63259	lvm: Remove lvmetad leftovers Since RHEL 8 lvmetad was removed, but we could not remove the code disabling it since we supported Fedora 30. Remove the useless code. Change-Id: I6d94e3621e5791abd45d537db86c9afd7cf76309 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-12-13 12:36:07 +02:00
Milan Zamazal	247721c362	virt: Indicate whether VM CPU stats are real or initial Before VM CPU stats are available, Vdsm reports zero initial values for them. ovirt-hosted-engine-ha relies on those stats when handling the Engine VM. The initial fake VM stats may confuse Engine VM monitoring and induce undesirable actions such as restarting the VM on another host without a good reason. There is no good way to distinguish the initial fake CPU stats from real CPU stats on the Engine side. Let's add a new flag, cpuActual, distinguishing the two cases. It is set to true when all the CPU stats are based on actual measured values. It would be better to simply omit the initial fake CPU stats. But we must keep them for compatibility with Engine 4.2, which expects their presence. Change-Id: I5adb1b01653b0029a30949ecb89219fde794dfd8 Bug-Url: https://bugzilla.redhat.com/2026263 Signed-off-by: Milan Zamazal <mzamazal@redhat.com>	2021-12-10 07:54:26 +00:00
Nir Soffer	d5f698517d	userstorage: Retry creating loop device Creating a loop device with the --sector-size option may fail randomly if the device has dirty pages from previous usage. This was fixed in losetup from util-linux 2.37.1, but this version is not available in Centos Stream 8. Fixed by adding a retry loop, similar to the retry loop used internally in losetup. Here is example random failure, fixed on the first retry: [userstorage] WARNING Attempt 1/20 failed: losetup: /dev/loop5: set logical block size failed: Resource temporarily unavailable Change-Id: I285dedd09abd89e62152b887a3b05807c627041e Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-12-08 16:04:13 +00:00
Ales Musil	92026520f9	net, tests: Load bonding module on GH actions Create and then remove bond to load the module. Since we are already running inside container we cannot simply use modprobe bonding. Fortunately creating bond through iproute2 loads the module for us. Change-Id: I0d894b6914a692fbb9ddda35543388926f9e66f4 Signed-off-by: Ales Musil <amusil@redhat.com>	2021-12-08 15:18:33 +00:00
Ales Musil	f937bc31f6	net, tests: Enable IPv6 on GH actions IPv6 is disabled in docker containers on GH actions. Enable IPv6 and remove skips for working tests. Change-Id: I6cdc6a5b66e233a26723252e4cdd054415da4c02 Signed-off-by: Ales Musil <amusil@redhat.com>	2021-12-08 15:18:33 +00:00
Roman Bednar	d4b485c332	lvm: use run_command() in removeLVs() flow Modify removeLVs() flow to use new run_command() which now raises LVMCommandError. The exceptions raised in this flow now inherit from LVMCommandError and provide more details for better debugging. The origial error (CannotRemoveLogicalVolume) is used as a wrapper for other errors so we can not change it to inherit from LVMCommandError which is what we need in removeLVs(). In this case we can add a new exception for this flow - LogicalVolumeRemoveError. Signed-off-by: Roman Bednar <rbednar@redhat.com> Bug-Url: https://bugzilla.redhat.com/1536880 Change-Id: Ibca33d32dbdddc5cfff914a85807f5c688670592	2021-12-07 08:22:59 +01:00
Vojtěch Juránek	22ad04b3bf	storage: move storage server related helpers from hsm HSM module is huge and contains various function related only to some other module. Such functions are e.g. helpers for connecting to storage servers. Move these functions partially into storageServer module and those related only to iscsi connection to iscsi module. Change-Id: Ie139f3d0d99fc01ab51d8fa95f9ce6637beba326 Bug-Url: https://bugzilla.redhat.com/1787192 Signed-off-by: Vojtěch Juránek <vjuranek@redhat.com>	2021-12-06 12:44:27 +01:00
Milan Zamazal	7b43fa5edf	tests: Fix a typo in a pasword test utility function Change-Id: I050bf7001be5ab83c4092cf9182a46aa25470e1d Signed-off-by: Milan Zamazal <mzamazal@redhat.com>	2021-12-03 20:11:26 +00:00
Sandro Bonazzola	eec39a5101	pep8: fix E203 fixed errors E203 (https://www.flake8rules.com/rules/E203.html) detected by flake8. Excluded from the test specific lines which are controversial and would make Black unhappy while making flake8 happy. Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=2004412 Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com> Change-Id: I76d66fc15fe6008de57a5fb12c6c54c6c6464e9e	2021-12-02 23:04:37 +02:00
Nir Soffer	701df6a380	lvmfilter: Don't use /dev/disk/by-id/lvm-pv-uuid-* When the device is a multipath device, the udev link points to the actual path (/dev/sda) during early boot. The link is updated to point to the multipath device only later. This creates a race during early boot when lvm can grab the device before multipath. Since lvm2-2.03.14-1.el8.x86_64 (Centos Stream 8) oVirt node boot breaks when using "stable" udev links. According to David Teigland: Using a link with the PVID may have sort of worked in the past, but it probably should not have. I'd call it accidental, it's depending on a quirk of udev processing, not anything in lvm itself. The change in lvm may be reverted, but I think we should get rid of the udev "stable" links anyway. This change reverts commit `db13e4bc58` lvmfilter: Use /dev/disk/by-id/lvm-pv-uuid devlinks for pv naming but it is not possible to simply revert the commit since additional code added based on that commit. Instead we changed the behavior: 1. When computing a filter, use the device names reported by lvm. This fixes the attached bug. When lvm reports /dev/mapper/xxx this is the device that will be used in the filter. 2. When analyzing existing filter, recommend to replace filter including "stable" udev links with filter including the device names. The original bug[1] will be solved in 4.5 by using the new lvm devices feature[2]. [1] https://bugzilla.redhat.com/1635614 [2] https://bugzilla.redhat.com/2012830 Change-Id: I538e4a078dfba2ba28408f6e2178ca5082ed808b Bug-Url: https://bugzilla.redhat.com/2016173 Bug-Url: https://bugzilla.redhat.com/2026370 Related-to: https://bugzilla.redhat.com/2026640 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-12-02 18:02:40 +00:00
Tomáš Golembiovský	1b4dc6ba8f	qga: use libvirt API to gather disk information We used to call qemu-ga directly with guest-get-disks command to get the names and serial numbers of all disks in the VM. We can now use API provided by libvirt (since 7.3.0). Change-Id: I938c049af930682c37db60956b16330424bb546e Bug-Url: https://bugzilla.redhat.com/1919857 Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>	2021-12-01 08:31:20 +01:00
Tomáš Golembiovský	3e03b7560f	virt: shared pool creation NUMA code is extended to build a list of CPUs on a core for each core available. This is then used to build a list of all CPUs in a shared pool -- i.e. CPUs available to all VMs without any specific CPU policy. The list is built in a way that: - CPUs of VMs with manual CPU pinning or NUMA auto-pinning policy are included in shared pool - CPUs of VMs with dedicated policy are excluded from shared pool - CPUs of VMs with isolate-threads or siblings policy are excluded from shared pool and all their siblings as well, so that whole cores are removed from shared pool and left exclusive to the particular VM The updates need to be exclusive and cannot run concurrently, otherwise the assigned CPU sets maybe wrong. Two racing VM.destroy calls are problematic because we could fail to increase the shared pool so the configured CPU set will be smaller than it could be for some VMs. Racing VM.destroy and VM.create is much more problematic as it can lead to situations where dedicated CPU would be used by other VMs. Change-Id: Ife3797cda4419ecd153a136dea9fe35663f07f18 Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com> Bug-Url: https://bugzilla.redhat.com/1782077	2021-11-30 19:37:19 +00:00
Tomáš Golembiovský	06a8e05814	numa: drop @cache.memoized and cache capabilities manually The _numa() call is cached (memoized) based on arguments used when calling the function. The _numa() function optionally took libvirt capabilities as an argument. This argument however was never used, except in tests. Meaning that _numa() was normally called only without arguments in which case libvirt capabilities were evaluated only the first time the function was called (by _numa() function itself). Recent patch changed how we treat _numa() in Host.getCapabilities API call. We fetch fresh libvirt capabilities and pass it as argument to _numa(). This causes two problems. First, it now causes small leak because the cache is allowed to grow indefinitely. Secondly, the results of re-evaluated _numa() call are not available to the rest of the VDSM code that calls _numa() without arguments (e.g. in sampling.py for every sample). The first problem could be easily solved by changing the memoizing decorator to functools.lru_cache(maxsize=N). But to solve also also the second problem the caching cannot be done based on arguments. We need something special because we don't want to fetch the libvirt capabilities every time we need to call _numa() and we also don't want to re-evaluate _numa() on every call. This patch removes the @cache.memoized decorator and creates caching in numa.py There is a separate update call that fetches fresh libvirt capabilities. The capabilities are examined only when they change. Change-Id: I9f7bc5596fb3b3c25dcf585e3d3a07a3a1a86a9e Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>	2021-11-30 19:37:19 +00:00
Tomáš Golembiovský	89c43e2899	virt: store CPU policy and pinned CPUs in metadata This is a first patch of the CPU policies series. We should get the CPU policy in VM metadata. If it is not there it has to be detected and stored there so that we know it on recovery. Otherwise, when we start managing the vCPUs and add pinning it may be falsely mistaken for manual CPU pinning. For the same reason we also store which vCPUs are manually pinned. Later, the vCPUs without pinning will be using pCPUs from shared pool and we need to remember which vCPUs were pinned by the user and which have pinning defined by VDSM. Defined policy names are: - none: no policy defined, CPUs from shared pool will be used - pin: manual CPU pinning or NUMA auto-pinning policy - dedicated: each vCPU is pinned to single pCPU that cannot be used by any other VM - siblings: like dedicated but physical cores used by the VM are blocked from use by other VMs - isolate-threads: like siblings but only one vCPU can be assigned to each physical core Related feature page for policies is: https://ovirt.org/develop/release-management/features/virt/dedicated-cpu.html Change-Id: I7bc7bad82a20d47d06135c82fa58572a2327badd Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com> Bug-Url: https://bugzilla.redhat.com/1782077	2021-11-30 19:37:19 +00:00
Tomáš Golembiovský	5dfe5e9fb8	virt: get real vCPU count instead of maximum The element value does not have to correspond to the real vCPU count. It is a maximum number of vCPUs a VM can have, but the actual count may be lower to make CPU hot-plugging possible. The actual count is specified in the (optional) attribute "current". Only if the attribute is not present the element value is the real count. The original implementation returned None in case there is was no <vcpu> element present. But missing <vcpu> suggests a broken domain XML so now we raise an exception instead. Change-Id: I67835dc2b05cb11cb7c0fcb6d0ce1c802f968c28 Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>	2021-11-30 17:36:58 +00:00
Harel Braha	e4418d1da9	vdsm: remove all uses of 'abrt' Change-Id: I6627c3eb0e0dd145357304cab202e81057a596e9 Bug-Url: https://bugzilla.redhat.com/2015093 Signed-off-by: Harel Braha <hbraha@redhat.com>	2021-11-30 06:04:54 +00:00
Nir Soffer	1fc1cec91a	lvm: Remove read_only mode Read only mode is not useful since RHEL 8; remove it. Change-Id: I167fa578a0669b092f77b574012458fafeac6ca7 Bug-Url: https://bugzilla.redhat.com/2025527 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-22 17:28:57 +02:00
Nir Soffer	732fe998dc	tests: Don't test read only mode Read only mode is not useful since RHEL 8. Remove tests for read only mode or changes to read only mode during tests. Change-Id: I368cbc3285cdcfb2965ea411528b107c53946262 Bug-Url: https://bugzilla.redhat.com/2025527 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-22 16:11:36 +02:00
Nir Soffer	4a59eb8941	lvm: Don't use global:locking_type This option is deprecated and useless since RHEL 8. We could not remove it in the past since we had to support older version of lvm on Fedora. Looks like this option became harmful in lvm2-2.03.14-1.el8.x86_64, converting locking_type=4 to --readonly. We see this failure in vdsm tests: WARNING: locking_type (4) is deprecated, using --sysinit --readonly. Operation prohibited while --readonly is set. Can\'t get lock for b4512b9d-84dc-43ba-865d-32c4a1cd148a. and the lvm command fails. Converting locking_type=4 to --readonly does not look correct, so this is likely a regression in lvm. But we should not use locking_type in vdsm. This option is used only in tests using read only mode, which is never used in the real application. As a quick fix to unbreak the tests, remove the locking_type configuration. We need to remove the tests for read only mode and remove the entire read only mode feature later. Change-Id: Ia9af81756c07c26517805633af5d90d523e60fe7 Bug-Url: https://bugzilla.redhat.com/2025527 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-22 15:47:00 +02:00
Nir Soffer	c63867bbd4	vm: Replace blockInfo with block stats Replace the call to blockInfo() with DriveMonitor.get_block_stats(), returning block info for all volumes. To extract the block stats for the drive active volume we need the index of the drive. We get block stats only if we have drives that should be extended, or when we pre-extend a drive when starting replication. getExtendInfo() was modified to amend block info from libvirt with information from the replica in case the drive is not chunked but replicating to a chunked replica. This method should be removed once we start using libvirt block stats for the replica drive. Change-Id: I5600bedf886be993233df67dcad39078e7c920c8 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-19 16:32:29 +02:00
Nir Soffer	bf0630f3a1	tests: Remove unused fake blockInfo Since we stopped calling getExtendInfo() during live merge, we don't need to mock blockInfo(). Change-Id: Ia9f78d37b8eb4673a9bf0f7ef09fa4ae70a15a18 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 20:09:36 +00:00
Nir Soffer	20635057d5	livemerge: Do not use getExtendInfo for base volume When extending base drive before live merge we used Vm.getExtendInfo() to get the current block info. This is wrong for several reasons: - The function returns block info for the drive active volume, instead of the base volume for the merge, which is never the active volume. - The function will replace Drive.blockinfo with the new block info, which is unwanted side affect when we try to extend the base volume. - It calls libvirt for no reason. Since we need the volume capacity, add the capacity to job.extend dict when starting a merge, so we don't need to get it from the storage API on each call to _start_extend(). Change-Id: Ic4b6dcccab4336e72a000b511d0e10c003eac9c5 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 20:09:36 +00:00
Nir Soffer	072fa810d9	drivemonitor: Fetch block info using block stats API When extending volumes, we use libvirt.virDomain.blockInfo() to get volume allocation. This API is easy to use, but it does not work for backup scratch disks, volumes in the backing chain, or blockCopy destination volume. We want to replace usage of libvirt.virDomain.blockInfo() with libvirt.virConnect.domainListGetStats(), that works for all block nodes, include backup scratch disks[1] (since RHEL 8.6). Vdsm already collects block stats for sampling purpose, but we cannot used this code:: - We need the allocation info at the time of the call, and sampling collects values only every 15 seconds. - Sampling does not collect info for the backing chain (and should not) but we must collect info for the backing chain. - Sampling collects all stats while we need only block stats. - Sampling collects stats for all VMs while we need only single VM. - Sampling skip non responsive VMs, while we don't skip blockInfo() calls. - Drive monitoring cannot depend on sampling, a subsystem with different requirements (best effort) and maintained by different teams. Using libvirt.virConnect.domainListGetStats() is tricky, since it requires a libvirt.virDomain object as parameter, and this object is wrapped by the VM._dom object. Since the VM owns the _dom object, it is natural to provide a method to get block stats in the VM object: VM.get_block_stats(). Another problem with libvirt.virConnect.domainListGetStats() is the unhelpful return value, flat mapping of "block.N.KEY" to VALUE for all block nodes: { ... "block.0.fl.times": 0, "block.1.name": "sda", "block.1.path": "/rhev/.../44d498a1-54a5-4371-8eda-02d839d7c840", "block.1.backingIndex": 2, "block.1.rd.reqs": 13448, "block.1.rd.bytes": 415614976, "block.1.rd.times": 9940902315, "block.1.wr.reqs": 4909, "block.1.wr.bytes": 82999296, "block.1.wr.times": 47469574949, "block.1.fl.reqs": 683, "block.1.fl.times": 4204366339, "block.1.allocation": 216006656, "block.1.capacity": 6442450944, "block.1.physical": 7113539584, "block.1.threshold": 6576668672, "block.2.name": "sda", ... There is lot of unneeded information for monitoring context, and no easy way to extract the single value we actually need. A new method DriveMonitor.get_block_stats() added, extracting this info in a useable form: { 2: drivemonitor.BlockInfo( index=2, name='sda', path='/rhev/.../44d498a1-54a5-4371-8eda-02d839d7c840', allocation=216006656, capacity=6442450944, physical=7113539584, threshold=6576668672, ), ... } We will use this in extend flows to fetch block info for all volumes when we try to extend drives. [1] https://bugzilla.redhat.com/2017928 Change-Id: I8cdaaaf56c9f1e078809e4400a86219fc8086c41 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 20:09:36 +00:00
Nir Soffer	d729f8bdb8	virdomain: Allow access to the underling libvirt.virDomain Some code needs access the underlying libvirt.virtDomain object. Make a public accessor so we don't access private attributes. The underlying virDomain is needed when calling libvirt.virConnect.domainListGetStats(), requiring list of virDomain objects. It should not be used for anything else. Change-Id: Iff39fb25d218d73d3ee8493a5c052a55d3270013 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 20:09:36 +00:00
Nir Soffer	b6b33f5af5	tests: Don't hide logs using vm logger FakeVm was using FakeLogger, which hides all the logs using the vm logger. The fake logger class should be used only for testing logging; when running tests we want to see real logs when a test fails. Change-Id: I69fbb822fd1e33a4a53bf5060afe860459c199de Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 20:08:02 +00:00
Nir Soffer	9ba68b49c7	tests: Generate backup xml automatically When we start a backup, we need to parse libvirt backup xml to get the index for the backup scratch disk. This requires that we have a backup xml for every test running backup.start_backup(). We have 22 invocations, so setting this manually is not the way. FakeDomainAdapter generates now the backup xml from the backup_xml argument to backupBegin(). Since we always have backup xml, we can verify that backup was started correctly by comparing the backup xml instead of keeping and comparing the input xml. Similar change is needed for verifying checkpoint xml. Change-Id: I64a5202b8f1ef3cad0092c6949c0a32c65fb9f10 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 20:08:02 +00:00
Nir Soffer	8eac14e663	tests: Remove python 2 future imports Change-Id: I6f1c26559d30d11d038082dca91bd82da3d91b07 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 18:25:22 +02:00
Nir Soffer	ade779bef6	tests: Use transientdisk.disk_path() When generating backup input xml, use transientdisk.disk_path(). The cleanest way to do this is to use temporary variables, and include them in the xml using fstring (introduced in python 3.6). Change-Id: Id7e15e59be1c9208fab5e00c5c82d6afce6cf544 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 18:25:22 +02:00
Nir Soffer	d93b8d575c	tests: Remove unneeded globals DOMAIN_ID was used only once when generating drives info. VOLUME_ID was used only once, to create drives with same volume id, which is invalid configuration. We create new uuid now instead of using a global. Change-Id: I3171ae0e3f125ff361b458e621a1780c9b89d4ed Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 14:04:12 +00:00
Nir Soffer	502f17370e	tests: Access drives via fake vm Pass the fake vm to all the helper function that need to access the drive list. This will make it possible to create different drive configuration, for example block based drives that need to be extended during backup. Now that the fake vm is available in the helper function, we can use its id attribute instead of duplicating it. Change-Id: I5c67319b0f27a0ae74c586e3b90dff565e843d85 Bug-Url: https://bugzilla.redhat.com/1913387 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-18 14:04:12 +00:00
Milan Zamazal	90af37dfdb	virt: Derive VM kill timeout from io_timeout We have been using vm_kill_paused_time config option to specify the time after which we can kill VMs with "kill" resume behavior that are in paused state due to an I/O error. If a user modifies sanlock timeout settings, the option must be adjusted accordingly. The sanlock I/O timeout is now configurable using sanlock.io_timeout option. The VM killing timeout can be directly computed from it by multiplying it by 8 and should be no more taken from a different option. This patch removes vm_kill_paused_time option and computes the corresponding value from io_timeout. Change-Id: Icaa097008544c280da0f6122f0bb378cc14b873c Bug-Url: https://bugzilla.redhat.com/2010205 Signed-off-by: Milan Zamazal <mzamazal@redhat.com>	2021-11-18 13:54:18 +00:00
Ales Musil	e17f6b2557	net: Delegate privileged operations to supervdsm in DHCP monitoring The DHCP monitoring was recently switched to use netlink monitoring which makes the whole process simpler, however the was major oversight. The DHCP monitor runs in unprivileged context (vdsmd) this brings two issues: 1) We cannot setup the source route rules because it needs to be done in root context (supervdsmd). 2) The pool of items that are monitored is operated by supervdsmd and vdsmd couldn't see the items available. That resulted in skipping every valid opportunity to notify engine about new IP and creating the source route rules. In order to fix that the dhcp monitor is kept in vdsmd but the monitoring check and removal is delegated to supervdsmd. Same goes for the setup of source route rules. Change-Id: I0481d5badfe2929a112fb47a945cbe7395341a71 Signed-off-by: Ales Musil <amusil@redhat.com>	2021-11-18 13:35:13 +00:00
Nir Soffer	79057e6a01	monitor: Do not tear down domain during shutdown During vdsm shutdown, we must keep storage live, since VMs or image transfers may use the storage domain. We had special check for the host id, keeping it alive during shutdown, but we were missing similar check for teardown. In the past StroageDoamin.teardown() was not effective, but during 4.4. we fixed it several times, and now it really teardown the storage domain, and shutting down vdsm deactivate entire volume groups and remove device mapper devices for logical volumes. When logical volumes are used, we see these errors during shutdown: 2021-11-14 14:00:03,911+0200 INFO (monitor/313e6d7) [storage.blocksd] Tearing down domain 313e6d78-80f7-41ab-883b-d1bddf77a5da (blockSD:996) 2021-11-14 14:00:03,911+0200 DEBUG (monitor/313e6d7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/sbin/lvm vgchange --config 'devices ... --available n 313e6d78-80f7-41ab-883b-d1bddf77a5da (cwd None) (commands:154) 2021-11-14 14:00:09,114+0200 DEBUG (monitor/313e6d7) [common.commands] FAILED: <err> = b' Logical volume 313e6d78-80f7-41ab-883b-d1bddf77a5da/ids in use.\n Can\'t deactivate volume group "313e6d78-80f7-41ab-883b-d1bddf77a5da" with 1 open logical volume(s)\n'; <rc> = 5 (commands:186) If we have logical volumes in use, tearing down the storage domain will leave them active, so running VMs and active image transfers are safe. However failed LVM commands are retried several times, which slow down the shutdown process, and shutting down is likely to time out. I think this may be related to the hosted engine local maintenance issue. Change-Id: Ic2a2d219868d869eb946047f6cdafeffc17704fb Bug-Url: https://bugzilla.redhat.com/2023344 Related-to: https://bugzilla.redhat.com/1986732 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-15 16:02:42 +02:00
Nir Soffer	bc730c9469	tests: Remove unused import Seems that OST is broken - it merged a patch without CI+1, and now CI fails on master. Change-Id: I7f50936e51fbf49ff942007f615b957041e63533 Signed-off-by: Nir Soffer <nsoffer@redhat.com>	2021-11-14 13:38:31 +02:00
Roman Bednar	bf8b7515ef	livemerge: recover from failed pivot attempt During live merge we call syncVolumeChain multiple times to make sure the actual/libvirt chain is synced to current/vdsm chain. When pivot starts, the new requested chain is passed to syncVolumeChain which compares it to current vdsm chain. This way we can tell what volume is being removed. If the volume being removed is a leaf/active layer, it is flagged as ILLEGAL in vdsm to prevent usage. Then libvirt blockjob (abort) is started and if it fails the old code never recovered the volume from ILLEGAL state and manual intervention was required. This patch adds a helper to switch the leaf volume back to LEGAL and calls the helper if libvirt fails abort block job. Signed-off-by: Roman Bednar <rbednar@redhat.com> Bug-Url: https://bugzilla.redhat.com/1949475 Change-Id: Ia57c26529cf60d381d3df143a4f7195948ce1cea	2021-11-13 02:02:05 +00:00
Roman Bednar	4265b86957	image: allow leaf legality status recovery when syncing chain Sync volume chain needs to be able to recover top volume legality. This sync function can be used after libvirt failing pivot to make sure the top volume in vdsm chain is not left in ILLEGAL state. Signed-off-by: Roman Bednar <rbednar@redhat.com> Bug-Url: https://bugzilla.redhat.com/1949475 Change-Id: I3b3fd83fa0fd9fa90ac9b330f4454b2916eee4c8	2021-11-13 02:02:05 +00:00

1 2 3 4 5 ...

5489 Commits