vendredi 6 octobre 2017

Cinder volume stucked in detaching state

Since our migration to Newton, some volumes cannot be detached from their server:
[root@controller ~]# openstack volume list --all
+--------------------------------------+--------------+-----------+------+---------------------------------------------------------------+
| ID                                   | Display Name | Status    | Size | Attached to                                                   |
+--------------------------------------+--------------+-----------+------+---------------------------------------------------------------+

...
| 60304d1e-aa57-11e7-9c40-b3ff0b0a5974 | V_NAME       | detaching |  100 | Attached to 23b19384-aa57-11e7-88a7-03b3c5fe3969 on /dev/vdg  |
...
The error displayed in the hypervisor is really obvious:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
    res = self.dispatcher.dispatch(message)
  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)
  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
    result = func(ctxt, **new_args)
  File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 75, in wrapped
    function_name, call_dict, binary)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 66, in wrapped
    return f(self, context, *args, **kw)
  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 216, in decorated_function
    kwargs['instance'], e, sys.exc_info())
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 204, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4856, in detach_volume
    attachment_id=attachment_id)
  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4786, in _detach_volume
    connection_info = jsonutils.loads(bdm.connection_info)
  File "/usr/lib/python2.7/site-packages/oslo_serialization/jsonutils.py", line 241, in loads
    return json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_utils/encodeutils.py", line 39, in safe_decode
    raise TypeError("%s can't be decoded" % type(text))
TypeError: <type 'NoneType'> can't be decoded



After replaying the Python code executed by the function, we have found that the following query is executed when trying to get the volume connection informations:
SELECT block_device_mapping.created_at AS block_device_mapping_created_at,
  block_device_mapping.updated_at AS block_device_mapping_updated_at,
  block_device_mapping.deleted_at AS block_device_mapping_deleted_at,
  block_device_mapping.deleted AS block_device_mapping_deleted,
  block_device_mapping.id AS block_device_mapping_id,
  block_device_mapping.instance_uuid AS block_device_mapping_instance_uuid,
  block_device_mapping.source_type AS block_device_mapping_source_type,
  block_device_mapping.destination_type AS block_device_mapping_destination_type,
  block_device_mapping.guest_format AS block_device_mapping_guest_format,
  block_device_mapping.device_type AS block_device_mapping_device_type,
  block_device_mapping.disk_bus AS block_device_mapping_disk_bus,
  block_device_mapping.boot_index AS block_device_mapping_boot_index,
  block_device_mapping.device_name AS block_device_mapping_device_name,
  block_device_mapping.delete_on_termination AS block_device_mapping_delete_on_termination,
  block_device_mapping.snapshot_id AS block_device_mapping_snapshot_id,
  block_device_mapping.volume_id AS block_device_mapping_volume_id,
  block_device_mapping.volume_size AS block_device_mapping_volume_size,
  block_device_mapping.image_id AS block_device_mapping_image_id,
  block_device_mapping.no_device AS block_device_mapping_no_device,
  block_device_mapping.connection_info AS block_device_mapping_connection_info,
  block_device_mapping.tag AS block_device_mapping_tag
  FROM block_device_mapping WHERE block_device_mapping.deleted = 0
  AND block_device_mapping.volume_id = '60304d1e-aa57-11e7-9c40-b3ff0b0a5974'
  AND block_device_mapping.instance_uuid = '23b19384-aa57-11e7-88a7-03b3c5fe3969'
  LIMIT 1 OFFSET 0


The problem is that in our case, if we remove the 'LIMIT 1 OFFSET 0' options, several lines are displayed:
SELECT device_name, connection_info, block_device_mapping.updated_at
  FROM block_device_mapping
  WHERE block_device_mapping.delete = 0
  AND block_device_mapping.volume_id = '60304d1e-aa57-11e7-9c40-b3ff0b0a5974'
  AND block_device_mapping.instance_uuid = '23b19384-aa57-11e7-88a7-03b3c5fe3969'
...
| /dev/vdd    | NULL |
| /dev/vdf    | NULL |
| /dev/vdg    | {"driver_volume_type": "iscsi", "connector": {"platform": "x86_64", "host": "node2.example.com", "do_local_attach": false, "ip": "192.168.1.32", "os_type": "linux2", "multipath": false, "initiator": "iqn.1994-05.com.redhat:b0aced88ee0"}, "serial": "60304d1e-aa57-11e7-9c40-b3ff0b0a5974", "data": {"access_mode": "rw", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2010-10.org.openstack:volume-60304d1e-aa57-11e7-9c40-b3ff0b0a5974", "target_portal": "192.168.1.20:3260", "volume_id": "60304d1e-aa57-11e7-9c40-b3ff0b0a5974", "target_lun": 0, "device_path": "/dev/disk/by-path/ip-192.168.1.20:3260-iscsi-iqn.2010-10.org.openstack:volume-60304d1e-aa57-11e7-9c40-b3ff0b0a5974-lun-0", "auth_password": "MiDgLw0xY6gmARjL"
, "auth_username": "8uQrIvTyAu5XvPonVWo5", "auth_method": "CHAP"}} |
3 rows in set (0,00 sec)


And filtering the selection with the 'LIMIT 1 OFFSET 0' returns only the first line, that is not usable anymore. To correct the issue, we have to remove the broken entries:
DELETE FROM block_device_mapping
  WHERE volume_id = '60304d1e-aa57-11e7-9c40-b3ff0b0a5974'
  AND block_device_mapping.instance_uuid = '23b19384-aa57-11e7-88a7-03b3c5fe3969'
  AND block_device_mapping.deleted = 0
  AND block_device.connection_info = NULL
Query OK, 2 row affected (0,01 sec)


Once the queries completed, reset the state of the volume using the cinder reset-state command. The volume can be then successfully detached.

Aucun commentaire:

Enregistrer un commentaire