Load Balancing VIO clients
(1/1)
potatoman:
Hi,
Looking for best practices and information on how to calculate the amount of Mem/CPU to allocate for your VIO.
I have recently load balanced the clients of VIO, by doing chpath -l hdiskX -p vscsiX -a priority=1/2. Do I need to reboot the client for this to take effect? Also, if I have 2 VIO servers, serving a client, and I reboot number 1, would number 2 keep connections active after I have started number1, or would it load balance on the fly?
I am a little new to VIO, please be gentle.
Michael:
My apologies for taking so long to reply - "Show unread posts" failed me.
Anyway, chpath should work instantly, and restore automatically.
But, I'll need to check that - I do not know all the parameters (attributes) from memory. I am going to try and look on my IVM based systems - only one VIOS - so I may not be able to verify MPIO settings real soon.
If you have a setup with 8GB fiber cards, and a SAN switch that supports NPIV you could also use dual VIOS, NPIV and native PCM drivers in the client - getting dynamic load-balancing. The client thinks it has two (or more) hba installed.
Michael:
Doing a little research...
Basically, there are two classes of objects we are interested in: adapter and disk
adapter, subclass vscsi
disk, subclass vdevice
# lsdev -PH -c disk -s vscsi
class type subclass description
disk vdisk vscsi Virtual SCSI Disk Drive
# lsdev -PH -c adapter -s vdevice
class type subclass description
adapter IBM,l-lan vdevice Virtual I/O Ethernet Adapter (l-lan)
adapter IBM,v-scsi vdevice Virtual SCSI Client Adapter
adapter hvterm1 vdevice LPAR Virtual Serial Adapter
we have at least one of each - let's look at vscsi0 and hdisk0 for attribute names we could modify:
# lsattr -El vscsi0
vscsi_err_recov delayed_fail N/A True
vscsi_path_to 0 Virtual SCSI Path Timeout True
# lsattr -El hdisk0
PCM PCM/friend/vscsi Path Control Module False
algorithm fail_over Algorithm True
hcheck_cmd test_unit_rdy Health Check Command True
hcheck_interval 0 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
max_transfer 0x40000 Maximum TRANSFER Size True
pvid 00c39b9daec18c210000000000000000 Physical volume identifier False
queue_depth 3 Queue DEPTH True
reserve_policy no_reserve Reserve Policy True
Both variables for the vscsi adapter look interesting - more on that below; for hdisk0 I am interested in algorithm, hcheck_cmd, hcheck_interval, and hcheck_mode. reserve_policy is not important for load balancing - but can be important for availability.
Using odmget I can get the default values for these attributes:
# for i in algorithm hcheck_cmd hcheck_interval hcheck_mode reserve_policy vscsi_err_recov vscsi_path_to
do
clear
echo ==== $i ====
odmget -q attribute=$i PdAt
read x
done
## This command generates a lot of output as there are different unique types that have there own range of values so I am editing the output to the values I am interested in...
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "algorithm"
deflt = "fail_over"
values = "fail_over"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 3
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "hcheck_cmd"
deflt = "test_unit_rdy"
values = "test_unit_rdy, inquiry"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 12
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "hcheck_interval"
deflt = "0"
values = "0-3600,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 7
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "hcheck_mode"
deflt = "nonactive"
values = "enabled,failed,nonactive"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 6
PdAt:
uniquetype = "disk/vscsi/vdisk"
attribute = "reserve_policy"
deflt = "no_reserve"
values = "no_reserve, single_path"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 16
LESSON learned: Looking at the output above - I could drop reserve_policy, as it really concerns something else, and use the following command instead - to get all the default attributes, and possibe settings for PCM/friend/vscsi - the uniquetype I am interested in for my hdisks!
# odmget -q uniquetype=PCM/friend/vscsi PdAt
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "dvc_support"
deflt = ""
values = "disk/vscsi/vdisk"
width = ""
type = "R"
generic = ""
rep = "sl"
nls_index = 2
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "algorithm"
deflt = "fail_over"
values = "fail_over"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 3
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "link_meth"
deflt = ""
values = ""
width = ""
type = "R"
generic = ""
rep = ""
nls_index = 0
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "hcheck_mode"
deflt = "nonactive"
values = "enabled,failed,nonactive"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 6
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "hcheck_cmd"
deflt = "test_unit_rdy"
values = "test_unit_rdy, inquiry"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 12
PdAt:
uniquetype = "PCM/friend/vscsi"
attribute = "hcheck_interval"
deflt = "0"
values = "0-3600,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 7
And just to show another command syntax (using AND and LIKE constructs in odmget -q)
# for i in vscsi_err_recov vscsi_path_to
do
echo ==== $i ====
odmget -q "attribute=$i AND uniquetype like adapter/vdevice/*" PdAt
done
==== vscsi_err_recov ====
PdAt:
uniquetype = "adapter/vdevice/IBM,v-scsi"
attribute = "vscsi_err_recov"
deflt = "delayed_fail"
values = "delayed_fail, fast_fail"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 0
==== vscsi_path_to ====
PdAt:
uniquetype = "adapter/vdevice/IBM,v-scsi"
attribute = "vscsi_path_to"
deflt = "0"
values = "0-3600,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 2
Michael:
In the previous post - I researched the names I am interested in: what do there variables mean?
For the hdisk:
algorithm: only has one legal value, so fail_over is always the default behavior - if enabled
hcheck_mode: "Health Check Mode": values: "enabled,failed,nonactive", default: "nonactive"
hcheck_cmd: "Health Check Command": values: "test_unit_rdy, inquiry", default: "test_unit_rdy"
hcheck_interval: "Health Check Interval (seconds): values: "0-3600,1", default: "0"
With the hcheck_interval set to 0, the partition is not going to attempt to fail_over at a disk level. With a positive value, every hcheck_interval seconds the client will execute the hcheck_cmd to determine disk status
For load balancing I am most interested in the adapter used (as this determines path).
The variables are:
vscsi_err_recov: "VSCSI Error Recovery algorithm" : values: "delayed_fail, fast_fail", default: "delayed_fail"
vscsi_path_to: "VSCSI PATH to ???": values: values: "0-3600,1", default: "0"
The vscsi_err_recov parameter has a function similiar to the fiber scsi interface fscsiX attribute fc_err_recov
Quote
# lsattr -El fscsi0
attach al How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x1 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
In short, with vscsi_err_recov set to fast_fail, the VIO client adapter will send a FAST_FAIL datagram to the VIO server and fail the I/O immediately rather than delayed. This may help to improve MPIO failover.
The vscsi_path_to attribute functions much like hcheck_interval. A value of 0 disables it, while positive values allows the virtual client adapter driver to determine the health or status of the VIO Server to improve and expedite path failover processing.
potatoman:
Thanx.
Navigation
[0] Message Index