Dusan Baljevic
a) LVM recover mirror consistency uses two methods:
MWC (Mirror Write Cache)
MCR (Mirror Consistency Record)
MCR and MWC are methods of keeping mirrors in synch and tracking writes to disk.
MCR is kept on the disks, in Volume Group Restricted Area (VGRA).
MWC is kept in core memory.
MWC/MCR is permanently running with the MWC in memory communicating with the MCR on disk.
This can have an effect on performance. Also it is used because of quick recovery from a crash.
b) The purpose of the mirror write consistency cache (MWC) is to provide a list of possibly out of sync mirrored areas. When a volume group is activated, the LVM copies all areas with an entry in the MWC from one of the good copies to all the other copies. This ensures that the mirrors are consistent, but makes no claims about the quality of the data.
c) On each write request to a mirrored logical volume that requests MWC, the LVM checks to see if there is already an entry for the data area in the current MWC. If so, it just sends the write to the underlying device driver. If there isn't an entry, it gets one and then waits for the now updated MWC to be written to disk.
So, each write to one of these logical volumes will potentially introduce one extra serial disk access. Whether or not this occurs is dependent on the degree to which accesses are random.
The more random, the higher probability of missing the MWC!
d) Getting an MWC entry can involve waiting for one to be available. If all the MWC entries are currently being used by I/O in progress, a given request might have to wait in a queue of requests until an entry becomes available.
Notice that the MWC entry is never freed on disk when a request returns to the LVM, it is merely marked as available to be used by another outgoing request.
e) Whether or not you use the MWC will depend on which aspect of system performance is more important to your environment:
run-time or
recovery-time
You can disable MWC to improve run-time performance. Entire data space will be resynched after a crash. This may be done when a database is doing transaction logging for itself.
f) You can disable both MCR and MWC only if the application can maintain mirror consistency itself (for example, database)! Mirrors will not be resynched by LVM after a crash.
MWC disabled gives better I/O performance.
If MCR is also disabled the mirrors will not synch at reboot. It will be up to you to decide if they want these features in use or not.
With MCR enabled (that is the default), the LVM will not keep run-time records of modified extents as MWC does, but in the event of a crash (followed by reboot and re-activation), the LVM will copy all extents from one non-stale copy of the mirror to all other mirrored copies of that extent. This is similar to the synchronization strategy used by DataPair/UX. The "good" copy of the data is chosen arbitrarily from the non-stale extents as there is no record kept as to which disk has the most recent copy of the data, so if a mirrored write is in progress during a crash, it is possible that old data could be copied over new data during the mirrored recovery at
activation time. If this behavior is unacceptable, MWC should be chosen. For example, this behavior would be preferred in situations where a database will re-write all incomplete transactions after a crash, but relies on the file system as underlying structures: the consistent mirrors will allow fsck to cleanly fix the file system, after which the database can update any of its out-of-date data files.
g) If both mirrors are enabled, I/O is redirected to another mirror if one is busy - so it improves performance. This should balance the I/O cost of MWC. The cost of disabling MWC and MCR is a slower recovery after a crash.
h) In HP-UX 11.31, the MWC is larger in size than in previous releases. This leads to a better logical volume I/O performance by allowing more concurrent writes. MWC has also been enhanced to support large I/O sizes.
i) Logical volumes belonging to shared volume groups (those activated with "vgchange –a s") of LVM version 1.0 and 2.0 must have the consistency recovery set to NOMWC or NONE.
Versions 1.0 and 2.0 do not support MWC for logical volumes belonging to shared volume groups. This might have changed with some patches, but I did not check this yet...
With the September 2008 release of HP-UX 11i v3, LVM supports MWC for logical volumes belonging to LVM version 2.1 shared volume groups. This ensures faster recovery following a system crash.
j) Note that one cannot change MWC on an active logical volume. Here is an example for primary paging device (swap):
Problem: While attempting to disable the "Mirror Write Cache" and "Mirror Consistency" for primary swap (/dev/vg00/lvol2 ) which was mirrored, the following error message is shown:
The command used to modify logical volumes, /sbin/lvchange, has failed.
The stderr output from the command is shown below. The logical volume has not been
modified.
lvchange: Could not change MirrorWriteCache while Logical Volume is opened or being synchronized.
Solution: Since primary swap is activated when the system boots, even in single user mode, the only way to successfully use lvchange on the primary swap logical volume is from LVM maintenance mode.
To boot into LVM maintenance mode, reboot the machine and interrupt the
boot sequence.
> hpux -lm (PA-RISC)
Or
> boot -lm (IA64)
This will boot the machine into LVM maintenance mode. Use lvchange(1M) with the "-M" and "-c" options to modify the mirror write cache and consistency settings.
# lvchange -M n -c n /dev/vg00/lvol2
k) A quick check of the system's lvol configurations will show if this parameter is misconfigured. Assuming we are interested in vg00:
# lvdisplay /dev/vg00/lvol* | more
Look (or grep) for the lines which describe each lvol's "Consistency Recovery":
Consistency Recovery MWC
Consistency Recovery NOMWC
Consistency Recovery NONE
If the "Consistency Recovery" is set to NONE for anything other than a swap device (or a raw database volume as stated above), it will need to be changed. Note that if the lvol is not currently mirrored, this is not an issue, and can safely be ignored until the customer wants to mirror that lvol.
It doesn't hurt to change the parameter early, and it could prevent stumbling later if they forget about this problem by the time they go to mirroring.
l) If we need to change the MWC for logical volume that is already mirrored, the process is a little bit more complex.
After determining which mirrored logical volumes need to have their consistency recovery changed, the steps to take are: reduce the mirror to only one good copy (non-mirrored), change the consistency recovery parameter, then recreate the mirroring configuration.
The simplest way to reduce a mirroring configuration to one without mirroring is to use "lvreduce -m 0" to simply eliminate the mirror copies. Then use the lvchange(1M) to turn on consistency followed by lvextend(1M) to re-add the mirrors. This reduction will minimize downtime, as it can safely be done while the system is fully operational, but it has two drawbacks:
· It allows the user less control over which copy of the mirror will remain, and it may require more reconstruction to recreate any specialized mirroring configuration such
as striped extents.
· Although the logical volume can remain in-use during the operation,
it would be best to avoid using the logical volume until integrity
checks can be made on the data ().
Another way of getting to a non-mirrored state is to split-off the mirrored copies using lvsplit(1M).
m) If importing a volume group from a previous release of HP-UX, there will be a full resynchronization because the format of the MWC changed at HP-UX 11i v3. If the volume group contains mirrored logical volumes using MWC, LVM converts the MWC at import time. It also performs a complete resynchronization of all mirrored logical volumes, which can take substantial time.
n) Now, let's list some of typical rules for MWC:
· Disable MWC and set MCR to "none" for the database logical volume
because the
database logging mechanism already provides consistency recovery.
· Disable MWC and MCR on mirrored logical volumes where the data is not needed after
a crash, such as paging device (swap space) or other raw scratch data.
· Logical volumes containing database data or file systems with few or infrequently
written large files (greater than 256K) must not use the MWC when runtime performance
is more important than crash recovery time.
· Use fast disks for the most intensive applications if they use mirrored logical
volumes.
· Ensure that all physical volumes for mirrored logical volumes are active
because MWC and other I/O will be redirected to another mirror if one is busy -
so it improves performance.
· Spread the data space across as many physical volumes as possible.
· The number of volume groups is directly related to the MWC. Since there is only
one MWC per volume group, disk space that is used for many small random write requests must be kept in distinct volume groups if possible when the MWC is used.
· If possible, ensure that physical volumes in volume groups that contain mirrored
Logical volumes reside on different controllers. For example, in a system with several
disk devices on each card and several cards on each bus converter, create volume groups so that all disks off of one bus converter are in one group and all the disks on the
other are in another group (one way is via physical volume groups). This configuration
ensures that all mirrors are created with devices accessed through different I/O paths.
· Since mirroring is typically used for root volume group only (these days all
other data is on SAN), it is strongly recommended not to allow any third-party
applications or software to run in it. I go to such an extreme that I even force
customers to use their own areas for temporary files:
1. Set TMPDIR variable to point to some other non-boot-volume.
I always encourage application admins to use their own areas for
temporary files.
Some applications look at TMPDIR environment variable.
Others look at two other variables: Try setting TEMP and TMP as well as
TMPDIR.
2. Mount /tmp file system with "tmplog" option in /etc/fstab.
/tmp is DESIGNED for temporary files, so it should not be abused for
other choices.
In "tmplog" mode, the intent log is almost always delayed.
This improves performance, but recent changes may disappear if the
system crashes.
3. Clean /tmp cleaned up at boot time (not really a performance issue
but useful for maintenance, especially if number of temporary files keep growing)?
By default I always enable it in /etc/rc.config.d/clean_tmps
CLEAR_TMP=1
Final comment is about multi-thread synching the mirror in LVM on HP-UX.
Option 1
lvsync(1M) recognizes the following option:
-T Perform mirror synchronization of logical volumes
within a volume group using multiple parallel threads.
Logical volumes belonging to different volume groups
will be synchronized serially. It is possible that
logical volumes start and/or complete their
synchronization in a different order than specified on
the command line.
The maximum number of threads used can be controlled
using the PTHREAD_THREADS_MAX system tunable.
NOTE: This option has no effect if the volume group is
activated in shared mode.
For example, you can extend the logical volumes and then issue parallel threads:
# lvextend -m 1 -s /dev/vgapp/lvol1
# lvextend -m 1 -s /dev/vgapp/lvol2
# lvextend -m 1 -s /dev/vgapp/lvol3
# lvsync -T /dev/vgapp/lvol1 /dev/vgapp/lvol2 /dev/vgapp/lvol3
Option 2
Check the defragmentation on the file system which is linked to the logical volumes you need to mirror. For example
# fsadm -F vxfs -DEde -t 600 /mydata
… and take action if necessary.
Another advice is to do it on the weekends, when activity by the users decreases.
Note the following on HP-UX 11.31:
# getconf PTHREAD_THREADS_MAX
3002
# kctune -v max_thread_proc
Tunable max_thread_proc
Description Maximum number of threads in each process
Module pm_proc
Current Value 3002
Value at Next Boot 3002
Value at Last Boot 3002
Default Value 256
Constraints max_thread_proc >= 64
max_thread_proc <= nkthread
Can Change Immediately or at Next Boot