Are your mirrored file systems consistent?

It started with a question “Several years ago you told us about checking your MQ disks are consistent,  can you provide us with a link to any documentation please?”.

I’ll explain why this is important and what you need to do to ensure you have data integrity and you do not lose data integrity when you go to a backup site.

With some applications that write to multiple files, the order that data is actually written to the disk does not matter.  For example when you print data, it often stays in a buffer, and is written out when the buffer is full.

A transaction manager

With programs that handle transactions (a transaction manager) it is critical that writes to disk are done in the order they are issued.  If the writes are not in the correct order then if there if the system crashes and tries to restore the transaction the recovery may be missing key data  (“it has taken the money from your account..  it cannot see who should get the money?”) and so data integrity is lost.

With local disks, the sequence is

  • Write to file1,
  • Wait for confirmation that the IO has completed
  • Write to file2,
  • Wait for confirmation that the IO has completed

Consider the case where file1 and file 2 are on different file systems.  For example file1 could be transaction log, file2 could be queue data.  (Picture file system1 on slow disks, and file system 2 on fast disks – so IO for file 2 is faster than IO to file 1).

With mirrored disks with synchronous replication, the sequence is

  • Write to file1 local copy; send data to remote site,  write to file1, send back OK when completed
  • Wait for confirmation that both IOs have completed
  • Write to file2 local copy,send data to remote site,  write to file2, send back OK when completed
  • Wait for confirmation that both IOs have completed

With synchronous replication the two locations need to be within 10s of kilometers.  The response time of the file write depends on the distance.

With Asynchronous replication the two locations can be 100s of kilometers apart.

In this case the sequence is

  • Write to file1 local copy; send data to remote site,  write to file1, send back OK when completed
  • Wait for confirmation that the local IO has completed.
  • Write to file2 local copy,send data to remote site,  write to file2, send back OK when completed
  • Wait for confirmation that the local IO has completed.

The disk subsystems manages the responses coming back from the remote end.

For capacity reasons there are usually multiple paths between the two sites.  It is possible that the data for file 2 gets there before the data for file1.  If the writes are done in the wrong order, this could be bad news.

Consistency group

The architecture of the mirroring systems have the concept of a consistency group.   You define one or more consistency groups.  You put file systems into a consistence group.  For any files in the consistency group the write order will be honoured.  So in the case above, if the two files are in the same consistency group, it will wait, write the data to file 1, then write to file 2.  This gives a solution with data integrity.

The lurking problem.

Someone needs to define the file systems to each consistency group.   The storage manager may have said

  • “all file systems are part of one consistency group”.
  • “”production data is in one consistency group, test data is in another consistency group”
  • “I’ll guess, and hope people tell me their requirements”

How will I know if I have a problem?

The sure fire way of finding out if you have a problem is to lose a site ( for example a power outage).  For 99 times out of a 100 it may be fine, and then one time in a hundred, you find you cannot restart your systems on the other site.  This is clearly the wrong time to find out.

Check with your storage administrator and give them information about the file systems that need to be part of the same consistency group.

Practice your fail over – perhaps weekly – at least monthly.

One thought on “Are your mirrored file systems consistent?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s