IBM Blog 2013 September

Backing up MQ Pagesets – what do I need to do?

Sep 14 2013

Why do I need to backup my page sets?

You need to backup your MQ page sets regularly to allow the queue manager to recover after loss of a page set. DASD is very reliable these days, but you could lose a dataset due to external factors such as floods, or by a person deleting the data set.

How do I backup my page sets

The preferred way of backing up a page set is using ADRDSSU utility (which is part of DFSMSDss) or equivalent.
You can use VSAM REPRO, but this is much slower than using the ADRDSSU. See below
You must do a logical backup of the data set, rather than using a backup of a whole DASD volume when you have multiple extents. This is to ensure the first page of the page set is backed up first. This page contained critical information about where in the logs to start recovery.

How often do I need to backup my page set

Backing up at least daily is a typical interval. Test systems which you can easily recreate do not need to be backed up. For production systems, you need to consider the chance of losing a page set, and the time the queue manager will be not available while the page set is restored and the queue manager restarted to recover the page set.

Do I need to shut my queue manager down while backing up the page set?

You can backup page sets while the queue manager is active, a so called fuzzy backup. So no, you do not need to shut your queue manager down to backup; but you do need the logs from up to 3 checkpoints before the backup was taken.

Impact of page set failure is likely to be an I/O error when the queue manager attempts to read or write the page set – this is very likely to cause a queue manager outage. The pressure is then on for a restart.

How long will it take to recover a page set?

This depends on many things

A decision needs to be made to restore the page set from a backup. This may mean having a phone call or a meeting to discuss it
You need to restore the data set from the backup. The backup could be migrated to tape, so time will be spent recalling the data set to disk. The duration of the restore depends on the size of the dataset, and if the source is on disk or tape.
The queue manager is restarted.
1. The first page of each page set contains the point in the log to start processing from (known as the restart RBA).
2. The queue manager starts at the earliest restart RBA from all of the page set and reads forward till the last log entry. Any records read from the log for the page set are reapplied to the page set. Even though the page set may not have been updated, the log is read looking for records.
3. If the restart RBA is not in active logs then archive logs need to be used. Archive logs may be migrated to tape, and so you need to manually recall these from HSM, so they are available on line before the queue manager needs them. If you do not recall them in advance the queue manager will wait until they have been recalled.
4. Once the logs have been read forward, the queue manager reads the logs backward to rollback any incomplete units of work.

The longest parts may be the meeting, and the reading of the logs forward.

I need to recover more than one page set

Restore all of the page sets and then restart the queue.

Can I backup all of my page sets in one step?

Yes, this may be easy to do for small page sets, but if you have large page sets you should use one backup per page set. If you backup all of your page sets to one large file, then it will take longer to recall this large file from tape compare to having one backup dataset per page set. The recovery of one page set from a backup of many page sets may take longer as more data may needed to be read. You can use multiple job steps, so the backups are done sequentially, or have multiple jobs so the backups are done in parallel.

Sample JCL

You can backup using ADRDSSU using COPY or DUMP.

Using COPY you can exploit FlashCopy or Snapshot within the DASD subsystem to copy data sets within the same DASD subsystem ( it saves a copy of the pointers to the data, rather than the data itself – so is fast)
Using DUMP you can copy to tape or a different DASD subsystem, and use Generation Data Groups to manage your backups

Using ADRDSSU DUMP command

You should consider using a Generation Data Group (GDG) for your backups. This allows you to easily manager your backups. For example

//PAICEGDG JOB 1,MSGCLASS=H
//STEP1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG –
(NAME(SCENDATA.BACKUP.MQPA.PAGESET.P4) –
LIMIT(4) –
NOEMPTY)

Sample JCL to dump a page set

//ARDSSU EXEC PGM=ADRDSSU,REGION=6M
//SYSPRINT DD SYSOUT=H
//DD2 DD DSN=SCENDATA.BACKUP.MQPA.PAGESET.P4(+1),
// DISP=(NEW,CATLG),
// SPACE=(CYL,(1000,100),RLSE)
//SYSIN DD *
DUMP –
OUTDDNAME(DD2) –
COMPRESS –
SHARE –
SPHERE –
CANCELERROR –
TOL(ENQF) –
DATASET(INCLUDE(SCENDATA.MQPA.PAGESET.P4))
/*
The TOL(ENQF) is to allow the data set to be backed up while it is in use. To use this you need read access to the RACF profile
STGADMIN.ADR.DUMP.TOLERATE.ENQF CL(FACILITY)

Ensure you know where your JCL is to delete and recreate the page set. It is dangerous to have one job with delete statements for all of your page sets, followed by the define of all of your page sets, as you may accidentally submit it – and lose all of your page sets.
It is better to have members for each page set – one member to delete, and another to define, and another to restore it. It is good practice to have a PDS for each queue manager for this JCL.

JCL to restore the page set from a dump

//ARDSSU EXEC PGM=ADRDSSU,REGION=6M
//SYSPRINT DD SYSOUT=H
//DD2 DD DSN=SCENDATA.MQPA.PAGESET.P4.BACKUP(0),
// DISP=SHR
//SYSIN DD *
RESTORE –
IMPORT –
INDDNAME(DD2) –
CANCELERROR –
DATASET(INCLUDE(**)) –
SPHERE –
SHARE –
REPLACE –
CATALOG

The queue manager must be shut down to restore the page set. The (0) in the data set name, says use the latest copy.

On my system it took about 18 seconds to backup a page set with a size of 3084 cylinders ( 555120 pages).

Using ADRDSSU COPY command.

This can exploit FlashCopy of Snapshot facilities within the DASD subsystem and so be very fast.

The JCL below copies SCENDATA.MQPA.PAGESET.** data sets to SCENDATA.MQPA.BACKUP1.**, so SCENDATA.MQPA.PAGESET.P4 gets copied to ,SCENDATA.MQPA.BACKUP1.P4. Before this is done, it copies SCENDATA.MQPA.BACKUP1.** to SCENDATA.MQPA.BACKUP2.**, so SCENDATA.MQPA.BACKUP1.P4 gets copies to SCENDATA.MQPA.BACKUP2.P4. This is done to have a copy of the previous backup

//PAICEUM2 JOB MSGCLASS=H,NOTIFY=PAICE,COND=(4,GT)
//* WE COPY PAGESET TO BACKUP1, BUT TO ENSURE WE HAVE AT LEAST
//* ONE BACKUP WE COPY BACKUP1 TO BACKUP2 BEFORE WE COPY THE PAGESET
//* STEP 1 COPY BACKUP1 TO BACKUP2 FOR EVERY PAGE SET
//* STEP 2 COPY PAGESET TO BACKUP1 FOR EVERY PAGE SET
//* IF STEP1 GETS RETURN CODE > 4 THEN THE JOB STOPS AND DOES NOT
//* DO STEP2
//STEP1 EXEC PGM=ADRDSSU,REGION=6M
//SYSPRINT DD SYSOUT=H
//SYSIN DD *
COPY –
DATASET(INCLUDE(SCENDATA.MQPA.BACKUP1.* )) –
RENAMEU(SCENDATA.MQPA.BACKUP1.**,SCENDATA.MQPA.BACKUP2.**) –
REPUNC –
FASTREPLICATION(PREFERRED)-
SPHERE –
CANCELERROR –
TOL(ENQF)
//STEP2 EXEC PGM=ADRDSSU,REGION=6M
//SYSPRINT DD SYSOUT=H
//SYSIN DD *
COPY –
DATASET(INCLUDE(SCENDATA.MQPA.PAGESET.* )) –
RENAMEU(SCENDATA.MQPA.PAGESET.**,SCENDATA.MQPA.BACKUP1.**) –
FASTREPLICATION(PREFERRED)-
SPHERE –
REPUNC –
CANCELERROR –
TOL(ENQF)
/*
//

Using FlashCopy the copy took under 2 seconds. When Flashcopy could not be used, it took about 27 seconds to copy the same data set as above (where the DUMP above took 18 seconds).

To copy the page set back use JCL like

//PAICEUM2 JOB MSGCLASS=H,NOTIFY=PAICE,COND=(4,GT)
//STEPREST EXEC PGM=ADRDSSU,REGION=6M
//SYSPRINT DD SYSOUT=H
//SYSIN DD *
COPY –
DATASET(INCLUDE(SCENDATA.MQPA.BACKUP1.P4 )) –
RENAMEU(SCENDATA.MQPA.BACKUP1.**,SCENDATA.MQPA.PAGESET.**) –
FASTREPLICATION(PREFERRED)-
SPHERE –
REPUNC –
CANCELERROR

/*
//
The queue manager must be shut down before restoring.

How long does the queue manager take to restart.

I restored the page set and restarted the queue manager. The restart of the queue manager read the active logs at about 180MB a second. A value of 60MB a second is typical at many customers. Calculate how many active logs that will need to be read and calculate the duration n * log size / data rate.

Reading from archive logs on disk the rate was about 55MB a second. It may be lower on your system. Calculate how many archive logs that will need to be read and calculate the duration ( n * log size / data rate + n * {tape mount time or HSM recall time })