Whoops I didn’t see that error message
Oct 13 2013
I was visiting a customer to do an MQ health check and looked at their MQ job log, and saw some messages that log shunt had failed, and some other configuration messages. They hadnt spotted the messages because they were not automated, and MQ produced so many messages.
I took some rexx’s exec that I had written and converted them to edit macros. These delete all of the boring messages and keep the ones which should be checked. When there are multiple messages of the same type – these are excluded.
So editing at the message logs using SDSF or editing a file of the message log, reduced a file of 78000 records down to
04.52.52 CSQI064E MQ01 Cannot get information from DB2. TOPIC;COPY objects not refreshed; 04.52.52 CSQM056E MQ01 CSQMDURR MQOPEN failed for queue;SYSTEM.DURABLE.SUBSCRIBER.QUEUE, MQRC=2085; 06.34.57 CSQP020E MQ01 CSQP1RSW Buffer pool 3 is too small – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 62 Line(s) not Displayed 05.16.57 CSQ3201E MQ01 ABNORMAL EOT IN PROGRESS FOR;USER=PAICE1A CONNECTION-ID=RRSBATCH THREAD-XREF= JOBNAME=PAICE1A1;ASID=0 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 261 Line(s) not Displayed 04.52.52 CSQ5020E MQ01 CSQ5LIST SQL error, table;CSQ.OBJ_B_TOPIC not defined in DB2; 00.21.33 CSQ9016E MQ01 ‘ DISPLAY’ command request not authorized – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 11 Line(s) not Displayed 00.21.33 CSQ9023E MQ01 CSQ9SCND ‘DISPLAY QMGR’ ABNORMAL COMPLETION – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 17 Line(s) not Displayed 04.11.53 IEC501A M 1C5D,PRIVAT,SL,COMP,MQ01MSTR,MQ01MSTR,HLQ1.MQM.MQ01.A1.E00403.T0410083.A0034619 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 111 Line(s) not Displayed 04.11.53 IEC502E K 1C5D,BB1234,SL,MQ01MSTR,MQ01MSTR,HLQ1.MQM.MQ01.A1.E00403.T0410083.A0046619 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 111 Line(s) not Displayed 04.10.08 IEF233D M 1C5D,PRIVAT,SL,MQ01MSTR,MQ01MSTR,;HLQ1.MQM.MQ01.A1.E00403.T0410083.B0046619,;OR RESPOND TO IEF455D MESSAG – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 77 Line(s) not Displayed 04.14.02 IEF234E K 1C5D,AA1234,PVT,MQ01MSTR,MQ01MSTR – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 77 Line(s) not Displayed |
This works for the chinit as well as the queue manager logs.
The edit macros are in XMIT format here. You need to FTP them to z/OS in binary.
From Ubuntu I use FTP
BIN
Site cyl 1 pri=1 sec=1 recfm=fb lrecl=80 blksize=3200
put mqlog.xmit.bin mqlog.xmit
Then from TSO issue TSO RECEIVE INDSN(mqlog.xmit)
This will create a pds with 3 members in it. Either copy the three members in your ISPF concatenation, or add the data set to your concatentation REXX or SYSEXEC concatenation.
You can use the TSO command ISRDDN to list the dataset allocted. Copy the 3 members to a data set with DDNAME REXX.
If you do not have an existing data set you can use, you can use the following command from SDSF
tso altlib act appl(clist) da(‘paice.clist’)
if you use split screen you will need to issue the command on each session.
There are 3 macros you need to run
MQLOG1. This takes the multi line messages and rebuild them into one message – truncating if necessary
MQLOG2. This goes through the file and checks to see if it is an Information message which can be ignored ( such as the output from a display command), an error message which can be ignored ( you gave an invalid syntax to a command), and leaves the rest.
MQLOG3. This sorts the file into message id sequence. It then goes through the file and if the message number is the same as the previous it excludes it.
The file has one instance of each error message, and perhaps a line saying how many instances of this message are excluded. You can use standard ISPF edit commands such as
DELETE ALL X to remove the duplicates, s999 line command to show the duplicates, or create a new file from the contents of the file.
When you exit from the file it will prompt you to save your changes or cancel.
I can see other function, such as displaying the rates of logging etc could easily be done using macros. Would any one find this sort of thing useful?
Disaster recovery and Active-Active for MQ
Oct 4 2013
Why am I interested in Active Active and disaster recovery?
There may be times when you want to move MQ on z/OS work from one site to another site and it is too far to be in the same sysplex. For example your data centres are in two different cities. Should I use disaster recovery or an Active-Active environment?
- You want to have two sites a distance too far apart for CFs to be shared between sites
- Data can be mirrored, so that DASD changes on one site get copied to a different site
- DB2 changes can get mirrored so two databases are kept in sync within a couple of seconds – perhaps minutes. This can be done with the IBM product QREP.
Disaster recovery site.
- The systems at the DR site are not doing any useful work. They may be quiesced.
- For recovery at the DR site, it may need the DASD to be reconfigured, and the z/OS images IPLed.
- In effect the same QSG is used – it has the same name, the data is on the same logical disk, but on a different physical disk.
- For MQ, the logs and pages are on disks which have been mirrored.
- When the queue manager starts, the CF structures are in an undefined state. They need to be recovered from a MQ backup of the CF structure. MQ will then process the active and perhaps archive logs (from all queue managers in the QSG), to recover persistent messages to the point of failure in the QSG.
- There will be a DNS change so channels get routed to the DR system.
- Clustering is not needed.
- It may take an hour or more to get this environment up and running.
Active Active
Both sites are doing production work, but one site may be running personal banking, and the other site running corporate banking. Read only activities such as inquiring an account balance, can be done on either system.
The CICS systems can be started, but are doing no work
There is a QSG on each system.
We use clustering from queue managers outside of this environment to route to the appropriate system eg using cluster ranking. By using CLWLRANK messages are routed to the primary site. If the channel to the primary site is stopped, then messages are automatically routed to the backup site.
MQ outside the environment
There are two scenarios
1) Message sequence important,
2) Message sequence is not important.
Message sequence is not important
Steps to switch
- Stop the cluster channel to the active queue manager(A), messages flow to the other system(B)
- On qmgr A run a batch job which gets from specified queues and does a put to same queue name@B. The mover then moves theses messages
Message sequence is important
Steps to switch
- Stop both cluster channels
- On qmgr A run a batch job which gets from specified queues and does a put to same queuename@B. The mover then moves theses messages
- Once all the messages have been moved, start the channel to B
It should takes 10’s of minutes to do switchI
Using the same channel name on different queue managers
It is not good practice to use non clustered channels and using the RESET CHANNEL command to start a channel to a different queue manager, with the same name as there is a risk of losing a message, or getting duplicate messages.
Clients
Somewhere the DNS will be changed to route traffic to B
Stop the client channels on A , so that they end cleanly and you do not end up with any indoubt UOW on the channel.
The client reconnects and is now connected to B.
Complications
You have clients doing Personal Banking, or corporate banking.
We suggest each business application has a unique port so we can switch each of these independently eg 9.20.4.6(2000) is for corporate banking, and 9.20.4.6(2001) is for personal banking. You may want to move personal banking to system B and corporate banking to system C. You just change the DNS for 9.20.4.6(2000), and personal banking is not affected.
Work running within the environment
Above it spoke about moving messages from one QSG to another, by having an application get the messages and put them to QRremote so they can get sent to the other system.
While this is happening, you may not want the applications processing the messages.
There are a couple of approaches
Simple case
- You need to stop applications putting messages to the queues. You should stop receiver type channels.
- Set the application queues to disable gets, so the CICS transactions processing messages should end normally.
- Once the transactions have stopped, enable gets again, and run the program to move the messages to the Qremote, so they can be moved to the remote system
More robust solution
- You disable puts for application queues, and the applications should then end cleanly
- Your applications may not be well written, or may have recovery built into them, so if the queue is disabled for gets, it does not end, but retries after a short period.
- In this case disabling the queue for gets and reenabling it will not work. You can use two definitions for the queue.
- Define a QALIAS for the application queue pointing to the base queue. Although the base queue has gets disabled, messages can still be got from the alias queue. So the program which gets messages and puts them to the remote queue should use this alias queue.
Are my definitions in sync?
You may have made configuration changes to your queue managers. You need to ensure these are copied to the other QSG.
You can use CSQUTIL to unload defintions on system A using MAKEREP, and statements are generated that can be run on them on system B to replace QMGR B’s definitions with those from system A.
However someone may have made changes on system B and not on system A, and these would be overwritten.
On system B you can use the DISPLAY Q(*) ALTDATE ALTTIME where(ALTDATE,GT,’2013-09-24′) to see if any changes were made after a particular date, and DISPLAY Q(*) ALTTIME where(ALTDATE,EQ,’2013-09-23′) to see any changes which were made on the given date. The alter time should be the close to the time you last replaced the objects.
If you identify some differences you need to resolve them before a switch or after the switch depending on the priority
These blog posts are from when I worked at IBM and are copyright © IBM 2013.