Ive got nothing in my events queue!

I was looking to test out my Python code and was checking the PCF messages produces – or in my case was not being produced. I did some playing around.

There is a DIS QMGR EVENT command (not eventS) this gave me

QMNAME(QMA) AUTHOREV(DISABLED)
CHLEV(DISABLED) CMDEV(DISABLED)
CONFIGEV(ENABLED) INHIBTEV(DISABLED)
LOCALEV(DISABLED) LOGGEREV(DISABLED)
PERFMEV(ENABLED) REMOTEEV(DISABLED)
SSLEV(DISABLED) STRSTPEV(ENABLED)

So the reason I was not getting many events is they mostly turned off.

I used runmqsc and exploited the command-complete facility.
I typed alter qmgr chl and pressed tab for it to complete it ALTER QMGR CHLEV(ENABLED). I then enter AUTHO and pressed tab and so on.
Then, when the command was completed, I pressed enter – only for it to complain about LOGGEREV was only valid when using linear logging.

The whole command was

ALTER QMGR CHLEV(ENABLED) AUTHOREV(ENABLED) CONFIGEV(ENABLED) INHIBTEV(ENABLED) LOCALEV(ENABLED) PERFMEV(ENABLED) REMOTEEV(ENABLED) SSLEV(ENABLED) STRSTPEV(ENABLED)

What do change events look like?

I was using my Python programs for processing MQ PCF data, and was testing them out with different sorts of MQ data. I found the documentation for change events in the IBM knowledge centre is a bit sparse, so here is some information on it.

I did

  • define QL(DELETEME)
  • alter ql(DELETEME) descr(‘change comment’)
  • delete QL(DELETEME)

For define and delete there is one event message created. For alter there are two messages created, one contains before data the other has the after data.

The MD looks like

"MQMD":{
"StrucId":"MD ",
"Version":1,
"Report":"NONE",
"MsgType":"DATAGRAM",
"Expiry":-1,
"Feedback":"NONE",
"Encoding":546,
"CodedCharSetId":1208,
"Format":"MQEVENT ",
"Priority":0,
"Persistence":"NOT_PERSISTENT",
"MsgId":"0x414d5120514d4120202020202020202020…",
"CorrelId":"0x414d5120514d4120202020202020202020…",
"BackoutCount":0,
"ReplyToQ":" ",
"ReplyToQMgr":"QMA",
"UserIdentifier":" ",
"AccountingToken":"0x0000…",
"ApplIdentityData":" ",
"PutApplType":"QMGR",
"PutApplName":"QMA ",
"PutDate":"20190217",
"PutTime":"11141373",
"ApplOriginData":" ",
"GroupId":"0x000000...",
"MsgSeqNumber":1,
"Offset":0,
"MsgFlags":0,
"OriginalLength":-1
},

When a change event occurs the MD for the “after” message is effectively the same but note the MsgID is different, and the CorrelId is the same, so you will need to clear the MsgId and keep the CorrelID when getting the second message.

For the create command the PCF header was

"Header":{
"Type":"EVENT",
"StrucLength":36,
"Version":2,
"Command":"CONFIG_EVENT",
"MsgSeqNumber":1,
"Control":"LAST",
"CompCode":0,
"Reason":2367, "CONFIG_CREATE_OBJECT"
"ParameterCount":58,
},

For the delete command the PCF header was

"Header":{
"Type":"EVENT",
"StrucLength":36,
"Version":2,
"Command":"CONFIG_EVENT",
"MsgSeqNumber":1,
"Control":"LAST",
"CompCode":0,
"Reason":2369, "CONFIG_DELETE_OBJECT
"ParameterCount":58

}

For the alter/change command the PCF headers were

"Header":{
"Type":"EVENT",
"StrucLength":36,
"Version":2,
"Command":"CONFIG_EVENT",
"MsgSeqNumber":1,
"Control":"NOT_LAST",
"CompCode":0,
"Reason":2368,
"ParameterCount":58
},

and

"Header":{
"Type":"EVENT",
"StrucLength":36,
"Version":2,
"Command":"CONFIG_EVENT",
"MsgSeqNumber":2,
"Control":"LAST",
"CompCode":0,
"Reason":2368, "CONFIG_CHANGE_OBJECT"
"ParameterCount":58
},

The differences are the MsgSeqNumber value and Control being NOT_LAST or LAST.

The data is common to all requests. You need to compare the fields in both the records for the change event, to see which are different. With the data in Python dict – this was a trivial exercise.

"Data":{
"EVENT_USER_ID":"colinpaice",
"EVENT_ORIGIN":"CONSOLE",
"EVENT_Q_MGR":"QMA",
"OBJECT_TYPE":"Q",
"Q_NAME":"DELETEME",
"Q_DESC":"",

"Q_TYPE":"LOCAL"
}

after the change

"Data":{
"EVENT_USER_ID":"colinpaice",
"EVENT_ORIGIN":"CONSOLE",
"EVENT_Q_MGR":"QMA",
"OBJECT_TYPE":"Q",
"Q_NAME":"DELETEME",
"Q_DESC":"change comment",

"ALTERATION_DATE":"2019-02-17",
"ALTERATION_TIME":"11.14.33",

"Q_TYPE":"LOCAL"
}

When I printed out the changed data, the first time ALTERATION_DATA and ALTERATION_TIME were both displayed. The second time, only ALTERATION_TIME was displayed. I thought this was a bug in my program. After checking my program, it was obvious… If I change the queue 10 minutes later – the ALTERATION_DATE does not change – so remember to report both of these, and not just the change values.

Baby Python scripts doing powerful work with MQ

I found PyMqi is an interface from Python to MQ. This is really powerful, and I’m am extending it to be even more amazing!

In this blog post, I give examples of what you can do.

  • Issue PCF commands and get responses back in words rather than internal codes ( so CHANNEL_NAME instead of 3501)
  • Saving the output of DISPLAY commands into files
  • Using these files to compare definitions and highlight differences.
  • Check these files conform to corporate standards.
  • Print out from the command event queue, and the stats event queue etc


I can use some python code to display information via PCF

  • # connect to MQ
  • qmgr = pymqi.connect( queue_manager,”QMACLIENT”,”127.0.0.1(1414)”)
  • # I want to inquire on all SYSTEM.* channels
  • prefix = b”SYSTEM.*”
  • # This PCF request
  • args = {pymqi.CMQCFC.MQCACH_CHANNEL_NAME: prefix}
  • pcf = pymqi.PCFExecute(qmgr)
  • # go execute it
  • response = pcf.MQCMD_INQUIRE_CHANNEL(args)

This is pretty impressive as a C program would take over 1000 lines to do the same!

This comes back with data like

  • 3501: b’SYSTEM.AUTO.RECEIVER’,
  • 1511: 3,
  • 2027: b’2018-08-16 ‘,
  • 2028: b’13.32.15′,
  • 1502: 50

which is cryptic even for experts because you need to know 3501 is the value of the type of data for “CHANNEL_NAME”.

I have some python code which converts this to..

  • ‘CHANNEL_NAME’: ‘SYSTEM.AUTO.RECEIVER’,
  • ‘CHANNEL_TYPE’: ‘RECEIVER’,
  • ‘ALTERATION_DATE’: ‘2018-08-16’,
  • ‘ALTERATION_TIME’: ‘13.32.15’
  • ‘BATCH_SIZE’: 50

for which you only need a kinder garden level of MQ knowledge to understand it. It converts 3501 to CHANNEL_NAME, and 3 into RECEIVER

With a few lines of python I can write this data out so each queue is a file on disk in YAML format.

A yaml file for a queue looks like

  • Q_NAME: TEMP
  • Q_TYPE: LOCAL
  • ACCOUNTING_Q: Q_MGR
  • ALTERATION_DATE: ‘2019-02-03’
  • ALTERATION_TIME: 18.15.52
  • BACKOUT_REQ_Q_NAME: ”

Now it gets exciting! (really)

Now it is in YAML, I can write small Python scripts to do clever things. For example

Compare queue definitions

  • from ruamel.yaml import YAML
  • import sys
  • yaml=YAML()
  • q1 = sys.argv[1] # get the first queue name
  • ignore = [“ALTERATION_DATE”,”ALTERATION_TIME”,
  • “CREATION_DATE”,”CREATION_TIME”]
  • in1 = open(q1, ‘r’) # open the first queue
  • data1 = yaml.load(in1) # and read the contents in
  • for i in range(2,len(sys.argv)): # for all of the passed in filenames
  • q2=sys.argv[i] # get the name of the file
  • in2 = open(q2, ‘r’) # open the file
  • data2 = yaml.load(in2) # read it in
  • for e in data1: # for each parameter in file 1
  • x1 = data1[e] # get the value from file 1
  • x2 = data2[e] # get the value from the other file
  • if not e in ignore: # some parameters we want to ignore
  • if x1 != x2: # if the parameters are different
  • print(q1,q2,”:”,e,x1,”/”,x2) # print out the queuenames, keywork and values

From this it prints out the differences

  • queues/CP0000.yml queues/CP0001.yml : Q_NAME CP0000 / CP0001
  • queues/CP0000.yml queues/CP0001.yml : OPEN_INPUT_COUNT 1 / 0
  • queues/CP0000.yml queues/CP0001.yml : MONITORING_Q Q_MGR / HIGH
  • queues/CP0000.yml queues/CP0001.yml : OPEN_OUTPUT_COUNT 1 / 0
  • queues/CP0000.yml queues/CP0002.yml : Q_NAME CP0000 / CP0002
  • queues/CP0000.yml queues/CP0002.yml : OPEN_INPUT_COUNT 1 / 0
  • queues/CP0000.yml queues/CP0002.yml : OPEN_OUTPUT_COUNT 1 / 0

I thought pretty impressive for 20 lines of code.

and another script -for checking standards

  • from ruamel.yaml import YAML
  • import sys
  • yaml=YAML()
  • q1 = sys.argv[1] # get the queue name
  • # define the variables to check
  • lessthan = {“MAX_Q_DEPTH”:100}
  • ne = {“INHIBIT_PUT”:”PUT_ALLOWED”,”INHIBIT_GET”: “GET_ALLOWED”}
  • in1 = open(q1, ‘r’) # open the first queue
  • data = yaml.load(in1) # and read the contents in
  • # for each element in the LessThan dictionary (MAX_QDEPTH), check with the
  • # data read from the file.
  • # if the data in the file is “lessthan” the value (100)
  • # print print out the name of the queue and the values
  • for i in lessthan: # just MAX_Q_DEPTH in this case
  • if data1[i] < lessthant[i] : print(q1,i,data[i],”Field in error. It should be less than <“,lessthan[i])
  • # if the values are not equal
  • for i in ne: # INHIBUT_PUT and #INHIBIT_GET
  • if data[i] != ne[i] : print(q1,i,data[i],”field is not equal to “,lt[i])

the output is

queues/CP0000.yml
MAX_Q_DEPTH 5000 Field in error. It should be < 100

Display command events

difference Q_NAME CP0000 CP0000 ALTERATION_DATE 2019-02-07 2019-02-11

difference Q_NAME CP0000 CP0000 ALTERATION_TIME 20.48.24 21.29.23

difference Q_NAME CP0000 CP0000 MAX_Q_DEPTH 4000 2000

With my journey so far – Python seems to be a clear winner in providing the infrastructure for managing queue managers.

The lows (and occasional high) of managing MQ centrally.

While I was still at IBM, and since I retired from IBM I have been curious how people managed MQ in their enterprise systems.

  1. How do you deploy a change to a queue to 1000 queue managers, safely, accurately, by an authorised person, and by the way one queue manager was down when you tried to make the change?
  2. Are theses identical systems identical – or has someone gone in and made an emergency change on one system and left one parameter different?
  3. We have all of these naming standards – do we follow them? Did we specify encryption on all external channels?

At the bottom of this blog (more like a long essay) I show some very short Python scripts which

  • compare queue definitions and show you the differences between them.
  • check when queue attributes do not meet “corporate standards”
  • printing of data from the change-events queue, so you can see what people altered.
  • I also have scripts which display PCF data from events, stats etc. I need to clean them up, then I’ll publish them.

I think Python scripting will make systems management so much easier.

Strategic tools do not seem to deliver.

There seem to be many “strategic tools” to help you. These include Chef, Puppet, Ansible, and Salt which are meant to help you deploy to your enterprise

There is a lot of comparison documents on the web – some observations in no particular order

  • Chef and Puppet have an agent on each machine and seem complex to initially set up
  • Ansible does not use agents – it uses SSH command to access each machine
  • Some tools expect deployers to understand and configure in Ruby (so moving the complexity from knowing MQ to Ruby), others use YAML – a simple format.

This seems to be a reasonable comparison.

Stepping back from using these tools I did some work to investigate how I would build a deployment system from standard tools. I have not done it yet, but I thought I would document the journey so far.

Some systems management requirements

What I expect to be able to do in an enterprise MQ environment.

  • I have a team of MQ administrators. All have read only access to all queue managers. Some can only update test, some can update test and production.
  • I want to be able to easily add and remove people from role based groups, and not wait a month for someone to press a button to give them authority.
  • I want to save a copy of the object before, and after a change – for audit trail and Disaster Recovery.
  • The process needs to handle the case when a change does not work because, the queue manager is down, or the object is in use.
  • I want to be able to deploy a new MQ server – and have all of the objects created according to a template for that application.
  • I want to check enforce standards eg names, and values (do you really need a max queue depth of 999 999 999, and why is curdepth 999 999?).
  • I want to be able to process the event data and stats data produced by MQ and put them in SPLUNK or other tool.
  • There are MQ object within the queue manager, and other objects such as CCDT tables for clients, and keystores TLS keys. I need to get these to wherever they are used.
  • I want to report statistics on MQ in my enterprise tool – so I need to get the statistics data from each machine to the central reporting tool
  • I want Test to look like Production (and use the same processes) so we avoid the problem of not testing what was deployed.

Areas I looked at

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

  • This may be fine for a test environment, but once deployed I still want to be able to change object attributes on a subset of the queue managers. I don’t think Docker solves this problem (it is hard to tell from the documentation).
  • I could not see how to set up the Client Channel Definition Tables (CCDT) so my client applications can connect to the appropriate production queue manager.
  • If I define my queues using clustering, when I add a new queue manager, the objects will be added to the repository cache. When I remove a queue manager from the cluster, and delete the container, the object live on in the cache for many days. This does not feel clean.
  • I wondered if this was the right environment (virtualised) for running my production performance critical workload on. I could not easily find any reports on this.
  • How do I manage licenses for these machines and make sure we have enough licenses, and we are not illegally using the same licence for all machines.

Using RUNMQSC

At first running runmqsc locally seemed to be answer to many of the problems.

I could use secure FTP to get my configuration files down to the machine, logon to the machine, pipe the file into runmqsc, capture the output and ftp the file back to my central machine.

  • Having all the MQ administrators with a userid on each machine can be done using LDAP groups etc. so that works ok
  • To use a userid and password you specify runmqsc -u user_id qm. This then prompts for the password. If you are pipe your commands in, then you have to put your password as the first line of the piped input. This is not good practice, and I could not see a way of doing it without putting the password in the file in front of the definitions. (Perhaps a Linux guru can tell me)

Having to ftp the files to and from the machine was not very elegant, so I tried using runmqsc as a client (the -c option). At first this seemed to work, then I tried making it secure, and use an SSL channel. I could only get this to work when it used a channel with the same name as the queue manager name. (So to use queue manager QMB I needed an SSL channel called QMB. The documentation says you cannot use MQSERVER environment variable to set up an SSL channel). On my queue manager QMB channel was already in use. I redefined my channel and got this to work.

As you may expect, I fell over the CHLAUTH rules, but with help from some conference charts written by Morag, I got the CHLAUTH rules defined, so that I could allow people with the correct certificate to use the channel. I could then give the channel a userid with the correct authority for change or read access.

I had a little ponder on this, and thought that a more secure way would be to use SSL AND have a userid and password. If someone copied my keystore they would still need the password to connect to MQ, and so I use two factor authentication.

This is an OK is solution, but it does not go far enough. It is tough trying to parse the output from runmqsc (which is why PCF was invented).

Using mqscx

Someone told me about mqscx from mqgem software. If runmqsc is the kinder garden version , mqscx is the adult one. It does so much more and is well worth a look.

See here for a video and here for the web site.

How does it do against my list of requirements?

  • I can enter userid and password on the command line and also pipe a file in ✔
  • One column output ( if required) so I can write the output to a file, and it is then very easy to parse ✔
  • I can use ssl channels ✔
  • I can use my Client Channel Definition Table (CCDT) ✔

It also has colour, multi column, better use of screen area ( you can display much more on your screen) and its own scripting language.

You can get a try before you buy license.

I moved onto Python and runmqsc…

so I could try to do useful things with it.

Using runmqsc under python does not work very well.

I have found Python is a very good tool for systems management – see below for what I have done with it.

  • I tried using Python “subprocess” so I could write data down the stdin pipe, into runmqsc and capture the output from the data writen to stdout. This did not work. I think the runmqsc output is written to stdout, but not flushed, so the program waiting for the data does not get it, and you get a deadlock.
  • I tried using Python “pexpect”, but this did not work as I could send one command to stdin, but then stdin was closed, and I could not send more data.
  • Another challenge was parsing the output of runmqsc. After a couple of hours I managed to create a regular expression which parsed most of the output, but there were a few edge cases which needed more work, and I gave up on this.
  • PCF on its own is difficult to use.
  • I came across PyMqi – MQ for Python. This was great, I could issue PCF commands, and get responses back – and I can process event queues and statistics queues!

From this I think using PyMqi is great!  My next blog post will describe some of the amazing things you can do in Python with only a few lines of code!

How do I change SSLCIPH on a channel?

Regular readers of my blog know that most of the topics I write on appear simple, but have hidden depth, this topic is no exception.

The simple answer is

  • For the client ALTER CHL(xxxx) CHLTYPE(CLNTCONN) SSLCIPH(new value)
  • For the svrconn
    • ALTER CHL(xxxx) CHLTYPE(SVRCONN) SSLCIPH(new value)
    • REFRESH SECURITY

The complexity occurs when you have many clients trying to use to the channel, and you cannot change them all at the same time (imagine trying to change 1000 of them – when half of them are not under your control). For the clients that have not changed, you will get message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘…’.

in the /qmgrs/xxxx/errors/AMQERR01.LOG

For this problem the CCDT is your friend. See my blog post here.

I have a client channel CHANNEL(C1) CHLTYPE(CLNTCONN)

On my CCDT queue manager I created another channel the same as the one I want to update.

DEF CHANNEL(C2) CHLTYPE(CLNTCONN) LIKE(C1)

On my server queue manager I used

DEF CHANNEL(C2) CHLTYPE(SVRCONN) LIKE(C1)

DEFINE CHLAUTH(C2) TYPE(BLOCKUSER)
USERLIST(….)

REFRESH SECURITY

When I ran my sample connect program, it connected using C1 as before.

On the MQ Server, I changed the SSLCIPH to the new value for C1.

When I ran my sample connect program it connected using channel(C2). In the AMQERR01.LOG I had the message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘C1′

So the changed channel did not connect, but the second channel with the old cipher spec worked succesfully. (The use of the backup channel was transparent to the application)

I then changed DEF CHANNEL(C1) CHLTYPE(CLNTCONN) so SSLCIPH had the correct, matching value. When my sample program was run, it connected using channel C1 as expected.

Once I have changed all my channels, and get no errors in the error log.

  • I can change the CHLAUTH(C2) BLOCKUSER(*) and either set warning, or give no warning and no access
  • Remove C2 from the CCDT queue manager, so applications no longer get this in their CCDT
  • Finally delete the channel C2 on the server.
  • Go down the pub to celebrate a successful upgrade!


Should I do in-place or side by side migration of MQ mid-range?

With mid-range MQ there are a couple of migration options:

  • Upgrade the queue manager in place – if there are problems, restore from backup, and sort out the problems this restore may cause. You may want to do this is you have just the one queue manager.
  • Upgrade the queue manager in place – if there are problems, leave it down until any problems can be resolved. This assumes that you are a good enterprise user and have other queue managers available to process the work.
  • Create another queue manager, “next to it” (“side by side”) on the same operating system image. A better description might be “adding a new queue manager to our environment on an existing box, and deleting an old one at a later date” rather than “side by side migration”. You may already have a document to do this.

What do you need to do for in-place migration.

  • Backup your queue manager see a discussion here
  • Shut down the queue manager, letting all work end cleanly
  • Either (see here)
    • Delete the previous version of MQ, and install the new version, or better..
    • Use Multi-install – so you have old and new versions available at the same time
  • Switch to the new version (of the multi-install)
  • Restart the queue manager
  • Let work flow
  • Make note of any changes you make to the configuration – for example alter qlocal… in case you need to restore from a backout, and re-apply the changes.

If you need to backout the migration and restore from the backout

You need to

  • Make sure there are no threads in doubt
  • Make sure all transmission queues are empty (so you do not overwrite messages when you restore from a backup)
  • Make sure all transmission queues are empty ( so you do not overwrite messages when you restore from a backup)
  • Offload messages from application queues – if you are lucky there will be no messages. Do not offload messages from the system queues.
  • Shut down MQ
  • Reset the MQ installation to the older version
  • Restore from your backup see here
  • Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
  • Load messages back onto the application queues
  • Reapply any changes, such as alter QL…

In the situation where you have a problem, personally I think it would be easier to leave the queue manager down, rather than trying to restore it from a backup. You may want to offload any application messages first. Of course this is much easier if you have configured multiple queue managers, and leaving one queue manager shut down should not cause problems. Until any problems are fixed you cannot proceed with migrating other queue managers, and you may have the risk of lower availability because there is one server less.

What you need to do for side by side migration.

“Side by side” migration requires a new queue manager to be created, and work moved to the new queue manager

  • If this is a cluster repository, you need to move it to another queue manager if only temporarily (otherwise you will get a new repository)
  • You need a new queue manager name
  • You need a new port number
  • Create the queue manager
  • You may want to alter qmgr SCHINIT (MANUAL) during the configuration so that you do not get client applications trying to connect to your new queue manager before you are ready.
  • You need to backup all application object definitions, chlauths etc and reapply them to the new queue manager. Do not copy and restore the channels
  • Apply these application objects to the new queue manager
  • List the channels on the old system
  • Create new channels – for example cluster receiver, with CONNNAME will need the updated port, and a new name
  • You should be able to reuse any sender channels unchanged
  • If you are using CCDT
    • Define new client SVRCONN names (as a CCDT needs unique channel names)
    • On the the queue manager which creates the CCDT, create new Client CLNTCONN channels. The queue manager needs unique names
    • Send the updated CCDT to applications which use this queue managers, so they can use the new queue manager. Note: From IBM MQ Version 9.0, the CCDT can be hosted in a central location that is accessible through a URI, removing the need to individually update the CCDT for each deployed client. See here
    • If you are using clustered queues, then cluster queues will be propagated automatically to the repository and to interested queue managers
    • If you are not using clustering, you will need to create sender/receiver channels, and create the same on the queue managers they attach to
  • Update automation to take notice of the queue managers
  • Change monitoring to include this queue manager
  • Change your backup procedures to back up the new queue manager files
  • Change your configuration and deployment tools, so changes to the old queue manager are copied to the new queue manager as well.
  • Configure all applications that use bindings mode, to add the new queue manager to the options. Restart these applications so they pick up the new configuration
  • When you are ready use START CHINIT
  • Alter the original queue manager to be qmgr SCHINIT (MANUAL), so when you restart the queue manager it does not start the chinit, and so channels will not workload.
    • Note there is a strmqm -ns option. The doc says… This prevents any of the following processes from starting automatically when the queue manager starts:
    • The channel initiator
    • The command server
    • Listeners
    • Services
    • This parameter also runs the queue manager as if the CONNAUTH attribute is blank, regardless of its current value. This allows unauthenticated access to the queue manager for locally bound applications; client applications cannot connect because there are no listeners. Administrative changes must be made by using runmqsc because the command server is not running.
    • But you may not want to run unauthenticated.
  • Stop the original queue manager, after a short time, all applications should disconnect, and reconnect to the new queue manager.
  • Shut down the old queue manager, and restart it. With SCHINIT (MANUAL) it should get no channels running. Stop any listeners. If you have problems you can issue START CHINIT and START LSTR. After a day shut down the queue manager and leave it down – in case of emergency you can just restart it.
  • After you have run successfully for a period you can delete the old queue manager.
  • Remove it from any clusters before deleting it. The cluster repository will remember the queue manager and queues for a long period, then eventually delete them.
  • Make the latest version of MQ the primary installation, and delete the old version
  • Update the documentation
  • Update your procedures – eg configuration automation

As I said at the beginning – an in-place migration looks much easier to do.

Should I use backup and restore of my mid-range queue manager?

In several places the MQ Knowledge centre mentions backing up your queue manager, for example if case of problems when migrating.

I could not find an emojo showing a worried wizard, so let me explain my concerns so you can make an informed decision about using it.

Firstly some obvious statements

  • You take a backup so you can restore it at a later date to the same state
  • When you do a restore in-place you overwrite what was there before
  • The result of a restore should be the same as when you did the backup

See really obvious – but you need to think through the consequences of these.

Creating duplicate messages

Imaging there is a message on the queue saying “transfer 1 million pounds to Colin Paice”. This gets backed up. The messages gets processed, and I am rich!

You restore from the backup – and this message reappears – so unless the applications are smart and can detect a duplicate message – I will get even richer!

Losing messages

An application queues was empty when it was backed up. A message is put to the queue “Colin Paice pays you 1 million pounds”. Before this message gets processed the system is restored – resetting the queue to when it was backed up – so the message disappears and you do not get your money.

Status information gets out of step

The queue manager holds information in queues. For example each channel has information about the sequence number of the message flow. If this gets out of sync, then you have to manually resync them.

If you restore the SYSTEM.CHANNEL.SYNCQ from last week – it will have the values from last week. If you restore this data, the channels will fail to start because the sequence numbers do not match, and you need to use the reset channel command.

If you really want to do backup and restore…

Before you back up..

  • If this is a full repository, “just” move it to another queue manager.
  • Stop receiver channels, so work stops flowing into the queue manager
  • Set all application input queues to put disabled, to stop applications from putting to the queues.
  • Let the applications drain all application queues (and send the replies back)
  • Make sure all queues, such as Dead Letter Queue, and Event Queues have been processed and the queues are empty.
  • Make sure all transmission queues are empty, including SYSTEM.CLUSTER.TRANSMIT.QUEUE and any Split Cluster Transmit queues.
  • Shut down the queue manager, letting all work end cleanly.
  • Backup the queue manager files
  • Make a record of every configuration change you make, such as alter qlocal.

If you need to restore.. you need to empty the queue manager before you overwrite it.

  • Make sure there are no threads in doubt.
  • Make sure all transmission queues are empty
  • Have applications process all of the application messages, or offload the messages
  • Shut down MQ
  • Restore from your backup.
  • Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
  • You may need to refresh cluster, so that you get the latest definitions sent the machine, and the information on local objects is propagated back to the full repository.
  • If you offloaded application messages, restore them
  • Reapply any changes, such as alter QL…
  • You need to be careful about applications connecting to your queue manager it is ready to do work. You might want to use strtmqm -ns to start in restricted mode.

It is dangerous if you restore the queue manager in a different place

You need to be careful if you restore a queue manager to a different place.

If you restore it, and start it, then channels are likely to start, and messages flow. For example it will contact the full repository, and send information about the objects in the newly restored queue manager. The full repository will get confused as you have two queue managers with the same name, and same unique name sending information. It is difficult to resolve this once it has occurred. People have been known to do this when testing their disaster recovery procedures.