The lows (and occasional high) of managing MQ centrally.

While I was still at IBM, and since I retired from IBM I have been curious how people managed MQ in their enterprise systems.

How do you deploy a change to a queue to 1000 queue managers, safely, accurately, by an authorised person, and by the way one queue manager was down when you tried to make the change?
Are theses identical systems identical – or has someone gone in and made an emergency change on one system and left one parameter different?
We have all of these naming standards – do we follow them? Did we specify encryption on all external channels?

At the bottom of this blog (more like a long essay) I show some very short Python scripts which

compare queue definitions and show you the differences between them.
check when queue attributes do not meet “corporate standards”
printing of data from the change-events queue, so you can see what people altered.
I also have scripts which display PCF data from events, stats etc. I need to clean them up, then I’ll publish them.

I think Python scripting will make systems management so much easier.

Strategic tools do not seem to deliver.

There seem to be many “strategic tools” to help you. These include Chef, Puppet, Ansible, and Salt which are meant to help you deploy to your enterprise

There is a lot of comparison documents on the web – some observations in no particular order

Chef and Puppet have an agent on each machine and seem complex to initially set up
Ansible does not use agents – it uses SSH command to access each machine
Some tools expect deployers to understand and configure in Ruby (so moving the complexity from knowing MQ to Ruby), others use YAML – a simple format.

This seems to be a reasonable comparison.

Stepping back from using these tools I did some work to investigate how I would build a deployment system from standard tools. I have not done it yet, but I thought I would document the journey so far.

Some systems management requirements

What I expect to be able to do in an enterprise MQ environment.

I have a team of MQ administrators. All have read only access to all queue managers. Some can only update test, some can update test and production.
I want to be able to easily add and remove people from role based groups, and not wait a month for someone to press a button to give them authority.
I want to save a copy of the object before, and after a change – for audit trail and Disaster Recovery.
The process needs to handle the case when a change does not work because, the queue manager is down, or the object is in use.
I want to be able to deploy a new MQ server – and have all of the objects created according to a template for that application.
I want to check enforce standards eg names, and values (do you really need a max queue depth of 999 999 999, and why is curdepth 999 999?).
I want to be able to process the event data and stats data produced by MQ and put them in SPLUNK or other tool.
There are MQ object within the queue manager, and other objects such as CCDT tables for clients, and keystores TLS keys. I need to get these to wherever they are used.
I want to report statistics on MQ in my enterprise tool – so I need to get the statistics data from each machine to the central reporting tool
I want Test to look like Production (and use the same processes) so we avoid the problem of not testing what was deployed.

Areas I looked at

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

This may be fine for a test environment, but once deployed I still want to be able to change object attributes on a subset of the queue managers. I don’t think Docker solves this problem (it is hard to tell from the documentation).
I could not see how to set up the Client Channel Definition Tables (CCDT) so my client applications can connect to the appropriate production queue manager.
If I define my queues using clustering, when I add a new queue manager, the objects will be added to the repository cache. When I remove a queue manager from the cluster, and delete the container, the object live on in the cache for many days. This does not feel clean.
I wondered if this was the right environment (virtualised) for running my production performance critical workload on. I could not easily find any reports on this.
How do I manage licenses for these machines and make sure we have enough licenses, and we are not illegally using the same licence for all machines.

Using RUNMQSC

At first running runmqsc locally seemed to be answer to many of the problems.

I could use secure FTP to get my configuration files down to the machine, logon to the machine, pipe the file into runmqsc, capture the output and ftp the file back to my central machine.

Having all the MQ administrators with a userid on each machine can be done using LDAP groups etc. so that works ok
To use a userid and password you specify runmqsc -u user_id qm. This then prompts for the password. If you are pipe your commands in, then you have to put your password as the first line of the piped input. This is not good practice, and I could not see a way of doing it without putting the password in the file in front of the definitions. (Perhaps a Linux guru can tell me)

Having to ftp the files to and from the machine was not very elegant, so I tried using runmqsc as a client (the -c option). At first this seemed to work, then I tried making it secure, and use an SSL channel. I could only get this to work when it used a channel with the same name as the queue manager name. (So to use queue manager QMB I needed an SSL channel called QMB. The documentation says you cannot use MQSERVER environment variable to set up an SSL channel). On my queue manager QMB channel was already in use. I redefined my channel and got this to work.

As you may expect, I fell over the CHLAUTH rules, but with help from some conference charts written by Morag, I got the CHLAUTH rules defined, so that I could allow people with the correct certificate to use the channel. I could then give the channel a userid with the correct authority for change or read access.

I had a little ponder on this, and thought that a more secure way would be to use SSL AND have a userid and password. If someone copied my keystore they would still need the password to connect to MQ, and so I use two factor authentication.

This is an OK is solution, but it does not go far enough. It is tough trying to parse the output from runmqsc (which is why PCF was invented).

Using mqscx

Someone told me about mqscx from mqgem software. If runmqsc is the kinder garden version , mqscx is the adult one. It does so much more and is well worth a look.

See here for a video and here for the web site.

How does it do against my list of requirements?

I can enter userid and password on the command line and also pipe a file in ✔
One column output ( if required) so I can write the output to a file, and it is then very easy to parse ✔
I can use ssl channels ✔
I can use my Client Channel Definition Table (CCDT) ✔

It also has colour, multi column, better use of screen area ( you can display much more on your screen) and its own scripting language.

You can get a try before you buy license.

I moved onto Python and runmqsc…

so I could try to do useful things with it.

Using runmqsc under python does not work very well.

I have found Python is a very good tool for systems management – see below for what I have done with it.

I tried using Python “subprocess” so I could write data down the stdin pipe, into runmqsc and capture the output from the data writen to stdout. This did not work. I think the runmqsc output is written to stdout, but not flushed, so the program waiting for the data does not get it, and you get a deadlock.
I tried using Python “pexpect”, but this did not work as I could send one command to stdin, but then stdin was closed, and I could not send more data.
Another challenge was parsing the output of runmqsc. After a couple of hours I managed to create a regular expression which parsed most of the output, but there were a few edge cases which needed more work, and I gave up on this.
PCF on its own is difficult to use.
I came across PyMqi – MQ for Python. This was great, I could issue PCF commands, and get responses back – and I can process event queues and statistics queues!

From this I think using PyMqi is great! My next blog post will describe some of the amazing things you can do in Python with only a few lines of code!

How do I change SSLCIPH on a channel?

Regular readers of my blog know that most of the topics I write on appear simple, but have hidden depth, this topic is no exception.

The simple answer is

For the client ALTER CHL(xxxx) CHLTYPE(CLNTCONN) SSLCIPH(new value)
For the svrconn
- ALTER CHL(xxxx) CHLTYPE(SVRCONN) SSLCIPH(new value)
- REFRESH SECURITY

The complexity occurs when you have many clients trying to use to the channel, and you cannot change them all at the same time (imagine trying to change 1000 of them – when half of them are not under your control). For the clients that have not changed, you will get message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘…’.

in the /qmgrs/xxxx/errors/AMQERR01.LOG

For this problem the CCDT is your friend. See my blog post here.

I have a client channel CHANNEL(C1) CHLTYPE(CLNTCONN)

On my CCDT queue manager I created another channel the same as the one I want to update.

DEF CHANNEL(C2) CHLTYPE(CLNTCONN) LIKE(C1)

On my server queue manager I used

DEF CHANNEL(C2) CHLTYPE(SVRCONN) LIKE(C1)
DEFINE CHLAUTH(C2) TYPE(BLOCKUSER)
USERLIST(….)
REFRESH SECURITY

When I ran my sample connect program, it connected using C1 as before.

On the MQ Server, I changed the SSLCIPH to the new value for C1.

When I ran my sample connect program it connected using channel(C2). In the AMQERR01.LOG I had the message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘C1′

So the changed channel did not connect, but the second channel with the old cipher spec worked succesfully. (The use of the backup channel was transparent to the application)

I then changed DEF CHANNEL(C1) CHLTYPE(CLNTCONN) so SSLCIPH had the correct, matching value. When my sample program was run, it connected using channel C1 as expected.

Once I have changed all my channels, and get no errors in the error log.

I can change the CHLAUTH(C2) BLOCKUSER(*) and either set warning, or give no warning and no access
Remove C2 from the CCDT queue manager, so applications no longer get this in their CCDT
Finally delete the channel C2 on the server.
Go down the pub to celebrate a successful upgrade!

Should I do in-place or side by side migration of MQ mid-range?

With mid-range MQ there are a couple of migration options:

Upgrade the queue manager in place – if there are problems, restore from backup, and sort out the problems this restore may cause. You may want to do this is you have just the one queue manager.
Upgrade the queue manager in place – if there are problems, leave it down until any problems can be resolved. This assumes that you are a good enterprise user and have other queue managers available to process the work.
Create another queue manager, “next to it” (“side by side”) on the same operating system image. A better description might be “adding a new queue manager to our environment on an existing box, and deleting an old one at a later date” rather than “side by side migration”. You may already have a document to do this.

What do you need to do for in-place migration.

Backup your queue manager see a discussion here
Shut down the queue manager, letting all work end cleanly
Either (see here)
- Delete the previous version of MQ, and install the new version, or better..
- Use Multi-install – so you have old and new versions available at the same time
Switch to the new version (of the multi-install)
Restart the queue manager
Let work flow
Make note of any changes you make to the configuration – for example alter qlocal… in case you need to restore from a backout, and re-apply the changes.

If you need to backout the migration and restore from the backout

You need to

Make sure there are no threads in doubt
Make sure all transmission queues are empty (so you do not overwrite messages when you restore from a backup)
Make sure all transmission queues are empty ( so you do not overwrite messages when you restore from a backup)
Offload messages from application queues – if you are lucky there will be no messages. Do not offload messages from the system queues.
Shut down MQ
Reset the MQ installation to the older version
Restore from your backup see here
Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
Load messages back onto the application queues
Reapply any changes, such as alter QL…

In the situation where you have a problem, personally I think it would be easier to leave the queue manager down, rather than trying to restore it from a backup. You may want to offload any application messages first. Of course this is much easier if you have configured multiple queue managers, and leaving one queue manager shut down should not cause problems. Until any problems are fixed you cannot proceed with migrating other queue managers, and you may have the risk of lower availability because there is one server less.

What you need to do for side by side migration.

“Side by side” migration requires a new queue manager to be created, and work moved to the new queue manager

If this is a cluster repository, you need to move it to another queue manager if only temporarily (otherwise you will get a new repository)
You need a new queue manager name
You need a new port number
Create the queue manager
You may want to alter qmgr SCHINIT (MANUAL) during the configuration so that you do not get client applications trying to connect to your new queue manager before you are ready.
You need to backup all application object definitions, chlauths etc and reapply them to the new queue manager. Do not copy and restore the channels
Apply these application objects to the new queue manager
List the channels on the old system
Create new channels – for example cluster receiver, with CONNNAME will need the updated port, and a new name
You should be able to reuse any sender channels unchanged
If you are using CCDT
- Define new client SVRCONN names (as a CCDT needs unique channel names)
- On the the queue manager which creates the CCDT, create new Client CLNTCONN channels. The queue manager needs unique names
- Send the updated CCDT to applications which use this queue managers, so they can use the new queue manager. Note: From IBM MQ Version 9.0, the CCDT can be hosted in a central location that is accessible through a URI, removing the need to individually update the CCDT for each deployed client. See here
- If you are using clustered queues, then cluster queues will be propagated automatically to the repository and to interested queue managers
- If you are not using clustering, you will need to create sender/receiver channels, and create the same on the queue managers they attach to
Update automation to take notice of the queue managers
Change monitoring to include this queue manager
Change your backup procedures to back up the new queue manager files
Change your configuration and deployment tools, so changes to the old queue manager are copied to the new queue manager as well.
Configure all applications that use bindings mode, to add the new queue manager to the options. Restart these applications so they pick up the new configuration
When you are ready use START CHINIT
Alter the original queue manager to be qmgr SCHINIT (MANUAL), so when you restart the queue manager it does not start the chinit, and so channels will not workload.
- Note there is a strmqm -ns option. The doc says… This prevents any of the following processes from starting automatically when the queue manager starts:
- The channel initiator
- The command server
- Listeners
- Services
- This parameter also runs the queue manager as if the CONNAUTH attribute is blank, regardless of its current value. This allows unauthenticated access to the queue manager for locally bound applications; client applications cannot connect because there are no listeners. Administrative changes must be made by using runmqsc because the command server is not running.
- But you may not want to run unauthenticated.
Stop the original queue manager, after a short time, all applications should disconnect, and reconnect to the new queue manager.
Shut down the old queue manager, and restart it. With SCHINIT (MANUAL) it should get no channels running. Stop any listeners. If you have problems you can issue START CHINIT and START LSTR. After a day shut down the queue manager and leave it down – in case of emergency you can just restart it.
After you have run successfully for a period you can delete the old queue manager.
Remove it from any clusters before deleting it. The cluster repository will remember the queue manager and queues for a long period, then eventually delete them.
Make the latest version of MQ the primary installation, and delete the old version
Update the documentation
Update your procedures – eg configuration automation

As I said at the beginning – an in-place migration looks much easier to do.

Should I use backup and restore of my mid-range queue manager?

In several places the MQ Knowledge centre mentions backing up your queue manager, for example if case of problems when migrating.

I could not find an emojo showing a worried wizard, so let me explain my concerns so you can make an informed decision about using it.

Firstly some obvious statements

You take a backup so you can restore it at a later date to the same state
When you do a restore in-place you overwrite what was there before
The result of a restore should be the same as when you did the backup

See really obvious – but you need to think through the consequences of these.

Creating duplicate messages

Imaging there is a message on the queue saying “transfer 1 million pounds to Colin Paice”. This gets backed up. The messages gets processed, and I am rich!

You restore from the backup – and this message reappears – so unless the applications are smart and can detect a duplicate message – I will get even richer!

Losing messages

An application queues was empty when it was backed up. A message is put to the queue “Colin Paice pays you 1 million pounds”. Before this message gets processed the system is restored – resetting the queue to when it was backed up – so the message disappears and you do not get your money.

Status information gets out of step

The queue manager holds information in queues. For example each channel has information about the sequence number of the message flow. If this gets out of sync, then you have to manually resync them.

If you restore the SYSTEM.CHANNEL.SYNCQ from last week – it will have the values from last week. If you restore this data, the channels will fail to start because the sequence numbers do not match, and you need to use the reset channel command.

If you really want to do backup and restore…

Before you back up..

If this is a full repository, “just” move it to another queue manager.
Stop receiver channels, so work stops flowing into the queue manager
Set all application input queues to put disabled, to stop applications from putting to the queues.
Let the applications drain all application queues (and send the replies back)
Make sure all queues, such as Dead Letter Queue, and Event Queues have been processed and the queues are empty.
Make sure all transmission queues are empty, including SYSTEM.CLUSTER.TRANSMIT.QUEUE and any Split Cluster Transmit queues.
Shut down the queue manager, letting all work end cleanly.
Backup the queue manager files
Make a record of every configuration change you make, such as alter qlocal.

If you need to restore.. you need to empty the queue manager before you overwrite it.

Make sure there are no threads in doubt.
Make sure all transmission queues are empty
Have applications process all of the application messages, or offload the messages
Shut down MQ
Restore from your backup.
Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
You may need to refresh cluster, so that you get the latest definitions sent the machine, and the information on local objects is propagated back to the full repository.
If you offloaded application messages, restore them
Reapply any changes, such as alter QL…
You need to be careful about applications connecting to your queue manager it is ready to do work. You might want to use strtmqm -ns to start in restricted mode.

It is dangerous if you restore the queue manager in a different place

You need to be careful if you restore a queue manager to a different place.

If you restore it, and start it, then channels are likely to start, and messages flow. For example it will contact the full repository, and send information about the objects in the newly restored queue manager. The full repository will get confused as you have two queue managers with the same name, and same unique name sending information. It is difficult to resolve this once it has occurred. People have been known to do this when testing their disaster recovery procedures – and thus causing a major problem!

One more point…

As wpkf pointed out.

You can start the QM without starting the channel. strmqm -ns prevents any of the following processes from starting automatically when the queue manager starts: the channel initiator, the command server, listeners, and services. All connection authentication configuration is suppressed.

Are your client connections not configured for optimum high availability?.

I would expect the answer for most people is – no, they are not configured for optimum high availability.

In researching my previous blog post on which queue manager to connect to, I found that the the default options for CLNTWGHT and AFFINITY may not be the best. They were set up to provide consistency from a previous release. The documentation was missing words “once you have migrated then consider changing these options”. As they are hard to understand, I expect most people have not changed the options.

The defaults are

CLNTWGHT(0)
AFFINITY(PREFERRED)

I did some testing and found some bits were good, predicable, and gave me High Availability other bits did not.

My recommendations for high availability and consistency are the complete opposite of the defaults:

use CLNTWGHT values > 0, with a value which would give you the appropriate load balancing
use AFFINITY(NONE)

There are several combination of settings

all clients use AFFINITY(NONE) – was reliable
- CLNTWGHT > 0 this was reliable, and gave good load balancing
- CLNTWGHT being >= 0 was reliable and did not give good load balancing
all clients use AFFINITY(PREFERRED) – was consistent, and not behave as I read the documentation
a mixture of clients with AFFINITY PREFERRED and NONE. This gave me weird, inconsistent behavior.

So as I said above my recommendations for high availability are

use CLNTWGHT values > 0, and with a value which would give you the appropriate load balancing.
use AFFINITY(NONE).

My set up

I had three queue managers set up on my machine QMA,QMB,QMC.
I used channels
QMACLIENT for queue manager QMA,
QMBCLIENT for queue manager QMB,
QMCCLIENT for queue manager QMC.

The channels all had QMNAME(GROUPX)

A CCDT was used

A batch C program does (MQCONN to QMNAME *GROUPX, MQINQ for queue manager name, MQDISC) repeatedly.
After 100 iterations it prints out how many times each queue manager was used.

AFFINITY(NONE) and clntwght > 0 for all channels

QMACLIENT CLNTWGHT(50), chosen 50 % on average
QMBCLIENT CLNTWGHT(20), chosen 20 % on average
QMCCLIENT CLNYWGHT(30), chosen 30 % on average.

On average the number of times a queue manager was used, was the same as channel_weight/sum(weights).
For QMACLIENT this was 50 /(50+20+30) = 50 / 100 = 50%. This matches the chosen 50% of the time as seen above.
I shut down queue manager QMC, and reran the test and got

QMACLIENT CLNTWGHT(50), chosen 71 % on average
QMBCLIENT CLNTWGHT(20), chosen 28 % on average
QMCCLIENT CLNYWGHT(30) not selected.
For QMACLIENT the weighting is 50/ (50 + 20) = 71%. So this works as expected.

AFFINITY(NONE) for all queue manager and clntwght >= 0

The documentation in the knowledge centre says any channels with CLNTWGHT=0 are considered first, and they are processed in alphabetical order. If none of these channel is available then the channel is select as in the CLNTWGHT(>0) case above.

QMACLIENT CLNTWGHT(50) not chosen
QMBCLIENT CLNTWGHT(0) % times chosen 100%
QMCCLIENT CLNYWGHT(30) not chosen

This shows that the CLNTWGHT(0) was the only one selected.
When CLNTWGHT for QMACLIENT was set to 0, (so both QMACLIENT and QMBCLIENT had CLNTWGHT(0) ), all the connections went to QMA – as expected, because of the alphabetical order.

If QMA was shut down, all the connections went to QMB. Again expected behavior.

With

QMACLIENT CLNTWGHT(0)
QMBCLIENT CLNTWGHT(20)
QMCCLIENT CLNYWGHT(30)

and QMA shut down, the connections were in the ratio of 20:30 as expected.

Summary: If you want all connections (from all machines) to go the same queue manager, then you can do this by setting CLNTWGHT to 0.

I do not think this is a good idea, and suggest that all CLNTWGHT values > 0 to give workload balancing.

Using AFFINITY(PREFERRED)

The documentation for AFFINITY(PREFERRED) is not clear.
For AFFINITY(NONE) it takes the list of clients with CLNTWGHT(0), sorts the list by channel name, and then goes through this list till it can successfully connect. If this fails, then it picks a channel at random depending on the clntwghts.

My interpretation of how PREFERRED works is

it builds a list of CLNTWGHT(0) sorted alphabetically,
then creates another list of the other channels selected at random with a bias of the CLNTWGHT and keeps that list for the duration of the program (or until the CCDT is changed).
Any threads within the process will use the same list.
For an application doing MQCONN, MQDISC and MQCONN it will access the same list.
With the client channels defined above, for different machines, or different applications instances you may get a list when CLNTWGHT >0 .

For example on different machines, or different application instances the lists may be:

QMACLIENT, QMBCLIENT, QMCCLIENT
QMBCLIENT, QMACLIENT, QMCCLIENT
QMACLIENT, QMBCLIENT, QMCCLIENT (same as the first one)
QMCCLIENT,QMACLIENT,QMCCLIENT

I’ll ignore the CLNTWGHT(0) as these would be at the front of the list in alphabetical order.

With

QMACLIENT CLNTWGHT(50) AFFINITY(PREFERRED)
QMBCLIENT CLNTWGHT(20) AFFINITY(PREFERRED)
QMCCLIENT CLNYWGHT(30) AFFINITY(PREFERRED)

According to the documentation, if I run my program I would expect 100% of the connections to one queue manager. This is what happened.

If I ran the job many times, I would expect the queue managers to be selected according to the CLNTWGHT.

I ran my program 10 times and in different terminal windows., and each time QMC got 100% of the connections. This was not what was expected!

I changed the QMBCLIENT CLNTWGHT from 20 to 10 and reran my program, and now all of my connections went to QMA!

With the QMBCLIENT CLNTWGHT 18 all the connections went to QMA, with QMBCLIENT CLNTWGHT 19 all the connections went to QMC.

This was totally unexpected behavior and not consistent with the documentation.

I would not use AFFINITY(PREFERRED) because it is unreliable and unpredictable. If you want to connect to the same queue manager specify the channel name in the MQCD and use mqcno.Options = MQCNO_USE_CD_SELECTION.

Having a mix of AFFINITY PREFERRED and NONE

With

QMACLIENT CLNTWGHT(50) AFFINITY(PREFERRED)
QMBCLIENT CLNTWGHT(20) AFFINITY(NONE)
QMCCLIENT CLNTWGHT(30) AFFINITY(NONE)

All of the connections went to QMA.

With

QMACLIENT CLNTWGHT(50) AFFINITY(NONE)
QMBCLIENT CLNTWGHT(20) AFFINITY(PREFERRED)
QMCCLIENT CLNTWGHT(30) AFFINITY(NONE)

there was a spread of connections as if the PREFERRED was ignored.

When I tried to validate the results – I got different results. (It may be something to do with the first or last object altered or defined).

Summary: Having a mix of AFFINITY with values NONE and PREFERRED, it is hard to be able to predict what will happen, so this situation should be avoided.

How do I know which queue manager to connect to ?

Question: How difficult can it be to decide which queue manager to connect to?

Answer: For the easy, it is easy, for the hard it is hard.
I would not be surprised to find that in many applications the MQCONN(x) are not coded properly!

That is a typical question and answer from me – but let me go into more detail so you understand what I am talking about.
If you have only one queue manager then it is easy to know which queue manager to connect to – it is the one and only queue manager.
If you have more than one – it gets more complex. If your application has just committed a funds transfer request and the connection is broken

you may just decide to connect to any available queue manager, and ignore a possibly partial funds transfer request
or you might wait for a period trying to connect to the same queue manager, and then give up, and connect to another, and later worry about any orphaned message on the queue.

You now see why the easy scenario is easy, and for the hard one, you need to do some hard thinking and some programming to get the optimum response.

There is an additional complexity that when you connect to the same instance – it may be a highly available queue manager, and it may have restarted somewhere else. For the purposes of this blog post I’ll ignore this, and treat it as the same logical queue manager.

I had lots of help from Morag who helped me understand this topic, and gave me the sample code.

You have only one queue manager.

This is easy, you issue an MQCONN for the queue manager. If the connect is not successful, the program waits for a while and then retries. See – I said it was easy.

You have more than one queue manager, and getting a reply back is not important.

For example, you are using non persistent messages.

Your application can decide which queue manager it tries to connect to, or you can exploit queue manager groups in the CCDT.

On queue manager QMA you can define a client channels for it and also for queue manager QMB

DEF CHL(QMA) CHLTYPE(CLNTCONN) QMNAME(GROUPX) 
   CONNAME(LINUXA) CLNTWGHT(50)…
DEF CHL(QMB) CHLTYPE(CLNTCONN) QMNAME(GROUPX) 
   CONNAME(LINUXB) CLNTWGHT(50)…
 
DEF CHL(QMA) CHLTYPE(SVRCONN) ...

On Unix these are automatically put into the /var/mqm/qmgrs/../@ipcc/AMQCLCHL.TAB file. This is a binary file, and can be FTPed to the client machines that need it.

You can use the environment variables MQCHLLIB to specify the directory where the table is located, and MQCHLTAB to specify the file name of the table (it defaults to AMQCLCHL). See here for more information.

FTP the files in binary to your client machine, for example into ~/mq/.
I did
export MQCHLLIB=/home/colinpaice/mq
export MQCHLTAB=AMQCLCHL.TAB
I then used the command
SET |grep MQ
to make sure those variables are set, and did not have MQSERVER set.

Sample MQCONN code (from Morag)….

MQLONG  CompCode, Reason;
MQHCONN hConn  = MQHC_UNUSABLE_HCONN;
char  * QMName = "*GROUPX";
MQCONN(QMName,
        &hConn,
        &CompCode,
        &Reason);
// and MQ will pick one of the two entries in the CCDT.

The application connected with queue manager name *GROUPX . Under the covers the MQ code found the channel connections with QMNAME of GROUPX and picked one to use. The “*” says do not check the name of the queue manager when you actually do the connect. If you omit the “*” you will get return code MQRC_Q_MGR_NAME_ERROR 2058 (080A in hex) because “GROUPX” did not match the queue manager name of “QMA” or “QMB”. I stopped QMA, and reconnected the application, and it connected to QMB as expected.

Common user error:When I tried connecting with queue manager name QMA, this failed with MQRC_Q_MGR_NAME_ERROR because there were no channel definitions with QMNAME value QMA. This was obvious once I had taken a trace, looked at the trace, and had a cup of tea and a biscuit, and remembering I had fallen over this before. So this may be the first thing to check if you get this return code.

Using channels defined with the same QMNAME, if your connection breaks, you reconnect with the same queue manager name “*GROUPX” and you connect to a queue manager if there is one available. You can specify extra options to bias which one gets selected. See CLNTWGHT and AFFINITY. See the bottom of this blog entry.

You can use MQINQ to get back the name of the queue manager you are actually connected to (so you can put it in your error messages).

//   Open the queue manager object to find out its name 
od.ObjectType = MQOT_Q_MGR; // open the queue manager object
MQOPEN(Hcon,                // connection handle            
    &od,                    // object descriptor for queue  
    MQOO_INQUIRE +          // open it for inquire          
    MQOO_FAIL_IF_QUIESCING, // but not if MQM stopping      
    &Hobj,                  // returned object handle       
    &OpenCode,              // MQOPEN completion code                  
    &Reason);               // reason code                  
 // report reason, if any   
if (Reason != MQRC_NONE)
    {
      printf("MQOPEN of qm object rc %d\n", Reason);
    .....
    }
//  Now do the actual INQ 
Selector = MQCA_Q_MGR_NAME;
MQINQ(Hcon,                 // connection handle  
      Hobj,                 // object handle for q manager  
      1,                    // inquire only one selector  
      &Selector,            // the selector to inquire 
      0,                    // no integer attributes are needed 
      NULL,                 // so no integer buffer 
      MQ_Q_MGR_NAME_LENGTH, // inquiring a q manager name 
      ActiveQMName,         // the buffer for the name
      &CompCode,            // MQINQ completion code    
      &Reason);             // reason code

printf("Queue manager in use %s\n",ActiveQMName);

You have more than one queue manager, and getting a reply back >is< important

Your application should have some logic to handle the case when your queue manager is running normally, there is a problem in the back end, and so you do not get your reply message within the expected time. Typical logic for when the MQGET times out is:

Produce an event saying “response not received”, to alert automation that there may be a problem somewhere in the back end
Produce an event saying “there is a piece of work that needs special processing – to manually redo or undo – update number…..”.
- At a later time a program can get the orphaned message and resolve it.
- You do not want an end user getting a message “The status of the funds transfer request to Colin Paice is … unknown” because the reply message is sitting unprocessed on the queue.
- Note: putting a message to a queue may not be possible as the application may not be connected to a queue manager.

When deciding to connect to any available queue manager, or connect to a specific queue manager, there are two key options in mqcno.Options field:

MQCNO_CD_FOR_OUTPUT_ONLY. This means, do not use any data in the passed in MQCD <as the field description says – use it for output only>, but pick a valid and available channel from the CCDT, and return the details.
MQCNO_USE_CD_SELECTION. This means, use the information in the MQCD to connect to the queue manager

Sample code (from Morag) showing MQCONNX

MQLONG  CompCode, Reason;
MQHCONN hConn  = MQHC_UNUSABLE_HCONN;
MQCNO   cno    = {MQCNO_DEFAULT}; 
MQCD    cd     = {MQCD_CLIENT_CONN_DEFAULT}; 
char  * QMName = "*GROUPX"; 
cno.Version       = MQCNO_VERSION_2; 
cno.ClientConnPtr = &cd; 
//  Main connection - choose freely from the CCDT
cno.Options       = MQCNO_CD_FOR_OUTPUT_ONLY;   
MQCONNX(QMName,
        &cno,
        &hConn,         
        &CompCode,
        &Reason); 
: : 
//  Oops, I really need to go back to the same connection to continue.
 
MQDISC(...); // without this you get queue manager name error

//  Using same MQCNO as earlier, it already has MQCD pointer set.

cno.Options       = MQCNO_USE_CD_SELECTION; 
MQCONNX(QMName,         
        &cno,         
        &hConn,
        &CompCode,
        &Reason);

Let me dig into a typical scenario to show the complexity

set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
MQCONN to any queue manager
MQPUT1 of a persistent message within syncpoint
set mqcno.Options = MQCNO_USE_CD_SELECTION, as you now want the application to connect to the same queue manager if there is a problem
MQCMIT. After this you want to connect to the specific queue manager you were using
MQGET with WAIT
MQCMIT
set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY. Because the application is not in a business unit of work it can connect to any queue manager.

The tricky bit is in the MQGET with WAIT. If your queue manager needs to be restarted you need to know how long this is likely to take. It may be 5 seconds, it may be 1 minute depending on the amount of work that needs to be recovered. (So make sure you know what this time is.)

Let’s say it typically takes 5 seconds between failure of the queue manager the application is connected to, and restart complete. You need some logic like

mqget with wait..
problem....
failure_time = time_now()
waitfor  = 5 seconds
mqcno.Options = MQCNO_USE_CD_SELECTION
loop:
   MQCONN to specific queue manager
   If this worked goto MQCONN_worked_OK
   try_time = time_now()
   If try_time - failure_time > waitfor + 1 second goto problem;
   sleep 1 second
go to loop:
MQCONN_worked_OK:
MQOPEN the reply to queue
Reissue the MQGET with wait

problem:
report problem to automation 
report special_processing_needed .... msgid...
mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
Go to start of program and connect to any queue manager

If you thought the discussion above was complex, it gets worse!

I had a long think about where to put the set mqcno.Options = MQCNO_USE_CD_SELECTION. My first thoughts were to put it after the first MQCMIT, but this may be wrong.

With the logic

MQCONN
MQPUT1
MQCMIT

If the MQCMIT fails, it could have failed going to the the queue manager so the commit request did not actually get to the queue manager, and the work was rolled back, or the commit could have worked, but the response did not get to your application.

The application should reconnect to the same queue manager, issue the MQGET WAIT. If the message arrives then the commit worked, if the MQGET times out, treat this as the MQGET WAIT timed out case (see above), and produce alerts. This is why I decided to put the set mqcno.Options = MQCNO_USE_CD_SELECTION before the commit. You could just as easily had logic which checks the return code of the MQCMIT and then set it.

A bit more detail on what is going on.

You can treat the MQCD as a black box object which you do not change, nor need to look into. I found it useful to see inside it. (So I could report problems with the channel name etc). The example below shows the fields displayed as an problem is introduced.

Before MQCONNX
set MQCNO_CD_FOR_OUTPUT_ONLY
pMQCD->ChannelName is '' - this is ignored
pMQCD->QMgrName is '' - this is ignored
QMName is '*GROUPX'. -this is needed
==MQCONNX had return code 0 0
pMQCD->ChannelName is 'QMBCLIENT'.
pMQCD->QMgrName is '';
QMName is '*GROUPX'.                                         
MQINQ QMGR queue manager name gave QMB                                                                                     Sleep
during the sleep endmqm -i QMB and strmqm QMB
After sleep
MQOPEN of queue manager object ended with reason code  MQ_CONNECTION_BROKEN = 2009.
Issue MQDISC, this ended with reason code 2009

set MQCNO_USE_CD_SELECTION
pMQCD->ChannelName is 'QMBCLIENT' - this is needed
pMQCD->QMgrName is '' 
QMName is '*GROUPX'.
MQCONNX return code 0 0
MQINQ queue manager name is QMB

Anything else on clients?

It is good practice to periodically have the clients disconnect and reconnect to do work load balancing. For example

You have two queue managers QMA and QMB. On Monday morning between 0800 and 1000 QMA is shut down for essential maintenance. All the clients connect to QMB. QMA is restarted at 1000 – but does no work, because all the clients are all connected to QMB. If your clients disconnect and reconnect then over time some will connect to QMA.

It is a good idea to have a spread of times before they disconnect, so if 100 clients connected at 0900, they disconnect and reconnect between 10pm and 3am to avoid all 100 disconnecting and reconnecting at the same time.

To get the spread of connections to the various queue managers, you need to use CLNTWGHT with a non zero value. If you omit CLNTWGHT, or specify a value of 0, then the channel chosen is the first alphabetically, in my case they would all go to QMA, and not to QMB.

I feel there is enough material on this for another blog post.

Migrating your queue manager in an enterprise.

Migrating an isolated queue manager is not too difficult and some of it is covered in the MQ Knowledge Center.

Since my first blog post on this topic, Ive had some comments asking for more detailed steps… so I’ve added a section at the bottom.

Below are some other things you need to think about when working in an enterprise when you have test and production, multiple queue managers per application (for HA or load balancing) and multiple applications. I am focusing on midrange MQ, and not z/OS though many of the topics are relevant to all platforms.

Consider the scenario where you have 4 queue managers
QMA and QMB supporting MOBILE and STATUS applications,
QMX and QMY supporting PAYROLL application.

You want a low risk approach, so you decide to upgrade the STATUS application first. This application uses QMA and QMB which are also used by business critical application MOBILE. This would be a high risk change.
It would be safer to to first migrate application PAYROLL on QMX and QMY.

Looking at QMX and QMY.
You could migrate both queue managers the same weekend – this would be least work, but has a risk that you do not have a good fall back plan if it does not work as expected.
You could migrate QMX this weekend, and QMY next weekend if there were no problems found.
If QMX has problems you can continue using QMY while you resolve problems. If QMX has problem, then if QMY has problems or is shut down you have an availability issue, so you may want to define a new environment with QMZ (and the web server etc – so not a trivial task).

As well as production QMX and QMY you have test systems: You need to plan to migrate and test these pre-production systems before considering migrating production. While the test and production levels of MQ are different, you may want to freeze making application changes, and factor this in the plan.

If you have a machine with one MQ level of code, and multiple queue managers on it, you cannot just migrate one queue manager, as you delete the MQ executables and install the new version. You can use multiple installed levels of MQ – but you may have to migrate to this before exploiting it. See Multiple Installations.

Clustering. Remember to migrate your full repositories first – you might want to consider creating dedicated queue managers for your repositories if this is a problem.

License: You will need licenses for the versions of MQ you use. The MQ command dspmqver gives you information about your existing installation. Some licenses entitle you to support from IBM, others are for development or trial use.

There are three stages to migrating applications.

Run the applications with no changes on the upgraded system. These should run successfully, but MQ may do more checks, for example some data is meant to be on a 4 byte boundary. MQ now polices this.
Recompile the applications to use the newer MQ libraries. Some application MQ control blocks may be larger, and this may uncover application coding problems. For example uninitialised storage.
Exploitation of new function. Do this once you have successfully migrated the existing queue managers.

Testing: You need to test the normal working application, plus error paths, such as messages being put to a Dead Letter Queue, and making sure this process works.

Update your infrastructure harness: You need to review what new messages your automation processes, and what actions to take.
You need to decide what additional statistics etc to use and what reports you want to product for capacity planning, health review and day to day running.

You have to worry about applications coming in to your queue manager. For example what levels of MQ are they using. They may need to be rebuilt with the newer libraries. The client code may need to up upgraded. You can use the DIS CHS(…) RVERSION to display the level of MQ client code. Of course your challenge will be to get people outside of you organization to update their code – especially when they say they have lost the source to the program.

MQ is rarely used in isolation. You may need to upgrade web servers to a newer level which support the new level of MQ.

You may need to upgrade the hardware and operating system.

Going down to the next level of detail.

Exits

You need to check any exits you have can support new functions and different levels of control blocks. For example there are shared connections, and the MQMD can change size from release to release.

If you cannot have one exit that supports all level of MQ. You’ll have to manage how you deploy the exit matching the queue manager level.

TLS and SSL setup

You need to review the TLS and SSL support. Newer levels of MQ removes support for weaker levels of TLS.
You need to review the end user certificates to make sure they are using supported levels of encryption.
You need to review the cipherspecs used by SSL channels, and upgrade them before you migrate the queue manager. (You could migrate to a newer version and see which channels fail to start, then fix them, but this is not so good).
As part of this cipherspec review you may wish to upgrade to strong cipher specs which use less CPU, or can be offload on z/OS.
You may have a problem sharing keystores, and make sure you include the keystore files in your backups. See APAR IT16295.

Building your applications

In some environments application developers compile programs on their own machines; in other environments, there is a process to generate applications on a central build machine. You will need to change the build environment to have the newer version header files, and change the build process to be able to use them.
You will need to set up a build environment so you can use the MQ V9 header files for just the application being migrated.
You many need to change your deploy tool so that the program compiled at MQ V9 is only deployed to TESTQMA, ( at MQ V9) and not to TESTQMB(still at MQ v7).
You need to change your deploy tool for test, pre-production and production.

Using the Client Channel Definition Table (CCDT)

Older clients must continue to use existing CCDT
Newer clients are able to understand older CCDTs.
For an application to use a newer version CCDT, you must update the MQ client.
So you need to be careful about moving the CCDT file around

System management applications

You may have home grown applications that are used to manage MQ. These need to be changed to support new object types( such as chlauth records and topics) and new fields on objects. You cannot rely on a field of interest being the 5th in the list as it was in MQ V5.

MQ Console (MQWEB)

If you are using the MQ Console server to provide a web browser or REST API to a queue manager, you may need to do extra work for this.

You have an instance of MQ Console to support MQ V9.0 and a different instance to support MQ 9.1

If you have multiple queue managers on a box, and plan to to use MQ Multiple Installation to migrate one queue manager at a time, then you will to consider the following

The box has QMA and QMB on it at MQ V9.0
These box use MQCONSOLE-XX with port 9090
Install MQ 9.1 on the same box.
Migrate QMA to 9.1
Create an MQCONSOLE-YY at MQ 9.1 with port 9191
Change your web browser URL and REST api apps to use port 9191
Wait for a week
Migrate QMB to 9.1
Migrate MQCONSOLE-XX to 9.1
Web browser URL and REST API url can continue using port 9090
Shutdown MQCONSOLE-YY
Undo any changes to change your web browser URL and REST api apps to use port 9191 and go back to port 9090

“The rest of the stuff”

I remember seeing a poster of child sitting on a potty with a caption saying “no job is complete until the paper work is complete”.

Someone said that doing the actual migration of a queue manager took 1 hour. Doing the paper work ; planning, change management, talking to user etc took two weeks per queue manager.

And yes, you do need to update your documentation!

Education

You need to talk to the teams around your organization. This is mainly applications – but other teams as well ( eg monitoring, networking)

Tell them what changes you will be making, the time scales etc..
There will be an application freeze during the migration.
The application teams will need to test their applications, and may need to make changes to them.
The application teams will get these new events/alerts which they need to handle.
You may learn about how they use MQ, and how this will affect your migration plans. (We used this unsupported program for which we have no source and no one knows how it works – which is critical to our business).
You may get a free trip to an exotic location to talk to the application teams (or you may get told to go to some hell hole)
You need to talk to people outside if your organization. The hard bit may be finding out who they are

Security

You need to protect any new libraries.
MQ may have new facilities such as topics which you need to develop and implement a security policy for. In V9 MQ midrange now publishes statistics to a topic.
Your tools for processing MQ security events, may need to be enhanced to handle new resource types or new events.

New messages and events

You need to review all new events or messages, and add automation to process them. You need to decide who gets notified, and what actions to take.

You need to review changed messages or alerts in case you are relying on “constant” information in particular place in the message, which has been changed.

Backups

People often dump the configuration of their queue managers every day, so they can use runmqsc to recreate the queues etc. You need to backup all objects including topics and chlauth records, and check you can recreated them in a queue manager.

Backup your mq libraries for queue manager and clients – or be able to redeploy them from your systems management software.

Performing the migration

This is documented in the Knowledge Centre. One path for migration involves deleting the old level of MQ and installing the new level of MQ. If you need to go back to the old level, you need to have a copy of the old level of MQ base + CSD level as you were running on!

Carefully check the documentation for the hops.

The Migration paths documentation says

You can migrate from V8.0 or later direct to the latest version.
To migrate from V7.0.1, you must first migrate to V8.0.
To migrate from V7.1 or V7.5, you must first migrate to V8.0 or V9.0.
You might have an extra step to go to MQ V9.1

I found some really old doc saying

“If you are still on MQ version 5.3, you should plan a 2 step migration: first migrate to MQ v7.0.1 then migrate to 7.1 or 7.5”. This could be a challenge as you can not get the MQ 7.0.1 or the MQ 7.1 product. One of the reasons for this two stage approach is that the layout of files changed, so you have to restart at MQ 7.0.1 to make these file system changes.

Finally…

If I have missed anything or got something wrong, please let me know and I’ll update the list

Checklist for implementation

Different stages

Pre-reqs
Education for team doing migration
Investigate – until you have done the investigation you cannot plan the work. For example how many exits are used, and how many need to be changed.
Plan. The first time you do something you may be slow. Successive times should be faster as you should know what you are doing.
Implement/Migrate
Exploit new features.

Before you start

People doing the work need access to systems
Need to draw up a schedule (but you may need to do the investigation work before you know how much work there is to do)
Appoint a team leader.
Determine what skills you need, eg TLS, application design, build
Which people do admin – which people handle code eg review exit programs
Reporting and status
Communication with other teams – we will be migrating in.. and you will be asked to do some work..
Extract configuration to common disk, so people do not need to access each queue manager.
External customers – provide one list of changes for them if possible. This is better than giving them multiple lists of changes, and will help them understand the size of their work.

Education

Ensure every one has basic knowledge
- MQ commands
- Unix commands
- TLS and security( and stop using SSL)
- Manage remote MQ from one site using remote runmqsc command or logon to each machine
- Efficient way of processing data
  - Use GREP on a file to find something, pipe it … sort it, do not find things by hand
How the project will be tracked

Areas for migration

TLS parameters and using stronger encryption
TLS certificate strength
Exits
Applications
Queue manager
Clients using the queue manager. A client may be able to connect to many queue managers.

Investigate SSL/TLS

Which TLS parameters are being used
Which ones are not supported in newer versions of MQ?
- SSLCIPH
- Need to worry about both ends of the connection
Identify “right” TLS parameters to use
- eg Strong encryption which can be offloaded on z/OS.
Will these cost more CPU? Is this a problem?
If TLS not being used – document this

Implement TLS

Need a plan to change any cipher specs which are out of support.
May need to make multiple changes across multiple queue managers at “same time” – coordinate different ends
Can be done before MQ migration.
Can be done AFTER MQ migration if you set a flag.
- May make implementation easier
- Still need a plan to change any which are out of support.
- Still may need to make multiple changes across multiple queue managers at “same time”

Investigate certificates

Investigate if certificates are using weak encryption
- Which certificates need to be changed? May need RACF/Security team to help report userids that need to change
Plan to roll out updated certificates
- Include checking external Business partners
Investigate any other changes in MQ configuration
Check changes to your TLS keystore in APAR UT16295.

How to check a certificate

/opt/mqm/java/jre64/jre/bin/ikeycmd -cert -details -db key.kdb -label …
A password is required to access the source key database.
Please enter a password:
Label: CLIENT
Key Size: 2048
Serial Number: ….
Issued by: CN=colinpaiceCA, O=Stromness Software Solutions, ST=Orkney, C=GB ? Check this is still valid
Subject: CN=colinpaice, C=GB
Valid: From: Thursday, 17 January 2019 18:22:45 o’clock GMT To: Sunday, 31 May 2020 19:22:45 o’clock BST ? Check ‘to’ date
Signature Algorithm: SHA256withRSA (1.2.840.113549.1.1.11) ? I think this needs to be SHA256withRSA
Trust Status: enabled

Implement certificate change

This can be done at any time before migrating a queue manager

Investigate exits

Find which exits are being used
- DIS CHL(*)… grep for EXIT
- dis qmgr grep for exit
Queue manager and clients
/var/mqm/qmgrs/QMNAME/qm.ini, channel definition (grep for EXIT)
Check exits at the correct level on all queue managers and clients. (change date,size)
May need emails to business partner.
Do exits need to be converted from 31 bit to 64 bit?
Locate exit source
Review source
Control blocks may be bigger
May have to support new functions, eg shared connections
Is function still needed?
Document exit usage

Implement exit changes

Recompile all exits and deploy to all platforms before you do any migration work – check no problems
Change and test exits
Need to change build tools to allow builds with new levels of header files etc, and roll out to selected queue managers
Should work on old and new releases
May need a V9.1 MQ to test exits on before migration
Can be deployed before MQ Migration ? Or do you have requirements for specific levels of exits.
Create documentation for exits

Investigate applications

External business partners as well as internal
Need to get named contact for each application
Check level of MQ client code
Check TLS options
Identify where connection info is stored (AMQCHLTAB)
What co-req products need to be updated
- Web servers
Is there test suite which includes error paths etc.
Identify build and deploy tools
Need capability to compile application using newer MQ header files, and deploy to one MQ

Implement application changes

Need to have change freeze during migration
Build project plan
Duration for testing
Which systems to be used for testing
Create process to update MQ client code
Make sure there is process to roll out changes in future
Need to allow buffer

Application recompile

Recompile programs using existing libraries and jar files -to make sure every think works before you migrate
Deploy and test
Change deploy process to use new versions of libraries
Recompile using newer versions of libraries
Deploy and test
- Any problems found need to be validated at previous levels, or have conditional statements around it
Once all queue managers upgraded
- Comment out code for compiling with previous libraries
- To prevent accidents
- In case of problems in production (before migration) needing a fix.

Investigate queue managers

Does the hardware need to be upgraded?
Are there any coreqs – eg multi instance or HA environments?
Any co-reqs eg upgrade web server, database?
Does the Operating System need to be upgraded
- For example MQ now 64 bit. Early versions were 31 bit
- Newer versions of Java
Identify which applications run on this queue manager
- Need plan for each application
Identify pre-reqs
- TLS
- Exits

Plan how you are going to update the Client Channel Definiton Table

If you migrate a queue manager, then its CCDT will be migrated to the newer level.
Clients cannot use a CCDT from a higher level queue manager.
If you migrate your clients to the latest level you will have no problems with the CCDT
If you migrate the CCDT owner queue manager first, you need to be careful about copying the CCDT to other machines, to prevent a mismatch.

Plan queue managers

Plan software and hardware upgrades
Identify order of queue manager migration
- Test, pre-prod, production
- Full repositories then partial
  - Consider setting up new QM just for full repository?
- Do one server, test applications, do other servers
- Need to worry about multi instance and HA queue. These need to be coordinated and done at the same time.
Check license for MQ
May need to migrate queue manager multiple times
- from MQ V5.3 to V7.x
- from 7.x to V9.0
- from 9.0 to 9.1
Clients first/later

Automation

Need to set up automation for new messages and new events

Backups etc

Make sure you have back up your queue managers (and other tools such as build configuration files before you make any changes).

Do the migration

Follow the MQ knowledge centre.

Flashes of insight into the MQ SMF data

To misquote Raymond Smullyan’s quote in What Is the Name of This Book?

If Professor A says it is obvious then a child of 10 can instantly see it
If Professor B says it is obvious then if you spend 10 minutes on it you will see it
If Professor C says it is obvious then you might spend the rest of your life and not see it
If Professor D says it is obvious – then it is false

I hope the “one liners” below are obvious if you spend 10 minutes on it

I was sent a report of key fields (Opened queue name, number of Mopens, MQgets, MQPUTS etc) and a list of questions

Why do I have records for queues which are not on my queue manager?

If the application put a message to a reply to queue then typically the MQOPEN uses the the remote queue name and remote queue manager name. This maps to another queue, such as the SYSTEM.CLUSTER.XMIT.QUEUE, or a remote queue name.

The name in the OPENNAME in the SMF record will be the remote queue name. If you look at the BASENAME as well, you will see the real queue it uses. So the queue in the OPENNAME may not exist on the system – but the BASENAME will exist.

I have puts but no gets for a queue

This could be caused by using a shared queue. Messages were put on one queue manager and got from another queue manager in the QSG. The SMF data from the other QMGR should have the gets.
The IMS bridge got the messages and sent them to IMS over the OTMA interface. This code does not record any SMF data.

This applies to gets but no puts.

I have 1 SMF record for this queue, but 1000 records for that queue

If you have a long running task then the SMF data is produced at a regular time – typically half an hour. This will produce one record. If you break out the data into a set of queue records you will have few records (typically more than 1 – one for the putting task (channel), and one for the getting server task
If you have short lived transactions such as CICS, or IMS MPP transactions then you will get an SMF record when the transaction ends. If you break out the data into a set of queue records you will have many records one for each transaction, and one for each the putting/getting task (channel)