What the docs don’t tell you about using Report option in your MQMD

You can set the MQMD.Report option of a message to MQRO_ACTIVITY, and get a trace of the activity sent to your reply to queue.  This is great, but not very usable.

For my MQPUT to a cluster queue I got activities

  • Sending Message Channel Agent getting the message
  • Sending Message Channel Agent sending it over channel CL.QMC
  • Receiving Message Channel Agent receiving the data from channel CL.QMC
  • Receiving Message Channel Agent putting the message to the queue on the remote system

Note, I did not get a response saying the back end application had actually processed the message.

So by setting one bit I got this extra information and can do things like calculate times and write the name of the channel used etc – great,  go out and celebrate!

The problem is that these activity trace messages come back to the reply to queue you specified in your request.  You do not want to have to process these messages in your application, as it will be an ugly program to process these PCF messages.

The best thing your application can do is put these messages somewhere else, by using  logic like

If MQMD.Format = ‘MQHEPCF’ then use MQPUT1 to put the message to “MY.ACTIVITY.TRACE.QUEUE”, and get the next message.
The activity trace replies should get to the queue before the back end applications reply, which should prevent having lost activity trace messages.

You can then use the IBM supplied  dspmqrte program, for example
dspmqrte  -i 0 -q MY.ACTIVITY.TRACE.QUEUE -m QMA -w1
to display the messages.   The Knowledge Centre is missing some documentation, for example that you specify -i 0 to get all the messages.  The KC says you have to specify the MSGID of the messages – which you will not know.

I tried to use the amqsact sample (which is meant to display activity trace messages), but it does not seem to recognize these activity trace messages as they are in MQEPCF format.

I took a copy of the amqsevt sample which processes PCF and made a few changes to support MQHEPCF format.  Im still working on how to make this available.

It is not a good idea to have this MQRO_ACTIVITY  set for all applications because of the extra overhead of the additional messages it introduces.  You could do it every 1000 messages, or if the milliseconds of the current time is .000  (so you would expect this on average once in 1000 messages).  You can then capture the data, and plot real time graphs of your MQ network, and where the delays are.
Good hunting (for delays and bottlenecks)





Can I modify the samples without the lawyers coming to get me?


The documentation says


This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.

Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows:

Portions of this code are derived from IBM Corp. Sample Programs.

© Copyright IBM Corp. 1993, 2019. All rights reserved.


Note – that this does not mean you can copy the code and put it up on GitHub – as GitHub is for open source, and the MQ samples are not open source.   Thanks to Matt Leming for this advice.

Every one should be made to suffer the chaos of their organization.

How difficult is to provide information which is
• clear
• provided at the right time
• accurate
• useful
From my recent experience this seems to be very difficult. Perhaps your organization is better – give it a try and see.

I went on holiday, involving two flights, and one of these was cancelled. I was amazed at how badly the airline handled this. It looked as if  this was the first time that they ever had a cancellation, and had no processes in place to handle this.   Talking to other people affected by this cancellation, the lack of clear process when there is a problem, seems very common.

We flew from Aberdeen to Manchester, and had a connecting flight from Manchester to Southampton. While we were in the air, the airline cancelled the second hop. We landed at Manchester and tried to get into the departures via the transit corridor to be told the flight was cancelled. I then got a text saying “your flight has been cancelled. Click on this link”.
The link took me to a page saying “re-book your flight”. There were no more flights that day, so it looked like we would have to stay over night at the airport.
Was the message clear ? No – it should have given us more information, told us that there was a coach being organised, where to collect your bags from, etc.
Was the message useful – no.

We went back through to arrivals and heard an announcement that people on the cancelled Southampton flight should go to the information desk.
We got to the desk and was told that a coach was provided, and our bags would be put on the coach.
We were taken to the coach, and before we got on, we checked to see if our bags were on the coach,  to be told we should have collected them first (and asked in an aggressive way,  why we had not collected them).
We had to go back to the terminal and we missed the coach. This is where the troubles really started.
We went to the airline’s service desk to be told that we had to go to the baggage handling agent’s office, next to the information desk.
• This was not accurate – there is no agent’s office.
• The information desk gave use the agents phone number. We rang it to be told that they handled missing bags, they could not help us because the plane had been cancelled, and so the bags were not lost. The bags were in the baggage transit area – and we should talk to the agent. The information the service desk told us was not useful nor correct.

When we mentioned that we had not heard any message about collecting bags, it turned out the message had been announced while we were in the air. So this message was not timely.
Eventually someone from the information desk took pity on us, went and found our bags – two hours later.

They airline had booked a hotel for us, and booked a taxi to take us there (“Out of the terminal, cross the road and use XYZ cabs”). The instructions for the taxi were totally wrong, we had to go to the taxi office close to the information office, to arrange a cab, and so it went on. I felt that the airline staff should have known this.

As the the airline frequently cancels planes, I would have expected to go to the service desk, or the information desk and to be given a piece of paper saying.

  • The service desk will try to book you on a later flight. If this is unsuccessful then…
  • If you were on a previous flight,  collect your bags from the following process….
  •  The service desk will arrange a hotel and a taxi to take you to and from the hotel.
    •   Once you have you bags, go to the xxxx office close to the information office on the ground floor. They will organise a car for you.
  • You can claim compensation, see this web site…

Maybe because I have been in a support role, I have high expectations.

How should an enterprise handle communications?

Make sure people have the right information.

When you have a major incident you need a well tested process. Once the incident has occurred, people will be dialing in to the incident hot line. You do not want to repeat the problem (One on’t cross beams gone owt askew on treddle)  twenty times – one for each new attendee. Having a blog post which people update with status is a good way of doing managing it.

You need evidence – not opinions.

It is easy to say “it must be an MQ problem because there are a lot of messages on the queue”. This usually indicates an application problem (the applications are not getting the messages fast enough), not an MQ problem. State the evidence “There are many messages on queue XYZ on queue manager ABC”. Then you can say “This can be caused by…”.   If you have no evidence, then you only have guesswork. You need to collect evidence.  I remember going out to work on a critsit on Message Broker, the problem description was “the problem is somewhere in there….”.   We turned on monitors and this showed that a SQL statement took over 10 seconds.  The table held temporary data for the duration of a transaction, and the data was never deleted.  Instead of an expected maximum of 100 rows, it had about 1 million rows.  Once we had the evidence – the root problem was obvious.

Solve the first problem first.

If there are messages accumulating on a queue at the front of the process, and also on a queue at the end of the process. They may be connected. Fix the first problem, and the second one may go away.

Be clear, and practice any actions.

With the problem with our bags, the airlines agent should have contacted the baggage handling agent, rather than us trying to talk to both ends. Decisions like switching workload, or going to a backup site need to be made by the right people. The MQ team cannot make this decision on their own.
At one site, the whole team practiced walked through problem scenarios in great detail. “We need to reconfigure the LPAR configuration; who has the userid and password for the hardware console?”. When a problem occurred, the person with this information was on a toilet and coffee break, and had left her phone on her desk. I was told that a man rushed out of the control room, burst into the ladies toilet calling out the ladies name! As a result they changed their process (you must always carry the duty phone – even in the toilets).

Make sure the messages for the end users are clear, and tell them what to do.
I had a message “you cannot do that operation at this time”. Did this mean – wait for 5 minutes because the server was down, or you cannot do the operation once you had checked in?

If you think the system will be available in 15 minutes, do not say “please retry”, but be more helpful “please try again in 30 minutes, we will update our twitter entry here… with status”.

I heard that the new CEO of a hire car company review their web site and looked at every page they expected the customer to use ( including the complaints and claiming compensation).  This resulted in major simplification of the pages, and providing much more useful information to their customers.
To test out your “error process as seen by an end user”, have a senior manager’s aged parents try out your processes and give you feedback on the chaos or smoothness of the experience!

Note.  One on’t cross beams gone owt askew on treddle is from Monty Python sketch, The Spanish Inquisition.

Beware unzipping a windows install file

I hit an interesting little problem trying to download and use the developer version of MQ V9.1 for windows.

I downloaded the zip file on my Linux box, and extracted it onto my memory stick.
I rebooted into Windows 10, and tried to install MQ from the memory stick.  I got various errors including error code 0x80004005, and “corrupt file” when trying to install Explorer.cab.  If you unzip it on Windows, and install it from the memory stick, it works!

Strange eh!

What data is there to help you manage your systems?

There is a lot of information provided by MQ to help you manage your systems, some of it is not well documented.

I’ll list the sources I know, and when they might be needed, but before that I’ll approach it from the “what do I want to do”.

What do I want to do?

I want to…

  • know when significant events happen in my system, such as channel start and stop, and security events. AMQERRnn.LOG
  • know when “events” happen, such as a queue filling up security exceptions.  Event Queues
  • be able to specify thresholds, such as when the current depth is > 10 messages, and the age of the oldest message is older than 5 seconds then do something. Display commands
  • be able draw graphs of basic metrics, such as number of messages put per hour/per day so I can do capacity planning, and look for potential capacity problems. Statistics
  • identify which queues are being used, display queue activity, number of puts, and size of puts etc  Statisticsdisplay object status
  • identify which queues (objects) are not being used, so they can be deleted.  Absence of records in Statistics.  Issue DIS QSTATUS every day and see if a message has been put to the queue.  Creating events for when a message is put to the queue.  Note an object may only be used once a year – so you need to monitor it all year.
  • identify which applications are putting to and getting from queues. Accounting
  • see what MQI verbs are being used, so we can educate developers on the corporate naming standards, and API usage.  Activity trace
  • display the topology.  Display commandsTrace routeActivity trace
  • trace where messages are going, so we can draw charts of the flow of message requests and their responses, and display the topology of what is actually being used.  Trace route
  • measure round trip times of messages – so I know if there are delays in the end to end picture. Trace route
  • Understand the impact of a problem “here” by seeing what flow through “here”.  What’s my topology

What sources are there?


These contain information about events in the queue manager, such as channel start and channel stop. These files are in /var/mqm/qmgrs/QMA/errors/… and can be read using an editor or browser.  People often feed these into tools like SPLUNK, and then you can filter and do queries to monitor for messages that have not been seen before.

Event queues.

MQ messages are put on queues like SYSTEM.ADMIN.QMGR.EVENT.

There is a sample amqsevt which can be used to print the message in text format or json format – or you can write your own program.

Creating events.

You can configure MQ to produce events when conditions occur. For important queues you can set a high threshold, and MQ produces an event when this limit is exceeded. You can use this

  • to see if messages are accumulating in a queue
  • to see if a queue is being used – set the queue high threshold to be 1, and you will get an event if a message is put to a queue


If you turn on statistics you information on the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.STATISTICS.QUEUE

The information is summarized by queue.

You can use

  1. the sample amqsevt to process these messages, you can have output in json format for input into other tools.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.


If you turn on accounting you information about what an application is doing. The the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.ACCOUNTING.QUEUE. The information is similar to the information provided by statistics, but it provided information about which application used the objects.

You can use

  1. the sample amqsevt to process these messages.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.

You can use display commands.

You can use commands, or use the MQINQ API to display information about object. You can issue commands using runmqsc or from an application by putting command requests in PCF format to a queue, and getting the data back in PCF format. Your program has to decode the PCF data.
You can display multiple fields and have logic  to take action if values are out of the usual range.   For example
periodically display the curdepth, and the age of the oldest message on the queue and then do processing based on these value. Tivoli uses this technique to creates situations if specified conditions are met. You can easily write your own programs to do this, for example using python scripts and pymqi.

What’s my topology?

You can use the DIS CHANNEL … CONNAME command to show where a channel connects to and use this to draw up a picture of your configuration.

You can use the DIS QCLUSTER and DIS CLUSQMGR to show information about your clusters, and where cluster queues are, and use this information to draw up a picture of your configuration

You can use the traceroute to dynamically see the routes between nodes, and understand the proportion of messages going to different destinations – at that moment in time.

Displaying object status

You can use display commands to show information such as the last time a message was put to a queue, or got from a queue, or sent over a channel.

Application trace

The application trace shows you the MQ API calls, the parameters, and return codes. This data goes to the SYSTEM.ADMIN.TRACE.ACTIVITY.QUEUE queue.

You can use this to check the API options being used, for example

  • Messages persistence is correct for the application pattern (inquiry is non persistence)
  • The correct message expiry is specified (non persistent has time value)
  • The correct options are specified
  • Applications are using MQ GET with wait rather than polling a queue
  • The correct syncpoint options are being used.
  • Which queue is really used. You open one queue name but this could be an alias. You get the queue it maps to.

There is an overhead to collecting this, so you do not want to run this for extended periods of time.

Running it for just a minute or two may give you enough information. You can turn this on for an individual program.

You can use amqsevt to process the queue.

Trace route.

You can send a message “to a queue” and get back the processes involved in getting to the queue.. For example use the dspmqrte to “put” a message to a cluster queue, and you will see the sending channel get the message and send it, then the receiver channel at the remote end receive the message and “putting” it to the queue. One of the data fields is the operation time, so you can see where the delays were in the processing (for example it took seconds to be sent over a channel).  See here

By default the message is not put to the queue, but there is an option to put it to the queue for the application to process, but there is no documentation to tell you how you process this message. The dspmqrte command effectively shows you the hops between queues. It is up to you to build up the true end to end path, and manage the responses yourself.

The provided programs dspmqrte are simplistic and show you the path to the queue, and the channels used on the queue.

The data is not pure PCF, and the sample amqsevt does not format it. I have modified it to handle this.

Where’s my network bottleneck? Try using traceroute

On Midrange MQ there a capability called trace route which allows you to see the path to a queue, and get information about the hops to the queue, for example how long was the message on the transmission queue before being sent.

If you have a problem in your MQ network. You can use dspmqrte in real time and see if there are any delays between end points.

In the blog post below, I’ll show an example of the information you can get, including where the time was spent during processing, and what you can use to automatically process the replies.

What is a trace route message?

A special (trace route) messages is sent to the specified queue (perhaps in a different queue manager), and tasks that process it en route, send information to a collection queue.

You can use the IBM supplied dspmqrte (display mq route) command. This sends the message and processes the responses.

For example; on QMA,  dspmrte put a message to a cluster queue CSERVER on QMC. During the processing of the trace route message, several messages were sent to the collection queue. Key data from the messages is displayed below.

Message 1 – processing done by dspmqrte

ApplName: dspmqrte
ActivityDesc: IBM MQ Display Route Application 

Operation: OperationType: Put
RemoteQName: CSERVER 
RemoteQMgrName: QMC 

We can see from this information that the application dspmqrte Put a message to the queue CSERVER which is queue CSERVER on QMC, and it goes via SYSTEM.CLUSTER.TRANSMIT.QUEUE

Message 2 – processing done by the sending channel

ApplName: amqrmppa
ActivityDesc: Sending Message Channel Agent

Operation: OperationType: Get
QMgrName: QMA 

Operation: OperationType: Send
QMgrName: QMA
RemoteQMgrName: QMC
ChannelName: CL.QMC
ChannelType: ClusSdr

We can see from this that the channel did two things (two operations) with the data

  1. Operation 1: ClusSdr channel CL.QMC, did a Get from SYSTEM.CLUSTER.TRANSMIT.QUEUE.
  2. Operation 2: Sent the message over the ClusSdr channel CL.QMC to queue manager QMC.

Message 3 – processing done by receiving channel

ApplName: amqrmppa
ActivityDesc: Receiving Message Channel Agent 

Operation: OperationType: Receive
QMgrName: QMC 
RemoteQMgrName: QMA 
ChannelName: CL.QMC
ChannelType: ClusRcvr

Operation: OperationType: Put
QMgrName: QMC 
ResolvedQName: CSERVER

We can see from this that the channel did two things with the data

  1. the ClusRcvr channel CL.QMC received a message from QMA,
  2. The channel put the message to CSERVER on this queue manager.

End to end path

There is a sequence

  1. dspmqrte put a message to queue CSERVER
    • this message was put on the SYSTEM.CLUSTER.TRANSMIT.QUEUE queue
  2. Cluster Sender Channel CL.QMC got the message and sent it over the network
  3. Cluster Receiver Channel CL.QMC received the message from the network and put it to the CSERVER queue.

There is an option to trace the return route which sometimes worked, but not consistently.

From the queue names used, and the resolved queue names, you can check the names of the queues being used. If you are using QALIAS, QREMOTE,  clustered queues, clustered QALIAS, or clustered QREMOTE you can see the true names of the objects being used, and draw a topology chart of what is actually being used (rather than what you think is being used)

Extending your applications to support trace route.

There is an option to pass the trace route message to the application processing the queue. I will write another blog post about doing this – it took me several days to get it to work. This allowed me to return data saying “colinsProg got the message from CSERVER and passed it on to NEXTHOP”. I could then build up a true picture of my application

Using the data in the messages

The returned messages have a lot of information, including OperationTime. The time is a character string with format HHMMSShh, where hh is hundredths of a second.

With my example message above

  1. dspmqrte put a message to queue CSERVER at 06:10:10.00
  2. Cluster Sender Channel CL.QMC got the message at 06:10:11.44 1.44 seconds later
  3. Cluster Sender Channel CL.QMC sent it over the network at 06:10:11.44, no time delta
  4. Cluster Receiver Channel CL.QMC received the message from the network, 06:10:11.44, no time delta
  5. Cluster Receiver Channel CL.QMC put it to the CSERVER queue. 06:10:11.44, no time delta

We can see from this that there was a slight delay (1.44 seconds) before the channel got the message. The rest of the processing was very fast. If I had a problem in my MQ network, I would look at why the sending end of the channel was slow to process the message.

Problems with traceroute

I had a few problems using trace route.

The messages which flow are not true PCF messages, and so the IBM sample amqsevt (which processes PCF messages) does not recognise them. As this is sample code I was able to change it to get it to work. I’ll send the changes to IBM and hope they incorporate them in the product.  I output the messages in json and then used python to process them.

If dspmqrte thinks it has already seen a reply for the request it can throw it away and not display it.   I had this problem when instrumenting my applications to provide the trace route information.

It would be good if dspmqrte displayed the time delta from the start of the request. I had to take the output and post process it to report where the delays are.

The time OperationTime is in hundreds of a second. This may be OK for most people as if you are looking for delays in your processing, a tenth of a second may be granular enough. I added the high resolution time (epoch time) to the data provided by my applications.

If my backend application was not active there was an operation of “OperationType:Discard” and Feedback: NotDelivered. This may be because the number of handles opened for input was zero. It was a surprise. I expected to get a response saying “message expired”

My non trivial application design is to send a message to a queue CSERVER which passes a request to the backend application (on QMZ) which sends a response  back to the originator. Dspmqrte does not support this. You can set up dspmqrte to display the route between QMA and QMC. You can use a client connected dspmqrte to send a message from QMC and QMZ and then build the end to end picture yourself.

I have made some progress in instrumenting my applications to do this, but I need more time, as the documentation is unclear, wrong in places, and missing bits. I’ll send my doc comments to IBM.

When is the best time to learn man over board drill – when it is calm. When is the best time to practice man over board – when it is rough.

If you have two totally different concepts, but there is a similarity between them, you can get insight by comparing them.

At first glance there is little in common between “man over board” when at sea, and enterprise computers, but there is, and we can get insight about testing and preparation.

When is the best time to learn man over board drill.

While you are learning, you want a nice calm day, so you any mistakes you make, do no damage.  You need to practice it until you can recover the dummy most times.

When is the best time to practice man overboard drill?

You are more likely to go overboard when the seas are rough than when the weather is calm.  You need to practice in this scenario.  When the weather is rough it is hard to see the person in the water – a persons head is 1 ft high – but the waves can  be  6 feet from peak to trough.  It is harder to position the boat.  So the best time to practice man over board drill is when the weather is rough.  Do not try it in a gale, as you are likely to damage the boat, or have someone really fall over board.
When my father was in the navy, he told me about “exercises” where the ships would be under attack from planes (this was before missiles) and there was a submarine or two trying to attach you. In one exercise,  the sea was a bit rough the captain of the ship sat back and let his senior officer (First Lieutenant)  run the ship.  Things were going well until the captain arranged for a “man over board” to happen.

The FL now had to decide to stop the ship (become a sitting duck and so be destroyed) and pick up the man over board; to leave the man in the water to die; or take people away from defending the ship and launch a boat/helicopter to rescue the man.   This was a complex situation, which suddenly became more complex, but they trained for this and had tested procedures and people knew what to do.

How does this apply to enterprise systems?

You need to decide on your “man overboard” scenarios.  For example this server is shut down, that network has problems.   You need to practice resolving the problems, capturing information to help you identify the real cause of the problem, and the steps needed to recover.   Once you have an automated or fully documented procedure, you  test it out in production – this is the “testing man over board when the sea is rough”.   This is where you find out holes in your processes, and find that production is configured differently to test, etc.

The “man over board when your ship is being attacked and the decision to save the man or save the ship” scenario.  You often get multiple problems and you need to decide on the priority of the actions.  “Messages building up on a queue” could be caused by a network problem.  It is more important to fix the network than “fix mq”.  You should go though scenarios to help decide what to do.  It is better to create action plans in advance, and document them, rather than try to come up with a plan during an emergency.  You want to avoid “if we do this then that will happen .   ahhh… not a good idea”

Think things through

I did a sailing course in the Mediterranean,  where the sea was warm, and people were swimming in the sea.   We had spent the morning doing Man Over Board, where we had to retrieve a buoy with a pole sticking out the top.  This was easy to retrieve, you lean over the side and just pick it up.  Well done,  tick the box, you passed.  We anchored up, and were having lunch with a nice cold glass or two of wine, when I asked, “so how do you get someone back into the boat” (they do not teach you this).  I then “accidentally” fell over board.

  • They threw me a rope – but I said my hands were too cold – I could not grip it.
  • They then made a loop in the end and threw it to me – the loop was too small to go over my head and life jacket.
  • They then make a big loop which I got over my head and they dragged me to the yacht, but were unable to lift me out because I was too heavy and my clothes were full of water.
  • The tutor then suggested using some of the lifting equipment from the boat, so they tied the rope over the end of the boom, and used a winch to winch me up – which worked, but they scraped me up the side of the boat, so I had a bleeding arm.

I said afterwards ( as they wiped my blood of the deck) look at the problems you had when we were at anchor.   Think what it would be like in a 6 ft sea!

Think how you will recover after your outage.

For example there may have been persistent messages on the queue manager when it went down.  The application retried and was successful because the traffic went to an alternative queue manager.   You now have possibly duplicate requests, or orphaned replies (because the getting application reconnected to a different queue manager.

This server went down, and all of the traffic went to that server.    Now this server has come back – how do you get traffic to balance over the queue managers?

You have a huge backlog of messages – what should you do – just purge them or let them be processed.  (This is where you realise that using message expiry on inquiry messages would be a good technique to use)

You need to think things through, these exercises are tedious and take a lot of time.   But you have no time in a crisis!