Sorting out the MQ application trace knotted spaghetti.

You can turn on report = MQRO_ACTIVITY to get activity traces sent to a queue.   This shows the hops and activity of your message.
You can create your own trace route messages to be sent to a remote queue, and get back the hops to get to the queue, or you can use the dspmqrte command to do this for you.

Which ever way you do it, the result is a collection of messages in your specified reply queue.  The problem is how do you untangle the messages.  It is not easy with for a single message.  If you are getting these activity messages every 10 seconds from multiple transactions, you quickly  get knotted spaghetti!   To entangle the spaghetti even more, you could have a central site processing these data from many queue managers, so you get data from multiple messages, and multiple queue managers.

You can get a message for part of your application or transaction.  For example,

  • a message with information about the first 10 MQ verbs your program uses.
  • a message for the sender channel with the MQGET and the send for the local queue manager, and the remote queue manager will send a message with the channel’s receive and MQPUT.

The easy bit – messages for activity on your queue manager.

The event message has a header section.  This has information including

QueueManager: ‘QMA’
Host Name: ‘colinpaice’
SeqNumber: 0
ApplicationName: ‘amqsact’
ApplicationPid: 28683
UserId: ‘colinpaice’

From  QueueManager: ‘QMA‘ and Host Name: ‘colinpaice‘, you know which machine and queue manager you are on.

From ApplicationPid: 28683 SeqNumber: 0, you can see the records for this applications Process ID, and the sequence number.   This happens to be for a program ApplicationName: ‘amqsact’ and UserId: ‘colinpaice’.  I dont know when the sequence number wraps.  If the application ends, and the same process is reused,  I would expect the sequence number to be reset to 0.

You may have many threads running in a process , such as for a web server.  For each MQ operation  there is information for example

MQI Operation: 0
Operation Id: MQXF_PUT
ApplicationTid: 81
OperationDate: ‘2019-05-25′
OperationTime: ’14:28:18’
High Res Time: 1558790898843979
QMgr Operation Duration: 114

We can see that this is for Task ID 81.

So to tie up all of the activity for a program, you have to select the records with the same ApplicationPid, and check the SeqNumber to make sure you are not missing records.  Then you can locate the record with the same TID.
You also need to remember that a thread behaviour can be complex ( like adding meat balls to the spaghetti).  Because of thread pooling, an application may finish with the thread, and the thread can be reused.  If a thread is not being used, it can be deleted, so you will get MQBACK and MQDISC occurring after a period of time.

It is similar for channels

For a sending channel you get the following fields.

QueueManager: ‘QMA’
Host Name: ‘colinpaice’
SeqNumber: 1723
ApplicationName: ‘runmqchl’
Application Type: MQAT_QMGR
ApplicationPid: 5157
ConnName: ‘127.0.0.1(1416)’
Channel Type: MQCHT_CLUSSDR

MQI Operation: 0
Operation Id: MQXF_GET
ApplicationTid: 1

For a  receiving channel you get

QueueManager: ‘QMA’
Host Name: ‘colinpaice’

SeqNumber: 1746
ApplicationName: ‘amqrmppa’
ApplicationPid: 4509
UserId: ‘colinpaice’

Channel Name: ‘CL.QMA’
ConnName: ‘127.0.0.1’
Channel Type: MQCHT_CLUSRCVR

MQI Operation: 0
Operation Id: MQXF_OPEN
ApplicationTid: 5
….
MQI Operation: 1
Operation Id: MQXF_PUT
ApplicationTid: 5
….

As there can be many receiver channels with the same name for example an Receiver MCA channel, you should be able to use the CONNAME IP address to identify the channel being used.

They may have the same or different ApplicationPid.

It might be easier just search all of the channels for the messages with matching msgid and correlid!

 

 

 

 

 

 

What data is there to help you manage your systems?

There is a lot of information provided by MQ to help you manage your systems, some of it is not well documented.

I’ll list the sources I know, and when they might be needed, but before that I’ll approach it from the “what do I want to do”.

What do I want to do?

I want to…

  • know when significant events happen in my system, such as channel start and stop, and security events. AMQERRnn.LOG
  • know when “events” happen, such as a queue filling up security exceptions.  Event Queues
  • be able to specify thresholds, such as when the current depth is > 10 messages, and the age of the oldest message is older than 5 seconds then do something. Display commands
  • be able draw graphs of basic metrics, such as number of messages put per hour/per day so I can do capacity planning, and look for potential capacity problems. Statistics
  • identify which queues are being used, display queue activity, number of puts, and size of puts etc  Statisticsdisplay object status
  • identify which queues (objects) are not being used, so they can be deleted.  Absence of records in Statistics.  Issue DIS QSTATUS every day and see if a message has been put to the queue.  Creating events for when a message is put to the queue.  Note an object may only be used once a year – so you need to monitor it all year.
  • identify which applications are putting to and getting from queues. Accounting
  • see what MQI verbs are being used, so we can educate developers on the corporate naming standards, and API usage.  Activity trace
  • display the topology.  Display commandsTrace routeActivity trace
  • trace where messages are going, so we can draw charts of the flow of message requests and their responses, and display the topology of what is actually being used.  Trace route
  • measure round trip times of messages – so I know if there are delays in the end to end picture. Trace route
  • Understand the impact of a problem “here” by seeing what flow through “here”.  What’s my topology

What sources are there?

AMQERRnn.LOG.

These contain information about events in the queue manager, such as channel start and channel stop. These files are in /var/mqm/qmgrs/QMA/errors/… and can be read using an editor or browser.  People often feed these into tools like SPLUNK, and then you can filter and do queries to monitor for messages that have not been seen before.

Event queues.

MQ messages are put on queues like SYSTEM.ADMIN.QMGR.EVENT.

There is a sample amqsevt which can be used to print the message in text format or json format – or you can write your own program.

Creating events.

You can configure MQ to produce events when conditions occur. For important queues you can set a high threshold, and MQ produces an event when this limit is exceeded. You can use this

  • to see if messages are accumulating in a queue
  • to see if a queue is being used – set the queue high threshold to be 1, and you will get an event if a message is put to a queue

Statistics

If you turn on statistics you information on the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.STATISTICS.QUEUE

The information is summarized by queue.

You can use

  1. the sample amqsevt to process these messages, you can have output in json format for input into other tools.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.

Accounting

If you turn on accounting you information about what an application is doing. The the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.ACCOUNTING.QUEUE. The information is similar to the information provided by statistics, but it provided information about which application used the objects.

You can use

  1. the sample amqsevt to process these messages.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.

You can use display commands.

You can use commands, or use the MQINQ API to display information about object. You can issue commands using runmqsc or from an application by putting command requests in PCF format to a queue, and getting the data back in PCF format. Your program has to decode the PCF data.
You can display multiple fields and have logic  to take action if values are out of the usual range.   For example
periodically display the curdepth, and the age of the oldest message on the queue and then do processing based on these value. Tivoli uses this technique to creates situations if specified conditions are met. You can easily write your own programs to do this, for example using python scripts and pymqi.

What’s my topology?

You can use the DIS CHANNEL … CONNAME command to show where a channel connects to and use this to draw up a picture of your configuration.

You can use the DIS QCLUSTER and DIS CLUSQMGR to show information about your clusters, and where cluster queues are, and use this information to draw up a picture of your configuration

You can use the traceroute to dynamically see the routes between nodes, and understand the proportion of messages going to different destinations – at that moment in time.

Displaying object status

You can use display commands to show information such as the last time a message was put to a queue, or got from a queue, or sent over a channel.

Application trace

The application trace shows you the MQ API calls, the parameters, and return codes. This data goes to the SYSTEM.ADMIN.TRACE.ACTIVITY.QUEUE queue.

You can use this to check the API options being used, for example

  • Messages persistence is correct for the application pattern (inquiry is non persistence)
  • The correct message expiry is specified (non persistent has time value)
  • The correct options are specified
  • Applications are using MQ GET with wait rather than polling a queue
  • The correct syncpoint options are being used.
  • Which queue is really used. You open one queue name but this could be an alias. You get the queue it maps to.

There is an overhead to collecting this, so you do not want to run this for extended periods of time.

Running it for just a minute or two may give you enough information. You can turn this on for an individual program.

You can use amqsevt to process the queue.

Trace route.

You can send a message “to a queue” and get back the processes involved in getting to the queue.. For example use the dspmqrte to “put” a message to a cluster queue, and you will see the sending channel get the message and send it, then the receiver channel at the remote end receive the message and “putting” it to the queue. One of the data fields is the operation time, so you can see where the delays were in the processing (for example it took seconds to be sent over a channel).  See here

By default the message is not put to the queue, but there is an option to put it to the queue for the application to process, but there is no documentation to tell you how you process this message. The dspmqrte command effectively shows you the hops between queues. It is up to you to build up the true end to end path, and manage the responses yourself.

The provided programs dspmqrte are simplistic and show you the path to the queue, and the channels used on the queue.

The data is not pure PCF, and the sample amqsevt does not format it. I have modified it to handle this.