Where’s my network bottleneck? Try using traceroute

On Midrange MQ there a capability called trace route which allows you to see the path to a queue, and get information about the hops to the queue, for example how long was the message on the transmission queue before being sent.

If you have a problem in your MQ network. You can use dspmqrte in real time and see if there are any delays between end points.

In the blog post below, I’ll show an example of the information you can get, including where the time was spent during processing, and what you can use to automatically process the replies.

What is a trace route message?

A special (trace route) messages is sent to the specified queue (perhaps in a different queue manager), and tasks that process it en route, send information to a collection queue.

You can use the IBM supplied dspmqrte (display mq route) command. This sends the message and processes the responses.

For example; on QMA,  dspmrte put a message to a cluster queue CSERVER on QMC. During the processing of the trace route message, several messages were sent to the collection queue. Key data from the messages is displayed below.

Message 1 – processing done by dspmqrte

ApplName: dspmqrte
ActivityDesc: IBM MQ Display Route Application 

Operation: OperationType: Put
QMgrName: QMA QName: CSERVER 
ResolvedQName: SYSTEM.CLUSTER.TRANSMIT.QUEUE
RemoteQName: CSERVER 
RemoteQMgrName: QMC 

We can see from this information that the application dspmqrte Put a message to the queue CSERVER which is queue CSERVER on QMC, and it goes via SYSTEM.CLUSTER.TRANSMIT.QUEUE

Message 2 – processing done by the sending channel

ApplName: amqrmppa
ActivityDesc: Sending Message Channel Agent

Operation: OperationType: Get
QMgrName: QMA 
QName: SYSTEM.CLUSTER.TRANSMIT.QUEUE 
ResolvedQName: SYSTEM.CLUSTER.TRANSMIT.QUEUE 

Operation: OperationType: Send
QMgrName: QMA
RemoteQMgrName: QMC
ChannelName: CL.QMC
ChannelType: ClusSdr
XmitQName: SYSTEM.CLUSTER.TRANSMIT.QUEUE

We can see from this that the channel did two things (two operations) with the data

  1. Operation 1: ClusSdr channel CL.QMC, did a Get from SYSTEM.CLUSTER.TRANSMIT.QUEUE.
  2. Operation 2: Sent the message over the ClusSdr channel CL.QMC to queue manager QMC.

Message 3 – processing done by receiving channel

ApplName: amqrmppa
ActivityDesc: Receiving Message Channel Agent 

Operation: OperationType: Receive
QMgrName: QMC 
RemoteQMgrName: QMA 
ChannelName: CL.QMC
ChannelType: ClusRcvr

Operation: OperationType: Put
QMgrName: QMC 
QName: CSERVER
ResolvedQName: CSERVER

We can see from this that the channel did two things with the data

  1. the ClusRcvr channel CL.QMC received a message from QMA,
  2. The channel put the message to CSERVER on this queue manager.

End to end path

There is a sequence

  1. dspmqrte put a message to queue CSERVER
    • this message was put on the SYSTEM.CLUSTER.TRANSMIT.QUEUE queue
  2. Cluster Sender Channel CL.QMC got the message and sent it over the network
  3. Cluster Receiver Channel CL.QMC received the message from the network and put it to the CSERVER queue.

There is an option to trace the return route which sometimes worked, but not consistently.

From the queue names used, and the resolved queue names, you can check the names of the queues being used. If you are using QALIAS, QREMOTE,  clustered queues, clustered QALIAS, or clustered QREMOTE you can see the true names of the objects being used, and draw a topology chart of what is actually being used (rather than what you think is being used)

Extending your applications to support trace route.

There is an option to pass the trace route message to the application processing the queue. I will write another blog post about doing this – it took me several days to get it to work. This allowed me to return data saying “colinsProg got the message from CSERVER and passed it on to NEXTHOP”. I could then build up a true picture of my application

Using the data in the messages

The returned messages have a lot of information, including OperationTime. The time is a character string with format HHMMSShh, where hh is hundredths of a second.

With my example message above

  1. dspmqrte put a message to queue CSERVER at 06:10:10.00
  2. Cluster Sender Channel CL.QMC got the message at 06:10:11.44 1.44 seconds later
  3. Cluster Sender Channel CL.QMC sent it over the network at 06:10:11.44, no time delta
  4. Cluster Receiver Channel CL.QMC received the message from the network, 06:10:11.44, no time delta
  5. Cluster Receiver Channel CL.QMC put it to the CSERVER queue. 06:10:11.44, no time delta

We can see from this that there was a slight delay (1.44 seconds) before the channel got the message. The rest of the processing was very fast. If I had a problem in my MQ network, I would look at why the sending end of the channel was slow to process the message.

Problems with traceroute

I had a few problems using trace route.

The messages which flow are not true PCF messages, and so the IBM sample amqsevt (which processes PCF messages) does not recognise them. As this is sample code I was able to change it to get it to work. I’ll send the changes to IBM and hope they incorporate them in the product.  I output the messages in json and then used python to process them.

If dspmqrte thinks it has already seen a reply for the request it can throw it away and not display it.   I had this problem when instrumenting my applications to provide the trace route information.

It would be good if dspmqrte displayed the time delta from the start of the request. I had to take the output and post process it to report where the delays are.

The time OperationTime is in hundreds of a second. This may be OK for most people as if you are looking for delays in your processing, a tenth of a second may be granular enough. I added the high resolution time (epoch time) to the data provided by my applications.

If my backend application was not active there was an operation of “OperationType:Discard” and Feedback: NotDelivered. This may be because the number of handles opened for input was zero. It was a surprise. I expected to get a response saying “message expired”

My non trivial application design is to send a message to a queue CSERVER which passes a request to the backend application (on QMZ) which sends a response  back to the originator. Dspmqrte does not support this. You can set up dspmqrte to display the route between QMA and QMC. You can use a client connected dspmqrte to send a message from QMC and QMZ and then build the end to end picture yourself.

I have made some progress in instrumenting my applications to do this, but I need more time, as the documentation is unclear, wrong in places, and missing bits. I’ll send my doc comments to IBM.

2 thoughts on “Where’s my network bottleneck? Try using traceroute

  1. Hi Colin, trace route messages are available on z/OS too. However the dspmqrte tool is only available on distributed, but can connect to MQ for z/OS using client mode.

    Regards, Matt.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s