Showing what the C client is doing and how long it took.

I was asked to get involved in a dispute between the application team, and the MQ team about a slow C application running as a client. The application team felt it was MQ, the MQ team felt it was somewhere else. We had lots of feelings but no facts. I took a client trace, post processed it, and could point the finger of blame. I then spent more time digging into the trace and could see where the time is being spent.

Collecting a trace

We used strmqtrc -e -d 0 to turn on the trace. -d 0 says collect 0 length data buffers.
We ran the application and used endmqtrc -e to stop the trace
As we were on Linux we had to use the dspmqtrc *.TRC to format the trace. You may have lots of file on /var/mqm/trace. The trace of the client will have the userid associated with the application. Files with userid mqm come from the queue manager.

If you look at the trace file there is so much data is it temping to give up.
At the top of the file is a head and includes a line “Program Name :- …” so you can check this is the correct file.

The trace data looks like
20:41:03.511815 17291.5      :      { MQGET
20:41:03.511819 17291.5      :       -{ reqGetConn
20:41:03.511822 17291.5 RSESS:000007 -} reqGetConn rc=OK FunctionTime=3
20:41:03.511826 17291.5 RSESS:000007{ MQGET

20:41:03.511815 is a time stamp
17291.5 is processid.threadid
the 17291 is the same as the file name AMQ17291.0.FMT
{, -{,–{ shows the nesting level of the function so { is the top level, -{ is the next level deeper etc

Using the command
grep -e ” {” -e ” }” *291*.FMT|less
extracts just high level MQ API calls with the output from different threads intermixed

20:41:03.637143 17291.8 : } MQPUT rc=OK FunctionTime=874
20:41:03.637146 17291.8 : { MQGET
20:41:03.641930 17291.6 : } MQGET rc=OK FunctionTime=4784
20:41:03.641949 17291.6 : { MQPUT

If you look in the trace file near the entry times you will see the MQ control blocks, such as the MQMD, and MQOD so you can identify which queues are being used etc..

grep -e ” {” -e ” }” *291*.FMT| grep 17291.8 |less
shows you the high level data in the trace file for just the thread 8.

(1) 20:41:06.069828 17291.8 : { MQGET
(2) 20:41:06.105633 17291.8 : } MQGET rc=OK FunctionTime=6784
(3) 20:41:06.105656 17291.8 : { MQPUT
(4) 20:41:06.139679 17291.8 : } MQPUT rc=OK FunctionTime=482
(5) 20:41:06.953967 17291.8 : { MQGET

Note. The FunctionTime is often mis calculated and should be ignored.

The first MQGET took (2) – (1) 20:41:06.105633 – 20:41:06.069828 = 0.035806 seconds.
After the get to the following MQPUT it took (3) – (2) 20:41:06.105656 – 20:41:06.105633 = 0.000023 seconds.
The MQPUT took (4)-(3) 20:41:06.139679 – 20:41:06.105656 = 0.034023 seconds
After the MQPUT to the following MQGET (5) – (4) it took 20:41:06.953967 – 20:41:06.139679 = 0.814288 seconds

In summary the MQ requests took about 35 milliseconds, between the get and the put was 0.023 milliseconds, but between the put and the following get it took 814 milliseconds. This shows that most of the time was outside of MQ.

This was enough evidence to show the applications people to look else where.

Script to process the trace file and pull out the times in send and waiting for data

It is hard work extract the time stamps and calculating the time differences, so I wrote a simple Python script to do this for me.

This script gave me

(1) 20:41:04.400034 17291.6 : { MQPUT 0.000017
(2) 20:41:04.404946 17291.6 : } MQPUT 0.004912 sendTime 0.000960 waitTIme 0.003281
(3) 20:41:04.404949 17291.6 : { MQGET 0.000003
(4) 20:41:04.416523 17291.6 : } MQGET 0.011574 sendTime 0.000050 waitTIme 0.011167

(2) shows that the MQPUT took 0.004912 seconds, of which 0.000960 seconds was spent doing TCP send requests, and 0.003281 seconds was spent waiting for the request to be executed, and the response sent back. The message was large, so there were multiple send requests.
(3) shows the time between the end of the previous MQPUT, and the MQGET request was 0.000003 seconds.
(4) show the time of the MQGET . Overall it took 0.011574 seconds, of which 0.000050 was doing the send (there was only one send request), and 0.011167 seconds was spent waiting for the request to be executed, and the response sent back.

Digging deeper to see what is going on under the covers

Having a performance background I thought I would dig a bit deeper into this.

The overall picture I have is of getting to the airport. Do I take a taxi or take the bus?
It is usually better for everyone if people take the bus, because it is more efficient overall, and can reduce the amount of traffic. The journey may be a little longer because the bus has to stop to set down and pick up people. If you need to minimise the journey time, I would take a taxi. It may mean I get their faster than the bus – but it may not, if the taxi is queued behind the bus and cannot overtake.

The basic flow of a channel is
1. The application issues an MQ request
2. 1 or more TCP send requests pass the MQ verb and any data (such as message payload for an MQPUT). No more than 64 KB is sent per send request, so big messages will require multiple sends.
3. The last request includes the “end of data” flag.
4. The request is processed
5. The response and any data is sent back, a thread does one or more TCP receives to get the response and any data (for example the message payload for an MQGET request)
6. When all the data has been received, the request returns back to the application

For many years the application sent the data and blocked waiting for the response. If you had an application server, each thread had its own connection. This could lead to a very large number of connections. This configuration is called sharecnv(0)

With shared conversations introduced in MQ V7 there is one TCP connection and threads within a process can use it. The number of threads that can share it is defined with sharecnv(n).
The logic is very similar
1. The application issues an MQ request
2. The application does lock:TCP send:unlock. Multiple threads can send data, so you may get sends for multiple threads intermingled; send for thread 1, send for thread 2, send for thread 1, send for thread 3 etc..
3. The last request for the application request says “end of data”
4. The application thread is suspended.
5. There is a “reply thread” which issues the TCP receive requests.
6. When all the data has been returned for an application, the application thread is woken up.
This has the advantage that there may be only one connection, for multiple threads in the process. This can be important if you have a very large number of threads using MQ.

With sharecnv(0) the channel was blocked during the MQ request.
There are two (or more) threads, one for sending, and a reply thread, means that asynchronous put, and asynchronous get can be supported.

How does the performance compare?

Sending data….

The TCP send request just passes the data into TCP buffers, so is quick. The TCP protocol says that after a number of bytes have been sent, TCP will wait till it has had confirmation that the earlier bytes have been received at the remote end. So looking at the TCP level you may have send(fast) send(fast) send(fast) send(slow – waiting for the ack from the far end), send (fast) send(slow)

This is the same for sharecnv(0) and sharecnv(> 0).

If you had sharecnv(0), so no sharing of the connection, the channel would do multiple sends, then perhaps do a database update. You may have other threads in the process also doing multiple sends and then perhaps a database update. If you did a packet trace of the network link you might see data for connection 1, data for connection 2, data for connection 1, data for connection 3 – just as for the sharecnv>0 case, but these requests may be done in parallel – depending on the configuration

So for sharecnv(0) and sharecnv(>0) you should expect similar send data rates.

Receiving data

The channels with sharecnv(0) have a dedicated connection. With sharecnv(>1) there is one reply connection shared by the users. With a low to medium throughput this should give the similar throughput as multiple channels with sharecnv(0). Because of the way the low level TCP protocol works, I can see that sometimes a channel defined with sharecnv(>1) may be faster!

With high throughput data may be delayed getting onto the shared connection, whereas with the sharecnv(0) channels the data can be flowing over connections in parallel.

What value should I use in sharecnv(n)

If you want to use the new asynchronous functions, n must be greater than 0.
If you want the maximum possible throughput and minimum response time – at the cost of a larger number of connections then use n = 1.
If your throughput requirements are lower, and the applications can tolerate a small increase in response time then a larger n value will mean you need fewer connections – and so your queue manager can support more application instances.

Showing what the C client is doing and how long it took.

Collecting a trace

Script to process the trace file and pull out the times in send and waiting for data

Digging deeper to see what is going on under the covers

How does the performance compare?

Sending data….

Receiving data

What value should I use in sharecnv(n)

Published by Colin Paice

One thought on “Showing what the C client is doing and how long it took.”

Leave a comment Cancel reply

Collecting a trace

Script to process the trace file and pull out the times in send and waiting for data

Digging deeper to see what is going on under the covers

How does the performance compare?

Sending data….

Receiving data

What value should I use in sharecnv(n)

Share this:

Related

Published by Colin Paice

One thought on “Showing what the C client is doing and how long it took.”

Leave a comment Cancel reply