Like many topics I look at, this topic seems much harder than I expected. I thought about this as I walked the long way to the shops, up a steep hill and down the other side (and came back the short way along the flat). I wondered if it was because I ignore the small problems (like one queue manager and two queues), and always look at the bigger scale problems, for example with hundreds of queue managers and hundreds of queues.
You have to be careful processing and interpreting the data, as it is easy to give an inaccurate picture see “Be careful” at the bottom of this post.
Here are some of the things I found out about the data.
Enabling accounting trace
- You need to alter qmgr acctQ(ON) acctqMQI(ON) before starting your applications (and channels). Enabling it after the program has started running does not capture the data for existing tasks. You may want to consider enabling accounting trace, then shutdown and restart the queue manager to ensure that all channels are capturing data.
- The interval ACCTINT is a minimum time. The accounting records are produced after this time, and with some MQ activity. If there is no MQ activity, no accounting record is produced, so be careful calculating rates eg messages processed a second.
- Altering ACCTINT does not take effect until after the accounting record has been produced. For example if you have ACCTINT(1800) and you change it to ACCTINT(60), the “ACCTINT(1800)” will have to expire, produce the records, then the ACCTINT(60) becomes operative.
- You can use the queue attribute ACCTQ() to disable the queue accounting information for a queue.
- The MQI accounting has channel and connection information, the queue accounting does not. This means you cannot tell the end point from just the queue data.
- The MQI accounting and Queue accounting have a common field the “connectionId”. It looks like this is made up with the first 12 characters of the the queue manager name, and a unique large number (perhaps a time stamp). If you are using many machines, with similar long queue manager names you may want to use a combination field of machineName.connectionId, to make this truly unique.
- I had an application using a dynamic reply to queue. I ran this application 1000 times, so 1000 dynamic queues were used. When the application put to the server, and used the dynamic reply queue, the server had a queue record for each dynamic queue. There were up to 100 queue sections in each queue record, and 11 accounting queue messages for the server ( 1000 dynamic queues, and one server input queue). These were produced at the end of the accounting interval, they all had the same header information, connectionId, start time etc. You do not know in advance how many queue records there will be.
- Compare this to using a clustered queue on a remote queue manager, the server queue accounting record on the remote system.had just two queues, the server input queue, and the SYSTEM.CLUSTER.TRANSMIT.QUEUE.
- The cluster receiver channel on the originator’s queue manager had a queue entry for each dynamic queue.
- In all my testing the MQI record was produced before the queue accounting record for a program. If you want to merge the MQI and the Queue records, save information from the MQI record in a table, keyed with the connectionId. When the Queue records come along you use same connectioID key to get the connection information and MQI data.
- You can get rid of the MQI key data from your table when queue record has less than 100 queues.
- If the queue record has exactly 100 queues, you do not know if this is middle in a series or the last of the series. To prevent a storage leak, you may want to store the time within the table and have a timer periodically delete these entries after a few seconds – or just restart the program once a day.
- The header part of the data has a “sequenceNumber” field. This is usually incremented with every set of records.
- On the SYSTEM.ADMIN.ACCOUNTING.QUEUE, messages for different program instances can be interleaved, for example client1 MQI, client2 MQI, client1 Queue, client3 MQI, client2 Queue, client1 Queue, client3 Queue.
- You do not get information about the queue name as used by an application, the record has the queue name as used by the queue manager (which may be the same as that which the application used). For example if your program uses a QALIAS -> clustered queue. The queue record will have the remote queue name used: SYSTEM.CLUSTER.TRANSMIT.QUEUE, not what the application used.
- You can use the activity trace to get a profile of what queues an application uses, and feed this into your processing
- You do not get information about topics or subscriptions names.
- You may want to convert connectionName from 127.0.0.1 format to a domain name.
Using the data in your enterprise
You need to capture and reduce the data into a usable form.
From a central machine you can use the amqsevt sample over a client channel for each queue manager and output the data in json format.
I used a python script to process this data. For example:
- Output the data into a file of format yymmdd.hh.machine.queueManager.json. You can then use program like jq to take the json data for a particular day (or hour of day) and merge the output from all your queue managers to one stream, for reporting.
- You get a new file every day/every hour for each queue manager, and allows you to archive or delete old files.
- Depending on what you want to do with the data, you need to select different fields. You may not be able to summarise the data by queue name, as you may find that all application are using clustered queues, and so it is reported as SYSTEM.CLUSTER.TRANSMIT.QUEUE. You might consider
- ConnectionName – the IP address where the client came from
- Channel Name
- The queue manager where the application connected to
- The queue manager group (see below)
- The program name
- Record start time, and record end time
- The interval of the record – for a client application, this may be 1 second or less. For a long running server or channel this will be the accounting interval
- Number of puts that worked.
- Number of gets that worked
- Number of gets that failed.
- Queue name used for puts
- Queue name used for gets.
- You can now display interesting things like
- Over a day, the average number of queue managers used by an application. Was this just one (a single point of failure), mostly using one (an imbalance), or spread across multiple queue managers(good).
- If application did a few messages and ended, and repeated this frequently. You can review the application and get it to stay connected for longer, and avoid expensive MQCONN and MQDISC, and so save CPU.
- Did an application stay connected to one queue manager all day? It is good practice to disconnect and reconnect perhaps hourly to spread the connections across available queue managers
- You could charge departments for usage based on userid, or program name, and charge on the number or size of messages processed and connections (remembering that an MQCONN and MQDISC are very expensive).
- You may want to group queue managers so the queue managers FrontEnd1, FrontEnd2, FrontEnd3 would be grouped as FrontEndALL. This provides summary information.
The time interval of an accounting record can vary. For example if you are collecting data at hh:15 and hh:45, and you business hours are 0900 to 1700. If you have no traffic after 1700, and the next MQ request is 0900 the next day, the accounting record will be from 1645 today to 0901 tomorrow.
- If you are calculating rates eg messages per second then the rates will be wrong, as the time interval is too long.
- The start times of an interval may vary from day to day. Yesterday it may have been 0910 to 0940, today it is 0929 to 0959. It makes it hard to compare like for like.
- If you try to partition the data into hourly buckets 0900 to 1000,1000 to 1100 etc. using data from 0950: 10:20 by putting a third of the numbers in the 0900:1000 bucket, and the other two thirds into 1000:1100 bucket, then for a record from1645 today to 0900 tomorrow will spread the data across the night and so give a false picture.
You may want to set your accounting interval to 30 minutes, and restart your servers at 0830, so they are recording on the hour. At the end of the day, shut down and restart your servers to capture all of the data in the correct time interval bucket