My mid-range accounting is missing data

Like many of the questions I get asked, it started off as a simple question.

Over a day, what people and machines are using this queue?”  Simple I said.  Just use the accounting data.

Problem 1

I had a program which puts a message to a clustered queue on a remote queue manager.  The message flows to the server and a reply is sent back.  I thought I would check it works.  No it didn’t.

The accounting data has my program, but it recorded a message put to the SYSTEM.CLUSTER.TRANSMIT.QUEUE, and a message got from my reply queue, so I cannot tell what queue was used, but I can tell how many bytes were processed

Problem 2

My application doing just a publish did not show up in the accounting data.

I used the samples amqspub, and amqssub to publish, and subscribe respectively.

The accounting data had a record for the amqssub, getting from a queue, but there was no record for the publishing application.

As Morag kindly pointed out, you can tell how many publishes you did from the MQI accounting data, but not to which topic.

Ive started looking at the activity trace, but this looks like a heavyweight solution to the simple question.

Problem 3

Someone else pointed out that if you want to know the channel name, and IP address of the application using a queue, you have to do some extra work.

  • Collect the MQI accounting
  • Process queue accounting and MQI accounting, and use the common field connectionId to merge the data

Using amqsevt to display the accounting data in json format you get

“eventData” : {
“queueMgrName” : “QMC”,
“startDate” : “2020-04-17”,
“startTime” : “09.43.23”,
“endDate” : “2020-04-17”,
“endTime” : “10.18.58”,
“commandLevel” : 913,
“connectionId” : “414D5143514D432020202020202020201361995E0A24A923”,
“sequenceNumber” : 1,
“applName” : “server”,
“processId” : 29103,
“threadId” : 11,
“userIdentifier” : “colinpaice”,
   “connDate” : “2020-04-17”,
   “connTime” : “09.12.16”,
   “connectionName” : “127.0.0.1”,
   “channelName” : “COLIN”,

Where the bold text is for the data only in the MQI accounting information.

It looks like the MQI accounting record is produced before the Queue accounting record.

You will need to save the connectionId key with the connection specific information.  When the queue accounting record is processed, use the connectionId to look up the information and append it to the queue record.  If many queues are being used, you may get multiple queue accounting records – so you cannot simply delete the connection key and connection specific information.

 

Some gotcha’s when using stats and accounting with MQ midrange

I’ve been writing some Python code to process the messages on SYSTEM.ADMIN.STATISTICS.QUEUE and SYSTEM.ADMIN.ACCOUNTING.QUEUE queues and to create tabular data in a .csv file. I fell over some problems, and I thought I would pass on my experiences.

I used /opt/mqm/samp/bin/amqsevt -m QMA -b -q SYSTEM.ADMIN.STATISTICS.QUEUE -o json -w 1 |jq . > statfile.txt

to read the message into json, and use jq to format them better, and write the output to the file statfile.txt.

I used the MQSA Python scripts which will soon be going up on MQTools GitHub repository, to actually produce the .csv files.

From here you can see the content of the messages.

Gotchas..

Not all fields are reported.

In the documentation, some fields are marked as “Always” present the others are marked “When available”. So fields like QName are always present, but PutCount is “When available” This means you cannot just go along the fields in the records and put the first field in column 1, the second field in column 2 etc. You have to put the “qname” data in the “qname” column etc, and you may have to skip columns if you do not have the fields. You also need to set up the column headers with all the possible fields. I used Python csv.DictWriter to write the data.

Many “fields” have multiple values.

For example “puts” has number of non persistent, number of persistent, for example puts[10,4]. I found it easier to add these together and provide one field puts[14]. Other fields, such as set have 8 fields, for queues, namelist…. Again, I added these up to provide one value.

Be careful when combining fields.

Finding the maximum message size put, PutMaxBytes is easy – it is the maximum of the non persistent and persistent values. The PutMinBytes is harder. If there were no persistent messages put, then the value in PutMinBytes will be zero! You need to factor this into your calculations. You need

  • If PutCount fields are present, and putMinBytes fields are present:
  • if PutCount[0] == 0 : # we did none
  • PutMinBytes = PutMinBytes[1] # use the other value
  • else: if PutCount[1] == 0:
  • PutMinBytes = PutMinBytes[0]
  • # else there were puts of P and NP messages so look at both
  • else: PutMinBytes = Min(PutMinBytes[0],PutMinBytes[1])
  • # we did no puts
  • else: PutMinBytes = 0 # theses fields were missing, or we did no puts.

Similarly merging TimeOnQAvg values needs care.

If the fields are present, and total number of gets > 0 :

TimeOnQAvg = (GetCount[0] * TimeOnQAvg[0]

+ GetCount[1] * TimeOnQAvg[1])

/(GetCount[0]+GetCount[1])

Date and time format.

The default format is “startDate”: “2019-02-27” and “startTime”: “17.03.41”. I changed these to “startDate”: “2019/02/27″ and “startTime”: “17:03:41”

Add up the put information

At a high level, you just want to know the total number messages put, and the total number of bytes put – and not if it came from a put, put1, or a topic put. To do this you need to do calculations like

puts_total = 0
put_names = {“puts”, “put1s”, “topicPuts”, “topicPut1s”}
for each element in put_names
if element is present in the data
puts_total += value
create element[“putsTotal”] = puts_total
put_bytes = 0
Do the same with elements = {“putBytes”, “topicPutBytes”}
create element[“putsBytesTotal”] = puts_bytes

Some tools like a date-time field.

I combined the above fields into one field. I effectively used startDate||’/’||startTime and passed this into the data-time code with the de-format string “%Y-%m-%d/%H.%M.%S” to parse it and produce a date time object.

Statistics messages are not produced at well defined times.

They are produced at an interval specified by the qmgr attribute STATINT. Mine was set to 600 (seconds). This produced records at 17:04:06, 17:14:06, 17:24:06. I stopped the queue manager and restarted it, and the next records came out at 17:32:37, 17:42:37, 17:52:37, and the times do not line up.

You need to think how to use this data. You may want to produce a chart showing usage over a day. The vertical access is the number of puts, the horizontal axis is hours. What do you do with a record which was from 19:45 to 20:05, produced at 20:05 ?

You could just record it in 20:00 to 21:00 bucket. This makes the peak look as it is was at 20:00 to 21:00 – not 19:00 to 20:00, so your graphs give the wrong message.

You could distribute the messages according to the time spent in each hour, so from 1945 to 2000 is 15 minutes, and from 2000 to 2005 is 5 minutes. You split the data in a ratio of 15:5 between the 19:00-20:00 bucket, and the 20:00 to 21:00 bucket. This is more more accurate, but still misleading. If you stop trading at 19:59:59. Splitting the data across the buckets will show usage past 20:00 which may not be true

You have to calculate the duration of the record.

Each record has a section called eventData with Start Date, Start Time, and End Date and Time. You can now calculate the duration yourself. As I needed date-time objects for another reason, I found it easiest to create the StartDateTime, and EndDateTime objects and then say duration = EndDateTime – StartDateTime. Rather than calculating EndTime – StartTime. If this is < 0 then add 24 hours.

Epoch needs to be changed.

You get data like

“eventCreation”: {
“timeStamp”: “2019-02-26T12:07:44Z”,
“epoch”: 1551182864
},

Epoch is the number of seconds from 1/1/1970. You can format it for example on Ubuntu, date –date @1551182864 gave Tue 26 Feb 12:07:44 GMT 2019. Unfortunately you cannot use this as a time stamp in some spread sheets, because they use a different day 0!, and so using “format as date” on the epoch integer give you the wrong answer. On Libre Office the calculation is epoch/86400+25569, and format as date time

Formatting the data in a spread sheet.

When you import the data into a spread sheet, you have to specify the formatting for some columns. To automate this, you may want to write a macro to do this automatically for you.

Using tools like Elasticsearch and Kibana.

Documents in Elastic search require a document type and unique id.

The document type could be MQ.Queue_Accounting, MQ.MQI accounting, MQ.Queue_Statistics, MQ.MQI_Statistics.

The unique id could be

  • qmgr,eventCreation_timeStamp for qmgr MQI statistics
  • qmgr,eventCreation_timeStamp.queueName for queue statistics
  • qmgr,eventCreation_timeStamp.processId.threadID for qmgr MQI accounting
  • qmgr,eventCreation_timeStamp.queueName.processId.threadID for queue accounting.