Understanding queue high events and time since reset

I was testing out my java program for processing PCF messages and snagged a “problem” with queue high events.

I had set max depth of the queue to 1.

I put two messages out of syncpoint, and got the following events on the SYSTEM.ADMIN.PERFM.EVENT queue.

"PCFHeader":{
  "Type" :MQCFT_EVENT
  ,"ParameterCount" :6
  ,"Command" :MQCMD_PERFM_EVENT
  ,"Control" :MQCFC_LAST
  ,"Reason" :Queue_Depth_High
  ,"MsgSeq" :1
  ,"Version" :1
  }
  ,"QMGR_Name" :QMA 
  ,"Base_Object_Name/mqca_Base_Q_Name" :CP0000 
  ,"Time_Since_Reset" :53
  ,"High_Q_Depth" :1
  ,"Msg_Enq_Count" :1
  ,"Msg_Deq_Count" :0

and

PCFHeader":{
  "Type" :MQCFT_EVENT
  ,"ParameterCount" :6
  ,"Command" :MQCMD_PERFM_EVENT
  ,"Control" :MQCFC_LAST
  ,"Reason" :Queue_Full
  ,"MsgSeq" :1
  ,"Version" :1
 }
 ,"QMGR_Name" :QMA 
 ,"Base_Object_Name/mqca_Base_Q_Name" :CP0000   
 ,"Time_Since_Reset" :0
 ,"High_Q_Depth" :1
 ,"Msg_Enq_Count" :0
 ,"Msg_Deq_Count" :0

I think this is telling me

The Queue high event occurred 53 seconds after some other event
One message was put, causing the queue high event
The queue full event occurred 0 seconds after some event. The documentation says “Time, in seconds, since the statistics were last reset”
A message was put to the queue, unsuccessfully, as Enq_Count is zero.

I think the documentation is not clear. I did not issue a reset statistics command. The time since reset varies between 0 and over 6000 seconds. My qmgr Stats interval was 10 seconds.

I think the time_since_reset is time since previous event or something else, so looks pretty useless. When I issued a reset qmgr reset(Statistics) it made no difference to the time since reset.

This time value is MQIA_TIME_SINCE_RESET in the PCF.

I noticed that the event messages are non persistent, so make sure you process the messages before you shut down your queue manager!

Processing Activity Trace PCF messages with Java

This post covers some of the things I found about extending my java program for processing PCF, to cover activity trace

Operation date has an inconsistent format

The field with MQCACF_OPERATION_DATE is a field with format SimpleDateFormat(“yyyy-MM-dd” but is a 12 byte field (instead of the 10 you might expect), and is padded with nulls instead of blanks. You need to do date = …getStringParameterValue(MQConstants.MQCACF_OPERATION_DATE).substring(0, 10) to capture the key data.

MQIACF_APPL_FUNCTION_TYPE.

This maps to fields MQFUN_TYPE_* with values like MQFUN_TYPE_UNKNOWN, MQFUN_TYPE_USERDEF, MQFUN_TYPE_JVM. I got Unknown, and do not know what this means. I think they are connected to the iSeries.

Using the high resolution timer, with microsecond precision.

Puts and gets have a high precision time value. The field MQIAMO64_HIGHRES_TIME gives a microsecond level precision of the time of the time.

To process these values I used

long milliseconds = (long) p.getValue()/1000;
long microseconds = (long ) p.getValue()%1000000;
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(milliseconds);
Date dateTimeVal = cal.getTime(); 
SimpleDateFormat sdfFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss", Locale.ENGLISH);
System.out.println(""High res timer:" + sdfFormat.format(dateTimeVal 
                                      + "." 
                                      + String.format("%06d", microseconds)  // add microseconds
                                      +"Z") ;

You can use the micro seconds value when doing calculations such as duration between operations.

You get the high resolution timer when doing MQGET, MQPUT, MQPUT1, for the other calls, such as MQOPEN, MQCLOSE, MQCMIT you only get the timer from the “Operation_Time” file for example 18:09:34.
It is not easy to find where your application is spending its time.

For example

MQ Verb	OperationTime	OperationDuration micro seconds	Hi res timer
MQOPEN	18:09:34	56
MQPUT	18:09:34	66	18:09:34.747837
MQCMIT	18:09:34	12
MQGET	18:09:34	60	18:09:34.754299
MQCLOSE	18:09:34	54

This application was doing just MQ verbs. The time between the start of the get and the start of the put 6462 micro seconds. The put took 66 microseconds, and the commit took 12 microseconds, so there was 6384 microseconds unaccounted for. This reported duration is at the queue manager level. There will be elapsed time from issuing the request to get into the queue manager, and back out again.

I put a microsecond timer in my client program to time the MQ requests. After the MQOPEN, the first MQGET took 2800 microseconds, the second MQGET took 500 microseconds.

There is a big difference in the time as reported by the queue manager, and the time as seen by the application,

Work has to be done from the API request to be able to issue the network request.
The request has to flow over the network
The queue manager has code to catch the network request and issue it to the queue manager
there is the duration of the request – as seen by the accounting trace
The reply has to be transformed to a network request
The reply has to flow over the network
The reply has to be caught and returned to the application.

If the application was using local bindings it would be quicker, as there would be no network component.

The timer code was

long start = System.nanoTime();
queue.get(msg, gmo);
long duration_in_microseconds = (System.nanoTime() – start) /1000;

Note: the machine may not be able to provide a timer accurate at the nanosecond level.

Embedded structures

Requests for get have embedded structures for MQMD (type MQBACF_MQMD_STRUCT -7028) and GMO (MQBACF_MQGMO_STRUCT – 7027).
Requests for put have embedded structure for PMO( MQBACF_MQPMO_STRUCT – 7029).

The parameter types are the values for MQ CallBack, and have been reused.

The data is a byte string (MQCFBS) . I could not find how to easily extract the fields from these structures and display them with having to construct each field myself. The integer fields can be in bigendian or little endian. The version field for me was “01000000” or “02000000” which little endian. If it was big endian the value would have been “0000001”

I had to write some methods to chomp my way along the string to produce the fields. I could not find a way of passing the string to a method and get the fields back.

I created a method for longs, strings, bytes, byte. I found it easiest to use the get StringValue for the structure. This gives it to you in a hexadecimal string. You need to do additional processing for example convert from hex to a String, convert the string to an integer using base 16. Byte arrays could be used unchanged.

More moans about the management of Monitoring in MQ

I was writing some Java code to process the various PCF messages produced by MQ. I got the code working and handling most of the different sorts of message types, statistics, accounting, change events etc in a couple of days. I then thought I’d spend the hour before breakfast to handle the monitoring records on midrange MQ. A couple of days! later, I got it to process the monitoring messages into useful data.

I had already written about the challenges of this data in a blog post Using the monitoring data provided via publish in MQ midrange, but I had not actually tried to write a program to process the data. In the post below I have included below some of the lessons I learned.

Summary of the Monitoring available

Most records are published every 10 seconds, but I have had examples where records were produced with an interval ranging from 0.008 seconds to 1700 seconds.
You can subscribe to different topics, for example statistics on the MQ APIs used, statistics on logging, and statistics on queue usage.
The information is not like any other MQ PCF data where you have a field called MQConstants.MQCAMO_START_DATE.
There is meta data which describes the data. You get a field with Unit information, and a description. Good theory, does not work well in practice.
If you subscribe to Meta information you get information about the data fields. This is not very helpful if you are running MQ in an enterprise where there is centralized monitoring and reporting, for example into Splunk, spread sheets on ElasticSearch, and where data is sent to a remote site.
Data fields are identified by three fields; class,type and record. Class could be DISK; type could be Log; and record is a number 12 meaning “Log write, data size”.

Subscribing to information.

You can write code to create a subscription when your program is running and so get notified of the data and meta data only while your program is running; or you can manually create a subscription, so you always have the data (until your queue fills up).

DEFINE SUB(COLINDISKLOG)
TOPICSTR(‘$SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/Log’)
DEST(COLINMON) USERDATA(COLINDISK)

Will get the data for Queue Manager QMA, the DISK, Log section. Note the case of the data.

You cannot use a generic for the queue manager name, so you need a unique subscription for each queue manager. (I cannot see why the the queue manager is needed on midrange, perhaps someone can tell me). The queue manager name is available in the MQMD.ReplyToQMgr field.

You can ask that the meta information is sent to a queue (whenever the subscription is made) for example

DEF SUB(COLINDISKLOGMETA)
TOPICSTR(‘$SYS/MQ/INFO/QMGR/QMA/Monitor/METADATA/DISK/Log’)
DEST(COLINMON) USERDATA(COLINDISK)

The queue can be a remote queue with the target queue on your monitoring environment.

MQRFH2 and message properties

With the data, there is message property data about the message, for example the topic, and any user data. I could not get get the supplied java methods to skip the RFH2 data. This was using the MQheader class.

If I used gmo.options = … MQConstants.MQGMO_PROPERTIES_IN_HANDLE, the data was returned as properties, and not as an RFH2 header, so the PCFMessage class worked with the message.

For one of my monitoring records, the properties were

,"Properties":{
"mqps.Sud" :COLINDISK
,"mqps.Top" :$SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/Log
}

Where mqps.Sud is MQSubUserData (from my subscription) and mqps.Top is MQTopicString.

Identifying the records and data

You may know from the queue name, that you are getting monitoring data.

You can also tell from the PCF header values, Monitoring data has Type :MQCFT_STATISTICS and Command :MQCMD_NONE. Normal statistics data has Type :MQCFT_STATISTICS and Command :MQCMD_STATISTICS_Q, so you can tell them apart.

The data and the meta data both come in as monitoring records. You can tell from the topic in the message property, or from the content of the messages.
If the messages has PCF groups then the message contains meta data records – and should be processed to extract the identifiers, or skipped.

I found that you have to process the message twice to extract different fields
Pass 1 – extract

Class
Type
Interval
Count number of groups
Display queue name if present

If the number of groups = 0 then

Pass 2 extract

field type
value

You need the class, type, and field type to be identify what the value represents.

The comments in the sample c program which formats the records, implies the records may not always be in the same sequence, so in theory (but unlikely) the class and type records may be at the end of the message instead at the front. Doing two passes means you can extract the values before you need to use them.

Identifying what the data represents.

The meta data represents the interpretation of the value. You can process the messages containing meta data, and dynamically build a map of values -> description. In my “enterprise” this did not work for me, so I created a static map of values. I created a key: 256 * 256 * Class + 256 * Type + field type , and created a hashmap of (key,description);

Part of the meta data is a “units” field

Units are

“Unit”, for example the maximum number of connections in the interval
“Delta”, for example the number of puts in the interval (total number of puts at the end of the interval – number of puts at the start of the interval). These “delta”s are often interesting as a rate, for example the number of puts/interval – display the rate of puts as float with 2 decimal places.
“Hundredths”, for example CPU load over 1 minute. To use this I converted it to float, divide by 100, and printed it with two decimal places
“KB”, I don’t think this is used
“Percent”, for example file system full percentage. It reported 4016 for one value – you have to divide this by 100 to get the value (40.16), I converted it to float, divided it by 100, and printed it with two decimal places.
“Microseconds”, for example the time to write a log record
“MB”, for example RAM size of the machine (in MB). This matched the value from the linux free -lm command
“GB”, I do not think this is used.

I do not expect the units will change for a record, because it would make the centralized processing of the records very difficult.

For my processing I changed the descriptions. For example I changed “Log – bytes occupied by extents waiting to be archived” to “LogWaitingArchiveMB“. This is a better description for displaying in charts and reports, and includes the units.

My static definitions were like

//$SYS/MQ/INFO/QMGR/…Monitor/CPU/SystemSummary

// add(Class, Type, Field Type, Units, Description)

add(0,0,0,”Percent”,”UserCPU%”);
add(0,0,1,”Percent”,”SystemCPU%”);
add(0,0,2,”Hundredths”,”CPULoad1Min”);
add(0,0,3,”Hundredths”,”CPULoad5Min”);
add(0,0,4,”Hundredths”,”CPULoad15Min”);
add(0,0,5,”Percent”,”RAMFree%”);
add(0,0,6,”MB”,”RAMTotalMiB”);

// $SYS/MQ/INFO/QMGR/…/Monitor/CPU/QMgrSummary

add(0,1,0,”Percent”,”QMUserCPU%”);
add(0,1,1,”Percent”,”QMSystemCPU%”);
add(0,1,2,”MB”,”QMRAMMiB”);

// $SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/SystemSummary

//add(1,0,0,…); missing
//add(1,0,1,…); missing
//add(1,0,2,…); missing
//add(1,0,3,…); missing
add(1,0,4,”MB”,”TraceBytesInUseMiB”);
….

// Below are the different topics you can subscribe on

// $SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/QMgrSummary
// $SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/Log
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/CONNDISC
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/OPENCLOSE
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/INQSET
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/PUT
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/GET
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/SYNCPOINT
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/SUBSCRIBE
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/PUBLISH
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/OPENCLOSE
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/INQSET
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/PUT
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/GET

Monitoring interval

The monitoring interval field (MQIAMO64_MONITOR_INTERVAL) is in microseconds. I converted this to float, divided the value by 1000000 to convert it to seconds, and printed it with a 6 figures after the decimal point ( String.format(“%.06f”, f); )

If this value was under 1 second (the smallest I saw was 0.008 seconds) I rounded it to 1.0 seconds.

Making the data more understandable.

To be strictly accurate the units should be MiB not MB, as 1 MB is 1,000,000 bytes and 1 MiB (Mebibyte) 1,048,576 bytes, (and if you see 1 mb… you get 8000 mb (milli bits) to the bytes – so if you get 50 mb/second broadband, I would complain).

You may want to convert “Log file system – bytes max” = 19549782016 in bytes, to LogFSBytesMiBMax = 18644.125 MiB or even LogFSBytesGiBMax = 18.20 GiB, though it may be better to keep every thing in MiB for consistency.

Processing PCF with Java

I started writing a C program to process PCF data, to do a better job than the provided samples. For example create a date-time object for tools like ElasticSearch and Splunk, and interpret numeric values as strings. I found I was writing more and more code to fill in some of the holes in the samples. I decided to use Java, as there are classes called PCFMessage, and code to allow me to iterate over the PCF elements.
I struggled with processing Monitoring messages, but I’ll create a separate post for the misery of this.

I also struggled with the accounting trace, and have blogged the specific information here.

I’ll give the key bits of knowledge, and let any interested reader fill in the blanks – for example adding try…catch statements around the code.

MQ provides a PCF Agent…

But it does not work for getting PCF messages. It allows you to send a PCF request, and it returns the result. This does not work for just getting PCF messages.

Connect to MQ

mq = new MQQueueManager(data.qm, parms);

Access the queue

queue = mq.accessQueue(data.queue,MQConstants.MQOO_BROWSE  | MQConstants.MQOO_FAIL_IF_QUIESCING);

Get the message from MQ

MQGetMessageOptions gmo = new MQGetMessageOptions();

gmo.waitInterval = data.waitTime;
gmo.options = MQConstants.MQGMO_WAIT
+ MQConstants.MQGMO_PROPERTIES_IN_HANDLE
// + MQConstants.MQGMO_BROWSE_NEXT
;
msg = new MQMessage();
queue.get(msg, gmo);

Display any message properties

The MQ PCF code does not handle messages with a MQRFH2 and MQPCF headers.

Use gmo.options = … + MQConstants.MQGMO_PROPERTIES_IN_HANDLE to convert the RFH2 to message properties.

Process the properties.

Enumeration props = null;
props = msg.getPropertyNames("%");
if (props != null) {
 while (props.hasMoreElements()) {
   String propName = (String) props.nextElement();
   Object propObject = msg.getObjectProperty(propName);
   System.out.println(propName, propObject.toString() );
}

There is an MQHeaderList class, and iterator, but these did not seem very reliable. Even after I deleted a the MQRFH2 header, it still existed, and the PCF classes did not work.

Make it a PCF’able message

PCFMessage message = new PCFMessage(msg);

Display all of the elements in the PCF message

Enumeration<?> e = message.getParameters();
while( e.hasMoreElements()) {
   PCFParameter p = (PCFParameter)e.nextElement();
   if (p.getType() == MQConstants.MQCFT_GROUP )doGroup((MQCFGR) p);
   else {
     int parm   p.getParameter());  // number
     String tp = p.getParameterName();  // string
     String value = p.getStringValue() );  // values
     System.out.println(tp +":" + value); 
   }
}

Or you can get the specific fields from the message, for example message.getStringParameterValue(MQConstants.MQCAMO_START_DATE);

Process a group of elements

void doGroup(MQCFGR sgroup)
{
  Enumeration<?> f = sgroup.getParameters();
  while(f.hasMoreElements()){
     PCFParameter q = (PCFParameter)f.nextElement();
     if (p.getType() == MQConstants.MQCFT_GROUP )doGroup((MQCFGR) q);
     else System.out.println("q.getParameterName() +":" + q.getStringValue() );
}

And use similar logic for processing the fields as before.

Processing lists

Many elements have a list of items; a list of integers, or a list of 64 bit integers(long). The array has an element:0 for Non Persistent and element:1 for Persistent;

I found it easiest to convert integers to longs, so I had to write fewer methods.

long[] longs = {0,0};

if (q.getType() == MQConstants.MQCFT_INTEGER64_LIST)
{ 
  MQCFIL64 i64l = ( MQCFIL64) q;
  longs = i64l.getValues(); 
}
else if (q.getType() == MQConstants.MQCFT_INTEGER_LIST)
{ 
  MQCFIL il = ( MQCFIL) q;
  int[] ii = il.getValues();
  longs = new long[ii.length];  // should be if size 2
  for (int iil = 0;iil < ii.length;iil++)
    longs[iil]= ii[iil];
}

Providing summary data

With fields like the number of puts, the data is an array of two values, Non Persistent, and Persistent.

It makes later processing much easier if you create

System.out.println("Puts_Total:" + (longs[0]  + longs[1]);
System.out.println("Puts_NP:" + long[0]); 
System.out.println("Puts_P:"  + long[1]);

There are verbs that produce data for Non Persistent and Persistent messages BROWSE, GET, PUT, PUT, PUT1 and different categories; number of requests, the maximum message size, the minimum message size, and the total bytes put or got.

MQIAMO_xxxs, MQIAMO_xxx_MAX_BYTES , MQIAMO_xxx_MIN_BYTES and MQIAMO64_xxx_BYTES

For MAX values the calculation is easy

REAL_GET_MAX_BYTES =max(longs[0],long[1]);

For minimum the calculation is more tricky, you need to consider if any messages were put with that persistence. For example

If you have PUT_MIN_BYTES = [50,0] and PUTS[1,0] then the minimum size is 50.
If you have PUT_MIN_BYTES = [50,0] and PUTS[1,1] then the minimum size is 0.

public long P_NP_MIN( int[] values,int[]count ) {
  long min;
  if ( count[0] > 0 && count[1]> 0 )
  {
    if (value[1]< value[0]) min = value[1];
    else min = value[0];
  }
  else if (count[0] == 0 ) 
     min = value[1];
  else // count[0] > 0 and count[1] = 0
     min = value[0];
  return min;
}

then use it

long REAL_GET_MIN = P_NP_MIN(message.getIntListParameterValue(MQConstants. MQIAMO_GET_MIN_BYTES),
                             message.getIntListParameterValue(MQConstants. MQIAMO_GETS)
                           )

Convert date and time into DateTime;

In some records, such as statistics and accounting, you have start dates and time, and end date and time.
It is useful to calculate the duration, and provide a dateTime value for tools like Splunk and ElasticSearch.
For example, fields MQConstants.MQCAMO_START_DATE and MQConstants.MQCAMO_START_TIME.

Some dates have “.” in the date, and sometimes it is “:”‘.

long doDateTime(PCFMessage m , 
               String label, 
               int date,  // eg MQConstants.MQCAMO_START_DATE 
               int time) // eg MQConstants.MQCAMO_START_TIME
  {
  Calendar cal = Calendar.getInstance();
  SimpleDateFormat sdfParse = new SimpleDateFormat("YYYY-MM-ddHH.mm.ss", Locale.ENGLISH);
  SimpleDateFormat sdfFormat = new SimpleDateFormat("YYYY-MM-dd'T'HH:mm:ss'Z'", Locale.ENGLISH);
  long timeMS = 0;
  try {
     // get the values from the message
     String startDate = m.getStringParameterValue(date);
     String startTime = m.getStringParameterValue(time);
     startTime = startTime.replace(":",".") ; // in case some times have ":" in the value
     String dateTime = startDate +startTime;  // make the dateTime value
     cal.setTime(sdfParse.parse(dateTime));   // Parse the combined value
     Date dateTimeVal = cal.getTime(); 
     System.out.println(label+":" + sdfFormat.format(dateTimeVal)); // print formatted time
     timeMS = dateTimeVal.getTime(); // number of milliseconds
    } catch (PCFException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
   } catch (ParseException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
   }
   return timeMS;
}

You need to substring the first 8 characters of the ActivityTrace OperationDate, as it is padded with nulls.

Some times are in format HH.mm.ss, and so the SimpleDateFormat has ‘.’ in the time parsing statement. The ActivityTrace OperationTime has HH:mm:ss, so the doDateTime routine uses ..replace(“:”,”.”) .

Calculating the duration of statistics and accounting records.

You can then do

long startTime =
doDateTime(m,"StartDateTime",MQConstants.MQCAMO_START_DATE,
                             MQConstants.MQCAMO_START_TIME);
long endTime =
doDateTime(m,"EndDateTime",MQConstants.MQCAMO_END_DATE, 
                           MQConstants.MQCAMO_END_TIME);
long durationInMilliseconds = endTime-startTime;
if (durationInMilliseconds == 0) durationInMilliseconds = 1;

And use this to calculate rates, such as messages put per second, as part of the output of the program.

Doing calculations on the data

If you are iterating over the parameters, you need to be careful about data from multiple parameters.

For example within a queue accounting record you get fields

MQIAMO_OPENS
MQCAMO_OPEN_DATE
MQCAMO_OPEN_TIME
MQIAMO_CLOSES
MQCAMO_CLOSE_DATE
MQCAMO_CLOSE_TIME
MQIAMO_PUTS
MQIAMO_PUTS_FAILED
MQIAMO_PUT1S
MQIAMO_PUT1S_FAILED
MQIAMO64_PUT_BYTES
MQIAMO_PUT_MIN_BYTES
MQIAMO_PUT_MAX_BYTES

So if you want to display the number of puts failed as a percentage of all puts you need logic like

switch (p.getParameter()) {
  case MQConstants.MQIAMO_PUTS:
   puts = (int[] ) p.getValue();
   break;

case MQConstants.MQIAMO_PUTS_FAILED:
   int PutsfailedPC =  100 * (int) p.getValue()/ (put[0]+puts[1]);
break;

The values in a PCF message could in theory be in any order, though I expect a defect would be raised if the order changed.
To be bullet proof – you could iterate over the elements, and store them in a hashmap, then check the fields you want to use exist, then extract the values and use them.

Special cases…

I hit some “special cases” – what I would call bugs, in some of the processing.

Many strings have trailing nulls instead of trailing blanks (as documented), so you should to trim nulls off every string value. Here is some code from Gwydion

String trimTrailingNulls(String s) {
  int stringLength = s.length();
  int i;
  for (i = stringLength; i > 0; i--) {
     if (s.charAt(i - 1) != (char) 0) {
       break;
     }
  }
  if (i != stringLength) {
    return s.substring(0, i);
  }
  return s;
}

Of course in C, printing a string with a trailing null is not a problem.

When I’ve got my code tested, and handled all of the strange data I’ll put my program up on github.

Midrange accounting and stats are sometimes pretty useless

I was looking into a problem between two queue managers, and I thought I would use the stats and accounting to see what was going on. These were pretty well useless. This post should be subtitled, “why you need connection pooling in your web server, for all the wrong reasons“.

As I walked along this morning I pictured a man in a big chair stroking a white car saying “For you Mr Bond, you will not know which applications used MQ, and you will not know what they did. For you, your struggle to understand what is going on, is over”.

Which queue were used?

The first problem was that the stats and accounting only display the mapped queue name, not the opened name.

I had two cluster queues QC1 and QC2. I opened them and put a few messages to each queue. These two queues map down to the SYSTEM.CLUSTER.TRANSMIT.QUEUE – or what ever has been configured for the cluster.

In the accounting data, I had “Queue_Name” :SYSTEM.CLUSTER.TRANSMIT.QUEUE opened twice. There was no entry for QC1 nor for QC2.

It works the same way if you have a QALIAS. If you have queue QA pointing to QL. When you open QA, the accounting record is for QL.

This applies to the statistics records as well.

Which transaction was it?

I had the transaction running as an MDB in my web server.

For my application logic in the onMessage method, which processed the passed message and put the reply, I had

,”Appl_Name” :CF3Name – this is the connection pool name used for putting the message
,”Process_Id” :9987
,”Thread_Id” :42
,”User_Identifier” :colinpaice
“Queue_Name” :SYSTEM.CLUSTER.TRANSMIT.QUEUE

For the listener side of the application, I had

,”Appl_Name” :weblogic.Server
,”Process_Id” :9987
,”Thread_Id” :43
,”User_Identifier” :colinpaice
“Queue_Name” :JMSQIN

Where JMSQIN is the name of the queue the MDB was listening on.

They are both for the same process id (9987) but they are on different threads.

I could only find out what was running from the name of the input queue.

I could not easily tie the two lots of accounting data together. Fortunately I had a dedicated connection pool for the MDB, so I could find it that way. If I had different MDBs using the same connection pool, I would not be able to tell which MDB did which requests.

In someways this is expected. If the listener is using a connection pool – there could be different MDBs serially using the same thread in the connection pool, so it cannot give the name of the MDB, as the MDB name will change.

So to help you understand what is happening in your MQ applications in a web server, you should use connection pooling so you can identify which application issued the requests. Connect pooling also gives you performance benefits.

I have an uncommitted message on the queue – but no one has the queue open. Which application is causing the problem?

The above question was raised by an MQ administrator who is not an MQ programmer, the second question was – is this a bug in MQ.

To application programmers the answers may appear be obvious, but has some subtleties.

If you use MQPUT1 to put a message within syncpoint, the queue is open for a very short time.
If you use MQOPEN, MQPUT to put the message within sycnpoint ,…. do some work MQCLOSE

You should then do a commit to make it happen.

If the application does not do a commit, but returns, and the thread ends, the queue manager will do a commit or backout ( depending on a platform) as part of end of task clean up so you should not get an uncommitted unit of work.

So why is there an uncommitted message after the application “ends”.

The transaction was running on a web server, and the connections were being reused, and so in concept the threads run until the web server is shut down.

If the application is running as a Message Driven Bean, there is logic which does

The message listener waits for a message and gets it.
The message is passed to the onMessage method of the application
The application puts a message and returns.
Being an MDB there is an MQ Commit which commits both the get and the update.

In this case the message will be committed, and you do not get an uncommitted message.

Hmmm , the plot thickens…

It looks like there was a java application which was not an MDB. The queue name had “BACKOUT.QUEUE” as part of the name.

I expect there was an MQGET and there was a problem with the message, or there was a problem doing an MQPUT, and, in either case, the message was written to the backout queue. I am guessing that this is in error handling code which very rarely gets used, it was unlikely to have been tested properly.

The application needs to put in an explicit commit statement.

What commands are there to help me?

On my test system I paused my application after the close, and before the commit.

DIS QSTATUS(COLINMON) gave me

dis qstatus(COLINMON) ALL 

AMQ8450I: Display queue status details.
QUEUE(COLINMON) TYPE(QUEUE) CURDEPTH(1048) IPPROCS(0)  OPPROCS(0) UNCOM(1)

So no instances with the queue open, but one uncommitted message.

Use the dspmqtrn command

dspmqtrn -i -m QMA
There are no matching prepared or heuristically completed transactions.

dspmqtrn -e -m QMA
There are no matching prepared or heuristically completed transactions.

These commands display the indoubt status. You only get indoubt when there are two resource managers involved for example MQ and DB2.

dspmqtrn -q -a -m QMA> aa

-a means display all transactions

-q mesage display the first 100 objects used by each transactions

In the aa file ( I searched for COLINMON) was

TranNum(0,24589)
TRANSTATE(ACTIVE)
UOWLOGDA( ) UOWLOGTI( )
UOWSTDA(2019-12-10) UOWSTTI(08.48.37)
UOWLOG( )
EXTURID(XA_FORMATID[] XA_GTRID[] XA_BQUAL[])
CONN(5F51EF5D045CDE24)
PID(9380) TID(8)
APPLTAG(jpcf/myprog)
APPLDESC(IBM MQ Channel)
CHANNEL(QMACLIENT)
CONNAME(127.0.0.1)
QMURID(0.24589)
USERID(colinpaice)
OBJECT(COLINMON )

The QMURID(0.24589) and TranNum(0,24589) refer to the same value, the position in the log.

Using the CONN information

dis CONN(5F51EF5D045CDE24) all 

AMQ8276I: Display Connection details.
CONN(5F51EF5D045CDE24) 
EXTCONN(414D5143514D41202020202020202020)
TYPE(CONN) 
PID(9380) TID(8) 
APPLDESC(IBM MQ Channel) APPLTAG(jpcf/myprog)
APPLTYPE(USER) ASTATE(NONE)
CHANNEL(QMACLIENT) CLIENTID( )
CONNAME(127.0.0.1) 
CONNOPTS(MQCNO_SHARED_BINDING,MQCNO_GENERATE_CONN_TAG)
USERID(colinpaice) UOWLOG( )
UOWSTDA(2019-12-10) UOWSTTI(08.48.37)
UOWLOGDA( ) UOWLOGTI( )
URTYPE(QMGR) 
EXTURID(XA_FORMATID[] XA_GTRID[] XA_BQUAL[])
QMURID(0.24589) UOWSTATE(ACTIVE)
CONNTAG(MQCT5F51EF5D045CDE24QMA_2018-08-16_13.32.14jpcf/myprog)

you could also have issued

DIS CONN(*) all WHERE(QMURID,EQ,0.24589)

When the web server was used, the APPLTAG type info was “WebLogic” instead of jpcf/myprog. All I could tell was which web server had been running the transaction.

UOWSTDA(2019-12-10) UOWSTTI(08.48.37) tells the start date and time of the unit of work.
UOWLOGDA( ) UOWLOGTI( ) tells the start date and time the connected first wrote to the log. As there is no date and time, there was no write to the log, so in this case the message was non persistent.

What a lot of information from a seemingly obvious question! Thanks to Gwydion for help with some of the trickier bits of this.