More moans about the management of Monitoring in MQ

I was writing some Java code to process the various PCF messages produced by MQ.  I got the code working and handling most of the different sorts of message types, statistics, accounting,  change events etc in a couple of days.  I then thought I’d spend the hour before breakfast to handle the monitoring records on midrange MQ.   A couple of days! later, I got it to  process the monitoring messages into useful data.

I had already written about the challenges of this data in a blog post  Using the monitoring data provided via publish in MQ midrange, but I had not actually tried to write a program to process the data.  In the post below I have  included below some of the lessons I learned.

Summary of the Monitoring available

  1. Most records are published every 10 seconds, but I have had examples where records were produced with an interval ranging from 0.008 seconds to 1700 seconds.
  2. You can subscribe to different topics, for example statistics on the MQ APIs used, statistics on logging, and statistics on queue usage.
  3. The information is not like any other MQ PCF data where you have a field called MQConstants.MQCAMO_START_DATE.
  4. There is meta data which describes the data.   You get a field with Unit information, and a description.  Good theory, does not work well in practice.
  5. If you subscribe to Meta information you get information about the data fields.  This is not very helpful if you are running MQ in an enterprise where there is centralized monitoring and reporting, for example into Splunk, spread sheets on ElasticSearch, and where data is sent to a remote site.
  6. Data fields are identified by three fields; class,type and record.  Class could be DISK; type could be Log; and record is a number 12 meaning “Log write, data size”.

Subscribing to information.

You can write code to create a subscription when your program is running and so get notified of the data and meta data only while your program is running; or you can manually create a subscription, so you always have the data (until your queue fills up).

DEFINE SUB(COLINDISKLOG)
TOPICSTR(‘$SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/Log’)
DEST(COLINMON)  USERDATA(COLINDISK)

Will get the data for Queue Manager QMA, the DISK, Log section.  Note the case of the data.

You cannot use a generic for the queue manager name, so you need a unique subscription for each queue manager.  (I cannot see why the the queue manager is needed on midrange, perhaps someone can tell me).  The queue manager name is available in the MQMD.ReplyToQMgr field.

You can ask that the meta information is sent to a queue (whenever the subscription is made) for example

DEF SUB(COLINDISKLOGMETA)
TOPICSTR(‘$SYS/MQ/INFO/QMGR/QMA/Monitor/METADATA/DISK/Log’)
DEST(COLINMON)  USERDATA(COLINDISK)

The queue can be a remote queue with the target queue on your monitoring environment.

MQRFH2 and message properties

With the data, there is message property data about the message, for example the topic, and any user data.  I could not get get the supplied java methods to skip the RFH2 data.   This was using the MQheader class.

If I used gmo.options = …  MQConstants.MQGMO_PROPERTIES_IN_HANDLE, the data was returned as properties, and not as an RFH2 header, so the PCFMessage class worked with the message.

For one of my monitoring records, the properties were

,"Properties":{
"mqps.Sud" :COLINDISK
,"mqps.Top" :$SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/Log
}

Where mqps.Sud is MQSubUserData (from my subscription) and mqps.Top is MQTopicString.

Identifying the records and data

You may know from the queue name, that you are getting monitoring data.

You can also tell from the PCF header values, Monitoring data has Type :MQCFT_STATISTICS and Command :MQCMD_NONE.  Normal statistics data has  Type :MQCFT_STATISTICS and Command :MQCMD_STATISTICS_Q, so you can tell them apart.

The data and the meta data both come in as monitoring records.   You can tell from the topic in the message property, or from the content of the messages.
If the messages has PCF groups then the message contains meta data records – and should be processed to extract the identifiers, or skipped.

I found that you have to process the message twice to extract different fields
Pass 1 – extract

  1. Class
  2. Type
  3. Interval
  4. Count number of groups
  5. Display queue name if present

If the number of groups = 0 then

Pass 2 extract

  1. field type
  2. value

You need the class, type, and field type to be identify what the value represents.

The comments in the sample c program which formats the records, implies the records may not always be in the same sequence, so in theory (but unlikely) the class and type records may be at the end of the message instead at the front.  Doing two passes means you can extract the values before you need to use them.

Identifying what the data represents.

The meta data represents the interpretation of the value.   You can process the messages containing meta data, and dynamically build a map of values -> description.  In my “enterprise” this did not work for me, so I created a static map of values.  I created a key: 256 * 256 * Class + 256 * Type + field type , and created a hashmap of (key,description);

Part of the meta data is a “units” field

Units are

  1. “Unit”, for example the maximum number of connections in the interval
  2.  “Delta”, for example the number of puts in the interval (total number of puts at the end of the interval – number of puts at the start of the interval).  These “delta”s are often interesting as a rate, for example the number of puts/interval – display the rate of puts  as float with 2 decimal places.
  3.  “Hundredths”, for example CPU load over 1 minute.  To use this I converted it to float,  divide by 100, and printed it with two decimal places
  4. “KB”, I don’t think this is used
  5.  “Percent”, for example file system full percentage.  It reported 4016 for one value – you have to divide this by 100 to get the value (40.16), I converted it to float,  divided it by 100, and printed it with two decimal places.
  6. “Microseconds”, for example the time to write a log record
  7. “MB”, for example RAM size of the machine (in MB).   This matched the value from the linux free -lm command
  8. “GB”, I do not think this is used.

I do not expect the units will change for a record, because it would make the centralized processing of the records very difficult.

For my processing I changed the descriptions. For example I changed “Log – bytes occupied by extents waiting to be archived” to “LogWaitingArchiveMB“.   This is a better description for displaying in charts and reports, and includes the units.

My static definitions were like

//$SYS/MQ/INFO/QMGR/…Monitor/CPU/SystemSummary

// add(Class, Type, Field Type, Units, Description)

add(0,0,0,”Percent”,”UserCPU%”);
add(0,0,1,”Percent”,”SystemCPU%”);
add(0,0,2,”Hundredths”,”CPULoad1Min”);
add(0,0,3,”Hundredths”,”CPULoad5Min”);
add(0,0,4,”Hundredths”,”CPULoad15Min”);
add(0,0,5,”Percent”,”RAMFree%”);
add(0,0,6,”MB”,”RAMTotalMiB”);

// $SYS/MQ/INFO/QMGR/…/Monitor/CPU/QMgrSummary

add(0,1,0,”Percent”,”QMUserCPU%”);
add(0,1,1,”Percent”,”QMSystemCPU%”);
add(0,1,2,”MB”,”QMRAMMiB”);

// $SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/SystemSummary

//add(1,0,0,…); missing
//add(1,0,1,…); missing
//add(1,0,2,…); missing
//add(1,0,3,…); missing
add(1,0,4,”MB”,”TraceBytesInUseMiB”);
….

// Below are the different topics you can subscribe on

// $SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/QMgrSummary
// $SYS/MQ/INFO/QMGR/QMA/Monitor/DISK/Log
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/CONNDISC
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/OPENCLOSE
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/INQSET
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/PUT
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/GET
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/SYNCPOINT
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/SUBSCRIBE
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATMQI/PUBLISH
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/OPENCLOSE
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/INQSET
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/PUT
// $SYS/MQ/INFO/QMGR/QMA/Monitor/STATQ/queuename/GET

Monitoring interval

The monitoring interval  field (MQIAMO64_MONITOR_INTERVAL) is in microseconds.  I converted this to float, divided the value by 1000000 to convert it to seconds, and printed it with a 6 figures after the decimal point ( String.format(“%.06f”, f); )

If this value was under 1 second (the smallest I saw was 0.008 seconds) I rounded it to 1.0 seconds.

Making the data more understandable.

To be strictly accurate the units should be MiB not MB, as 1 MB is 1,000,000 bytes and 1 MiB (Mebibyte) 1,048,576 bytes, (and if you see 1 mb… you get 8000 mb (milli bits) to the bytes – so if you get 50 mb/second broadband, I would complain).

You may want to convert “Log file system – bytes max” = 19549782016 in  bytes, to LogFSBytesMiBMax = 18644.125 MiB or even LogFSBytesGiBMax = 18.20 GiB, though it may be better to keep every thing in MiB for consistency.

One thought on “More moans about the management of Monitoring in MQ

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s