mqweb – displaying the secret statistics

Yes, mqweb does provide statistics; through the standard JMX interface provided as part of the base Liberty function.  I expect most people do not know they are available.   The data gets less useful over time, for example you get the “average time” since the mqweb started, rather than the last minute.  See here on how to extract useful information from the data, and show useful averages.

You need in mqwebuser.xml .

<featureManager>
<feature>monitor-1.0</feature>
</featureManager>

and in jvm.options

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=true
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

These options should be suitable in your test environment.  You will want to change them for production.

You need the port number (9010 in the above example) when you extract jmx data.

How do you display the data?

For a quick sniff, (no good for extracting data and plotting charts)  you can use jconsole.  Use remote connection localhost:9010 .  it does not display all of the data.

I found jmxquery very useful.  I updated the github version to fix a bug which caused a loop.  See here.

The query is

java -jar JMXQuery.jar -url service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi -q ‘WebSphere:*’

You get data on

  • WebSphere:type=JvmStats
  • WebSphere:type=ThreadPoolStats,name=Default Executor
  • WebSphere:type=ServletStats,name=com.ibm.mq.console.javax.ws.rs.core.Application/AppName (String) = com.ibm.mq.console
  • WebSphere:type=ServletStats,name=com.ibm.mq.rest.javax.ws.rs.core.Application/AppName (String) = com.ibm.mq.rest

To get ‘console’  and ‘ResponseTime’ data I used

java -jar JMXQuery.jar -url service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi 
-q 'WebSphere:*' 
-count 20 -every 60 
|grep --line-buffered console
|grep --line-buffered ResponseTime
|python3 mqweb.py
  • Where
  • java -jar JMXQuery.jar – invoke the program
  • -url service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi – with this url and the above port number from the jvm.options
  • -q ‘WebSphere:*’ – give me only data for this components
  • -count 20 -every 60 – my extensions giving a record every 60 seconds, and doing 20 of them
  • |grep –line-buffered console – only pull out the console records, ( ignore the ‘rest’ records).  The –line-buffered tells grep to flush it immediately
  • |grep –line-buffered ResponseTime – only interested in this detailed level
  • |python3 mqweb.py – pass it into the python program.  This calculates the delta between records and prints out the count and mean value for that interval

If you are collecting data in real time from a stream, you need to ensure any processing is unbufferred.  Often the default behavior is to accumulate the data in a big buffer, and write the buffer when it is full.  Check any filters you use, for example grep –line-buffered.

One line of output from the JMXQuery program for the console activity is

WebSphere:type=ServletStats,name=com.ibm.mq.console.javax.ws.rs.core.Application/ResponseTimeDetails/count (Long) = 5

JvmStats

This comes under WebSphere:type=JvmStats/  See here.

  • FreeMemory (Long) = 13350536
  • ProcessCPU (Double) = 0.658040027605245
  • UsedMemory (Long) = 75131256
  • GcTime (Long) = 539
  • UpTime (Long) = 1048277
  • GcCount (Long) = 118
  • Heap (Long) = 88473600
  • FreeMemory (Long) = 13350536
  • ProcessCPU (Double) = 0.658040027605245
  • UsedMemory (Long) = 75131256
  • GcTime (Long) = 539
  • UpTime (Long) = 1048277
  • GcCount (Long) = 118

ThreadPoolStats

This comes under WebSphere:type=ThreadPoolStats,name=Default Executor/

  • PoolSize (Integer) = 8
  • ActiveThreads (Integer) = 2
  • PoolName (String) = Default Executor

See here.

com.ibm.mq.console and  com.ibm.mq.rest

The data is similar between them.  One has name=com.ibm.mq.console.javax.ws.rs.core.Application, the other has name=com.ibm.mq.rest.javax.ws.rs.core.Application

The data (in italics) with my comments in plain font are

  • AppName (String) = com.ibm.mq.console
  • RequestCountDetails/currentValue (Long) = 116
  • RequestCountDetails/description (String) = This shows number of requests to a servlet
  • RequestCountDetails/unit (String) = ns – this looks like a bug as it is a count not nanoseconds
  • RequestCount (Long) = 116
  • ServletName (String) = javax.ws.rs.core.Application
  • ResponseTimeDetails/count (Long) = 116
  • ResponseTimeDetails/description (String) = Average Response Time for servlet
  • ResponseTimeDetails/maximumValue (Long) = 3060146565 – in nanoseconds ( see below for the unit)
  • ResponseTimeDetails/mean (Double) = 8.796846855172414E7 – in nanoseconds
  • ResponseTimeDetails/minimumValue (Long) = 793871 – in nanoseconds
  • ResponseTimeDetails/standardDeviation (Double) = 4.198572684166255E8
  • ResponseTimeDetails/total (Double) = 1.0204342352E10 – in nanoseconds – used in calculations
  • ResponseTimeDetails/unit (String) = ns  – this is the units.  ns is nanoseconds
  • ResponseTimeDetails/variance (Double) = 1.64064538075292096E17 –  used in calculations
  • Description (String) = Report Servlet Stats for specified Servlet and application.
  • ResponseTime (Double) = 8.796846855172414E7 – same as the ResponseTimeDetails

So we can see that there were

  • 116 console requests since the mqweb server was started
  • the units are ns (nano seconds)
  • the console requests taking an average of 8.796846855172414E7  nanoseconds, 0.0879 seconds with
  • a standard deviation of  4.136787844033974E7 – nanoseconds = 0.04198 seconds
  • the maximum value was 3 060 146 565 ns = 3.060 seconds
  • the minimum time was  793 871 ns or 0.000793 seconds

Some other data, showing how it changed over time

Data Values Later values Much later values
Number of requests 82 3590 22920
Average (seconds) 0.158 0.0108 0.0099
Standard deviation (seconds) 0.487 0.034 0.035
Maximum (seconds) 2.3 2.3 2.3
Minimum(seconds) 0.001 0.0001 0.0001

Notes:

  • There is data only once a request has been processed, so if you have not run a rest request, there will be no JMX data for rest activity.
  • These values are from start of the mqweb server. I did not see them reset, so you could have a data for a whole week or more.
  • The maximum was from the first requests to run.  I expect this includes the “warm up” costs,  of loading the code and JITing it.
  • The average values are from the start, so will be impacted by peaks and troughs.

For the each mqconsole window, there are two console counts every 10 seconds.  Any charts are refreshed every 10 seconds, so  I think this is a “I am still here, please send me any data you have for me”.

Data for rest

I started my mqweb server, and ran a python program which opened a connected and got three messages.

  • Maximum time 0.3486
  • Minimum time 0.0026
  • Calculate the other one 0.0028

Because the first request takes a long time, you can adjust for this in your calculations to get a truer mean.

For example

I reran the script and processed 100 messages.  The average time of these was 0.003 seconds.

  • Maximum 0.3486
  • Mean 0.00637
  • Count 103
  • Total 0.656

The calculations are

  • Mean * count =  0.656 (which matches Total as expected)
  • Subtract the maximum, first time value 0.656 – 0.349 = 0.307
  • Calculate the improved mean value ignoring the first value,  0.307 /(103 -1) = 0.003

So the adjusted mean time is 0.003 seconds – compared to the 0.006 which the JMX stats report.

 

Getting useful information out of JMX data

The data coming from Liberty WebServer through the JMX interface  provides some data, but it is not very useful, and it may become inaccurate over time.

I’ll cover

  1. Getting a useful mean value
  2. Getting a more accurate long term mean
  3. Data gets more inaccurate over time
  4. Standard deviation (this may only be of interest to a few people)

For example from JMX, the reported  mean time for mqconsole transactions  was 9.9 milliseconds – this is for all requests since the mqweb server was started.   Over the previous minute the average measured time, for a 10 second period was 7, or 8 milliseconds, so well below the 9.9 reported.

This is because the mean time includes any initial start up time.   The maximum transaction time, at the start of the run, was over 2 seconds.   This will bias the mean.

You can process the data to extract some useful information, and I show below how to get out useful response time values.

You get the following data (and example values) from mqweb through the JMX interface.

ResponseTimeDetails/count (Long) = 20
ResponseTimeDetails/description (String) = Average Response Time for servlet
ResponseTimeDetails/maximumValue (Long) = 3060146565 
– in nanoseconds (see below for the unit)
ResponseTimeDetails/mean (Double) = 4.336789965E8
– in nanoseconds
ResponseTimeDetails/minimumValue (Long) = 2474556
– in nanoseconds
ResponseTimeDetails/standardDeviation (Double) = 9.089057964078983E8
– in nanoseconds
ResponseTimeDetails/total (Double) = 8.67357993E9
– used in calculations
ResponseTimeDetails/unit (String) = ns
– the unit ns = nanoseconds
ResponseTimeDetails/variance (Double) = 8.319076762335653
– used in calculations

Getting a useful mean value

To produce these numbers, the count of the response times and the sum of the transaction response times are accumulated within the Liberty Server.  To calculate the mean value you calculate sum/count.   This gives you the overall mean time.  If you obtain the data periodically you can manipulate the data to provide some useful statistics.

Let the count and sum at time T1 be Count1, and Sum1, and at time T2 Count2, and Sum2.
You can now calculate (Sum2- Sum1)/(Count2 – Count1) to get the average for that period.  For example the reported mean was 0.016 ms, but the calculated value gave 0.008 ms.  You can also calculate (Count2 – Count1)/(T2-T1) to give a rate of requests per second.   These are much more useful than the raw data.  I suggest collecting the data every minute.

Getting a more accurate long term mean

The first rest request and console request take a long time because the java code has to be loaded in etc.  In one test the duration of the first request was 50 times the duration of the other requests.  A better “mean” value is to ignore the duration of the first request.

The improved mean is (JMX mean * JMX count  – JMX Maximum value) /(JMX Count-1), or JMXMean – (JMXMaximum/JMXCount) .

Data gets more inaccurate over time

The total time is stored as a floating point double.  As you add small numbers to big numbers, the small numbers may be ignored.  Let me try to explain.

Consider a float which has precision of 3, so you could have 1.23 E2 = 1230.  Add 1 to this, and you get 1231 which is 1.23 E2 with a precision of 3 – the number we started with.

The total time is in nanoseconds so 1 second is stored as 1.0 E9.  With 100 of these a second, and 1 hour( 3600 seconds) for 100 hours is 360,000,000, or 3.6 E8 seconds.  * 1.0 E9 nano seconds. = 3.6E17 nano seconds.   The precision  of most float numbers is 16, so with this 3.6 E17 we have lost the odd nanosecond.    I do not think this is a big enough problem to worry about – unless you are running for years without restarting the server.

The variance uses the time**2 value.  So with the maximum time above 599482097 nano seconds. Time **2 is 3.593787846×10¹⁷ and you are already losing data.  I expect the variance will not be used very often, so I think this can be ignored.

If the times were in microseconds instead of nano seconds, this would not be a problem.

Getting a useful standard deviation (this may only be of interest to a few people)

The standard deviation gives a measure of the spread of the data, a small standard deviation means the data is in a narrow band, a larger standard deviation means it is spread over a wider band.  Often 95% of the values are within plus or minus 3 * standard deviations from the mean, so anything outside this range would considered an outlier, or unusual statistic.

I set up some data, a mixture of  10  values 9, 10, 11,  the standard deviation was 0.73.    I changed one value to 20, and the standard deviation changed to 3.3, indicating a wide spread of values.

With a mixture of 100 values 9,10,11, the standard deviation was 0.71.   I changed one value to 20, and the standard deviation changed to 1.2, so a much smaller value, most of the data was still around 10 – just one outside the range.

With a lot of data, the standard deviation converges on a value, and “unusual” numbers make little difference to the value.  I think that the standard deviation over an extended period is not that useful, especially if you get periodic variations such as busy time, and overnight.

You calculate the standard deviation as the square root of the variance.   The variance is (Sum of (values**2) – (mean ** 2)) /number of measurements.

With data

ResponseTimeDetails/count (Long) = 203
ResponseTimeDetails/mean (Double) = 6420785.187192118 nanoseconds
ResponseTimeDetails/variance (Double) = 1.7113341125320868E15 – used in calculations

Variance  = 1.7113341125320868E15 =  ( (Sum of (values**2) – (6420785.187192118 ** 2)) / 203

So (Sum of (values**2)) =   3.474420513264337e+17

You can now do the sums as with the mean, above:

At time T1, the ssquares1 is the sum of (values**2)   at time T2, the ssquares2 is the sum of (values**2).

You can now calculate ssquares2 – ssquares2, and used that to estimate the variance, and so the standard deviation of the data in the range T1 to T2, I’ll leave the details to the end user.

For the advanced user,  you can use the mean for the interval – rather than the long term mean.  Good luck.

 

What are the JMS connection factories on webLogic doing?

As part of my long running activity to find out what is causing 1 million MQCONNects a day from an Oracle  webLogic web server, I have found out how to monitor what is going on inside the webLogic instance.

Most web servers support a Java Management eXtension (JMX) interface.  You can use gui tools like jconsole to do an ad-hoc display of the management beans – but these are   not practical for long term monitoring.

Ive listed the data from the JMX query, what the data means, and also documented how I got the data.

There are three components to an MDB

  1. A listening task waits to be notified about messages arriving on the queue.   As far as I can tell there is one thread doing this work.  If you need more I think you need to create a second MDB.
  2. Multiple MDB threads which get woken up by the listener task, get a message and invoke the MDB application OnMessage() method with the message.  The data for this part has Type=EJBPoolRuntime.
  3. The application part which processes the message – typically it connects to MQ and puts a reply, and disconnects.  This application specified which connection pool to be used using ConnectionFactory cf = (ConnectionFactory)ctx.lookup(“CF3”);  The data for this part has Type=ConnectorConnectionPoolRuntime. 

 

The multiple MDB threads.

This data came from the JMX record with com.bea:ServerRuntime=AdminServer2,MessageDrivenEJBRuntime=WMQ_IVT_MDB,Name=WMQ_IVT_MDB,ApplicationRuntime=MDB3,Type=EJBPoolRuntime,EJBComponentRuntime=MDB3

Where

  1.  ServerRuntime=AdminServer2. This is where the application runs
  2. MessageDrivenEJBRuntime=WMQ_IVT_MDB comes from  display-name WMQ_IVT_MDB in the ejb-jar.xml file.,Name=WMQ_IVT_MDB,
  3. ApplicationRuntime=MDB3, This is the name of the deployed .jar file.
  4. Type=EJBPoolRuntime, this is the type for the MDB threads
  5. EJBComponentRuntime=MDB3, this is the name of the deployed .jar file.

From experiementing with MDBs and adjusting parameters my picture of how the EJB thread pool works is as follows

  1. There is a general free pool for threads.
  2. There is a pool for the EJB.
  3. When a message arrives the listener thread gets a thread from the EJB pool, and executes the OnMessage() method on the thread.
  4. If there are not available threads,
    1. it waits for a free thread.
    2. if, over a period of seconds, most of the requests have had to wait for a thread, then start a new thread
      1. it gets a thread from the general pool, if none available it creates a new thread.
      2. it executes the MDB application ejbCreate() method on the thread just obtained.
      3. It executes the OnMessage() method on the thread.
      4. Puts the thread into the EJB pool as a free thread.
    3. In the ejbCreate() method, I had it write to the job log.  I could see threads started, after
  5. Periodically the EJB pool is cleaned up,  for threads which have been idle for a specific time interval
    1. execute ejbRemove() on the thread
    2. return the thread to the general free pool
  6. Periodically the general pool is cleaned up
    1. Threads which have not been used for a while are disconnected from the queue manager, and deleted.

The data came from the JMX with the defintion

com.bea:ServerRuntime=AdminServer2, Name=WMQ_IVT_MDB, ApplicationRuntime=MDB3, Type=MessageDrivenEJBRuntime, EJBComponentRuntime=MDB3

 

See here for the documentation.

  • AccessTotalCount = 28180,   There were this many messages processed, and  so this many OnMessage() methods executed
  • BeansInUseCount  = 2.  Deprecated
  • BeansInUseCurrentCount  = 1  One thread is currently active.
  • DestroyedTotalCount   = 0.   
  • IdleBeansCount  = 4 There are 4 free thread available in the EJB pool
  • MissTotalCount  = 19.  For 19 (out of the 28180) requests, they failed to get a free thread, and so a new thread was obtained.
  • Name  = WMQ_IVT_MDB
  • PooledBeansCurrentCount  = 3.   The numer of free threads inthe pool.
  • TimeoutTotalCount  = 0.  Number of requests timed out waiting for a thread from the free pool.
  • Type  = EJBPoolRuntime.   You can use Type=EJBPoolRuntime in the JMX query string.
  • WaiterCurrentCount  = 0. The number of requests waiting for a thread in the pool.   This value should always be zero.
  • WaiterTotalCount = 0. The total number of threads that have had to wait in the pool.  This value should always be zero, if it is non zero you need to tune your connection pool.

The data for the connection factory.

See here for some documentation.

  • ActiveConnectionsCurrentCount (Inte
  • ger) = 9.  How many connections were in use when the JMX request was issued.
  • ActiveConnectionsHighCount (Integer) = 10. The highest number of connections in use since the connection pool was started or reset
  • AverageActiveUsage (Integer) = 0. This value is always zero.
  • CapacityIncrement (Integer) = 1.   This is the value the connection pool is configured with
  • CloseCount (Long) = 5682.    How many connection.close() requests have been issued.   This will be less than or equal to ConnectionsMatchedTotalCount below.
  • ConnectionFactoryClassName (String) = com.ibm.mq.connector.outbound.ConnectionFactoryImpl.   This is the name of the connection class being used
  • ConnectionFactoryName (Null).  I could not find where to change this, or if it has any value
  • ConnectionIdleProfileCount (Integer) = 0.   This was always zero.
  • ConnectionIdleProfiles (Null).   This was always zero.
  • ConnectionLeakProfileCount (Integer) = 0.   This was always zero.  I believe this is to do with connections not being returned to a pool, perhaps connection.close() was not issued.
  • ConnectionLeakProfiles (Null).  This was always zero. I believe this is to do with connections not being returned to a pool, perhaps connection.close() was not issued.
  • ConnectionProfilingEnabled (Boolean) = false.   I could not see how to change this, or what value it adds
  • Connections (ObjectName[]) = [Ljavax.management.ObjectName;@299a06ac]  This will be an internal object within Java.
  • ConnectionsCreatedTotalCount (Integer) = 21. This is the count of instances started.   You could get connection started, connection freed, connection started, connection freed.   This would could  2 connections started.
  • ConnectionsDestroyedByErrorTotalCount (Integer) = 0.
  • ConnectionsDestroyedByShrinkingTotalCount (Integer) = 11.  After a time period, idle-timeout-seconds in weblogic-ejb-jar.xml, connections that have not processed messages for this time period are released back to the general pool.
  • ConnectionsDestroyedTotalCount (Integer) = 11.  This seems to be the same as the previous item
  • ConnectionsMatchedTotalCount (Integer) = 5670.    This many requests for a connection with matching userid etc were got from the pool
  •  ConnectionsRejectedTotalCount (Integer) = 0.  I dont know what this is
  • ConnectorEisType (String) = Java Message Service
  • CurrentCapacity (Long) = 10.   This is the size of the pool
  • EISResourceId (String) = type=<eis>, application=colin, module=colin, eis=Java
  • Message Service, destinationId=CF2
  • FreeConnectionsCurrentCount (Integer) = 1.   There is currently one free connection in the pool
  • FreeConnectionsHighCount (Integer) = 10.  This is the highest number of free connections in the pool – when there was no requests for the connection pool.
  • FreePoolSizeHighWaterMark (Long) = 10.  I dont know the difference between this and the previous item.
  • FreePoolSizeLowWaterMark (Long) = 0.   This is the lowest number of free connections – it is the gap between in use and the maximum pool size.
  • HealthState (Null).    I do not think connection pools for MDBs support this
  • HighestNumWaiters (Long) = 0.   This value was always zero.  Even when I reduced the size of the connection pool, and specified 50 threads could wait, this value was still 0
  • InitialCapacity (Integer) = 1.   This is the specified value.   This is used  to set a lower limit of connections in the pool.  Setting the initial capacity to 5 means there will always be at least 5 connections in the pool, and the pool will not be shrunk below this value
  • JNDIName (String) = CF2.  This is what is specified in the connection pool definition.   This label is used in the application ConnectionFactory cf = (ConnectionFactory)ctx.lookup(“CF2”);
  • Key (String) = CF2.  This seems to the be same as the JNDI name.
  • LastShrinkTime (Long) = 1565334529469.  This is the number of milliseconds POSIX time (since Jan 1st 1970).  To format it use  time_t now = t/1000; strftime(buff, 20, “%H:%M:%S”, localtime(&now));
  • LogFileName (Null).  You can specify logging for this adapter.  Deployments -> resource adapter -> configuration -> Outbound Connection Pools, javax.jsm.ConnectionFactory, select the connection pool,  Logging tab.  When this was active I got nothing logged in it.
  • LogFileStreamOpened (Boolean) = false
  • LoggingEnabled (Boolean) = true
  • LogRuntime (Null)
  • ManagedConnectionFactoryClassName (String) = com.ibm.mq.connector.outbound.ManagedConnectionFactoryImpl.  This is the name of the resouce adapter class.
  • MaxCapacity (Integer) = 20.   This is the maximum size of the pool.
  • MaxIdleTime (Integer) = 0.   This was always zero for me.
  • MCFClassName (String) = com.ibm.mq.connector.outbound.ManagedConnectionFactoryImpl.   This is the ManagedConnectionFactoryClassName. Same as above.
  • Name (String) = CF2.  Another connection pool identifier
  • NumberDetectedIdle (Integer) = 0.   This was always zero.
  • NumberDetectedLeaks (Integer) = 0.  This was always zero. I believe this is to do with connections not being returned to a pool, perhaps connection.close() was not issued.
  • NumUnavailableCurrentCount (Integer) = 9.   This feels like the number threads waiting while the connection pool creates a connection in the connection pool.
  • NumUnavailableHighCount (Integer) = 10.   This feels like the number threads waiting while the connection pool creates a connection in the connection pool.
  • NumWaiters (Long) = 0.   This was always zero – even through I restricited the number of connections in the pool.  I got a java exception, unable to get a connection, rather than have the thread wait.
  • NumWaitersCurrentCount (Integer) = 0.  See above.
  • Parent (ObjectName) = com.bea:ServerRuntime=AdminServer2,Name=colin, ApplicationRuntime=colin, Type=ConnectorComponentRuntime
  • PoolName (String) = CF2.   Yet another field with the name of the connection pool/
  • PoolSizeHighWaterMark (Long) = 10.   This is the highest number of connections used in the pool
  • PoolSizeLowWaterMark (Long) = 0.       This is the lowest number of connecitons used in pool
  • ProxyOn (Boolean) = false.    I could not find what this means
  • RecycledTotal (Integer) = 0.    This was always zero for me.
  • ResourceAdapterLinkRefName (Null).   You can specify this field name in your MDB definition.
  • RuntimeTransactionSupport (String) = NoTransaction.  This defines if your connections are part of a unit of work or not. NoTransaction means out of syncpoint.    You may want to specify. XATransaction.
  • ShrinkCountDownTime (Integer) = 340.  This is how long before a scan of the pool to remove any threads in the pool which have done no work for the specified interval.
  • ShrinkingEnabled (Boolean) = true
  • ShrinkPeriodMinutes (Integer) = 15.  You can specify the interval on the resource adapter definition  (Note: you specify it in seconds – it gets reported in minutes)
  • State (String) = Running.    This connection pool is started.
  • Testable (Boolean) = false.   You can defined some resources as “testable”
  • TransactionSupport (String) = NoTransaction.   This is the same as RuntimeTransactionSupport above.
  • Type (String) = ConnectorConnectionPoolRuntime.   This defines the object.   You can use the Type=ConnectorConnectionPoolRuntime  in the JMX query.

From this I can see there is a pool called CF2 which has maximum of 10 connections.  The maximum connections used was 10, the lowest was 0.

There were Connections Matched Total Count  = 5670 requests for a connection from the pool.

The pool has shrunk more than once as it has Connections Destroyed By Shrinking Total Count  = 11 connections.

Using this data, you can now plot usage over time and see if you need to increase ( or decrease) the size of the pool, or the parameters to tune when the pool is shrunk.

I do not know enough about JMX to tell if the “high” and “low” value are reset on each query, or if you can use JMX to reset them periodically. These high and low value may have little value, if they are since the webLogic instance started (6 months ago).

The data fields are mentioned here.

How I got the data

There is a python package called JMXQuery which has a .jar file which allows you to query information in a JMX server.  The output is in json format so you can use your favourite tools (python) to quickly convert this to other format, such as .csv .

The command I used was

java  -jar “/usr/local/lib/python3.6/dist-packages/jmxquery/JMXQuery-0.1.8.jar” -url service:jmx:rmi:///jndi/rmi://127.0.0.1:8091/jmxrmi -json -u webLogic
-p passw0rd -q “com.bea:ServerRuntime=AdminServer2,Name=CF2,ApplicationRuntime=colin, Type=ConnectorConnectionPoolRuntime,ConnectorComponentRuntime=colin”

which breaks down as follows

  • java – invoke java
  • -jar “/usr/local/lib/python3.6/dist-packages/jmxquery/JMXQuery-0.1.8.jar” – this jar file
  • -url service:jmx:rmi:///jndi/rmi://127.0.0.1:8091/jmxrmi – this is the url of my webLogic server
  • -json – output it in json format
  • -u webLogic -p passw0rd -q  – userid and password
  • “com.bea:ServerRuntime=AdminServer2,Name=CF2,ApplicationRuntime=colin, Type=ConnectorConnectionPoolRuntime,ConnectorComponentRuntime=colin”
    • com.bea is the bean type
    • The admin server was called AdminServer2
    • The connection factory was CF2
    • The resource adapter is installed under “Deployments”  as colin
    • Type=ConnectorConnectionPoolRuntime is the type of bean

I then used |jq . |grep -v mBean > bb to convert the one line json to one field per line, dropped the mBean value, and put it to a file.  The output was like

[
 {
   "attribute": "Connections",
   "attributeType": "ObjectName[]",
   "value": "[Ljavax.management.ObjectName;@299a06ac"
 },
 {
   "attribute": "FreeConnectionsCurrentCount",
   "attributeType": "Integer",
   "value": 4
},
etc

You can put generics in for example

java  -jar “/usr/local/lib/python3.6/dist-packages/jmxquery/JMXQuery-0.1.8.jar” -url service:jmx:rmi:///jndi/rmi://127.0.0.1:8091/jmxrmi -json -u readonly -p read0nly -q “com.bea:ApplicationRuntime=*,ConnectorComponentRuntime=*,Name=”CP*”,ServerRuntime=*,Type=ConnectorConnectionPoolRuntime” e=*,Type=ConnectorConnectionPoolRuntime” > aa

 

Where this uses a userid set up as a monitor id, “*” has been specified for many values, and only give objects beginning with “CP”.  Note blanks have meaning. “,,,=*, Name=…” looks for an object with blank,N,a,m,e,

Data when not using a resource adapter

The above information was for a resource adapter.  When an EJB 2 MDB is deployed (non resource adapter)

from –q  “com.bea:ApplicationRuntime=MDB3,EJBComponentRuntime=MDB3,
MessageDrivenEJBRuntime=WMQ_IVT_MDB_JMSQ1,
Name=WMQ_IVT_MDB_JMSQ1,ServerRuntime=AdminServer2,Type=EJBPoolRuntime”

Where the weblogic-jar-xml has

<weblogic-ejb-jar> <ejb-name>WMQ_IVT_MDB</ejb-name>
<destination-jndi-name>JMSQ1</destination-jndi-name>

The data was

BeansInUseCount (Integer) = 0
PooledBeansCurrentCount (Integer) = 10
IdleBeansCount (Integer) = 10
BeansInUseCurrentCount (Integer) = 0
DestroyedTotalCount (Long) = 0
WaiterCurrentCount (Integer) = 0
Name (String) = WMQ_IVT_MDB_JMSQ1
MissTotalCount (Long) = 7
AccessTotalCount (Long) = 28
Type (String) = EJBPoolRuntime
TimeoutTotalCount (Long) = 0
WaiterTotalCount (Long) = 0

With

-q “com.bea:ApplicationRuntime=MDB3,EJBComponentRuntime=MDB3, MessageDrivenEJBRuntime=WMQ_IVT_MDB_JMSQ1,Name=WMQ_IVT_MDB_JMSQ1,
ServerRuntime=AdminServer2,Type=EJBTransactionRuntime

the output was

TransactionsRolledBackTotalCount (Long) = 0
TransactionsTimedOutTotalCount (Long) = 0
Name (String) = WMQ_IVT_MDB_JMSQ1
Type (String) = EJBTransactionRuntime
TransactionsCommittedTotalCount (Long) = 0

What data is there to help you manage your systems?

There is a lot of information provided by MQ to help you manage your systems, some of it is not well documented.

I’ll list the sources I know, and when they might be needed, but before that I’ll approach it from the “what do I want to do”.

What do I want to do?

I want to…

  • know when significant events happen in my system, such as channel start and stop, and security events. AMQERRnn.LOG
  • know when “events” happen, such as a queue filling up security exceptions.  Event Queues
  • be able to specify thresholds, such as when the current depth is > 10 messages, and the age of the oldest message is older than 5 seconds then do something. Display commands
  • be able draw graphs of basic metrics, such as number of messages put per hour/per day so I can do capacity planning, and look for potential capacity problems. Statistics
  • identify which queues are being used, display queue activity, number of puts, and size of puts etc  Statisticsdisplay object status
  • identify which queues (objects) are not being used, so they can be deleted.  Absence of records in Statistics.  Issue DIS QSTATUS every day and see if a message has been put to the queue.  Creating events for when a message is put to the queue.  Note an object may only be used once a year – so you need to monitor it all year.
  • identify which applications are putting to and getting from queues. Accounting
  • see what MQI verbs are being used, so we can educate developers on the corporate naming standards, and API usage.  Activity trace
  • display the topology.  Display commandsTrace routeActivity trace
  • trace where messages are going, so we can draw charts of the flow of message requests and their responses, and display the topology of what is actually being used.  Trace route
  • measure round trip times of messages – so I know if there are delays in the end to end picture. Trace route
  • Understand the impact of a problem “here” by seeing what flow through “here”.  What’s my topology

What sources are there?

AMQERRnn.LOG.

These contain information about events in the queue manager, such as channel start and channel stop. These files are in /var/mqm/qmgrs/QMA/errors/… and can be read using an editor or browser.  People often feed these into tools like SPLUNK, and then you can filter and do queries to monitor for messages that have not been seen before.

Event queues.

MQ messages are put on queues like SYSTEM.ADMIN.QMGR.EVENT.

There is a sample amqsevt which can be used to print the message in text format or json format – or you can write your own program.

Creating events.

You can configure MQ to produce events when conditions occur. For important queues you can set a high threshold, and MQ produces an event when this limit is exceeded. You can use this

  • to see if messages are accumulating in a queue
  • to see if a queue is being used – set the queue high threshold to be 1, and you will get an event if a message is put to a queue

Statistics

If you turn on statistics you information on the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.STATISTICS.QUEUE

The information is summarized by queue.

You can use

  1. the sample amqsevt to process these messages, you can have output in json format for input into other tools.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.

Accounting

If you turn on accounting you information about what an application is doing. The the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.ACCOUNTING.QUEUE. The information is similar to the information provided by statistics, but it provided information about which application used the objects.

You can use

  1. the sample amqsevt to process these messages.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.

You can use display commands.

You can use commands, or use the MQINQ API to display information about object. You can issue commands using runmqsc or from an application by putting command requests in PCF format to a queue, and getting the data back in PCF format. Your program has to decode the PCF data.
You can display multiple fields and have logic  to take action if values are out of the usual range.   For example
periodically display the curdepth, and the age of the oldest message on the queue and then do processing based on these value. Tivoli uses this technique to creates situations if specified conditions are met. You can easily write your own programs to do this, for example using python scripts and pymqi.

What’s my topology?

You can use the DIS CHANNEL … CONNAME command to show where a channel connects to and use this to draw up a picture of your configuration.

You can use the DIS QCLUSTER and DIS CLUSQMGR to show information about your clusters, and where cluster queues are, and use this information to draw up a picture of your configuration

You can use the traceroute to dynamically see the routes between nodes, and understand the proportion of messages going to different destinations – at that moment in time.

Displaying object status

You can use display commands to show information such as the last time a message was put to a queue, or got from a queue, or sent over a channel.

Application trace

The application trace shows you the MQ API calls, the parameters, and return codes. This data goes to the SYSTEM.ADMIN.TRACE.ACTIVITY.QUEUE queue.

You can use this to check the API options being used, for example

  • Messages persistence is correct for the application pattern (inquiry is non persistence)
  • The correct message expiry is specified (non persistent has time value)
  • The correct options are specified
  • Applications are using MQ GET with wait rather than polling a queue
  • The correct syncpoint options are being used.
  • Which queue is really used. You open one queue name but this could be an alias. You get the queue it maps to.

There is an overhead to collecting this, so you do not want to run this for extended periods of time.

Running it for just a minute or two may give you enough information. You can turn this on for an individual program.

You can use amqsevt to process the queue.

Trace route.

You can send a message “to a queue” and get back the processes involved in getting to the queue.. For example use the dspmqrte to “put” a message to a cluster queue, and you will see the sending channel get the message and send it, then the receiver channel at the remote end receive the message and “putting” it to the queue. One of the data fields is the operation time, so you can see where the delays were in the processing (for example it took seconds to be sent over a channel).  See here

By default the message is not put to the queue, but there is an option to put it to the queue for the application to process, but there is no documentation to tell you how you process this message. The dspmqrte command effectively shows you the hops between queues. It is up to you to build up the true end to end path, and manage the responses yourself.

The provided programs dspmqrte are simplistic and show you the path to the queue, and the channels used on the queue.

The data is not pure PCF, and the sample amqsevt does not format it. I have modified it to handle this.