Getting useful information out of JMX data

The data coming from Liberty WebServer through the JMX interface  provides some data, but it is not very useful, and it may become inaccurate over time.

I’ll cover

  1. Getting a useful mean value
  2. Getting a more accurate long term mean
  3. Data gets more inaccurate over time
  4. Standard deviation (this may only be of interest to a few people)

For example from JMX, the reported  mean time for mqconsole transactions  was 9.9 milliseconds – this is for all requests since the mqweb server was started.   Over the previous minute the average measured time, for a 10 second period was 7, or 8 milliseconds, so well below the 9.9 reported.

This is because the mean time includes any initial start up time.   The maximum transaction time, at the start of the run, was over 2 seconds.   This will bias the mean.

You can process the data to extract some useful information, and I show below how to get out useful response time values.

You get the following data (and example values) from mqweb through the JMX interface.

ResponseTimeDetails/count (Long) = 20
ResponseTimeDetails/description (String) = Average Response Time for servlet
ResponseTimeDetails/maximumValue (Long) = 3060146565 
– in nanoseconds (see below for the unit)
ResponseTimeDetails/mean (Double) = 4.336789965E8
– in nanoseconds
ResponseTimeDetails/minimumValue (Long) = 2474556
– in nanoseconds
ResponseTimeDetails/standardDeviation (Double) = 9.089057964078983E8
– in nanoseconds
ResponseTimeDetails/total (Double) = 8.67357993E9
– used in calculations
ResponseTimeDetails/unit (String) = ns
– the unit ns = nanoseconds
ResponseTimeDetails/variance (Double) = 8.319076762335653
– used in calculations

Getting a useful mean value

To produce these numbers, the count of the response times and the sum of the transaction response times are accumulated within the Liberty Server.  To calculate the mean value you calculate sum/count.   This gives you the overall mean time.  If you obtain the data periodically you can manipulate the data to provide some useful statistics.

Let the count and sum at time T1 be Count1, and Sum1, and at time T2 Count2, and Sum2.
You can now calculate (Sum2- Sum1)/(Count2 – Count1) to get the average for that period.  For example the reported mean was 0.016 ms, but the calculated value gave 0.008 ms.  You can also calculate (Count2 – Count1)/(T2-T1) to give a rate of requests per second.   These are much more useful than the raw data.  I suggest collecting the data every minute.

Getting a more accurate long term mean

The first rest request and console request take a long time because the java code has to be loaded in etc.  In one test the duration of the first request was 50 times the duration of the other requests.  A better “mean” value is to ignore the duration of the first request.

The improved mean is (JMX mean * JMX count  – JMX Maximum value) /(JMX Count-1), or JMXMean – (JMXMaximum/JMXCount) .

Data gets more inaccurate over time

The total time is stored as a floating point double.  As you add small numbers to big numbers, the small numbers may be ignored.  Let me try to explain.

Consider a float which has precision of 3, so you could have 1.23 E2 = 1230.  Add 1 to this, and you get 1231 which is 1.23 E2 with a precision of 3 – the number we started with.

The total time is in nanoseconds so 1 second is stored as 1.0 E9.  With 100 of these a second, and 1 hour( 3600 seconds) for 100 hours is 360,000,000, or 3.6 E8 seconds.  * 1.0 E9 nano seconds. = 3.6E17 nano seconds.   The precision  of most float numbers is 16, so with this 3.6 E17 we have lost the odd nanosecond.    I do not think this is a big enough problem to worry about – unless you are running for years without restarting the server.

The variance uses the time**2 value.  So with the maximum time above 599482097 nano seconds. Time **2 is 3.593787846×10¹⁷ and you are already losing data.  I expect the variance will not be used very often, so I think this can be ignored.

If the times were in microseconds instead of nano seconds, this would not be a problem.

Getting a useful standard deviation (this may only be of interest to a few people)

The standard deviation gives a measure of the spread of the data, a small standard deviation means the data is in a narrow band, a larger standard deviation means it is spread over a wider band.  Often 95% of the values are within plus or minus 3 * standard deviations from the mean, so anything outside this range would considered an outlier, or unusual statistic.

I set up some data, a mixture of  10  values 9, 10, 11,  the standard deviation was 0.73.    I changed one value to 20, and the standard deviation changed to 3.3, indicating a wide spread of values.

With a mixture of 100 values 9,10,11, the standard deviation was 0.71.   I changed one value to 20, and the standard deviation changed to 1.2, so a much smaller value, most of the data was still around 10 – just one outside the range.

With a lot of data, the standard deviation converges on a value, and “unusual” numbers make little difference to the value.  I think that the standard deviation over an extended period is not that useful, especially if you get periodic variations such as busy time, and overnight.

You calculate the standard deviation as the square root of the variance.   The variance is (Sum of (values**2) – (mean ** 2)) /number of measurements.

With data

ResponseTimeDetails/count (Long) = 203
ResponseTimeDetails/mean (Double) = 6420785.187192118 nanoseconds
ResponseTimeDetails/variance (Double) = 1.7113341125320868E15 – used in calculations

Variance  = 1.7113341125320868E15 =  ( (Sum of (values**2) – (6420785.187192118 ** 2)) / 203

So (Sum of (values**2)) =   3.474420513264337e+17

You can now do the sums as with the mean, above:

At time T1, the ssquares1 is the sum of (values**2)   at time T2, the ssquares2 is the sum of (values**2).

You can now calculate ssquares2 – ssquares2, and used that to estimate the variance, and so the standard deviation of the data in the range T1 to T2, I’ll leave the details to the end user.

For the advanced user,  you can use the mean for the interval – rather than the long term mean.  Good luck.

 

mqweb error messages and symptoms of TLS setup problems

I deliberately caused TLS set up errors, and noted the symptoms.  Ive recorded them below; the article is not meant to be read, but indexed by search engines.

There are three sections

  1. Problems with server certificates
  2. Problems with the client certificate
  3. Chrome messages, and possible causes of the problems.

The mqweb messages.log reported problems that the mqweb server saw.   For me this was in file /var/mqm/web/installations/Installation1/servers/mqweb/logs/messages.log

Problems with the server certificate

Problem: mqwebuser.xml serverKeyAlias name not in the keystore

Message log:

  • E CWPKI0024E: The certificate alias mqweb specified by the property com.ibm.ssl.keyStoreServerAlias is not found in KeyStore /home/colinpaice/ssl/ssl2/mqweb.p12.
  • I FFDC1015I: An FFDC Incident has been created: “com.ibm.wsspi.channelfw.exception.ChannelException: java.lang.IllegalArgumentException: CWPKI0024E: The certificate alias mqweb specified by the property com.ibm.ssl.keyStoreServerAlias is not found in KeyStore /home/colinpaice/ssl/ssl2/mqweb.p12. com.ibm.ws.channel.ssl.internal.SSLConnectionLink 238″ at ffdc_….

curl:

* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* curl (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:9443
* stopped the pause stream!
* Closing connection 0

chrome:

This site can’t be reached.  ERR_CONNECTION_CLOSED

Problem:  The host certificate is self signed and not in the client keystore

Message log:

Nothing.

curl:

* TLSv1.2 (OUT), TLS alert, Server hello (2):
* SSL certificate problem: self signed certificate
* stopped the pause stream!
* Closing connection 0
curl: (60) SSL certificate problem: self signed certificate

Chrome: in browser

NET::ERR_CERT_AUTHORITY_INVALID

Click on the Not Secure in the url, to display the certificate which was sent down.

Chrome log:

ERROR:cert_verify_proc_nss.cc(1011)] CERT_PKIXVerifyCert for localhost failed err=-8179

From here  -8179 is Peer’s certificate issuer is not recognized.

Problem: curl: The host certificate is self signed and you use the –insecure option

curl

* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: C=GB; O=aaaa; CN=testuser
* start date: Jan 20 17:39:37 2020 GMT
* expire date: Feb 19 17:39:37 2020 GMT
* issuer: C=GB; O=aaaa; CN=testuser
* SSL certificate verify result: self signed certificate (18), continuing anyway.

Problem: Chrome:  The host certificate is self signed and is not trusted

Chrome browser

This site can’t be reached
localhost unexpectedly closed the connection.
ERR_CONNECTION_CLOSED

Debugging

  • I could find nothing that told me what certificate was being used.  The Chrome network trace just gave “net_error = -100 (ERR_CONNECTION_CLOSED)“.
  • Use certutil -L $sql  to list the contents of your browsers keystore.   The certificate needs “P,…” permissions.
  •  Or use the chrome url chrome://settings/certificates  and display “your certificates”. Pick the likely one, if it says “UNTRUSTED” then this may be the problem.   View the certificate, and check it, for example under details, there may be a comment describing its use.
  •  Defined the server certificate as trusted using certutil -M $sql -n name -t “P,,” 
  • Restart the web browser.

Problem: The  CA signer server certificate had the wrong subjectAltName

curl:

* subjectAltName does not match 127.0.0.1
* SSL: no alternative certificate subject name matches target host name ‘127.0.0.1’

Chrome:

NET::ERR_CERT_COMMON_NAME_INVALID
From the “Not Secure” in front of the URL, display the certificate, and check the extenstions, especially Certificate Subject Alternative Names.

Chrome log:

ERROR:ssl_client_socket_impl.cc(935)] handshake failed; returned -1, SSL error code 1, net_error -200
From here -200 is  CERT_INVALID

Problem: The mqweb server certifcate has expired

curl:

* TLSv1.2 (OUT), TLS alert, Server hello (2):
* SSL certificate problem: certificate has expired
curl: (60) SSL certificate problem: certificate has expired

chrome:

while Chrome running:   web page reports Lost communication with the server.  Could not establish communication with the server. Check your network connections and refresh your browser

restart browser, get “Your connection is not private NET::ERR_CERT_DATE_INVALID”

message.log.  Chrome session was working, then server certificate expired

  • E CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLException: Received fatal alert: certificate_unknown

Problem: The mqweb server certificate is missing extendedKeyUsage = serverAuth

curl:

* SSL certificate problem: unsupported certificate purpose
curl: (60) SSL certificate problem: unsupported certificate purpose

Chrome:

Your connection is not private
Attackers might be trying to steal your information from localhost (for example, passwords, messages or credit cards).
NET::ERR_CERT_INVALID

Chrome log:

CERT_PKIXVerifyCert for localhost failed err=-8101
From here  -8101 is Certificate type not approved for application.

ERROR:ssl_client_socket_impl.cc(935)] handshake failed; returned -1, SSL error code 1, net_error -207
From here -207 is CERT_INVALID

Problems with the client certificate

Problem: The client certificate is self signed and not in the server’s trust store

curl:

* TLSv1.2 (OUT), TLS handshake, Finished (20):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:9443

Chrome:

ERR_CONNECTION_CLOSED

Messages in messages.log:

  • I FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertPathBuilderException: unable to find valid certification path to requested target com.ibm.ws.ssl.core.WSX509TrustManager checkClientTrusted” at ffdc_20.01.30_08.29.27.0.log
  •  E CWPKI0022E: SSL HANDSHAKE FAILURE: A signer with SubjectDN CN=testuser, O=aaaa, C=GB was sent from the target host. The signer might need to be added to local trust store /home/colinpaice/ssl/ssl2/trust.jks, located in SSL configuration alias defaultSSLConfig. The extended error message from the SSL handshake exception is: PKIX path building failed: java.security.cert.CertPathBuilderException: unable to find valid certification path to requested target
  •  I FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertificateException: unable to find valid certification path to requested target com.ibm.ws.ssl.core.WSX509TrustManager checkClientTrusted” at ffdc_20.01.30_08.29.27.1.log
  • E CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLHandshakeException: null cert chain

Problem: Invalid cn=, the cn value is not a valid userid.

curl message

{“error”: [{

  • “action”: “Provide credentials using a client certificate, LTPA security token, or username and password via HTTP basic authentication header. On z/OS, if the mqweb server has been configured for SAF authentication, check the messages.log file for messages indicating that SAF authentication is not available. Start the Liberty angel process if it is not already running. You might need to restart the mqweb server for any changes to take effect.”,
  • “completionCode”: 0,
  •  “explanation”: “The REST API request cannot be completed because credentials were omitted from the request. On z/OS, if the mqweb server has been configured for SAF authentication, this can be caused by the Liberty angel process not being active.”,
  • “message”: “MQWB0104E: The REST API request to ‘https://127.0.0.1:9443/ibmmq/rest/v1/login ‘ is not authenticated.”,
  • “msgId”: “MQWB0104E”,
  • “reasonCode”: 0,
  • “type”: “rest”

chrome:

It gives you a window to enter userid and password.   This looks like a bug as I have <webAppSecurity allowFailOverToBasicAuth=”false”/>.  It takes the userid and password.

Messages in  messages.log:

R com.ibm.websphere.security.CertificateMapFailedException
and 100 lines of stack trace

The certificate causing the problems, nor the userid is listed – so pretty useless.

Problem: Client certificate missing “extendedKeyUsage = clientAuth”  during signing.

curl message

* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
curl session hangs…
* Operation timed out after 300506 milliseconds with 0 out of 0 bytes received

Chrome

ERR_CONNECTION_CLOSED

message in messages.log:

  • E CWPKI0022E: SSL HANDSHAKE FAILURE: A signer with SubjectDN CN=colinpaice, O=cpwebuser, C=GB was sent from the target host. The signer might need to be added to local trust store /home/colinpaice/ssl/ssl2/trust.jks, located in SSL configuration alias defaultSSLConfig. The extended error message from the SSL handshake exception is: Extended key usage does not permit use for TLS client authentication
  •  I FFDC1015I: An FFDC Incident has been created: “java.lang.NullPointerException com.ibm.ws.ssl.core.WSX509TrustManager checkClientTrusted” at ffdc_20.01.28_17.11.10.1.log

ffdc in /var/mqm/web/installations/Installation1/servers/mqweb/logs/messages.log/ffdc

Exception = java.lang.NullPointerException
Source = com.ibm.ws.ssl.core.WSX509TrustManager
probeid = checkClientTrusted
Stack Dump = java.lang.NullPointerException
at com.ibm.ws.ssl.core.WSX509TrustManager.checkClientTrusted(WSX509TrustManager.java:202)

Problem: Client certificate missing “keyUsage = digitalSignature”  during signing.

curl message

* TLSv1.2 (OUT), TLS handshake, Finished (20):
* Operation timed out after 300509 milliseconds with 0 out of 0 bytes received

message in messages.log

  • E CWPKI0022E: SSL HANDSHAKE FAILURE: A signer with SubjectDN CN=colinpaice, O=cpwebuser, C=GB was sent from the target host. The signer might need to be added to local trust store /home/colinpaice/ssl/ssl2/trust.jks, located in SSL configuration alias defaultSSLConfig. The extended error message from the SSL handshake exception is: KeyUsage does not allow digital signatures
  • FFDC1015I: An FFDC Incident has been created: “java.lang.NullPointerException com.ibm.ws.ssl.core.WSX509TrustManager checkClientTrusted”
  • E CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLHandshakeException: null cert chain

ffdc in /var/mqm/web/installations/Installation1/servers/mqweb/logs/messages.log/ffdc

Exception = java.lang.NullPointerException
Source = com.ibm.ws.ssl.core.WSX509TrustManager
probeid = checkClientTrusted
Stack Dump = java.lang.NullPointerException
at com.ibm.ws.ssl.core.WSX509TrustManager.checkClientTrusted(WSX509TrustManager.java:202)

Chrome:

  • If there is one or more certificates in the keystore, the list of valid certificates does not include the problem one.
  • If there is only the problem certificate in the keystore, you get
    This site can’t be reached.
    localhost unexpectedly closed the connection.
    ERR_CONNECTION_CLOSED

CA Signed client certificate has expired

curl:

* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:9443
* stopped the pause stream!
* Closing connection 0

Chrome:

This site can’t be reached
localhost unexpectedly closed the connection.
ERR_CONNECTION_CLOSED

message in messages.log:

for curl.

  • I FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertPathValidatorException: The certificate expired at Thu Jan 30 16:46:00 GMT 2020; internal cause is:
    java.security.cert.CertificateExpiredException: NotAfter: Thu Jan 30 16:46:00 GMT 2020 com.ibm.ws.ssl.core.WSX509TrustManager checkClientTrusted” at ffdc_20.01.30_17.16.11.0.log
  • E CWPKI0022E: SSL HANDSHAKE FAILURE: A signer with SubjectDN CN=colinpaice, O=cpwebuser, C=GB was sent from the target host. The signer might need to be added to local trust store /home/colinpaice/ssl/ssl2/trust.jks, located in SSL configuration alias defaultSSLConfig. The extended error message from the SSL handshake exception is: PKIX path validation failed: java.security.cert.CertPathValidatorException: The certificate expired at Thu Jan 30 16:46:00 GMT 2020; internal cause is:
    java.security.cert.CertificateExpiredException: NotAfter: Thu Jan 30 16:46:00 GMT 2020
  •  I FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertificateException: The certificate expired at Thu Jan 30 16:46:00 GMT 2020 com.ibm.ws.ssl.core.WSX509TrustManager checkClientTrusted” at ffdc_20.01.30_17.16.11.1.log

for chrome:

  • I FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertificateException: The cer
    tificate expired at Thu Jan 30 16:46:00 GMT 2020 com.ibm.ws.ssl.core.WSX509TrustManager checkClientTrusted” at ffdc_20.01.30_17.16.11.1.log
  • E CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLHandshakeException: null cert chain

Bad requests

HTTP request was issued – it should have been HTTPS

curl:

curl:(52) Empty reply from server

messages.log:

E CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?

 

Chrome errors

Chrome has more stricter checks than curl.  These are from Chrome browser.

NET::ERR_CONNECTION_CLOSED

  • mqwebuser.xml serverKeyAlias name not in the keystore
  • The host certificate is self signed and is not trusted
  • The client certificate is self signed and not in the server’s trust store
  • Client certificate missing “extendedKeyUsage = clientAuth”  during signing.
  • CA Signed client certificate has expired
  • Client certificate missing “keyUsage = digitalSignature”  during signing.

NET::ERR_CERT_COMMON_NAME_INVALID

  • missing x509 extensions in the server certificate
  • invalid subjectAltName in x509 extensions, for example IP:127.0.0.11  instead of IP:127.0.0.1

NET::ERR_CERT_INVALID

  • missing extendedKeyUsage = serverAuth in x509 extensions

NET::ERR_CERT_AUTHORITY_INVALID

  • Certificate is not peer.  Need certutil -M $sql -n $name -t “P,,” to change the certificate to be a trusted peer
  • Server’s self signed not found in the browser keystore.

NET::ERR_CERT_DATE_INVALID

  • The mqweb server certifcate has expired

mqweb – performance notes

  • I found facilities in Liberty which can improve the performance of your mqweb server by 1% – ish, by using http/2 protocol and ALPN
  • Ive documented where time is spent in the mq rest exchange.

Use of http/2 and ALPN to improve performance.

According to Wikipedia, Application-Layer Protocol Negotiation (ALPN) is a Transport Layer Security (TLS) extension that allows the application layer to negotiate which protocol should be performed over a secure connection in a manner that avoids additional round trips and which is independent of the application-layer protocols. It is needed by secure HTTP/2 connections, which improves the compression of web pages and reduces their latency compared to HTTP/1.x.

mqweb configuration.

This is a liberty web browser configuration, see this page.

For example

 <httpEndpoint id="defaultHttpEndpoint"
   host="${httpHost}" 
   httpPort="${httpPort}"
   httpsPort="${httpsPort}"
   protocolVersion="http/2"
   >
   <httpOptions removeServerHeader="false"/>

</httpEndpoint>

Client configuration

Most web  browsers support this with no additional configuration needed.

With curl you specify ––http2.

With curl, ALPN is enabled by default (as long as curl is built with the ALPN support).

With the curl ––verbose option on a curl request,  you get

  • * ALPN, offering h2 – this tells you that curl has the support for http2.
  • * ALPN, offering http/1.1

and one of

  • * ALPN, server did not agree to a protocol
  • * ALPN, server accepted to use h2

The “* ALPN, server accepted to use h2” says that mqweb is configured for http2.

With pycurl you specify

 c.setopt(pycurl.SSL_ENABLE_ALPN,1)
 c.setopt(pycurl.HTTP_VERSION,pycurl.CURL_HTTP_VERSION_2_0)

Performance test

I did a quick performance test of a pycurl program getting a 1024 byte message (1024 * the character ‘x’) using TLS certificates.

HTTP support Amount of “application data” sent Total data sent.
http/1.1 2414 7151
http/2 2320 7097

So a slight reduction in the number of bytes send when using http/2.

The time to get 10 messages was 55 ms with http/2, and 77ms with http/1.1,  though there was significant variation in repeated measurements, so I would not rely on these measurements.

Where is the time being spent?

cURL and pycurl can report the times from the underlying libcurl package.  See TIMES here.

The times (from the start of the request) are

  • Name lookup
  • Connect
  • Application connect
  • Pre transfer
  • Start transfer
  • Total time

Total time- Start transfer = duration of application data transfer.

Connect duration = Connect Time – Name lookup Time etc.

For a pycurl request getting two messages from a queue the durations were

Duration in microseconds First message Second messages
Name_lookup 4265 32
Connect 53 3
APP Connect 18985 0
Pre Transfer 31 42
Start Transfer 12644 11036
Transfer of application data 264 235

Most of the time is spent setting up the connection, if the same connection can be reused, then the second and successive requests are much faster.

In round numbers, the first message took 50 ms, successive messages took between 10 and 15 ms.

mqweb charts look like “work in progress”

One of the widgets you can add to a dashboard page is a chart.   This can subscribe to the published monitoring data, and displays it as a line chart.

The topics are described in my post  What data is available with the Published Monitoring data, and include

  • Platform central processing units
      • CPU performance – platform wide  …
    • CPU performance – running queue manager …
  • Platform persistent data stores
    • Disk usage – platform wide …
    • Disk usage – running queue managers …
    • Disk usage – queue manager recovery logs …
  • API usage statistics
    • MQCONNs and MQDISCs …
    • etc
  • API per-queue usage statistics
    • MQOPEN and MQCLOSE …
    • etc

The data is published every 10 seconds or so, and the charts are refreshed around every 10 seconds.

These charts seem to be a work in progress or demonstration of technology.

After you’ve added the widget you need to click in the wheel to configure the chart.

You can select the data you want to display from drop downs

  • Select top left, for the “resource class” (major category), top right for the “resource type” (minor category) second row left for the “resource element” (detail), second row right for any object.
  • For those topics that need an object, such as a queue name, you must give  a specific object, such as CP9999, not as CP999*.
  • If you change what is being displayed, you have to select all of the data, for example,  I was collecting API usage statistics, get count per queue, and wanted to collect API usage statistics (on all queues).  I found I was changing the major category, clicking save, and getting the wrong data displayed.
    • Click on the cog
    • Select API resource class: “API usage statistics
    • Resource type: defaults to “MQCONN and MQDISC“, so select “MQGET
    • The Resource element defaults to “Interval total destructive get- count”.  I want this, so select Save.

View finder=select time range

From the cog, you can select or hide the “view finder”.  I would have called “view finder”  “select time range”.  If you “show” view finder, instead of one chart 6 cm high, the main chart is squashed into 5 cm, and there is a 1 cm squashed version below it.  It took me a while to find what this view finder is for.  If you click+hold on the “view finder” graph, and drag left or right,  the mini chart becomes a grey box with a slider.  Drag, right or left,  allows you to select the range which is displayed in the big chart.

  • To reset the window click somewhere else in the view finder chart.
  • As the small chart displays the whole time period available to you, you can drag the slider to an interesting area to allow you to drill into it.
  • You can click and drag the time range bar/gap in the view finder, so you get the same time duration, but at an earlier or later time.
  • As more data is added to the right hand side of the chart,  the time slide moves to the left over time
  • I could not find how to display just the last 5 minutes worth of data, as the window moved.
  • With the <variable name=”ltpaExpiration” value=”30″/> configuration, you get logged off after 30 minutes.  With certificates you get logged on again, but the time interval is reset, and you lose the historical data.  In this case you get no more than 30 minutes worth of data displayed.

Have data for more than one queue manager on a chart

  • If you have more than one queue manager active, you can display data from more than one queue manager.
  • A queue manager has to be active to be able to add it to a chart.   You can then stop the queue manager, and the chart will remember your selection.
  • You can select a colour for each queue manager, so for example have QMA in red, and QMB in blue, on the same chart.
  • On a chart, if you click on the circle in front of a queue manager, the circle changes from solid to empty. Click again and it goes solid.   When circle is solid the data is displayed on the chart, when it is empty, the data for that queue manager is not displayed.
    • The displayed time range may change.
  • If you move your cursor over the chart, a grey line will appear, and jump to data points, so you can see the data at that point.  It only jumped to data for one colour.
  • The number at the top of the y axis is the maximum value  displayed.

Some descriptions could be clearer

  • Some of the descriptions could be clearer, for example “interval total destructive get – count” is clear,  but “Failed MQGET – count”, is presumable for the interval as well.  I think these come from the published data.   (I found it easier to create better descriptions when I processed the published monitoring data)

The data for multiple queue managers may not be synchronised

  • For example “Platform CPU,  CPU performance – platform wide” when I had two queue managers, gave me two lines, which tend to follow each other.   The data is collected at different times (for example 10:00:03, and 10:00:07), so the data is different at each point.
  • The answer is easy – for system wide metrics just select one queue manager.

You can rename charts

  • If you hover over the title, you can click on the pencil to rename the widget.  If you clear the name it resets it to the default.
  • I changed it to “COLINs LAPTOP CPU”, then, later,  changed the chart to disk space usage.  The title “COLINs LAPTOP CPU ” was no longer relevant, so I clicked on the pencil icon, and cleared the title, and got the default chart description back.

Refresh window

  • If you refresh the window, all the charts are reset and historical data may be lost.
  • If a queue manager was stopped and restarted, refreshing the window will cause the subscription for all of the charts in the window to be reissued.

Some “hmm interesting” observations.

  • Some of the y axis data was strange.   When starting to collect some data, I had number of gets 2050, when the number of gets in the last hour was about 5.   This is from the published data.  Since data was published for the queue (over 1 week ago) there were 2050 gets from the queue since them.   The published data reported this value, and reset it.
  • Because I had <variable name=”ltpaExpiration” value=”30″/>, after 30 minutes the windows gets logged off.  Because I was using digital certificates it automatically logs on again.
  • I stopped one queue manager and restarted it.  In some charts, the data for that queue manager stopped being displayed. On other charts it was displayed successfully.
    • You need to go to the settings cog, and click save.   This reissues the subscription.
  • If you click on a the box with an arrow in it (“browse data”), by the cog, you can display the data for that chart.  Select a queue manager from the pull down list.   If you type “a” you can select all of the data – you cannot copy it to the clip board.
    • If you click on the column heading ( Timestamp or Data) you can sort ascending or descending.
  • If you select API stats for a specific queue, it does not display which queue is being displayed.
  • Sometimes data is missing.  I could see a line was missing some data.   Using the “browse data” box, I could see one queue manager had data from 13:14,  another had data from 13:15.   Both queue managers were running while I had my lunch.
  • The chart “MQ trace file system – bytes in use” reported 16KB of data – when I had over 160KB of *.trc data.  If if was for the file system, then it is a very small file system.  I do not understand this metric.

mqweb what’s the difference between the message API and the admin API?

At first glance it looks like the answer is in the question.  You can use

  • the messaging REST API put and get messages
  • the admin REST API to administer queue manager objects

In a couple of places the IBM documentation says you can use the messaging API to administer your objects, which is true at the general sense, but not the specific sense.  Until I hit a problem I thought there was one “messaging REST API” with different flavors of syntax.

Security

The admin API authorisation is managed through <security-role name=”MQWebAdmin”> and <security-role name=”MQWebAdminRO”> sections in the mqwebuser.xml file.

The messaging API authorisation is managed through <security-role name=”MQWebUser”> sections.

Access to resources is done using the Alternate Userid.  I can see in the activity trace that the userid is colinpaice(the id mqweb is running under), but the open of a queue was done with alternate userid testuser.  When I tried to browse messages on a queue, I got a message saying my userid did not have the correct authority. I used setmqaut, and mqsc command refresh security(*) to resolve it.

Cost of the admin interface

The admin interface has a request like

https://127.0.0.1:9443/ibmmq/rest/v1/admin/qmgr/QMA/queue/CP0000?attributes=*

which returns all of the attributes of the queue CP0000.  From the activity trace we can see

  • MQCONN + MQDISC
  • MQOPEN, MQINQ, MQCLOSE of the manager object – twice
  • MQOPEN, MQPUT, MQCLOSE to the SYSTEM.ADMIN.COMMAND.QUEUE
  • MQOPEN, MQGET, MQCLOSE to the SYSTEM.REST.REPLY.QUEUE
  • MQCMIT
  • MQBACK – the JMQI code always does this to be sure that there is no outstanding unit of work,

The most expensive request is the MQCONNect.

Using the admin interface is fine for administration because changes to objects are usually done infrequently.   If you are considering the admin interface to monitor objects, for example plot queue depths over time, the mq rest API may not be the best solution.

Cost of the messaging interface

The messaging API interface uses connection pooling.   When the application does an MQDISC, the connection is returned to a pool, and can be reused if the same userid does an MQCONN.  If the connection is not used for a period, it can be removed from the pool and an MQDISC done to release the connection.    This should eliminate frequent MQCONN and MQDISCs.

From the activity trace we see

MQOPEN, MQGET,MQGET,MQCLOSE of the queue, and no MQCONN.

There will be an MQCONN, is there is no connection available for that userid in the pool, but this should be infrequent.

Python and mq REST api

I found cURL a good way of using the mq REST API, but I wanted to do more.  cURL depends on a package called libcurl, which can be used by other languages.

Python seemed the next obvious place to look.

As I have found out, using digital certificates for authentication is hard to set up, and using signed certificates is even harder.  As I had done the hard work of setting up the certificates before I tried curl and Python, the curl and Python experience was pretty easy.

I looked at using the Python “request” package.   This allows you to specify most of the parameters that libcurl needs, except it does not allow you to specify the password for the user’s keystore.

I then looked at the Python package pycurl package.    This is a slightly lower level API, but got it working in an hour or so.
My whole program is below.

During the testing I got various errors, such as “77”.  These are documented here. 

The messages were clear, for example

CURLE_SSL_CACERT_BADFILE (77) Problem with reading the SSL CA cert (path? access rights?).

Which was enough to tell me where to look.

All the things you can do with curl, you can do with pycurl.

 

# program - based on code in http://pycurl.io/docs/latest/quickstart.html
import sys
import pycurl

from io import BytesIO

# header_function take from http://pycurl.io/docs/latest/quickstart.html
headers = {}
def header_function(header_line):
# HTTP standard specifies that headers are encoded in iso-8859-1.

header_line = header_line.decode('iso-8859-1')

# Header lines include the first status line (HTTP/1.x ...).
# We are going to ignore all lines that don't have a colon in them.
# This will botch headers that are split on multiple lines...
if ':' not in header_line:
  return

# Break the header line into header name and value.
name, value = header_line.split(':', 1)
print("header",name,value)

home = "/home/colinpaice/ssl/ssl2/"
ca=home+"cacert.pem"
cert=home+"testuser.pem"
key=home+"testuser.key.pem"
cookie=home+"cookie.jar.txt"
url="https://127.0.0.1:9443/ibmmq/rest/v1/admin/qmgr/QMA/queue/CP0000?attributes=type"
buffer = BytesIO()
c = pycurl.Curl()
print("C=",c)
try:
  # see option names here https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
  # PycURL option names are derived from libcurl
  # option names by removing the CURLOPT_ prefix. 
  c.setopt(c.URL, url) 
  c.setopt(c.WRITEDATA, buffer) 
  c.setopt(pycurl.CAINFO, ca) 
  c.setopt(pycurl.CAPATH, "") 
  c.setopt(pycurl.SSLKEY, key) 
  c.setopt(pycurl.SSLCERT, cert) 
  c.setopt(pycurl.SSL_ENABLE_ALPN,1)
  c.setopt(pycurl.HTTP_VERSION,pycurl.CURL_HTTP_VERSION_2_0)
  c.setopt(pycurl.COOKIE,cookie) 
  c.setopt(pycurl.COOKIEJAR,cookie) 
  c.setopt(pycurl.SSLKEYPASSWD , "password") 
  c.setopt(c.HEADERFUNCTION, header_function)  
# c.setopt(c.VERBOSE, True)
  c.perform() 
  c.close()
except Exception as e: 
  print("exception :",e ) 
finally: 
  print("done") 
body = buffer.getvalue() # Body is a byte string. 
# We have to know the encoding in order to print it to a text file 
# such as standard output. 
print(body.decode('iso-8859-1'))

 

Getting mqweb into production

You’ve got mqweb working,  you can now do administration using the REST API, or use a web browser in your sandbox environment to manage a queue manager.  You now want to get it ready for production – so where do you start?

I’ll document some of the things you need to do.  But to set the the scene, consider your environment

  • Production and test
  • Two major applications, accounts and payroll
  • You have multiple machines for each application, providing high availability and scalability
  • Teams of people
    • The MQ administration team who can do anything
    • The MQ RO administration team who can change the test systems, but have read only access to production
    • The applications teams who can change their test environment, but only have read only access to production
  • You will use signed certificates (because this is production)  and not use passwords.
  • People will get the same dashboard,  to make training and use easier.
  • You want to be able to quickly tell if a dashboard is for production or test, and accounts and payroll
  • You want to script deployment, so you deployment to production can be done with no manual involvement.
  • You want a secure, available solution.

The areas you need to consider are

  • the mqwebuser.xml file
  • the keystore for the mqweb certificate
  • the trust store for the authorisation certificates
  • the dashboard for each user
  • each user’s certificate store with their private keys
  • Displaying the statistics on the mq console and REST requests.

Setting up security

It is better to give access using groups rather than by individual ids.

  • If some one joins or leaves a team, you have to update one group, rather than many configuration files.
    • This is easier to do, and is easier to audit
  • The control is in the right place.  For example the manager of the accounts team should mange the accounts group.  The MQ team should not be doing userid administration on the accounts group.

You will need groups for

  • MQ Systems Administrators who can administer production and test machines
  • MQ Systems  RO Administrators,  who can administer test machines, and have read access to production machines.
  • Payroll – the applications manager may want more granular groups.
  • Accounts  – the applications manager may want more granular groups.

You will need to set up the groups on each machine (you may well have this already).

Queue security

REST users need get and put access to SYSTEM.REST.REPLY.QUEUE.

For example

setmqaut -m QMA -n SYSTEM.REST.REPLY.QUEUE -t q -g test +get +put

then runmqsc refresh security

Set up the mqwebuser.xml file

The same file can be used for the different machines for “Accounts – production”, and a similar file for “Accounts – test” etc.

You may want to use “include”  files, so have one file imbedded in more than one mqwebuser.xml file.

Do not use the setmqweb command.   This will update the copy on the machine, and it will be out of sync with the master copy in your repository.

Define roles

The production environment for payroll may have

 <security-role name="MQWebAdmin">
   <group name="MQSA"/>
</security-role>

<security-role name="MQWebAdminRO">
  <group name="MQSARO"/>
  <group name="PAYROLL"/>
</security-role>

The test environment for payroll may have

<security-role name="MQWebAdmin">
   <group name="MQSA"/>
   <group name="MQSARO"/>
   <group name="PAYROLL"/>
</security-role>

<security-role name="MQWebAdminRO">
  <!-- none -all admin users can change test-->
</security-role>

Define http settings

By default mqweb is set up for localhost only.  You will need to have

  • <variable name=”httpHost” value=”hostname” />

where hostname specifies the IP address, domain name server (DNS) host name with domain name suffix, or the DNS host name of the server where IBM MQ is installed. Use an asterisk, *, to specify all available network interfaces.

You may need to change the port value from

  • <variable name=”httpsPort” value=”9443″/>

Define the keystore in mqwebuser.xml

Decide on the names and location of the key stores

  • <keyStore id=”defaultKeyStore” location=”/home/mq/payrollproductionkeystore.p12” type=”pkcs12″ password=”{aes}AMsUYgpOjy+rxR7f/7wnAfw1gZNBdpx8RpxfwjeIG8Wj”/>
  • <keyStore id=”defaultTrustStore” location=”/home/mq/payrollproductiontruststore.jks” type=”JKS” password=”{aes}AJOmiC8YKMFZwHlfJrI2//f2Keb/nGc7E7/ojSj37I/5″/>

Encrypt the keystore passwords  using the /opt/mqm/web/bin/securityUtility command. See here.

Ensure the deployment process gives the files the appropriate access.  The key store includes the private key, so needs to be protected.  The trust store should only have information in the “public” domain, such as certificates and no private keys, so could be universally read.

Set up the keystores

The keystore has the certificate and private key which identifies the server.  The certificate needs the subjectAltName specified which has a list of valid url names and IP addresses.
You need to decide if you want one certificate per server, and so have several certificates

subjectAltName = DNS:payroll1, IP:10.4.6.1

or several systems in the list, and have one certificate

subjectAltName = DNS:payroll1, DNS:payroll2, IP:10.4.6.1,  IP:10.9.5.4

You may want to create the keystore on your build environment, and securely deploy it to the run time machines, or send the .p12 file across and import it.  I think creating the keystore and deploying it is more secure.

If you change the keystore you have to restart mqweb to pickup changes.

Set up the trust store.

The trust store is used to validate certificates sent from the client for authentication.  In an ideal work, this will have just one CA certificate.  You may have more than one CA.  If you have self signed certificates this creates a management problem.

You may be able to use the same trust store for all your environments.   The access control is done by the security-roles in the mqwebuser.xml, not by the trust store.

The cn from the certificate is used as the userid. So both

cn=colinpaice,ou=test,o=sss and cn=colinpaice,ou=prod,o=sss are valid, and would extract userid colinpaice.

If the trust store is changed, the mqweb server needs to be restarted.

End user certificates

Each user will need a certificate to be able to access the mqweb server.  This needs to be signed by your CA, and needs to be set to trusted.  You should have this set up already.

If you have more than one valid certificate in the browser store, you will be prompted to pick one.   This is used until the browser is restarted.

You can configure mqweb to log off users after a period.   If you are using certificates, the browser will automatically log you on again!

Dashboard

The dashboard is the layout of the mqweb window, the tabs in the window, and the widgets on the tabs.

You will generally want users to have the template you define, and not have to create their own. So the Payroll team use the payroll dashboard, and the MQ admin team use the MQADMIN dashboard.

Create a dashboard and use export to create a json file.   You can store in your configuration repository.   You can change queue manager names as you deploy it for example change QMPayroll1 to QMPayroll2.

On the MQ machines these files are stored in the  /var/mqm/web/installations/Installation1/angular.persistence directory.

You can put your templates for that machine in this directory, and use symbolic links for a userid to their dashboard. For example

ln -s common.json colinpaice.json

If the dashboard.json is made read only, then people will not be allowed to change it online.

 

Is this dashboard for production or test?

I could not find a way of customise the colours of a page, so you cannot easily tell which is production and which is test etc.

I need a secure available solution.

You can use userids and passwords, or certificates to provide authentication.

You need to protect access to MQ objects

You need to protect the files used by mqweb, especially the key store, and the mqwebuser.xml

If you update the mqwebsuser.xml file, it will pickup up changes a short while later (seconds rather than minute).

If you change the keystore or trust store you need to restart mqweb to pick up the changes.   This should take about 10s of  seconds.

Deploy scripts

All of the configuration can be done with scripts.  For example extract your mqwebuser.xml file, make machine specific changes and deploy it.

You can create the keystores in your secure build environment and deploy them.

House keeping

  • You should check /var/mqm/web/installations/Installation1/servers/mqweb/logs/ffdc daily for any files, and raise a PMR with PMR if you get any exceptions.
  • Check /var/mqm/web/installations/Installation1/servers/mqweb/ daily.  I was getting large (700MB) dumps in this directory, which caused my machine to go short on disk space.
  • Display the server certificate expiry date (any any CA certificates) and put a date in your diary to check (and renew) them.
  • Your enterprise should have a process for renewing personal certificates

Someone joins the department

  • Connect them to the appropriate group on all machines
  • Give them a symbolic link to the appropriate dashboard file, in /var/mqm/web/installations/Installation1/angular.persistence

Collect statistics on the MQ console and REST requests, and the JVM

See these posts