How do I use the mid-range MQ accounting data?

Like many topics I look at, this topic seems much harder than I expected.  I thought about this as I walked the long way to the shops, up a steep hill and down the other side (and came back the short way along the flat).  I wondered if it was because I ignore the small problems (like one queue manager and two queues), and always look at the bigger scale problems, for example with hundreds of queue managers and hundreds of queues.

You have to be careful processing and interpreting the data, as it is easy to give an inaccurate picture see “Be careful” at the bottom of this post.

Here are some of the things I found out about the data.

Enabling accounting trace

  • You need to alter qmgr acctQ(ON) acctqMQI(ON) before starting your applications  (and channels).  Enabling it after the program has started running does not capture the data for existing tasks.  You may want to consider enabling accounting trace, then shutdown and restart the queue manager to ensure that all channels are capturing data.
  • The interval ACCTINT is a minimum time.  The accounting records are produced after this time, and with some MQ activity.  If there is no MQ activity, no accounting record is produced, so be careful calculating rates eg messages processed a second.
  • Altering ACCTINT does not take effect until after the accounting record has been produced.  For example if you have ACCTINT(1800) and you change it to ACCTINT(60), the “ACCTINT(1800)” will have to expire, produce the records, then the ACCTINT(60) becomes operative.
  • You can use the queue attribute ACCTQ() to disable the queue accounting information for a queue.

The data

  • The MQI accounting has channel and connection information, the queue accounting does not.  This means you cannot tell the end point from just the queue data.
  • The MQI accounting and Queue accounting have a common field the “connectionId”.   It looks like this is made up with the first 12 characters of the the queue manager name, and a unique large number (perhaps a time stamp).  If you are using many machines, with similar long queue manager names you may want to use a combination field of machineName.connectionId, to make this truly unique.
  • I had an application using a dynamic reply to queue.  I ran this application 1000 times, so 1000 dynamic queues were used. When the application put to the server, and used the dynamic reply queue, the server had a queue record for each dynamic queue.   There were up to 100 queue sections in each queue record, and 11 accounting queue messages for the server ( 1000 dynamic queues, and one server input queue).  These were produced at the end of the accounting interval, they all had the same header information, connectionId, start time etc. You do not know in advance how many queue records there will be.
  • Compare this to using a clustered queue on a remote queue manager, the server queue accounting record on the remote system.had just two queues, the server input queue, and the SYSTEM.CLUSTER.TRANSMIT.QUEUE.
    • The cluster receiver channel on the originator’s queue manager had a queue entry for each dynamic queue.
  • In all my testing the MQI record was produced before the queue accounting record for a program.   If you want to merge the MQI and the Queue records, save information from the MQI record in a table, keyed with the connectionId. When the Queue records come along you use same connectioID key to get the connection information and MQI data.
    • You can get rid of the MQI key data from your table when queue record has less than 100 queues.
    • If the queue record has exactly 100 queues, you do not know if this is middle in a series or the last of the series.  To prevent a storage leak, you may want to store the time within the table and have a timer periodically delete these entries after a few seconds – or just restart the program once a day.
  • The header part of the data has a “sequenceNumber” field.    This  is usually incremented with every set of records.
  • On the SYSTEM.ADMIN.ACCOUNTING.QUEUE, messages for different program instances can be interleaved, for example client1 MQI, client2 MQI, client1 Queue, client3 MQI, client2 Queue, client1 Queue, client3 Queue.
  • You do not get information about the queue name as used by an application, the record has the queue name as used by the queue manager (which may be the same as that which the application used).  For example if your program uses a QALIAS -> clustered queue.   The queue record will have the remote queue name used: SYSTEM.CLUSTER.TRANSMIT.QUEUE, not what the application used.
    • You can use the activity trace to get a profile of what queues an application uses, and feed this into your processing
  • You do not get information about topics or subscriptions names.
  • You may want to convert connectionName  from 127.0.0.1 format to a domain name.

Using the data in your enterprise

You need to capture and reduce the data into a usable form.

From a central machine you can use the amqsevt sample over a client channel for each queue manager and output the data in json format.
I used a python script to process this data. For example:

  • Output the data into a file of format yymmdd.hh.machine.queueManager.json.  You can then use program like jq to take the json data for a particular day (or hour of day) and merge the output from all your queue managers to one stream, for reporting.
    • You get a new file every day/every hour for each queue manager, and allows you to archive or delete old files.
  • Depending on what you want to do with the data, you need to select different fields.  You may not be able to summarise the data by queue name,  as you may find that all application are using clustered queues, and so it is reported as SYSTEM.CLUSTER.TRANSMIT.QUEUE.  You might consider
    • ConnectionName – the IP address where the client came from
    • Channel Name
    • Userid
    • The queue manager where the application connected to
    • The queue manager group (see below)
    • The program name
    • Record start time, and record end time
    • The interval of the record – for a client application, this may be 1 second or less.  For a long running server or channel this will be the accounting interval
    • Number of puts that worked.
    • Number of gets that worked
    • Number of gets that failed.
    • Queue name used for puts
    • Queue name used for gets.
  • You can now display interesting things like
    • Over a day, the average number of queue managers used by an application.   Was this just one (a single point of failure), mostly using one (an imbalance), or spread across multiple queue managers(good).
    • If application did a few messages and ended, and repeated this frequently.  You can review the application and get it to stay connected for longer, and avoid expensive MQCONN and MQDISC, and so save CPU.
    • Did an application stay connected to one queue manager all day?  It is good practice to disconnect and reconnect perhaps hourly to spread the connections across available queue managers
    • You could charge departments for usage based on userid, or program name, and charge on the number or size of messages processed and connections (remembering that an MQCONN and MQDISC are very expensive).
    • You may want to group queue managers  so the queue managers FrontEnd1, FrontEnd2, FrontEnd3 would be grouped as FrontEndALL.   This provides summary information.

Be careful

The time interval of an accounting record can vary.  For example if you are collecting data at hh:15 and hh:45, and you business hours are  0900 to 1700.  If you have no traffic after 1700, and the next MQ request is 0900 the next day, the accounting record will be from 1645 today to 0901 tomorrow.

  • If you are calculating rates eg messages per second then the rates will be wrong, as the time interval is too long.
  • The start times of an interval may vary from day to day.  Yesterday it may have been 0910 to 0940, today it is 0929 to 0959.  It makes it hard to compare like for like.
  • If you try to partition the data into hourly buckets 0900 to 1000,1000 to 1100 etc. using data from 0950: 10:20 by putting a third of the numbers in the 0900:1000 bucket, and the other two thirds into 1000:1100 bucket, then for a record from1645 today to 0900 tomorrow will spread the data across the night and so give a false picture.

You may want to set your accounting interval to 30 minutes, and restart your servers  at 0830, so they are recording on the hour.  At the end of the day, shut down and restart your servers to capture all of the data in the correct time interval bucket

When is mid-range accounting information produced?

I was using the mid-range accounting information to draw graphs of usage, and I found I was missing some data.

There is a  “Collect Accounting” Time for every queue every ACCTINT seconds (default 1800 seconds = 30 minutes).  After this time, any MQ activity will cause the accounting record to be produced.  This does not mean you get records every half hour as you do on z/OS, it means you get records with a minimum interval of 30 minutes for long running tasks.

Setup

I had a server which got from its input queue and put a reply message to the reply-to-queue.

Every minutes an application started once a minute which put messages to this server, got the replies and ended.

When are the records produced?

Accounting data is produced (if collecting is enabled) when:

  • an MQDISC is issued, either explicitly implicitly
  • for long running tasks  the accounting record(s) seems to be produced at when the current time is past the “Collect Accounting time”, when there has been some MQ activity. For example  there were accounting records for a server at the following times
    • The queue manager was started at 12:35:51, and the server started soon afterwards
    • 12:36:04 to 13:06:33.   An application put a message to the server queue and got the response back.   This is 27 seconds after the half hour
    • 13:06:33 to 13:36:42  The application had been putting messages to the server and getting the responses back.   This is 6 seconds after the half hour
    • 13:36:42 to 14:29:48 this interval is 57 minutes.  The server did no work from 1400 to 14:29:48 ( as I was having my lunch).  At 14:29:48 a message arrived, and the accounting record was written for the server.
    • 14:29:48 to 15:00:27 during this time messages were being processed, the interval is just over the 30 minutes.

What does this mean?

  • If you want accounting data with an interval “on the half hour”, you need to start your queue manager “just before the half hour”.
  • Data may not be in the time period you expect.  If you have accounting record produced at 1645, the data collected between 1645 and 17:14  may not appear until the first message is processed the next day. The record havean  interval  from 16:45 to  09:00:01 the next day.  You may not want to display averages if the interval is longer than 45 minutes.
  • You may want to stop and restart the servers every night to have the accounting data in the correct day.

 

mqweb – how to get a chrome browser trace

How to get a chrome trace

See Troubleshooting Chrome network issues  and the description here on how to collect a trace.

  • Open a tab with the chrome://net-export/ url.
  • Click start logging to disk
  • Select a file location
  • In another tab select the mqweb url
  • Click on the “stop” button in the window
  • If you select show file – it opens the json file.   This has all the information you need to process the file, but it is much easier to use the provided tools
  • The filename is given for example “FILE: /home/colinpaice/Downloads/chrome-net-export-log.json
  • Click on “The log file can be loaded using the netlog_viewer.” link.   This gets you to a page which says
  • This app loads NetLog files generated by Chromium’s chrome://net-export. Log data is processed and visualized entirely on the client side (your browser). Data is never uploaded to a remote endpoint.
  • Select  https://netlog-viewer.appspot.com/ to invoke the formatter.
  • Drag your netlog file, or use “choose file”
  • Select events, and this displays all of the traffic
  • In the search bar at the top enter your port 9443, or error
  • You get a list like
  • NONE HOST_RESOLVER_IMPL_REQUEST
    1083 URL_REQUEST https://127.0.0.1:9443/ibmmq/console/
    1084 DISK_CACHE_ENTRY
    1085 HTTP_STREAM_JOB_CONTROLLER https://127.0.0.1:9443/
    1086 HTTP_STREAM_JOB https://127.0.0.1:9443/
  • If the background  is pale green – it is good.  If it is pink (pale red) there was a problem.
  • Click on a line and it displays trace information in a window.  For example the first URL_REQUEST gave
    • t= 8 [st= 8]        HTTP_STREAM_JOB_CONTROLLER_BOUND
                          --> source_dependency = 1089 (HTTP_STREAM_JOB_CONTROLLER)
      t=65 [st=65]        HTTP_STREAM_REQUEST_BOUND_TO_JOB
                          --> source_dependency = 1090 (HTTP_STREAM_JOB)
      t=65 [st=65]     -HTTP_STREAM_REQUEST
      t=65 [st=65]      URL_REQUEST_DELEGATE_SSL_CERTIFICATE_ERROR  [dt=1]
      t=66 [st=66]      CANCELLED
                        --> net_error = -200 (ERR_CERT_COMMON_NAME_INVALID)
      t=66 [st=66]   -URL_REQUEST_START_JOB
                      --> net_error = -200 (ERR_CERT_COMMON_NAME_INVALID)
      t=66 [st=66]    URL_REQUEST_DELEGATE_RESPONSE_STARTED  [dt=0]
      t=66 [st=66] -REQUEST_ALIVE
      
    • SSL_CONNECT_JOB gave me
      1087: SSL_CONNECT_JOB
      ssl/127.0.0.1:9443
      Start Time: 2020-01-29 08:41:25.699
      t= 1 [st= 0] +CONNECT_JOB  [dt=64]
      t= 1 [st= 0]    SOCKET_POOL_CONNECT_JOB_CREATED
                      --> backup_job = false
                      --> group_id = "ssl/127.0.0.1:9443"
      t= 1 [st= 0]   +SSL_CONNECT_JOB_CONNECT  [dt=64]
      t= 1 [st= 0]     +TRANSPORT_CONNECT_JOB_CONNECT  [dt=0]
      t= 1 [st= 0]        HOST_RESOLVER_IMPL_REQUEST  [dt=0]
                          --> address_family = 0
                          --> allow_cached_response = true
                          --> host = "127.0.0.1:9443"
                          --> is_speculative = false
      t= 1 [st= 0]        CONNECT_JOB_SET_SOCKET
      t= 1 [st= 0]     -TRANSPORT_CONNECT_JOB_CONNECT
      t=65 [st=64]      CONNECT_JOB_SET_SOCKET
      t=65 [st=64]   -SSL_CONNECT_JOB_CONNECT
                      --> net_error = -200 (ERR_CERT_COMMON_NAME_INVALID)
      t=65 [st=64] -CONNECT_JOB
      

Understanding Chromium trace and performance data

I found this link very useful to explain the developer information, such as trace, performance etc.

mqweb – performance notes

  • I found facilities in Liberty which can improve the performance of your mqweb server by 1% – ish, by using http/2 protocol and ALPN
  • Ive documented where time is spent in the mq rest exchange.

Use of http/2 and ALPN to improve performance.

According to Wikipedia, Application-Layer Protocol Negotiation (ALPN) is a Transport Layer Security (TLS) extension that allows the application layer to negotiate which protocol should be performed over a secure connection in a manner that avoids additional round trips and which is independent of the application-layer protocols. It is needed by secure HTTP/2 connections, which improves the compression of web pages and reduces their latency compared to HTTP/1.x.

mqweb configuration.

This is a liberty web browser configuration, see this page.

For example

 <httpEndpoint id="defaultHttpEndpoint"
   host="${httpHost}" 
   httpPort="${httpPort}"
   httpsPort="${httpsPort}"
   protocolVersion="http/2"
   >
   <httpOptions removeServerHeader="false"/>

</httpEndpoint>

Client configuration

Most web  browsers support this with no additional configuration needed.

With curl you specify ––http2.

With curl, ALPN is enabled by default (as long as curl is built with the ALPN support).

With the curl ––verbose option on a curl request,  you get

  • * ALPN, offering h2 – this tells you that curl has the support for http2.
  • * ALPN, offering http/1.1

and one of

  • * ALPN, server did not agree to a protocol
  • * ALPN, server accepted to use h2

The “* ALPN, server accepted to use h2” says that mqweb is configured for http2.

With pycurl you specify

 c.setopt(pycurl.SSL_ENABLE_ALPN,1)
 c.setopt(pycurl.HTTP_VERSION,pycurl.CURL_HTTP_VERSION_2_0)

Performance test

I did a quick performance test of a pycurl program getting a 1024 byte message (1024 * the character ‘x’) using TLS certificates.

HTTP support Amount of “application data” sent Total data sent.
http/1.1 2414 7151
http/2 2320 7097

So a slight reduction in the number of bytes send when using http/2.

The time to get 10 messages was 55 ms with http/2, and 77ms with http/1.1,  though there was significant variation in repeated measurements, so I would not rely on these measurements.

Where is the time being spent?

cURL and pycurl can report the times from the underlying libcurl package.  See TIMES here.

The times (from the start of the request) are

  • Name lookup
  • Connect
  • Application connect
  • Pre transfer
  • Start transfer
  • Total time

Total time- Start transfer = duration of application data transfer.

Connect duration = Connect Time – Name lookup Time etc.

For a pycurl request getting two messages from a queue the durations were

Duration in microseconds First message Second messages
Name_lookup 4265 32
Connect 53 3
APP Connect 18985 0
Pre Transfer 31 42
Start Transfer 12644 11036
Transfer of application data 264 235

Most of the time is spent setting up the connection, if the same connection can be reused, then the second and successive requests are much faster.

In round numbers, the first message took 50 ms, successive messages took between 10 and 15 ms.

Rant: I find the IBM Knowledge center on the web runs like a dog with a wooden leg

While playing with the mqweb stuff, I found I was searching for materials on mqweb in the IBM knowledge center.    I got fed up with it being so slow, so I’ve spent some time looking into it.  The slowness may be due to “performance code”  within the page which measures how slowly it goes.  We had a basset hound who had one of its front legs in plaster, and the display of the web pages reminds me of how it used to run.

It is so bad, I see the picture stuttering as it build up.

  • I see the blue header
  • then “Do you want to” which finally ends up at the bottom of the screen.
  • table of contents on the left hand side
  • the page with the content on it appears
  • finally the banner saying “free trial.   Try RESTful APIs to and from your IBM Z mainframe”.

This banner is annoying – I cannot  get rid of it.  It takes up 2cm out of the 15 cm space in my browser – that’s 13% of the real estate!  I keep being asked to give comments on the web site… I do, but I think any comments are being ignored.

I compared the IBM site with the BBC, which has lots of coloured image files,  using the “lighthouse” capability within the Chrome browser.

Site First meaningful paint, seconds Time to interactive, seconds
IBM 9.1 KC page q132130_.htm 0.6 5.8
BBC news page with lots of images 0.3 1.5

Wow, 5.8 seconds – even worse than I thought!

With my broadband, I get download speed of about 53 Mb/Second and upload about 17 Mb/Second.  Ping took about 30 ms to both IBM and to the BBC.  We are on an island, north of Scotland, so I think our response time is typical.

How did I get this data?

In Google Chrome,  Ctrl-shift I, select the Audits tab,  type your URL at the top, press enter

Select “desktop”, Performance, No Throttling.

Click on “Run Audits”.  It runs for a few seconds and stops.

There is a lot of good information.

If you click on “view trace” button, then at the bottom you get a summary chart.

  •      93 ms Loading
  • 3419 ms Scripting
  •   321 ms  Rendering
  •     31 ms Painting
  •  885 ms System

So most of the time is spent scripting!

What sites are used?

I took the trace file, extracted the records with “url” and counted the occurrences.

  • 7357 1.www.s81c.com – an IBM site
  • 5347 http://www.ibm.com
  •   240 tags.tiqcdn.com – Tealium enterprise tag management and marketing software.
  •     42 consent.trustarc.com – TrustArc Cookie Consent Manager
  •    34 9j ?
  •     25 consent.truste.com – TrustArc Cookie Consent Manager
  •     13 consent-st.trustarc.com – TrustArc Cookie Consent Manager
  •    12 js.logentries.com –  Live Log Management and Analytics
  •      7 mapvip.podc.sl.edst.ibm.com
  •      3 www-api.ibm.com
  •      3 idaas.iam.ibm.com

And there was me thinking that the knowlegde center was like a fat pipe squirting down the data, when in fact it sends data down a drop at a time. It also tells other sites what you are looking at.

You can use the “source” tab, and explore all the files which were downloaded.  For example  there is the >V9.1.0  jpg file, along with .js and .css files used in formatting.

What are the top use java script files?

There seem to be a couple of hot java script files, taking over 2 seconds.  (on http://www.s81c.com file  js/www.js… )  The text inside the files begin with IBMPerformance…   I think that a hot function within this,  is the time function, so maybe this code is timing every thing it does, and so slowing it down.

What helps me?

This link explains how to understand the trace and performance data from Chrome.

 

mqweb what’s the difference between the message API and the admin API?

At first glance it looks like the answer is in the question.  You can use

  • the messaging REST API put and get messages
  • the admin REST API to administer queue manager objects

In a couple of places the IBM documentation says you can use the messaging API to administer your objects, which is true at the general sense, but not the specific sense.  Until I hit a problem I thought there was one “messaging REST API” with different flavors of syntax.

Security

The admin API authorisation is managed through <security-role name=”MQWebAdmin”> and <security-role name=”MQWebAdminRO”> sections in the mqwebuser.xml file.

The messaging API authorisation is managed through <security-role name=”MQWebUser”> sections.

Access to resources is done using the Alternate Userid.  I can see in the activity trace that the userid is colinpaice(the id mqweb is running under), but the open of a queue was done with alternate userid testuser.  When I tried to browse messages on a queue, I got a message saying my userid did not have the correct authority. I used setmqaut, and mqsc command refresh security(*) to resolve it.

Cost of the admin interface

The admin interface has a request like

https://127.0.0.1:9443/ibmmq/rest/v1/admin/qmgr/QMA/queue/CP0000?attributes=*

which returns all of the attributes of the queue CP0000.  From the activity trace we can see

  • MQCONN + MQDISC
  • MQOPEN, MQINQ, MQCLOSE of the manager object – twice
  • MQOPEN, MQPUT, MQCLOSE to the SYSTEM.ADMIN.COMMAND.QUEUE
  • MQOPEN, MQGET, MQCLOSE to the SYSTEM.REST.REPLY.QUEUE
  • MQCMIT
  • MQBACK – the JMQI code always does this to be sure that there is no outstanding unit of work,

The most expensive request is the MQCONNect.

Using the admin interface is fine for administration because changes to objects are usually done infrequently.   If you are considering the admin interface to monitor objects, for example plot queue depths over time, the mq rest API may not be the best solution.

Cost of the messaging interface

The messaging API interface uses connection pooling.   When the application does an MQDISC, the connection is returned to a pool, and can be reused if the same userid does an MQCONN.  If the connection is not used for a period, it can be removed from the pool and an MQDISC done to release the connection.    This should eliminate frequent MQCONN and MQDISCs.

From the activity trace we see

MQOPEN, MQGET,MQGET,MQCLOSE of the queue, and no MQCONN.

There will be an MQCONN, is there is no connection available for that userid in the pool, but this should be infrequent.

Getting MQConsole (brower interface to administer MQ via mqweb) working

It was a new year, as I sat in my basement cave while the gale force winds blow around the house, I thought I would try to use the new MQWeb and MQConsole, and see how it stands up to “the Paice treatement”.    The MQWeb allows you to administer MQ from a web browser, or a rest interface (for example using CURL or Python).  This technology has been around for a few years now.  I know it is being enhanced every few months through the continuous delivery channel.

The installation and getting started reminded me of an old car belonging to my father.  The car was not easy to get started (it had a starting handle!), but once it was started it worked pretty well.

Getting it up and running in a test sandbox took about 1 hour.   It took me about two week to get mqweb set up properly using digital certificates, and to document how I did it.  Being security related, there must be a team which tries to make it as hard as possible to diagnose problems so as not to provide useful information to a hacker.  It also took a while to  work out how to use mqweb  in an enterprise where you have multiple machines and have to support many users.  It also feels a bit buggy and some of it was frustrating, but as it is being continuously improved, I am sure it will get better.

Ive written some blog posts

I had MQ 9.1.3 running on my laptop running Ubuntu 18.04.

Getting it installed and up and running.

Initially I followed the  9.1 instructions here.   After lots of clicking and guessing I got to this page which gave me some instructions (but they were not very helpful). There are various mistakes on the page such as var/mqm/web should be /var/mqm/web.  I ignored the instructions and simply used sudo apt install /home/colinpaice/…/ibmmq-web_9.1.3.0_amd64.deb to install it.

The configuration file /opt/mqm/web/mq/samp/configuration/basic_registry.xml has predefined userids and the configuration is suitable to have an initial look at the MQWEB.

I used

cp /opt/mqm/web/mq/samp/configuration/basic_registry.xml 
/var/mqm/web/installations/Installation1/servers/mqweb/mqwebuser.xml

to copy the configuration file.

Starting and stopping the mqweb

The strmqweb command failed for me.   This was strange  as commands like strmqm works.  This is because there is a symbolic link /usr/bin/strmqm which points to /opt/mqm/bin/strmqm, but no link for the mqweb commands.

See here  which explains there is a /usr/bin/strmqm → /opt/mqm/bin/strmqm , but not for the mqweb stuff. I think this is an IBM Whoops.

I created these myself using

sudo ln -s /opt/mqm/bin/dspmqweb /usr/bin/dspmqweb
sudo ln -s /opt/mqm/bin/endmqweb /usr/bin/endmqweb
sudo ln -s /opt/mqm/bin/setmqweb /usr/bin/setmqweb
sudo ln -s /opt/mqm/bin/strmqweb /usr/bin/strmqweb

The configuration file is deep down a directory tree.

I created  a symbolic link to the file using

ln -s /var/mqm/web/installations/Installation1/servers/mqweb/mqwebuser.xml web.xml

so I can do  gedit ~/web.xml

and if you forget where the file really is, use ls -l web.xml

I used the strmqweb command to start the mqweb server.

I used dspmqweb and got

MQWB1124I: Server ‘mqweb’ is running.

MQWB1123E: The status of the mqweb server applications cannot be determined.  A request was made to read the status of the deployed mqweb server applications, however the data appears corrupt. This may indicate that there is already an mqweb server started on this system, probably related to another IBM MQ instance.

The MQWB1123E message only happened occasionally – I think it is a timing problem and can be ignored.

I stopped the mqweb instance using endmqweb

Log files

There is a file /var/mqm/web/installations/Installation1/servers/mqweb/logs/console.log  which has audit type statement in it.

There is a file /var/mqm/web/installations/Installation1/servers/mqweb/logs/messages.log which has more messages (including time stamps).   This file is more useful.

I defined a symbolic link to this file, to make debugging easier.

ln -s/var/mqm/web/installations/Installation1/servers/mqweb/logs/messages.log messages.log 

When the strmqweb command is issued,

  • it deletes the previous console.log
  • it rename the messages.log to a file with a time stamp in the file name
  • it deletes any other message logs files.

After starting and stopping the web server several times the only files I had were

  • messages_20.01.05_15.18.50.0.log
  • messages.log
  • console.log

You may want to put the strmqweb command in a shell script which saves away any message and console files.

The command dspmqweb gives output like

MQWB1124I: Server 'mqweb' is running.
URLS:
  https://localhost:9443/ibmmq/rest/v1/
  https://localhost:9443/ibmmq/console/

This tells you which URL you need to use.

Note: port 9443 is the default port for WebSphere Liberty Profile.  If it is in use you will have to configure a different port.

First logon

I logged on to Firefox browser using the address https://localhost:9443/ibmmq/console/Make sure you the https in https:… .  If you use http: without the https, the logon fails with message “The connection was reset”.

Using https:… gave me big error screen and

Warning: Potential Security Risk Ahead 
localhost:9443 uses an invalid security certificate.
The certificate is not trusted because it is self-signed.
Error code: MOZILLA_PKIX_ERROR_SELF_SIGNED_CERT

While  you are exploring the mqconsole, you can accept this.  To fix it properly is a big piece of work.  See my other blog posts.

I signed on using userid mqadmin and password mqadmin and it showed the queue managers.

Select the row of an active queue manager. The table header changes to give options.  Select properties to display the queue manager properties.

The queue manager attributes do not refresh in real time

You have to go back to the queue manager table and re-display the data.   This is not a big problem as the attributes do not typically change frequently.  I noticed this when I changed an accounting parameter, and the attribute page did not show the change.

Adding widgets to the dashboard.

There are two ways of adding widgets for MQ objects.

  1. From the list of queue manages, select a queue manager, then on the title like, click on the “…”  (more actions) button and select “Add new dashboard tab”. This creates a dashboard with all of the MQ Objects defined, MQ Queues, client connections, MCA connections, listeners etc. You can select and delete widgets you do not need.
  2. Click on the “Add widget” button.

It may be quicker and easier to use the first option to add all widgets and delete the widgets you do not need.

Create more tabs

At the top of the browser window next to the “tab”,  click on the “+”. This defines a new dashboard, use the add widget button to select the widget you want to define.

Each userid has their own dashboard (tab layout and widget)

See the next topic if you want people to have the same dashboard.

Export the dashboard for enterprise deploy or backup

At the top of the screen is an icon with three vertical dots for dashboard settings. You can export the dashboard and widgets to a JSON file.

  • You can change the queue manager names and import it on another queue manager.  This is useful to enterprise users who have to support many queue managers in a similar environment.  Being a JSON file you can process the JSON to change queue manager names.  I could not find a way of importing it except from a web page.  This make it challenging to deploy automatically.
  • You can have another user import it, so they get the same dashboard.  If it changes, they have to import it manually.
  • You may want to export your dashboard every week and back it up.

Using the widgets

I clicked on the “Queue on …”  widget.

I clicked on the “Queue depth” column for queues, and it quickly sorted the queue depth.

I could see I had a total of 33 non system queues. By clicking on the settings wheel, I could select “show system objects”.

If you select the settings wheel, you can select a different queue manager.  By changing this you could have one tab showing queues on different queue managers on the machine, and another tab showing channels on different queue managers on the machine.  You could also have a tab per queue manager, and have queues and channels for one queue manager on that tab.

I could refresh a widget by using the refresh icon.

There is a search box at the top of each widget. It searches for the value in any column. So typing in 003 gave me queue CP00003 and DEEPQ with depth 1000003.
At the bottom of the widget it said Total: 90 Filtered:2

If you select a row, the search box changes and give you a list of actions.

  • Delete queue
  • Properties
  • Put message
  • Browse message
  • More actions → Manage authority records
  • 1 item selected
  • Deselect

You can select all the objects in a widget by typing “a” , or to deselect using “shift a”.  Note: it selects all items – not just the filtered items. For example I typed “a” and the header line said “33 objects selected”. At the bottom of the widget is said total 33 filtered 8.  So be careful if you were thinking of doing bulk changes on all objects.

I was unable to select more than one object, using the cursor keys.

It was easy to delete widgets by selecting the X icon.

You can move the widgets around by grabbing the title line and dragging it.

If you hover on the title line of a widget, a pencil icon appears which allows you to rename the widget.

You can control how many widgets are displayed per line by clicking on the down arrow in the tab (at the top of the page) and selecting how many columns to use.  This is a not very smart.

  • I selected 5 column layout.
  • It did not reflow the widgets automatically.  Each line had 2 widgets and lots of space to the right.  I could drag a widget to the top line.  If I then went to 2 column layout, and back to 5 column layout – I got back to two widgets per line
  • If you select an item, the search box becomes a list of icons.  With a narrow widget, you only get the as many icons as fit in the space, for example you do not get the  “…” (more action) icon.
  • The formatting within a table is not very smart. I had a truncated queue name SYSTEM.ADMIN.CH and lots of space for the queue depth. I think the data is displayed in a table and the columns are the same width, and not changeable.

It may be better to have no more than 2 or 3 widgets per line.

Using operating system security.

The basic mqweb configuration file used hard coded userids mqadmin with password mqadmin. This is not very secure.

You can use the operating system userids and passwords using a different configuration file

I used

  • cp /opt/mqm/web/mq/samp/configuration/local_os_registry.xml  /var/mqm/web/installations/Installation1/servers/mqweb/mqwebuser.xml
  • chmod o+w /var/mqm/web/installations/Installation1/servers/mqweb/mqwebuser.xml
    • to give me update access to the file.

I changed my file to have

<enterpriseApplication id="com.ibm.mq.console">
  <application-bnd>
    <security-role name="MQWebAdmin">
      <user name="colinpaice" realm="defaultRealm"/>
    </security-role>
    <security-role name="MQWebAdminRO">
      <group name="test"/>
    </security-role>
   </application-bnd>
</enterpriseApplication>

Notes.

  • The realm=”defaultRealm” is to do with Jave Enterprise Edition security. Just specify it.
  • Each security-role name section must be unique. I specified <security-role name=”MQWebAdminRO”>… twice. Only the last one was used, I was hoping it would be cumulative.
  • You can specify multiple <user …> or <group… > lines.

See here  and here  for pointers to the IBM documentation.

Managing mqwebuser.xml

You can include files into the mqwebuser.xml files using the xml

<include optional="true" location="pathname/filename"/>
or
<include optional="true" location="url"/>

You can put groups of definitions in one file and have them included.

For example in the file payroll.xml have

<group name="mqsysprog"/>
<group name="payroll"/>

For each of the configuration files for the payroll queue managers have

<security-role name="MQWebAdmin">
  <include optional="true" location="payroll.xml"/> </security-role> <security-role name="MQWebAdminRO"> <group name="test"/> </security-role>

How do I check what role I have?

At the top right of your browser window is a porthole with a circle in it. Click on this, and then click on “about”. It gave me

Principal:colinpaice - Administrator (Password Authentication)
A different userid gave
Principal:testuser - Read-Only Administrator (Password Authentication)

Can I have the logon time out?

Yes, you set a time out value using the ltpaExpiration value. See here.

Use dspmqweb properties -a|grep ltpaEx  and note the ltpaExpiration value.

Use  setmqweb properties -k ltpaExpiration -v time    to set the time in minutes.

Note:

  • After you are logged on for this time period, your session is cancelled and you have to logon again, this happens whether the session is busy or idle.
  • The setmqweb command updates the mqwebuser.xml file on disk. If you were editing the file you will need to reload the file from disk and reapply the changes.
  • The above setmqweb command added <variable name=”ltpaExpiration” value=”10″/> to the mqwebuser.xml file. You could just update the file yourself and avoid this concurrent  update problem.

There is one timeout value for all users, so if you have a screen displaying charts from mqweb, this will also time out.

If you are using certificates to provide authentication

  • your session will be dropped, and automatically reconnected.
  • you cannot logoff – you have to drop the browser tab
  • in the top right of your page the icon will be a black circle with a which “i” in it.  If you are not using certificates this will be a porthole with a circle in it.

Stackoverflow: What throughput can a standalone Java program achieve?

There was a question on the MQ section on StackOverflow

I have a standalone multi threaded java application which listen messages from IBM MQ.
Current system take around 500ms for processing of 1 message after it read from queue and till it commit.
I want to know how many messages I can consume

  • Concurrently:
  • Max number of messages can be processed? or throttle limit

A good meaty performance question I thought.  Let me break this into pieces.

Current system take around 500ms for processing of 1 message after it read from queue and till it commit.

Processing one messages and commit should take about 10 milliseconds or less( say 30 ms for a two phase commit).    There is clearly something else going on.  Fix this first.

  1. A long database call.   This could be due to database locking, or a badly designed statement, for example a query which needs to access thousands or millions of rows.
  2. A request to a server far far away
  3. A file system with the speed of writing an illuminated letter to parchment

How many messages I can consume: Concurrently:

Take the worst case of using persistent messages, which require log IO during commit.

For one thread, processing multiple messages before doing a commit means the thread can do more work.  Consider a get taking 1 millisecond, and a commit taking 10 ms. This is one message processed every 11 ms.  If you did 50 gets – taking 50 ms and a commit taking 10 ms, this is 50 messages in 50 + 10 ms which equates to one message every 1.2 milliseconds almost 10 times faster.    This is how channels can send messages efficiently.   There is a “sweet spot” of messages per commit to give you maximum data processed per second.   This depends on the message size, logging rates and other factors.  For a 100MB message it is one message per commit.  For 10KB messages,  this may be 1000 messages per commit.

This may be selfish

This is clearly a great improvement, but possibly selfish.  If the application logic is a get followed by a database insert, followed by a commit, then doing 50 gets, 50 inserts and a commit, will work much faster.  The down side is that the database requests will keep locks until the commit.  These locks may prevent other applications from accessing data, either the recently inserted  records, page locks, or index locks. So overall MQ throughput goes up – but the business transaction suffers.    You need to understand the database and find the optimum number of requests per commit for your business transaction.

How long before the data is visible?

Rather than have one thread process 1000 messages per commit (taking 1010 ms) you may want to have multiple threads processing 10 messages per commit – taking 20 ms.  This means that the data in the database (or replies etc) are visible earlier.    This may be important to your business transaction if you have to worry about response time.

Parallel  threads

  1. Using more threads should improve throughput, unless this is delayed by external factors – such as database locks.
  2. One customer found one thread was optimum because there was no database delays.

How many messages I can consume: Max number of messages can be processed? or throttle limit

There are papers written on this but here is a one minute overview

As fast as the queue manager can process data

  1. The rate at which MQ can write its logs
  2. Keep queue data in memory – ( buffer pools on z/OS, queue buffer on midrange), so few messages on the queue.

Threads

  1. Having parallel threads gives you better throughput than one thread.  You get overlapped writing to the log, the units of work are shorter in duration, you can get parallel IO.
  2. You may be limited by the network.   Having multiple threads from an application means the network can be better utilized.  One thread can be receiving data down the wire, while another thread is waiting in commit.
  3. You may be limited by where your programs run – eg short of CPU, or slow IO (for your System.out.println statements)

Application design

  1. You may get delays due to serialization if all thread are using the same queue.
  2. Remove the debug printf or System.out.println statements.
  3. Using a queue per business application is better than all applications sharing the same queue
  4. Using one reply to queue per web server may be better than a shared reply to queue – especially if you use Apache Camel.
  5. Use get first if possible.  Avoid scans of the queues.

 

The short answer….

You should be able to get thousands of 1KB messages a second through your Java application when using multiple threads.

 

How do I get a client to disconnect?

I had a question from a customer who asked how they can reduce the number of client connections in use.  They had tried setting a disconnect interval (DISCINT) on the channel, but the connections were like weeds – you kill them off, and they grow back again.

DISCINT is “the length of time after which a channel closes down, if no message arrives during that period”.  This sounds perfect for most people.   The application is in an MQGET, and if no messages arrive, the channel can be disconnected, and the application gets connection broken.   The application can then decide to disconnect or reconnect.
If the application is not in an MQGET, then it will get notified of the broken connection next time it tries to use MQ.

Independent applications

Many applications are well written in that when they get Connection Broken, they just reconnect again, and so the DISCINT has no effect on reducing the number of connections. This may be good for availability but not for resource usage.   It may be good to have 1000 application instances running the day, but perhaps not overnight when there is no work to do.   Ive seen instances where the applications do an MQGET every minute, and with 1000 instances this can use a lot of CPU and doing no useful work.  In this case you want unused application instances to stop, and be restarted when needed.

You cannot use triggering with client connections (unless you have a very smart trigger monitor to produce an event which says start a client program over there).

Use automation periodically check the queue depth, and number of input handles. If there is a high queue depth, or a low number of handles(eg 2)  then start more application instances, across your back-end servers.  Your applications can then disconnect if they have not received a message within say 10 minutes.  This should keep the right number of application instances active.

An administrator should be able to get this automation set up, but getting the application to connect could be a challenge, as this requires the application developer to change the code!

Running under a web server

If your applications are running under a web server you may have mis-configured connection pools.  You can specify the initial size of the pool, and this many connections are made.  As more connections are needed, then more can be added to the pool until the pool maximum is reached. You should specify a time out value, so periodically the pool gets cleaned up, and unused connections are removed, until the pool is back to the initial size.  You should review the initial size of the pools ( is it too large), and the value of the time out value.

This should just be an administrative change.

Good luck, you may be successful in reducing the number of client connections, but do not set your hopes too high.

WebSphere Liberty connectionPool statistics

This blog post explains how to get and understand statistics from WebSphere Liberty on connectionPool usage.

In your MDB application you can have code like

 InitialContext ctx = new InitialContext();
ConnectionFactory cf = (ConnectionFactory)ctx.lookup("CF3");

This says lookup the connection defined by CF3 and issue MQCONN for this connection.

In WebSphere Liberty you defined connection information in server.xml.  For example

<jmsConnectionFactory jndiName="CF3" id="CF3ID">
  <connectionManager maxPoolSize="2" connectionTimeout="7s"/> 
  <properties.wmqJms 
   queueManager="QMA"
   transportType="BINDINGS"
   applicationName="Hello"/>
</jmsConnectionFactory>

The maxPoolSize gives the maximum number of connections available in this pool.

If server.xml has

<featureManager>
   <feature>monitor-1.0</feature>
</featureManager>

then you can get out statistics on connectionPools using the JMX interface.

In ./usr/servers/test/jvm.options I had

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

Which defined the JMX port as 9010, and so I can get information through this port.

Looking at the output

There is documentation here on the connectionPool statistics.

You can use jconsole to get the JMX data, but this is not very usable, so I used jmxquery, which is part of a python package.  I installed it using pip install jmxquery.

I used the command

java -jar jmxquery.jar -url service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi -u admin c -p admin -q ‘WebSphere:*’ > outputfile

-q ‘WebSphere:*’  means give all records belonging to the WebSphere component.  If you say -q ‘*:*’ you get statistics for all components, see the bottom of the blog post.  Example output is given below.

This command wrote all of the output to file outputfile.  I then used grep to extract the relevant records.

grep WebSphere:type=ConnectionPoolStats,name outputfile

If you change a parameter in server.xml for the jmsConnectionPool, the pool is deleted, recreated, and the JMX data is reset.   If the pool has been reset, or not been used, statistics for that pool are not available.  On the first use of the pool the pool is created, and JMX statistics are available.

The JMX data for connectionPools

The data was like

WebSphere:type=ConnectionPoolStats,name=CF3/CreateCount (Long) = 2

The detailed records for WebSphere:type=ConnectionPoolStats,name=CF3 are

  • CreateCount (Long) = 2   this is the number of connections created,
  • DestroyCount (Long) = 0 this is the number of connections released because the pool was purged,
  • WaitTime (Double) = 76.36986301369863  there were insufficient threads.  For those threads that had to wait, this is the average wait time before a connection became available,
  • InUseTime (Double) = 18.905405405405407 the threads were active this time on average,
  • WaitTimeDetails/count (Long) = 98 requests because had to wait,
  • WaitTimeDetails/description (String) = Missing,
  • WaitTimeDetails/maximumValue (Long) = 110  the maximum wait time in milliseconds,
  • WaitTimeDetails/mean (Double) = 78.13265306122449 the average wait time,
  • WaitTimeDetails/minimumValue (Long) = 16 the minimum wait time,
  • WaitTimeDetails/standardDeviation (Double) = 16.474205982730254 the standard deviation,
  • WaitTimeDetails/total (Double) = 7657.0 in milliseconds.  7657/(number of waits 98) = average 78.13 (above),
  • WaitTimeDetails/unit (String) = UNKNOWN looks like a bug – this should be milliseconds,
  • WaitTimeDetails/variance (Double) = 271.82426517365184 ,
  • ManagedConnectionCount (Long) = 2  The total number of managed connections in the free, shared, and unshared pools,
  • ConnectionHandleCount (Long) = 0  this is the current handles in use,
  • FreeConnectionCount (Long) = 2  this is the number of connections in the pool, but not in use,
  • InUseTimeDetails/count (Long) = 101 – number of requests for a connection (ctx.lookup(“CF3”)),
  • InUseTimeDetails/description (String) = Missing,
  • InUseTimeDetails/maximumValue (Long) = 53 the maximum time the connection as in use in milliseconds,
  • InUseTimeDetails/mean (Double) = 18.099009900990097  the average time the connections were in use in milliseconds,
  • InUseTimeDetails/minimumValue (Long) = 10  the minimum time the connections were in use in milliseconds,
  • InUseTimeDetails/standardDeviation (Double) = 5.63923216261808,
  • InUseTimeDetails/total (Double) = 1828.0  in milliseconds.   This value(1828)/(number of connections used 101) gives the mean value 18.09 above,
  • InUseTimeDetails/unit (String) = UNKNOWN.

Note the order of the record can vary, for example CreateCount, can be first, or nearly last.

After a time interval aged connections can be released.  When there is sufficient workload to need more connections, they will be created as needed.  If the CreateCount increases significantly during the day, you may either have an irregular workload, or you need to increase you connectionTimeout value, to smooth out the connect/disconnect.

Having WaitTimeDetails/count=0 is good.  If this number is large in comparison to InUseTimeDetails/total then the pool is too small.

Other data you can get from JMX

  • IBM MQ:type=CommonServices
  • java.lang:type=ClassLoading
  • java.lang:type=Compilation
  • java.lang:type=GarbageCollector
  • java.lang:type=Memory
  • java.lang:type=Threading
  • osgi.core:
  • JMImplementation:type=MBeanServerDelegate
  • java.util.logging:type=Logging
  • java.nio:type=BufferPool,name=direct
  • java.lang:type=MemoryManager
  • java.lang:type=MemoryPool,name=Code Cache
  • java.lang:type=OperatingSystem
  • java.lang:type=Runtime
  • WebSphere:feature=apiDiscovery,name=APIDiscovery
  • WebSphere:feature=kernel,name=ServerInfo
  • WebSphere:type=JvmStats
  • WebSphere:type=ThreadPoolStats
  • WebSphere:type=ConnectionPoolStats ( as described above)
  • WebSphere:service=com.ibm.websphere.application.ApplicationMBean,name=CCP