NETTIME does not just mean net time

I saw a post which mentioned NETTIME and where people assume it is the network time.   It is more subtle than that.

If NETTIME is small then dont worry.   If NETTIME is large it can mean

  • Slow network times
  • Problems processing messages at the remote end

Consider a well behaved channel where there are 2 messages on the transmission queue

Sender end Receiver end
  • MQGET first message from the queue
  • TCP Send message in  buffer 1
  • MQGET second message from the queue
  • TCP Send message in buffer 2
  • MQGET – no message found
  • Do end of batch processing
    • TCP Send “End of batch” in buffer 3
    • Start timer
    • Wait
    • buffer arrives, wake up application
    • Stop timer. Interval is “Sender wait time”
  • Extract “receiver processing time” from reply buffer
  • Calculate NETTIME = “sender wait time” – “receiver processing time”
  • buffer 1 arrives, wake up Receiver channel application
  • buffer 2 arrives
  • TCP receive buffer 1 from network
  • MQPUT message 1 to the queue
  • buffer 3 arrives
  • TCP receive buffer 2 from network
  • MQPUT message 2  to the queue
  • TCP receive buffer 3 from the network – get  “end of batch” flag
    • Start timer
    • Issue commit
    • Stop timer
    • Send “OK + time interval back to Sender

In this case the NETTIME is the total time less the time at the receiver end.  So NETTIME is the network time.

In round numbers

  • it takes 2 millisecond from sending the data to it being received
  • get + send takes 0 ms ( the duration of these calls is measured in microseconds)
  • receive (when there is data) + MQPUT and put works, takes 0 ms
  • commit takes 10 ms
  • it takes 1 ms between sending the response and it arriving.
  • “10 ms” is sent in the response message

This is a simplified picture with details omitted.

The sender channel sees 13 ms  between the last send and getting the response.  (13 ms – 10 m)s is 3 ms – the time spent in the network.

 

Now introduce a queue full situation at the receiver end

Sender end Receiver end
  • MQGET first message from the queue
  • TCP Send message in buffer 1
  • MQGET second message from the queue
  • TCP Send message in buffer 2
  • MQGET – no message found
  • Do end of batch processing
    • TCP Send “End of batch” in buffer 3
    • Start timer
    • Wait
    • buffer arrives, wake up application
    • Stop timer. Interval is “Sender wait time”
  • Extract “receiver processing time” from reply buffer
  • Calculate NETTIME = “sender wait time” – “receiver processing time”
  • buffer 1 arrives, wake up Receiver channel application
  • buffer 2 arrives
  • TCP receive buffer 1 from network
  • MQPUT message 1 to the queue – it gets queue full, it pauses
  • buffer 3 arrives.  All of the data is in the buffers in TCP at this end.
  • after 500 ms the MQPUT succeeds.
  • TCP receive buffer 2 from network
  • MQPUT message 2 to the queue
  • TCP receive buffer 3 from the network – get “end of batch” flag
    • Start timer
    • Issue commit
    • Stop timer
    • Send “OK + time interval back to Sender

In round numbers

  • it takes 2 millisecond from sending the data to it being received
  • get + send takes 0 ms ( it is in microseconds)
  • receive (when there is data) takes 0 ms
  • the pause and retry took 500 ms
  • the second receive and MQPUT takes 0 ms
  • commit takes 10 ms
  • it takes 1 ms between sending the response and it arriving.
  • “10 ms” is sent ( as before) in the response message (the time between the channel code seeing the “end of batch” flag and the end of its processing
  • Buffer 3 with the “end of batch” flag was sitting in the TCP buffers for 500 ms

The sender channel sees 513 ms  between the last send and getting the response.  513 ms – 10 ms is 503  ms – and reports this as ” the time spent in the network” when in fact the network time was 3 ms, and 500 ms was spent wait to put the message.

Regardless of the root cause of the problem, a large nettime should be investigated:

  • do a TCP ping to do a quick check of the network
  • check the error logs at the receiver end
  • check events etc to see if the queues are filling up at the receiver end

The ccdt file was exactly where I specified it.

I had been successfully using a json ccdt file to use runmqsc as a client.  Because of a “.” it didn’t work when I used it elsewhere.

I had a directory of useful shell scripts.  In this directory I had my mqclient.ini with

CHANNELS:
MQReconnectTimeout=30
ReconDelay=(1000,200)(2000,200)(4000,1000)
ChannelDefinitionDirectory=.
ChannelDefinitionFile=ccdt.json

I had another directory with some shell scripts for some other applications.  I used

export MQCLNTCF=/home/colinpaice/… to point to the mqclient.ini file.

When I used runmqsc -c QMA I got

AMQ8118E: IBM MQ queue manager does not exist.

In /var/mqm/errors/AMQERR01.LOG was the error message AMQ9518E: File ‘./ccdt.json’ not found.

This was because I had specified ChannelDefinitionDirectory=. instead of a fully qualified name.  I had wrongly assumed that “.” meant the same directory as the mqclient.ini file.

The moral of this user error is to

  • use a fully qualified directory instead of “.”
  • or omit this parameter, and put the ccdt.json in the default location /var/mqm – but you may not have write access to this directory.
  • and go to the bottom of the class.

Do not panic so much if you are still running MQ V8

I had a question from a customer about being unable to use the REST API because they were still on MQ V8.

The announced end of support date for V8 was the end of April 2020. This gives them no time to upgrade test system, pre-production, and production.

Do not panic if you haven’t  upgraded yet – because the page says

This gives you a bit of time to get to MQ 9.1 – dont delay, raise a change request today.

Using Activity Trace to show a picture of which queues and queue managers your application used.

I used the midrange MQ activity trace to show what my simple application, putting a request to a cluster server queue and getting the reply, was doing.  As a proof of concept (200 lines of Python), I  produced the following

This output is a .png format.   You can create it as an HTML image, and have the nodes and links as clickable html links.

Ive ignored any SYSTEM.* queues, so the SYSTEM.CLUSTER.TRANSMIT.QUEUE does not appear.

The red arrows show the “high level” flow between queue managers at the “architectural”, hand waving level.

  • The application oemput on QMA did a put to a clustered queue CSERVER, there is an instance of the queue on QMB and QMC.   There is a red line from QMA.oemput to the queue CSERVER on QMB and QMC
  • The server programs, server running on QMB and QMC put the reply message to queue CP0000 on queue manager A

The blue arrows show puts to the application specified queue name – even though this may map to the S.C.T.Q.  There are two blue lines from QMA.oemput because one message went to QMC.CSERVER, and another went to QMB.CSERVER

The yellow lines show the path the message took between queue managers.  The message was put by QMA.oemput to queue CSERVER; under the covers this was put to the SCTQ.  From the accounting trace record this shows the remote queue manager and queue name:  the the yellow line links them.

The black line is getting from the local queue

The green line is the get from the underlying queue.  So if I had a reply queue called CP0000, with a QAlias of QAREPLY. If the application does a get from QAREPLY,  There would be a black line to CP0000, and a green line to QAREPLY

How did I get this?

I used the midrange activity trace.

On QMA I had in mqat.ini

applicationTrace:
ApplClass=USER # Application type
ApplName=oemp* # Application name (may be wildcarded)
Trace=ON # Activity trace switch for application
ActivityInterval=30 # Time interval between trace messages
ActivityCount=10 # Number of operations between trace msgs
TraceLevel=MEDIUM  # Amount of data traced for each operation
TraceMessageData=0 # Amount of message data traced

I turned on the activity trace using the runmqsc command

ALTER QMGR ACTVTRC(ON)

I ran some work load, and turned the trace off few seconds later.

I processed the trace data into a json file using

/opt/mqm/samp/bin/amqsevt -m QMA -q SYSTEM.ADMIN.TRACE.ACTIVITY.QUEUE -w 1 -o json > aa.json

I captured the trace on QMB, then on QMC, so I had three files aa.json, bb.json, cc.json.  Although I captured these at different times, I could have collected them all at the same time.

jq is a “sed” like processor for processing json data.   I used it to process these json files and produce one output file which the Python json support can handle.

jq . --slurp aa.json bb.json cc.json  > all.json

The two small python files are zipped here. AT.

I used ATJson.py python script to process the all.json file and extract out key data in the following format:

server,COLIN,127.0.0.1,QMC,Put1,CP0000,SYSTEM.CLUSTER.TRANSMIT.QUEUE,QMC,CP0000,QMA, 400

  • server, the name of the application program
  • COLIN, the channel name, or “Local”
  • 127.0.0.1, the IP address, or “Local”
  • QMC, on this queue manager
  • Put1, the verb
  • CP0000, the name of the object used by the application
  • SYSTEM.CLUSTER.TRANSMIT.QUEUE, the queue actually used, under the covers
  • QMC, which queue manager is the SCTQ on
  • CP0000, the destination (remote) queue name
  • QMA, the destination queue manager
  • 400 the number of times this was used, so 400 puts to this queue.

I had another python program Process.py which took this table and used python graphviz to draw the graph of the contents.  This produces a file with DOT  (graph descriptor language)parameters, and used one of the many programs to draw the chart.

This shows you what can be done, it is not a turn-key solution, but I am willing to spend a bit of time making it easier to use, so you can automate it.  If so please send me your Activity Trace data, and I’ll see what I can do.

My mid-range accounting is missing data

Like many of the questions I get asked, it started off as a simple question.

Over a day, what people and machines are using this queue?”  Simple I said.  Just use the accounting data.

Problem 1

I had a program which puts a message to a clustered queue on a remote queue manager.  The message flows to the server and a reply is sent back.  I thought I would check it works.  No it didn’t.

The accounting data has my program, but it recorded a message put to the SYSTEM.CLUSTER.TRANSMIT.QUEUE, and a message got from my reply queue, so I cannot tell what queue was used, but I can tell how many bytes were processed

Problem 2

My application doing just a publish did not show up in the accounting data.

I used the samples amqspub, and amqssub to publish, and subscribe respectively.

The accounting data had a record for the amqssub, getting from a queue, but there was no record for the publishing application.

As Morag kindly pointed out, you can tell how many publishes you did from the MQI accounting data, but not to which topic.

Ive started looking at the activity trace, but this looks like a heavyweight solution to the simple question.

Problem 3

Someone else pointed out that if you want to know the channel name, and IP address of the application using a queue, you have to do some extra work.

  • Collect the MQI accounting
  • Process queue accounting and MQI accounting, and use the common field connectionId to merge the data

Using amqsevt to display the accounting data in json format you get

“eventData” : {
“queueMgrName” : “QMC”,
“startDate” : “2020-04-17”,
“startTime” : “09.43.23”,
“endDate” : “2020-04-17”,
“endTime” : “10.18.58”,
“commandLevel” : 913,
“connectionId” : “414D5143514D432020202020202020201361995E0A24A923”,
“sequenceNumber” : 1,
“applName” : “server”,
“processId” : 29103,
“threadId” : 11,
“userIdentifier” : “colinpaice”,
   “connDate” : “2020-04-17”,
   “connTime” : “09.12.16”,
   “connectionName” : “127.0.0.1”,
   “channelName” : “COLIN”,

Where the bold text is for the data only in the MQI accounting information.

It looks like the MQI accounting record is produced before the Queue accounting record.

You will need to save the connectionId key with the connection specific information.  When the queue accounting record is processed, use the connectionId to look up the information and append it to the queue record.  If many queues are being used, you may get multiple queue accounting records – so you cannot simply delete the connection key and connection specific information.

 

mqweb – what’s the difference between MQWebAdmin* and MQWebUser roles?

The MQWebAdmin, MQWebAdminRO and MQWebUser roles can all be used for the admin REST API. That’s why MQWebUser is not called MQWebMessaging – it’s not just for messaging. The difference between them is the user ID that’s checked by the qmgr for operations performed via the REST API.
  • Operations performed by users in the MQWebAdmin* roles take place under the context of the mqweb server user ID.
  • Operation performed by users in the MQWebUser role take place under the context of the user logged into the REST API.

Thanks to Gwydion for this enlightenment.

mqweb – who did what to what, when, and how long did it take?

You can provide an audit trail of the http requests coming into your server.  This is described in the base liberty document, and works in mqweb.

Within the httpEndoint tag you can add

<httpEndpoint host=”${httpHost}” httpPort=”${httpPort}” httpsPort=”${httpsPort}” id=”defaultHttpEndpoint”>
<httpOptions removeServerHeader=”false”/>
<accessLogging enabled=”true” filePath=”${server.output.dir}/logs/http_access.log”
logFormat=’a:%a A:%A D:%D h:%h HeaderHost:%{Host}i HeaderOrigin:%{Origin}i m:%m R:%{R}W t:%{t}W u:%u U:%U X:%{X}W r:”%r” s:%s’
maxFileSize=20

maxFiles=0
/>
</httpEndpoint>

See here for information on the accessLogging,  here for the syntax of the <accessLogging…> and here for the logFormat format options.

From the logFormat page

%a
Remote IP address
%A
Local IP address
%b
Response size in bytes excluding headers
%B
Response size in bytes excluding headers. 0 is printed instead of – if no value is found.
%{CookieName}C
The request cookie specified within the brackets, or if the brackets are not included, prints all of the request cookies.
%D
The elapsed time of the request – millisecond accuracy, microsecond precision
%h
Remote host
%{HeaderName}i
HeaderName header value from the request
%m
Request method
%{HeaderName}o
HeaderName header value from the response
%q
Output the query string with any password escaped
%r
First line of the request
%{R}W
Service time of the request from the moment the request is received until the first set of bytes of the response is sent – millisecond accuracy, microsecond precision
%s
Status code of the response
%t
NCSA format of the start time of the request
%(t)W
The current time when the message to the access log is queued to be logged in normal NCSA format
%u
Remote user according to the WebSphere Application Server specific $WSRU header
%U
URL Path, not including the query string
%{X}W
Cross Component Tracing (XCT) Context ID

So

logFormat=’a:%a A:%A D:%D h:%h HeaderHost:%{Host}i HeaderOrigin:%{Origin}i m:%m R:%{R}W t:%{t}W u:%u U:%U  r:”%r” s:%s’

gave me

  • a:127.0.0.1 remote IP address
  • A:127.0.0.1 local host
  • D:225960 duration in millisecond
  • h:127.0.0.1 remote host
  • HeaderHost:127.0.0.1:9443 the Host header
  • HeaderOrigin:- the Origin header (missing in this case)
  • m:GET request method
  • R:186930 Service time of the request from the moment the request is received until the first set of bytes of the response is sent
  • t:[27/Feb/2020:17:04:27 +0000] NCSA format of the start time of the request
  • u:colinpaice remote user
  • U:/ibmmq/console url path
  • r:”GET /ibmmq/console HTTP/1.1″ First line of the request
  • s:302 Status code of the response

There are other log formatting options available, I picked those I thought were most useful.

Note when using the MQConsole from a browser, the interface is chatty. I had 20 request to refresh one window.

Other ways of formatting the data

  • I separated each field with a ‘,’ and could read it into a spread sheet.
  • You could configure your log format string to produce the output in JSON format, to make it easier to post process.

What does mqRestCorsMaxAgeInSeconds in mqweb mean?

Ive blogged about CORS, and how this allows you to list sites that are permitted to use scripts to send request to the mqweb server.

I struggled with understanding what value mqRestCorsMaxAgeInSeconds has, as it did not behave as expected (this was due to a bug).

If you have a CORS transaction there is an OPTIONS request, followed by the actual DELETE/GET/POST requests.
The OPTIONS request checks that the request is coming from an authorised list of URLs, and that the parameters are valid.  The OPTIONS information can be cached.

If the check is successful then the real request can be issued.  If the requests occur in a short time, then you can have OPTIONS, DELETE, DELETE, DELETE, using the cached values.  If there is a “long” time interval between the requests you may get OPTIONS, DELETE, gap, OPTIONS, DELETE.

The OPTIONS information can be cached for up to the minimum of the mqweb mqRestCorsMaxAgeInSeconds  time, and the browser time.

For Chrome the maximum time interval is 600 seconds.  If no time is specified in the OPTIONS response, then 5 seconds is used.

There is a bug in the Liberty base code which sends down the header Access-Control-Allow-Max-Age: …, when the browser is expecting Access-Control-Max-Age.   Because of this, the default time of 5 seconds is used in Chrome.

This should not have a major impact.  For those applications using scripts to send multiple REST API request, there will be more frequent OPTIONS requests – every 5 seconds instead of up to 600 seconds.  These extra flowes are invisible to the scripts and the end user.

What value should I use?

Chrome has a maximum of 600 seconds, with a default of 5 seconds.

Firefox has a maximum of 24 hours (86400 seconds).

Setting it to 600 seconds sounds reasonable to me.

Making changes to mqRestCorsMaxAgeInSeconds

If you change mqRestCorsMaxAgeInSeconds you have to restart the mqweb server.

I do not get caching!

When researching this topic I found every GET request had an OPTIONS request, rather than the OPTIONS, GET, GET.   A quick search on the internet showed me the Chrome option ( F1 -> Settings and preferences) “Disable Cache ( while DevTOOLS is open)” was enabled. I deselected this, and I got the caching.

How do I tell if my puts and gets are in or out of syncpoint?

This question came up from someone who is new to MQ.  it is good to find people people who are new to MQ.

As usual there is an easy answer, but some cases you need to think about.

There is no command you can use to show this information.  You can find it it out using the accounting information.

Typically an application does

  • MQCONN
  • MQOPEN queue
  • MQPUT|MQGET
  • Possibly  a commit
  • MQCLOSE
  • MQDISC

If the accounting data shows that there was a COMMIT request, and the commit took about 1 millisecond, then there is a good chance that the put or get was in syncpoint.

If gets more interesting if you have

  • MQPUT of messages out of syncpoint (for example and audit trail – we got to this point.  Out of syncpoint so it exists even after a rollback)
  • MQPUT within syncpoint
  • MQCOMMIT

If the accounting information shows one commit, you cannot tell if it was one or two messages within syncpoint.

If the message is persistent and out of syncpoint, then under the covers it does “MQPUT|MQGET – and commit this requests”.  On z/OS a put or get typically take 10 microseconds, a log I/O takes about  1 ms, so if you see an MQPUT or MQGET with a long response time (1 millisecond), then it might have been a put or get of a message outside of syncpoint.
On z/OS you can also get the logging time for a put or get, so if you see the logging time for a put or get, this is a persistent message out of syncpoint.

A non persistent message does not write to the log on disk, all updates are kept in memory, so you cannot tell from the duration of the request if it was in or out of syncpoint.    Best practice is that non persistent messages are usually processed out of syncpoint.

If you see persistent messages are put or got, and the commit time is microseconds,  then the commit did not do disk IO, so there was no work to commit, and so the put or get was out of syncpoint.

And lastly, you may see no commit request in the accounting data.   This is bad practice, as it is assuming that the MQDISC will cause a commit or roll back, so it would happen invisibly.  An application should explicitly issue a commit if messages are within syncpoint.

As I said – pretty simple – but with some complexities.

 

 

 

Error connecting to JMX endpoint: Unsupported protocol: rest

I got this error message when I was trying to use JMX into the WLP web server, when using the restConnector-2.0 interface in Liberty.

The documentation was not that helpful.  Oneproblem was I suffered from the curse of cut and paste, and used a Windows environment variable %JAVA_HOME%  instead of using the Linux $JAVA_HOME.  Another problem was caused documentation saying add the jar to the class path, and then the class path was ignored.

True documentation.

When you use the  -jar option, the JAR file is the source of all user classes, and other user class path settings are ignored.  See here.

Unhelpful documentation

Configuring secure JMX connection to Liberty   says

jconsole -J-Djava.class.path=...;%WLP_HOME%/clients/restConnector.jar

This was for Windows – on Linux it would be $WLP_HOME – except on my system $WLP_HOME was not set.

jconsole

Some of the jar files are in $JAVA_HOME, you can use the environment variable, or specify it yourself.  Note %JAVA_HOME% is windows, so be careful when you use cut and paste.

  • jconsole -J-Djava.class.path=$JAVA_HOME/lib/jconsole.jar:$JAVA_HOME/lib/tools.jar:/opt/mqm/web/clients/restConnector.jar
  • jconsole -J-Djava.class.path=/usr/lib/jvm/java-8-openjdk-amd64/lib/jconsole.jar:/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/opt/mqm/web/clients/restConnector.jar

If the window displays “Secure connection failed” restart jconsole and use the -debug option.   For me this gave  “java.io.IOException jmx.remote.credentials not provided. Set to a String[2] {user,password}”.  I entered my userid and password, and this connected successfully.

I could not connect using my .jar file.

I was using JMXQuery to extract the data.  I was getting the Error connecting to JMX endpoint: Unsupported protocol: rest message.
Adding the jar file to my class path did not help, as the class path is ignored when using java the -jar parameter.

How to fix it

There are two ways of fixing this.

  1. Put the required jar file in the extensions path, not the class path
  2. Use the java -classpath…  instead of specifying java -jar

1. You need to have the jar for in the extensions path, not in the class path.

See How classes are found  in the Java documentation.  It says

  • Bootstrap classes – Classes that comprise the Java platform, including the classes in rt.jar and several other important jar files.
  • Extension classes – Classes that use the Java Extension mechanism. These are bundled as .jar files located in the extensions directory.
  • User classes – Classes defined by developers and third parties that do not take advantage of the extension mechanism. You identify the location of these classes using the -classpath option on the command line (the preferred method) or by using the CLASSPATH environment variable.

There is a system property java.ext.dirs  which gives the location of the Extension classes. On my system this was

/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext:/usr/java/packages/lib/ext

  • The /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext: is java dependent. You should not put your files in this directory
  • /usr/java/packages/lib/ext This is for “user” extensions.

The directory /usr/java/packages/lib/ext did not exist on my system, so I had to do, create it, copy the web connection jar file to it, and grant permissions on it.

  • sudo mkdir -p /usr/java/packages/lib/ext
  • sudo cp /opt/mqm/web/clients/restConnector.jar /usr/java/packages/lib/ext/
  • sudo chmod 444 /usr/java/packages/lib/ext/restConnector.jar

2. Use a -classpath – not a -jar

In a .jar file there is a META-INF/MANIFEST.MF file which includes information on the entry point.

Manifest-Version: 1.0
Main-Class: src.Client

Instead of using the -jar option to point to the jar, you can use the -classpath to point to the jar and explicitly specify the entry point.  For example

java -cp ./Client.jar:/opt/mqm/web/clients/restConnector.jar  
src/Client
service:jmx:rest://localhost:9443/IBMJMXConnectorREST
  • java -cp ./Client.jar:/opt/mqm/web/clients/restConnector.jar –  use the classpath option, and specify the needed jar files.  Client.jar is my program.  /opt/mqm/web/clients/restConnector.jar  is the Liberty provided jar.
  • src/Client – the “entry point” class to use
  • service:jmx:rest://localhost:9443/IBMJMXConnectorREST – the url to use.
  • note the absence of a -jar option.

In the manifest of the Client.jar file it had Main-Class: src.Client.   This is for the the src/Client.java source file.   This ties up with the src/Client.class as seen when you use the command jar -tvf Client.jar .