What data is there to help you manage your systems?

There is a lot of information provided by MQ to help you manage your systems, some of it is not well documented.

I’ll list the sources I know, and when they might be needed, but before that I’ll approach it from the “what do I want to do”.

What do I want to do?

I want to…

  • know when significant events happen in my system, such as channel start and stop, and security events. AMQERRnn.LOG
  • know when “events” happen, such as a queue filling up security exceptions.  Event Queues
  • be able to specify thresholds, such as when the current depth is > 10 messages, and the age of the oldest message is older than 5 seconds then do something. Display commands
  • be able draw graphs of basic metrics, such as number of messages put per hour/per day so I can do capacity planning, and look for potential capacity problems. Statistics
  • identify which queues are being used, display queue activity, number of puts, and size of puts etc  Statisticsdisplay object status
  • identify which queues (objects) are not being used, so they can be deleted.  Absence of records in Statistics.  Issue DIS QSTATUS every day and see if a message has been put to the queue.  Creating events for when a message is put to the queue.  Note an object may only be used once a year – so you need to monitor it all year.
  • identify which applications are putting to and getting from queues. Accounting
  • see what MQI verbs are being used, so we can educate developers on the corporate naming standards, and API usage.  Activity trace
  • display the topology.  Display commandsTrace routeActivity trace
  • trace where messages are going, so we can draw charts of the flow of message requests and their responses, and display the topology of what is actually being used.  Trace route
  • measure round trip times of messages – so I know if there are delays in the end to end picture. Trace route
  • Understand the impact of a problem “here” by seeing what flow through “here”.  What’s my topology

What sources are there?

AMQERRnn.LOG.

These contain information about events in the queue manager, such as channel start and channel stop. These files are in /var/mqm/qmgrs/QMA/errors/… and can be read using an editor or browser.  People often feed these into tools like SPLUNK, and then you can filter and do queries to monitor for messages that have not been seen before.

Event queues.

MQ messages are put on queues like SYSTEM.ADMIN.QMGR.EVENT.

There is a sample amqsevt which can be used to print the message in text format or json format – or you can write your own program.

Creating events.

You can configure MQ to produce events when conditions occur. For important queues you can set a high threshold, and MQ produces an event when this limit is exceeded. You can use this

  • to see if messages are accumulating in a queue
  • to see if a queue is being used – set the queue high threshold to be 1, and you will get an event if a message is put to a queue

Statistics

If you turn on statistics you information on the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.STATISTICS.QUEUE

The information is summarized by queue.

You can use

  1. the sample amqsevt to process these messages, you can have output in json format for input into other tools.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.

Accounting

If you turn on accounting you information about what an application is doing. The the number of puts, gets for the system, and the number of puts and gets etc to a queue. This information is put to a queue SYSTEM.ADMIN.ACCOUNTING.QUEUE. The information is similar to the information provided by statistics, but it provided information about which application used the objects.

You can use

  1. the sample amqsevt to process these messages.
  2. Systems management products like Tivoli can take these messages and store the output in a database to allow SQL queries
  3. Write your own program

One problem with the data going to a queue, is that a program processing the queue may get and delete the message on the queue, so other applications cannot use it. Some programs have a browse option.

Later versions of MQ use a publish subscribe model, so you subscribe to a topic, and get the data you want sent to your queue.

You can use display commands.

You can use commands, or use the MQINQ API to display information about object. You can issue commands using runmqsc or from an application by putting command requests in PCF format to a queue, and getting the data back in PCF format. Your program has to decode the PCF data.
You can display multiple fields and have logic  to take action if values are out of the usual range.   For example
periodically display the curdepth, and the age of the oldest message on the queue and then do processing based on these value. Tivoli uses this technique to creates situations if specified conditions are met. You can easily write your own programs to do this, for example using python scripts and pymqi.

What’s my topology?

You can use the DIS CHANNEL … CONNAME command to show where a channel connects to and use this to draw up a picture of your configuration.

You can use the DIS QCLUSTER and DIS CLUSQMGR to show information about your clusters, and where cluster queues are, and use this information to draw up a picture of your configuration

You can use the traceroute to dynamically see the routes between nodes, and understand the proportion of messages going to different destinations – at that moment in time.

Displaying object status

You can use display commands to show information such as the last time a message was put to a queue, or got from a queue, or sent over a channel.

Application trace

The application trace shows you the MQ API calls, the parameters, and return codes. This data goes to the SYSTEM.ADMIN.TRACE.ACTIVITY.QUEUE queue.

You can use this to check the API options being used, for example

  • Messages persistence is correct for the application pattern (inquiry is non persistence)
  • The correct message expiry is specified (non persistent has time value)
  • The correct options are specified
  • Applications are using MQ GET with wait rather than polling a queue
  • The correct syncpoint options are being used.
  • Which queue is really used. You open one queue name but this could be an alias. You get the queue it maps to.

There is an overhead to collecting this, so you do not want to run this for extended periods of time.

Running it for just a minute or two may give you enough information. You can turn this on for an individual program.

You can use amqsevt to process the queue.

Trace route.

You can send a message “to a queue” and get back the processes involved in getting to the queue.. For example use the dspmqrte to “put” a message to a cluster queue, and you will see the sending channel get the message and send it, then the receiver channel at the remote end receive the message and “putting” it to the queue. One of the data fields is the operation time, so you can see where the delays were in the processing (for example it took seconds to be sent over a channel).  See here

By default the message is not put to the queue, but there is an option to put it to the queue for the application to process, but there is no documentation to tell you how you process this message. The dspmqrte command effectively shows you the hops between queues. It is up to you to build up the true end to end path, and manage the responses yourself.

The provided programs dspmqrte are simplistic and show you the path to the queue, and the channels used on the queue.

The data is not pure PCF, and the sample amqsevt does not format it. I have modified it to handle this.

Where’s my network bottleneck? Try using traceroute

On Midrange MQ there a capability called trace route which allows you to see the path to a queue, and get information about the hops to the queue, for example how long was the message on the transmission queue before being sent.

If you have a problem in your MQ network. You can use dspmqrte in real time and see if there are any delays between end points.

In the blog post below, I’ll show an example of the information you can get, including where the time was spent during processing, and what you can use to automatically process the replies.

What is a trace route message?

A special (trace route) messages is sent to the specified queue (perhaps in a different queue manager), and tasks that process it en route, send information to a collection queue.

You can use the IBM supplied dspmqrte (display mq route) command. This sends the message and processes the responses.

For example; on QMA,  dspmrte put a message to a cluster queue CSERVER on QMC. During the processing of the trace route message, several messages were sent to the collection queue. Key data from the messages is displayed below.

Message 1 – processing done by dspmqrte

ApplName: dspmqrte
ActivityDesc: IBM MQ Display Route Application 

Operation: OperationType: Put
QMgrName: QMA QName: CSERVER 
ResolvedQName: SYSTEM.CLUSTER.TRANSMIT.QUEUE
RemoteQName: CSERVER 
RemoteQMgrName: QMC 

We can see from this information that the application dspmqrte Put a message to the queue CSERVER which is queue CSERVER on QMC, and it goes via SYSTEM.CLUSTER.TRANSMIT.QUEUE

Message 2 – processing done by the sending channel

ApplName: amqrmppa
ActivityDesc: Sending Message Channel Agent

Operation: OperationType: Get
QMgrName: QMA 
QName: SYSTEM.CLUSTER.TRANSMIT.QUEUE 
ResolvedQName: SYSTEM.CLUSTER.TRANSMIT.QUEUE 

Operation: OperationType: Send
QMgrName: QMA
RemoteQMgrName: QMC
ChannelName: CL.QMC
ChannelType: ClusSdr
XmitQName: SYSTEM.CLUSTER.TRANSMIT.QUEUE

We can see from this that the channel did two things (two operations) with the data

  1. Operation 1: ClusSdr channel CL.QMC, did a Get from SYSTEM.CLUSTER.TRANSMIT.QUEUE.
  2. Operation 2: Sent the message over the ClusSdr channel CL.QMC to queue manager QMC.

Message 3 – processing done by receiving channel

ApplName: amqrmppa
ActivityDesc: Receiving Message Channel Agent 

Operation: OperationType: Receive
QMgrName: QMC 
RemoteQMgrName: QMA 
ChannelName: CL.QMC
ChannelType: ClusRcvr

Operation: OperationType: Put
QMgrName: QMC 
QName: CSERVER
ResolvedQName: CSERVER

We can see from this that the channel did two things with the data

  1. the ClusRcvr channel CL.QMC received a message from QMA,
  2. The channel put the message to CSERVER on this queue manager.

End to end path

There is a sequence

  1. dspmqrte put a message to queue CSERVER
    • this message was put on the SYSTEM.CLUSTER.TRANSMIT.QUEUE queue
  2. Cluster Sender Channel CL.QMC got the message and sent it over the network
  3. Cluster Receiver Channel CL.QMC received the message from the network and put it to the CSERVER queue.

There is an option to trace the return route which sometimes worked, but not consistently.

From the queue names used, and the resolved queue names, you can check the names of the queues being used. If you are using QALIAS, QREMOTE,  clustered queues, clustered QALIAS, or clustered QREMOTE you can see the true names of the objects being used, and draw a topology chart of what is actually being used (rather than what you think is being used)

Extending your applications to support trace route.

There is an option to pass the trace route message to the application processing the queue. I will write another blog post about doing this – it took me several days to get it to work. This allowed me to return data saying “colinsProg got the message from CSERVER and passed it on to NEXTHOP”. I could then build up a true picture of my application

Using the data in the messages

The returned messages have a lot of information, including OperationTime. The time is a character string with format HHMMSShh, where hh is hundredths of a second.

With my example message above

  1. dspmqrte put a message to queue CSERVER at 06:10:10.00
  2. Cluster Sender Channel CL.QMC got the message at 06:10:11.44 1.44 seconds later
  3. Cluster Sender Channel CL.QMC sent it over the network at 06:10:11.44, no time delta
  4. Cluster Receiver Channel CL.QMC received the message from the network, 06:10:11.44, no time delta
  5. Cluster Receiver Channel CL.QMC put it to the CSERVER queue. 06:10:11.44, no time delta

We can see from this that there was a slight delay (1.44 seconds) before the channel got the message. The rest of the processing was very fast. If I had a problem in my MQ network, I would look at why the sending end of the channel was slow to process the message.

Problems with traceroute

I had a few problems using trace route.

The messages which flow are not true PCF messages, and so the IBM sample amqsevt (which processes PCF messages) does not recognise them. As this is sample code I was able to change it to get it to work. I’ll send the changes to IBM and hope they incorporate them in the product.  I output the messages in json and then used python to process them.

If dspmqrte thinks it has already seen a reply for the request it can throw it away and not display it.   I had this problem when instrumenting my applications to provide the trace route information.

It would be good if dspmqrte displayed the time delta from the start of the request. I had to take the output and post process it to report where the delays are.

The time OperationTime is in hundreds of a second. This may be OK for most people as if you are looking for delays in your processing, a tenth of a second may be granular enough. I added the high resolution time (epoch time) to the data provided by my applications.

If my backend application was not active there was an operation of “OperationType:Discard” and Feedback: NotDelivered. This may be because the number of handles opened for input was zero. It was a surprise. I expected to get a response saying “message expired”

My non trivial application design is to send a message to a queue CSERVER which passes a request to the backend application (on QMZ) which sends a response  back to the originator. Dspmqrte does not support this. You can set up dspmqrte to display the route between QMA and QMC. You can use a client connected dspmqrte to send a message from QMC and QMZ and then build the end to end picture yourself.

I have made some progress in instrumenting my applications to do this, but I need more time, as the documentation is unclear, wrong in places, and missing bits. I’ll send my doc comments to IBM.

When is the best time to learn man over board drill – when it is calm. When is the best time to practice man over board – when it is rough.

If you have two totally different concepts, but there is a similarity between them, you can get insight by comparing them.

At first glance there is little in common between “man over board” when at sea, and enterprise computers, but there is, and we can get insight about testing and preparation.

When is the best time to learn man over board drill.

While you are learning, you want a nice calm day, so you any mistakes you make, do no damage.  You need to practice it until you can recover the dummy most times.

When is the best time to practice man overboard drill?

You are more likely to go overboard when the seas are rough than when the weather is calm.  You need to practice in this scenario.  When the weather is rough it is hard to see the person in the water – a persons head is 1 ft high – but the waves can  be  6 feet from peak to trough.  It is harder to position the boat.  So the best time to practice man over board drill is when the weather is rough.  Do not try it in a gale, as you are likely to damage the boat, or have someone really fall over board.
When my father was in the navy, he told me about “exercises” where the ships would be under attack from planes (this was before missiles) and there was a submarine or two trying to attach you. In one exercise,  the sea was a bit rough the captain of the ship sat back and let his senior officer (First Lieutenant)  run the ship.  Things were going well until the captain arranged for a “man over board” to happen.

The FL now had to decide to stop the ship (become a sitting duck and so be destroyed) and pick up the man over board; to leave the man in the water to die; or take people away from defending the ship and launch a boat/helicopter to rescue the man.   This was a complex situation, which suddenly became more complex, but they trained for this and had tested procedures and people knew what to do.

How does this apply to enterprise systems?

You need to decide on your “man overboard” scenarios.  For example this server is shut down, that network has problems.   You need to practice resolving the problems, capturing information to help you identify the real cause of the problem, and the steps needed to recover.   Once you have an automated or fully documented procedure, you  test it out in production – this is the “testing man over board when the sea is rough”.   This is where you find out holes in your processes, and find that production is configured differently to test, etc.

The “man over board when your ship is being attacked and the decision to save the man or save the ship” scenario.  You often get multiple problems and you need to decide on the priority of the actions.  “Messages building up on a queue” could be caused by a network problem.  It is more important to fix the network than “fix mq”.  You should go though scenarios to help decide what to do.  It is better to create action plans in advance, and document them, rather than try to come up with a plan during an emergency.  You want to avoid “if we do this then that will happen .   ahhh… not a good idea”

Think things through

I did a sailing course in the Mediterranean,  where the sea was warm, and people were swimming in the sea.   We had spent the morning doing Man Over Board, where we had to retrieve a buoy with a pole sticking out the top.  This was easy to retrieve, you lean over the side and just pick it up.  Well done,  tick the box, you passed.  We anchored up, and were having lunch with a nice cold glass or two of wine, when I asked, “so how do you get someone back into the boat” (they do not teach you this).  I then “accidentally” fell over board.

  • They threw me a rope – but I said my hands were too cold – I could not grip it.
  • They then made a loop in the end and threw it to me – the loop was too small to go over my head and life jacket.
  • They then make a big loop which I got over my head and they dragged me to the yacht, but were unable to lift me out because I was too heavy and my clothes were full of water.
  • The tutor then suggested using some of the lifting equipment from the boat, so they tied the rope over the end of the boom, and used a winch to winch me up – which worked, but they scraped me up the side of the boat, so I had a bleeding arm.

I said afterwards ( as they wiped my blood of the deck) look at the problems you had when we were at anchor.   Think what it would be like in a 6 ft sea!

Think how you will recover after your outage.

For example there may have been persistent messages on the queue manager when it went down.  The application retried and was successful because the traffic went to an alternative queue manager.   You now have possibly duplicate requests, or orphaned replies (because the getting application reconnected to a different queue manager.

This server went down, and all of the traffic went to that server.    Now this server has come back – how do you get traffic to balance over the queue managers?

You have a huge backlog of messages – what should you do – just purge them or let them be processed.  (This is where you realise that using message expiry on inquiry messages would be a good technique to use)

You need to think things through, these exercises are tedious and take a lot of time.   But you have no time in a crisis!

 

 

 

 

 

Getting round the problems with amqsevt.

There is a great sample program amqsevt(a) for printing out data from queues in PCF format, for example event queues, and stats and accounting. I use this to output the data in json format and then process it in python scripts or other tools which handle json format.
Ive noticed a couple of tiny problems with it – which are easy to get round.  I spotted these when trying to parse a file with the json data in it.

  • The output is sometimes like {..} {..} which is not strictly valid json.  It should be [{…},{..}] but hard to create this when streaming the data.  I got json.decoder.JSONDecodeError: Extra data:….
  • It sometimes reports hex strings as very large decimal numbers for example
    • “msgId” :      “414D5120514D412020202020202020202908BA5CF4C9AB23” is OK
    •  “correlId” :   0000000000000000000000000000000000000000000000000 and the json parser complains.  I had json.decoder.JSONDecodeError: Expecting ‘,’ delimiter pointing to the middle of the line with the problem.

I fixed these by passing the json through the great utility jq which cleans this up.

For example

/opt/mqm/samp/bin/amqsevt -m QMA -q SYSTEM.ADMIN.TRACE.ACTIVITY.QUEUE -o json|jq -c .  |python myprog.py
and use the python code
for x in range(…):
     line = sys.stdin.readline()
     if not line: break
     j = json.loads(line)

jq also cleans up the “correlId” :  000000000000000000000000000000000000000000000  to
“correlId” :   0

Build the sample

To build amqsevta.c, I used the make file

cparms = -Wno-write-strings -g 
clibs = -I. -I../inc -I'/usr/include' -I'/opt/mqm/inc'
rpath = -rpath=/opt/mqm/lib64 -Wl,-rpath=/usr/lib64 
lparms = -L /opt/mqm/lib64 -Wl,$(rpath) -lmqic_r -lpthread 
% : %.c
     gcc -m64 $(cparms) $(clibs) $< -o $@ $(lparms)

and the command make -f xxxx amqsevta

 

Should we share everything or share nothing?

We have the spectrum of giving every application having its own queue manager spread across Linux images, or having a big server with a few queue managers servicing all the applications.

The are good points and bad points within the range of share nothing, share all.

What do you need to consider when considering sharing of resources within the environment.

You may want to provide isolation

  • For critical applications, so other applications cannot impact it (keep trouble out)
  • To protect applications from a “misbehaving” application and to minimise the impact, (keep trouble in).
  • For regulatory reasons
  • For capacity reasons
    • Disk IO response time and throughput.
    • Amount of RAM needed
    • Amount of virtual storage
    • Number of of TCP ports
    • Number of MQ connections
    • Number of file connections in the Operating System
  • For security. It is often easier to deny people access to an image, than to put all the controls in place within the image.
  • You have more granularity at shutting down an image – fewer applications are impacted.
  • Restart time may be shorter.

If your requirements don’t fit into the above, you should consider sharing resources.

The advantages of sharing

  • Fewer environments to manage.
    • Provisioning can be done with automation, but the large number of small images can be hard to manage.
    • Monitoring is easier – you do not have so many systems to look at. (How big a screen do you need to have every system showing up on it ?)
    • You have to do changes and ugprades less frequently
  • Removing images tends to leave information behind – for example information about a deleted clustered queue manager stays in a full repository for many days.
  • The operating system may be able to manage work better than a VM hypervisor “helping”.
  • Fewer events and situations to manage.
  • By having more work the costs can be reduced. For example a channel with 10 messages in a single batch uses much less resources than 10 batches of one message.

What else?

You may need to provide multiple queue managers for availability and resilience, but rather than provide two queue manager for applicationA, and two queue managers for applicationB, you may be able to have two queue managers shared, with each queue manager supporting applicationA and applicationB.

You can provide isolation within a queue manager by

  • Having specific application queue names, for example the queues names start with the application prefix. You can then define a security profile (authrec) based on that prefix.
  • You can use split cluster transmit queue (SCTQ) so clusters do not share the SYSTEM.CLUSTER.TRANSMIT.QUEUE – but have their own queue and their own channels.

You may think by providing multiple instances you are providing isolation. There can be interaction with others at all levels, queue manager, operating system, hypervisors, disk subsystem network controllers – you just do not see them.

I had to work on a problem where MQ in a virtualized distributed environment saw very long disk I/O response times. We could see this from an an MQ trace, and from the Linux performance data. The customer’s virtualization people said on average the response time was OK, so no issue. The people in charge of the Storage Area Network, said they could not see any problems. The customer solved this performance problem by making all messages non persistent – which solved the performance problem – but may have introduced other data problems! As my father used to say, the more moving parts, the more parts that can go wrong.

Why do I have no authority?

We tried setting up dmpmqcfg for a general user, and had various security problems. This blog gives you information on how to set up security, and where to find more information.
At the bottom we give all of the security commands we needed.

While doing research on this, I wrote some other blog posts on security

What is dmpmqcfg?

This program dumps the mq configuration, object definitions, security etc, so they can be restored, or used as a master copy to see what has changed.
The documentation for dmpmqcfg is pretty good. It tells you what authorizations you need, and with these the command worked.

Although we got the command to work, we had to do additional configuration, as the documentation says The user must … , and (+dsp) authority for every object that is requested,… so few objects were dumped, until we fixed this, and then we got all of the object dumped.
To illustrate how solve problems, we did not completely follow the instructions.

Actually using dmpqmcfg

The testuser user issued command dmpmqcfg -a and got
AMQ8135E: Not authorised.
The error log had

10/04/19 09:19:44 – Process(10654.36) User(colinpaice) Program(amqzlaa0)
Host(colinpaice) Installation(Installation1)
VRMF(9.1.2.0) QMgr(QMA)
Time(2019-04-10T08:19:44.500Z)
CommentInsert1(testuser)
CommentInsert2(QMA [qmgr])
CommentInsert3(connect)
AMQ8077W: Entity ‘testuser’ has insufficient authority to access object QMA [qmgr].
EXPLANATION:
The specified entity is not authorized to access the required object. The following requested permissions are unauthorized: connect
ACTION:
Ensure that the correct level of authority has been set for this entity against the required object, or ensure that the entity is a member of a privileged group.

This was very clear and easy to follow.

If you have ALTER QMGR AUTHOREV(ENABLED), you will get events generated for security violations. You can use can use the following to process the authorization event,
/opt/mqm/samp/bin/amqsevt -m QMA -o json -w 1 -q SYSTEM.ADMIN.QMGR.EVENT
but the AMQERROR01.LOG is easier to read and has the correct actions.

We fixed the connection problem by giving connect authority
setmqaut -m QMA -t qmgr -g test +connect

We retried and got
AMQ9505E: Program unable to open object SYSTEM.DEFAULT.MODEL.QUEUE
The error log gave

10/04/19 09:32:23 – Process(10654.41) User(colinpaice) Program(amqzlaa0)
Host(colinpaice) Installation(Installation1)
VRMF(9.1.2.0) QMgr(QMA)
Time(2019-04-10T08:32:23.050Z)
CommentInsert1(testuser)
CommentInsert2(SYSTEM.DEFAULT.MODEL.QUEUE [1003])
AMQ8245W: Entity ‘testuser’ has insufficient authority to display object
SYSTEM.DEFAULT.MODEL.QUEUE [1003].

EXPLANATION:
The specified entity is not authorized to display the required object. The following requested permissions are unauthorized: dsp
ACTION:

Ensure that the correct level of authority has been set for this entity against the required object, or ensure that the entity is a member of a privileged group.

Again a very clear message.

We used the command
setmqaut -n SYSTEM.DEFAULT.MODEL.QUEUE -m QMA -t queue -g testuser +dsp
and the dmpmqcfg worked!
To be able to use a model queue, then you need +dsp authority
What commands did we need? – Thanks to Tushar Shukla for this list

setmqaut -m QMA -t qmgr-g test+connect +inq +dsp
setmqaut -m QMA -n “**” -t queue -g test+dsp +inq
setmqaut -m QMA -n “**” -t topic -g test+dsp +inq
setmqaut -m QMA -n “**” -t channel -g test+dsp
setmqaut -m QMA -n “**” -t process -g test+dsp +inq
setmqaut -m QMA -n “**” -t namelist -g test+dsp +inq
setmqaut -m QMA -n “**” -t authinfo -g test+dsp +inq
setmqaut -m QMA -n “**” -t clntconn -g test+dsp
setmqaut -m QMA -n “**” -t listener -g test+dsp
setmqaut -m QMA -n “**” -t service -g test+dsp
setmqaut -m QMA -n “**” -t comminfo -g test+dsp
setmqaut -m QMA -n “SYSTEM.DEFAULT.MODEL.QUEUE” -t queue -g test+dsp +get +put
setmqaut -m QMA -n SYSTEM.ADMIN.COMMAND.QUEUE -t queue -g test+dsp +inq +put

or runmqsc commands like
set authrec profile(‘**’) objtype(authinfo) authadd(dsp) group(‘test’)

Why -n “**” ? See here.

Lots of error messages in AMQERR01.LOG.

When setting this, up we got lots of message in the error log
AMQ8245W: Entity ‘testuser’ has insufficient authority to display object oooo [objtype]

So you should set up authorities and determine what you want the userid to be able to dump before trying the dmpmqcfg command.


What is profile self in display authrec?

You can give authority to connect using

setmqaut -m QMA -t qmgr -g test +connect

The defintions have to hang off a profile name. When using the queue manager, it has an internally used profile of “self”.

PROFILE(self) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(QMGR) AUTHLIST(CONNECT,DSP)
PROFILE(@class) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(QMGR) AUTHLIST(NONE)

Why is there a @class entry? See here.

If you remove authority from a group or userid, the entry is left, but with access (NONE).

dis authrec objtype(qmgr)

PROFILE(self) ENTITY(testuser) ENTTYPE(GROUP) OBJTYPE(QMGR) AUTHLIST(NONE)

What is @class in authrec in midrange?

Before a user or group can be given access to a specific profile and object type, it needs to have a profile called “@class” in the object type.

This “@class” profile is used for authorising the create object of the specified object type.

The commands

set authrec profile(‘ZZ*’) objtype(namelist) group('test') authadd(INQ)

dis authrec objtype(namelist) group('test')

gave two profiles one for the class and one for the specific resource.

PROFILE(@class) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(NONE)
PROFILE(ZZ*) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(INQ)

So we can see that group test is authorised to inquire with the profile ZZ* for NAMELIST.

But because of PROFILE(@class) OBJTYPE(NAMELIST) AUTHLIST(NONE) , group test it is not authorised to create a namelist.

If you want to control delete name list you specify

set  authrec profile('ZZ*') objtype(namelist) group('test') authadd(DLT)

and the display now gives

PROFILE(ZZ*)ENTITY(test)ENTTYPE(GROUP)OBJTYPE(NAMELIST) AUTHLIST(DLT,INQ)

To display people who have been given any authority to an object type use,

dis authrec profile('@class')objtype(namelist)

PROFILE(@class) ENTITY(colinpaice) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(CRT)
PROFILE(@class) ENTITY(mqm) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(CRT)
PROFILE(@class) ENTITY(testuser) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(NONE)
PROFILE(@class) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(NONE)

Shows that ids in the group colinpaice, and mqm can create namelists. Userids solely in group test or testuser cannot. Userid colinpaice in groups mqm and test is authorised to create name lists. Being in at least one group which is allowed to create a resource means the userid is allowed to create a resource.

Can I clean up the entries?

After using my queue manager for a while I found there were entries like

PROFILE(@class) ENTITY(…) ENTTYPE(PRINCIPAL) OBJTYPE(QMGR) AUTHLIST(NONE)

which existed even though the principal or group had been deleted from MQ.

You cannot delete these entries.

These display authority commands are difficult to use!

I was asked to explain how the midrange security commands work. At first glance it looked pretty easy, but then I tried to use them, and got very confused when it did not work as I expected.

Some of the complexity comes from userids which are also groups, definitions which look like generics but are not generic, some of the definitions seem backwards, and some more documentation is needed! Read on to see how I was baffled and felt very confused.

I was talking to Morag on this, and she said that MQGEM do education on authorization – see here.

 

I was running on Ubuntu (18.04)

Queue manager running userid or group based authorization.

A queue manger can be set up to use user-based or group-based authorization. See here and here. The default is group based.

A linux userid has a private group with the same name as the userid

I set up a userid testuser which had effectively no authority to do anything. This has a (private) group with the same name as the userid. See here.

The command id testuser gives

uid=1002(testuser) gid=1004(testuser) groups=1004(testuser)

I used

sudo groupadd test 
sudo adduser testuser test

to add testuser to the group called test, and will use group test in the rest of the discussions.

There are two ways of displaying and setting MQ authorisation information

There are two ways of using the MQ security commands

  1. Runmqsc and display/delete/set AUTHREC. You can use runmqsc as a client from a remote machine. This can be used for an MQ appliance.
  2. setmqauth and dspmqauth shell commands. You need to have access to the shell environment to be able to issue these commands. This cannot be used for an MQ appliance.

The documentation has similar content but the runmqsc set authrec command is slightly better.

For example see here.

  • runmqsc set authrec explains DSP : Display the attributes of the specified object using the appropriate command set . But it is not clear what a command set is. I think it means PCF or MQSC.
  • setmqauth shows DSP – but does not explain what DSP provides

The syntax of the commands is similar, but different, and this caught me out for a while. For example I used

setmqauth -m QMA -t qmgr -p testuser +inq -dsp

but with the runmqsc I had to specify principal(‘testuser’) in quotes – because as with all runmqsc fields they get converted to upper case when the string is not quoted!

Creating and using profiles

I created profiles

set AUTHREC PROFILE(COLIN_1) OBJTYPE(QUEUE) group('test') AUTHADD(GET) 
set AUTHREC PROFILE(COLIN_2) OBJTYPE(QUEUE) group('test') AUTHADD(SET)
set AUTHREC PROFILE(COLIN_3*) OBJTYPE(QUEUE) group('test') AUTHADD(INQ)
set AUTHREC PROFILE(COLIN_*) OBJTYPE(QUEUE) group('test') AUTHADD(GET,PUT)

When security checks are done, if there are choice of records for a queue the definition, is the most specific one. See here. I could not find anything in MQ which told me which actual profile was used – even though MQ knows this information!

If ‘testuser’ in group test, wants to access some queues, the userid can open

COLIN_1  for get
COLIN_2 for set
COLIN_3 for inquire
COLIN_33 for inquire. The COLIN_3* is more specific than COLIN_*
QUEUE_11 for put + get. This is from the COLIN_* definition.

Displaying a profile

If you issue DIS QUEUE(COLIN*), the * acts as a generic character and says display any queues beginning with COLIN.

The DIS AUTHREC is different. DIS AUTHREC PROFILE(COLIN*) does not say show all profiles beginning with COLIN, it says show me the actual profile COLIN* . Just the same as if it was COLIN@.

Below are some display commands and their responses

DIS AUTHREC PROFILE(COLIN*) returned no object, as explained above.

DIS AUTHREC PROFILE(COLIN_*) this is saying give me the specific profile COLIN_* and it returned

PROFILE(COLIN_*) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(QUEUE) AUTHLIST(GET)

DIS AUTHREC PROFILE(COLIN_1) returned two entries (as both of them could potentially apply)

PROFILE(COLIN_1) … AUTHLIST(GET)
PROFILE(COLIN_*) … AUTHLIST(GET,PUT)

DIS AUTHREC PROFILE(COLIN_*) GROUP(test)returned

AMQ8459I: Not found. This is because the test became upper case TEST

DIS AUTHREC PROFILE(COLIN_*) GROUP(‘test’) returned

PROFILE(COLIN_*) … AUTHLIST(GET)

DIS AUTHREC PROFILE(COLIN_*) group(‘mqm’) returned

AMQ8459I: Not found.

To summarize,

If you issue a display command for a generic looking profile – it will display profiles with the specific name including the ‘*’.

If you display a specific looking name, it will act like a generic, and display all the records which apply to the specific name.

So you can see why I was confused – but it gets more complex.

MATCH(MEMBERSHIP)

There is a parameter MATCH with default PROFILE which says return the profiles and the the value of the principal eg GROUP(TEST), this is what happened above.

There is also MATCH(MEMBERSHIP). This looks into the definitions and gets the list of the userid’s groups and displays the authrecs for the specified userid.

dis authrec profile(COLIN_2) objtype(queue) principal(‘testuser’) match(membership)returned

PROFILE(COLIN_2) ENTITY(test) ENTTYPE(GROUP) … AUTHLIST(SET)

This is because userid testuser is in group test.

You can also specify MATCH(EXACT). This returns the specified profile, and specified principle.

DIS AUTHREC PROFILE(COLIN_1) match(PROFILE)returned

PROFILE(COLIN_1) ENTITY(test)…
PROFILE(COLIN_*) ENTITY(test)…

DIS AUTHREC PROFILE(COLIN_1) match(EXACT)returned

PROFILE(COLIN_1) ENTITY(test)..

How do I find the list of profiles if I cannot use a generic search argument?

To list all queue auth records applicable for group test use

DIS AUTHREC objtype(QUEUE) group(‘test’)

To list all available auth records applicable for group test use

DIS AUTHREC group(‘test’)

To list all auth records, for all objects for all users and groups use

DIS AUTHREC which you may want to capture in a file using

echo “DIS AUTHREC “ |runmqsc QMA > authrec.txt

Good things about the the CCDT in json, and my sorry journey.

I thought that a Client Channel Definition Table (CCDT) in json will be very good for enterprise customers, as they can now do change management, and parameterize key values.

As usual with any new stuff, I tried to get it to work, and tried the silly user errors that people often make, to see how it holds up.

This is a long blog post, so I’ve created some sections

The good stuff – one liners

Having a human (and machine) readable file has many advantages

  • You can use change control on it
  • You can add your own name:value fields such as “comment”:[”Changed by Colin”,”Date-March 2019”], these are ignored by MQ
  • You can use commands like sed to take the .json file, change the contents and create a new json file – for example change the host name and port for production
  • If you have the same channel defined in several queue managers you can have one definition and provide an array of hostnames and their ports.
  • You can use runmqsc -n to display the mq contents of the .json file.

Ive written a short python script which checks the syntax of the .json file, see below.

When deploying it for the first time, you should introduce errors to the .json file, to ensure you are actually using the ccdt.json and not alternative ways of defining channels.

My journey first steps – happiness

I ran on Ubuntu 18.04.

I used runmqsc to display my existing CLNTCONN channel definitions. I also used the provided sample with the complete list of ccdt channel attributes.

I used gedit to edit the file as this has highlighting support for json (other editors and eclipse also have this support).
Creating the file was a bit slow, instead of a simple list of entries like

“description”: “colins channel”,
"host": "localhost",
"port": 1416,

you have some values nested within structures, for example

"connection":  {  "host": "localhost", "port": 1416  },
“description”: “colins channel”,

I had to keep referring to the documentation to look at the structure. I also just used the above sample, “filled in the blanks”, and remove the unused sections.

You have to convert from terms QMNAME to queueManager (remembering to get the spelling, and upper case/lower case correct).

I eventually completed my ccdt.json , and updated my mqclient.ini to include the definition.

CHANNELS:

ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416)

MQReconnectTimeout=30

ReconDelay=(1000,200)(2000,200)(4000,1000)

ChannelDefinitionDirectory=.

ChannelDefinitionFile=ccdt.json

I tried running my programs – and they all worked – first time. Hurrah! I went for a beer to celebrate this unusual event.

Second steps, depression

A bit later, I changed the definitions, restarted my programs – and the changes made no difference. I got depressed and went for a cup of tea.

About an hour later, I discovered that I still had ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416) in my mqclient.ini file! I commented out this ServerConnectionParms statement.

Then I found I had the environment variable MQCHLLIB set, soI used unset MQCHLLIB

The IBM Knowledge Centre says If the location is specified both in the client configuration file and by using environment variables, the environment variables take priority.

I tried my program again. This time I got messages

AMQ9695E: JSON file format error for ‘./ccdt.json’.

And my program gave

MQCONNX to *GROUP cc 2 rc 2058 MQRC_Q_MGR_NAME_ERROR

This was a major step forward as it proved I was finally using the ccdt.json file. This is why I recommend you introduce a few errors in your .json file the first time you use it.

I searched for the message in the KC for AMQ9695E I got a hit on the page, but searching within the page, it was not found; but AMQ9695 without the E was found! (As a psychic programmer you are meant to know to drop the last letter off the message number).

The explanation from the KC. was Parsing of JSON file <insert_3> failed. The file was expected to contain an attribute named <insert_4> but this was not found or was defined with an unexpected type. The parser returned an error of <insert_5> which may be useful in determining any invalid formatting.

This was not very helpful, what are insert_3, insert_4, insert_5, and where where are insert_1, insert_2.

I went for another cup of tea. Half way through eating a chocolate ginger biscuit I had inspiration, there might be information in the error logs.

I used tail -n100 /var/mqm/errors/*01*|less to look in the MQ errors log. This had a full error description and explanation (Hurrah!).

EXPLANATION:
parsing of JSON file ‘./ccdt.json‘ failed. The file was expected to contain an attribute named ‘channel‘ but this was not found or was defined with an unexpected type. The parser returned an error of ‘Required ‘name’ attribute is missing’ which may be useful in determining any invalid formatting.
ACTION: Check that the contents of the file use the correct JSON schema.

I checked my file – it had “channel” with an array of two elements (definitions)(tick), I had “type”: “clientConnection”, which is valid( tick).

By now I was getting bored with it, so I wrote some python code to take my ccdt.json and compare it with the IBM sample one. This told me I had defined “Name”, instead of “name”.

Where the documentation said ‘Required ‘name’ attribute is missing’, it did not mean the generic name:value, it meant the specific field called “name” was missing. So once I understood the problem, the error message made sense!

I fixed that, and a couple of other typos.

Displaying what MQ thinks is in the ccdt.json – Using runmqsc -n.

I had to use unset MQCHLLIB for this to work (as above). Then I ran runmqsc -n and this gave me.

5724-H72 (C) Copyright IBM Corp. 1994, 2019.
Starting local MQSC for ‘AMQCLCHL.TAB’.

(Ignore the ‘AMQCLCHL.TAB’ which is confusing and not true – it would be nicer if it were to say Starting local MQSC for ‘ccdt.json’)

dis chl(*)
1 : dis chl(*)
AMQ9696E: JSON attribute ‘[1] COLIN: sharingConversations’ has an invalid value or an unexpected type.
AMQ9555E: File format error.

The explanation said Ensure string values use double quotes and ensure numeric and boolean values are unquoted.

What does the error message mean?
I had two entries in the file for channel with “name”: “COLIN”.
For JSON attribute ‘[1] COLIN: sharingConversations’ this means

  • The data with key “name”: “COLIN”, with the attribute sharingConversations has a problem.
  • [1] COLIN means the second entry for COLIN, (the counting is 0 based).

I checked my file and I found I had specified “sharingConversations”: “30” with quotes when it should just be 30 (no quotes).

I fixed these, and next time my application worked using these definitions. It was time for another cup of tea and a second chocolate ginger biscuit to celebrate.

If you have specified

“timestamps”: { “altered”: “2018-12-04T15:37:22.000Z” }

This will display in the ALTDATE field. ALTDATE(2018-12-04) ALTTIME(15.37.22). If you do not specify this field you will get ALTDATE(1970-01-01) ALTTIME(01.00.00).

Putting your own fields in the ccdt.json file

I added a comment data into the file. For example

{
“comment”:
{“Createdby”: “Colin Paice”, “Ondate”: “March 2019”},
“channel”: […]
}

the queue manager ignores this; so you can add your own data into the file.

Checking the syntax of the file.

There are tools which can check the syntax of your .json file. I used some web based tools to create a schema from the IBM sample. I then used a validator to check the syntax of my ccdt.json. Overall, I thought it was not worth the effort, as I could not run in as a command line, and the output was not that useful.

I have created some python which takes a ccdt.json, and makes sure all the fields are also in the IBM sample.json file, and that the type of the values are the same. For example with

  • “SharingConversations”: 30 it reports “SharingConversations” not found in … sharingConversations …, so you can spot the spelling mistakes, and
  • “sharingConversations”: “30” it reports types do not match sharingConversations in….

You can install it using

git clone https://github.com/colinpaicemq/MQccdt.json/

then
cd MQccdt.json/MQccdt
and use it
python3 ccdt.py –ccdt …path_to_ your_ccdt
or
python3 ccdt.py –ccdt …path_to_ your_ccdt -schema …path_to_ your_full_ccdt.json

One definition, multiple connections – No Initial connection balancing

You can have one definition with multiple host names.

{ “channel”:
[ {“name”: “COLIN”,
“clientConnection”:
{
“connection”:
[{“host”: “localhost”,”port”: 1414},
{“host”: “localhost”,”port”: 1416}
],
“queueManager”: “GROUP”
},
“type”: “clientConnection”,

With this, my program always connected to the last entry (port 1416) it is was active, if it was not active it chose port 1414. I did not get connections balanced across the available channels.

Multiple definitions, one connection each – No Initial connection balancing

I had a channel[{“name”:”COLIN”,… } {“name”:”COLIN”…}]and “connectionManagement”: { “clientWeight”: 90} on both.
It always connected to the second queue manager.

If I changed the second to have {“clientWeight”: 89} it always connected to the first queue manager.

So it looks like some of the parameters for doing the initial connection balancing are not working.

Tailoring the definitions

I used the shell command

sed ‘s/localhost/remotehost/g’ ccdt.json|sed ‘s/1414/2424/g’

to change localhost to remote host, and port 1414 to 2424