Getting round the problems with amqsevt.

There is a great sample program amqsevt(a) for printing out data from queues in PCF format, for example event queues, and stats and accounting. I use this to output the data in json format and then process it in python scripts or other tools which handle json format.
Ive noticed a couple of tiny problems with it – which are easy to get round.  I spotted these when trying to parse a file with the json data in it.

  • The output is sometimes like {..} {..} which is not strictly valid json.  It should be [{…},{..}] but hard to create this when streaming the data.  I got json.decoder.JSONDecodeError: Extra data:….
  • It sometimes reports hex strings as very large decimal numbers for example
    • “msgId” :      “414D5120514D412020202020202020202908BA5CF4C9AB23” is OK
    •  “correlId” :   0000000000000000000000000000000000000000000000000 and the json parser complains.  I had json.decoder.JSONDecodeError: Expecting ‘,’ delimiter pointing to the middle of the line with the problem.

I fixed these by passing the json through the great utility jq which cleans this up.

For example

/opt/mqm/samp/bin/amqsevt -m QMA -q SYSTEM.ADMIN.TRACE.ACTIVITY.QUEUE -o json|jq -c .  |python myprog.py
and use the python code
for x in range(…):
     line = sys.stdin.readline()
     if not line: break
     j = json.loads(line)

jq also cleans up the “correlId” :  000000000000000000000000000000000000000000000  to
“correlId” :   0

Build the sample

To build amqsevta.c, I used the make file

cparms = -Wno-write-strings -g 
clibs = -I. -I../inc -I'/usr/include' -I'/opt/mqm/inc'
rpath = -rpath=/opt/mqm/lib64 -Wl,-rpath=/usr/lib64 
lparms = -L /opt/mqm/lib64 -Wl,$(rpath) -lmqic_r -lpthread 
% : %.c
     gcc -m64 $(cparms) $(clibs) $< -o $@ $(lparms)

and the command make -f xxxx amqsevta

 

Should we share everything or share nothing?

We have the spectrum of giving every application having its own queue manager spread across Linux images, or having a big server with a few queue managers servicing all the applications.

The are good points and bad points within the range of share nothing, share all.

What do you need to consider when considering sharing of resources within the environment.

You may want to provide isolation

  • For critical applications, so other applications cannot impact it (keep trouble out)
  • To protect applications from a “misbehaving” application and to minimise the impact, (keep trouble in).
  • For regulatory reasons
  • For capacity reasons
    • Disk IO response time and throughput.
    • Amount of RAM needed
    • Amount of virtual storage
    • Number of of TCP ports
    • Number of MQ connections
    • Number of file connections in the Operating System
  • For security. It is often easier to deny people access to an image, than to put all the controls in place within the image.
  • You have more granularity at shutting down an image – fewer applications are impacted.
  • Restart time may be shorter.

If your requirements don’t fit into the above, you should consider sharing resources.

The advantages of sharing

  • Fewer environments to manage.
    • Provisioning can be done with automation, but the large number of small images can be hard to manage.
    • Monitoring is easier – you do not have so many systems to look at. (How big a screen do you need to have every system showing up on it ?)
    • You have to do changes and ugprades less frequently
  • Removing images tends to leave information behind – for example information about a deleted clustered queue manager stays in a full repository for many days.
  • The operating system may be able to manage work better than a VM hypervisor “helping”.
  • Fewer events and situations to manage.
  • By having more work the costs can be reduced. For example a channel with 10 messages in a single batch uses much less resources than 10 batches of one message.

What else?

You may need to provide multiple queue managers for availability and resilience, but rather than provide two queue manager for applicationA, and two queue managers for applicationB, you may be able to have two queue managers shared, with each queue manager supporting applicationA and applicationB.

You can provide isolation within a queue manager by

  • Having specific application queue names, for example the queues names start with the application prefix. You can then define a security profile (authrec) based on that prefix.
  • You can use split cluster transmit queue (SCTQ) so clusters do not share the SYSTEM.CLUSTER.TRANSMIT.QUEUE – but have their own queue and their own channels.

You may think by providing multiple instances you are providing isolation. There can be interaction with others at all levels, queue manager, operating system, hypervisors, disk subsystem network controllers – you just do not see them.

I had to work on a problem where MQ in a virtualized distributed environment saw very long disk I/O response times. We could see this from an an MQ trace, and from the Linux performance data. The customer’s virtualization people said on average the response time was OK, so no issue. The people in charge of the Storage Area Network, said they could not see any problems. The customer solved this performance problem by making all messages non persistent – which solved the performance problem – but may have introduced other data problems! As my father used to say, the more moving parts, the more parts that can go wrong.

Why do I have no authority?

We tried setting up dmpmqcfg for a general user, and had various security problems. This blog gives you information on how to set up security, and where to find more information.
At the bottom we give all of the security commands we needed.

While doing research on this, I wrote some other blog posts on security

What is dmpmqcfg?

This program dumps the mq configuration, object definitions, security etc, so they can be restored, or used as a master copy to see what has changed.
The documentation for dmpmqcfg is pretty good. It tells you what authorizations you need, and with these the command worked.

Although we got the command to work, we had to do additional configuration, as the documentation says The user must … , and (+dsp) authority for every object that is requested,… so few objects were dumped, until we fixed this, and then we got all of the object dumped.
To illustrate how solve problems, we did not completely follow the instructions.

Actually using dmpqmcfg

The testuser user issued command dmpmqcfg -a and got
AMQ8135E: Not authorised.
The error log had

10/04/19 09:19:44 – Process(10654.36) User(colinpaice) Program(amqzlaa0)
Host(colinpaice) Installation(Installation1)
VRMF(9.1.2.0) QMgr(QMA)
Time(2019-04-10T08:19:44.500Z)
CommentInsert1(testuser)
CommentInsert2(QMA [qmgr])
CommentInsert3(connect)
AMQ8077W: Entity ‘testuser’ has insufficient authority to access object QMA [qmgr].
EXPLANATION:
The specified entity is not authorized to access the required object. The following requested permissions are unauthorized: connect
ACTION:
Ensure that the correct level of authority has been set for this entity against the required object, or ensure that the entity is a member of a privileged group.

This was very clear and easy to follow.

If you have ALTER QMGR AUTHOREV(ENABLED), you will get events generated for security violations. You can use can use the following to process the authorization event,
/opt/mqm/samp/bin/amqsevt -m QMA -o json -w 1 -q SYSTEM.ADMIN.QMGR.EVENT
but the AMQERROR01.LOG is easier to read and has the correct actions.

We fixed the connection problem by giving connect authority
setmqaut -m QMA -t qmgr -g test +connect

We retried and got
AMQ9505E: Program unable to open object SYSTEM.DEFAULT.MODEL.QUEUE
The error log gave

10/04/19 09:32:23 – Process(10654.41) User(colinpaice) Program(amqzlaa0)
Host(colinpaice) Installation(Installation1)
VRMF(9.1.2.0) QMgr(QMA)
Time(2019-04-10T08:32:23.050Z)
CommentInsert1(testuser)
CommentInsert2(SYSTEM.DEFAULT.MODEL.QUEUE [1003])
AMQ8245W: Entity ‘testuser’ has insufficient authority to display object
SYSTEM.DEFAULT.MODEL.QUEUE [1003].

EXPLANATION:
The specified entity is not authorized to display the required object. The following requested permissions are unauthorized: dsp
ACTION:

Ensure that the correct level of authority has been set for this entity against the required object, or ensure that the entity is a member of a privileged group.

Again a very clear message.

We used the command
setmqaut -n SYSTEM.DEFAULT.MODEL.QUEUE -m QMA -t queue -g testuser +dsp
and the dmpmqcfg worked!
To be able to use a model queue, then you need +dsp authority
What commands did we need? – Thanks to Tushar Shukla for this list

setmqaut -m QMA -t qmgr-g test+connect +inq +dsp
setmqaut -m QMA -n “**” -t queue -g test+dsp +inq
setmqaut -m QMA -n “**” -t topic -g test+dsp +inq
setmqaut -m QMA -n “**” -t channel -g test+dsp
setmqaut -m QMA -n “**” -t process -g test+dsp +inq
setmqaut -m QMA -n “**” -t namelist -g test+dsp +inq
setmqaut -m QMA -n “**” -t authinfo -g test+dsp +inq
setmqaut -m QMA -n “**” -t clntconn -g test+dsp
setmqaut -m QMA -n “**” -t listener -g test+dsp
setmqaut -m QMA -n “**” -t service -g test+dsp
setmqaut -m QMA -n “**” -t comminfo -g test+dsp
setmqaut -m QMA -n “SYSTEM.DEFAULT.MODEL.QUEUE” -t queue -g test+dsp +get +put
setmqaut -m QMA -n SYSTEM.ADMIN.COMMAND.QUEUE -t queue -g test+dsp +inq +put

or runmqsc commands like
set authrec profile(‘**’) objtype(authinfo) authadd(dsp) group(‘test’)

Why -n “**” ? See here.

Lots of error messages in AMQERR01.LOG.

When setting this, up we got lots of message in the error log
AMQ8245W: Entity ‘testuser’ has insufficient authority to display object oooo [objtype]

So you should set up authorities and determine what you want the userid to be able to dump before trying the dmpmqcfg command.


What is profile self in display authrec?

You can give authority to connect using

setmqaut -m QMA -t qmgr -g test +connect

The defintions have to hang off a profile name. When using the queue manager, it has an internally used profile of “self”.

PROFILE(self) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(QMGR) AUTHLIST(CONNECT,DSP)
PROFILE(@class) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(QMGR) AUTHLIST(NONE)

Why is there a @class entry? See here.

If you remove authority from a group or userid, the entry is left, but with access (NONE).

dis authrec objtype(qmgr)

PROFILE(self) ENTITY(testuser) ENTTYPE(GROUP) OBJTYPE(QMGR) AUTHLIST(NONE)

What is @class in authrec in midrange?

Before a user or group can be given access to a specific profile and object type, it needs to have a profile called “@class” in the object type.

This “@class” profile is used for authorising the create object of the specified object type.

The commands

set authrec profile(‘ZZ*’) objtype(namelist) group('test') authadd(INQ)

dis authrec objtype(namelist) group('test')

gave two profiles one for the class and one for the specific resource.

PROFILE(@class) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(NONE)
PROFILE(ZZ*) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(INQ)

So we can see that group test is authorised to inquire with the profile ZZ* for NAMELIST.

But because of PROFILE(@class) OBJTYPE(NAMELIST) AUTHLIST(NONE) , group test it is not authorised to create a namelist.

If you want to control delete name list you specify

set  authrec profile('ZZ*') objtype(namelist) group('test') authadd(DLT)

and the display now gives

PROFILE(ZZ*)ENTITY(test)ENTTYPE(GROUP)OBJTYPE(NAMELIST) AUTHLIST(DLT,INQ)

To display people who have been given any authority to an object type use,

dis authrec profile('@class')objtype(namelist)

PROFILE(@class) ENTITY(colinpaice) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(CRT)
PROFILE(@class) ENTITY(mqm) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(CRT)
PROFILE(@class) ENTITY(testuser) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(NONE)
PROFILE(@class) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(NAMELIST) AUTHLIST(NONE)

Shows that ids in the group colinpaice, and mqm can create namelists. Userids solely in group test or testuser cannot. Userid colinpaice in groups mqm and test is authorised to create name lists. Being in at least one group which is allowed to create a resource means the userid is allowed to create a resource.

Can I clean up the entries?

After using my queue manager for a while I found there were entries like

PROFILE(@class) ENTITY(…) ENTTYPE(PRINCIPAL) OBJTYPE(QMGR) AUTHLIST(NONE)

which existed even though the principal or group had been deleted from MQ.

You cannot delete these entries.

These display authority commands are difficult to use!

I was asked to explain how the midrange security commands work. At first glance it looked pretty easy, but then I tried to use them, and got very confused when it did not work as I expected.

Some of the complexity comes from userids which are also groups, definitions which look like generics but are not generic, some of the definitions seem backwards, and some more documentation is needed! Read on to see how I was baffled and felt very confused.

I was talking to Morag on this, and she said that MQGEM do education on authorization – see here.

 

I was running on Ubuntu (18.04)

Queue manager running userid or group based authorization.

A queue manger can be set up to use user-based or group-based authorization. See here and here. The default is group based.

A linux userid has a private group with the same name as the userid

I set up a userid testuser which had effectively no authority to do anything. This has a (private) group with the same name as the userid. See here.

The command id testuser gives

uid=1002(testuser) gid=1004(testuser) groups=1004(testuser)

I used

sudo groupadd test 
sudo adduser testuser test

to add testuser to the group called test, and will use group test in the rest of the discussions.

There are two ways of displaying and setting MQ authorisation information

There are two ways of using the MQ security commands

  1. Runmqsc and display/delete/set AUTHREC. You can use runmqsc as a client from a remote machine. This can be used for an MQ appliance.
  2. setmqauth and dspmqauth shell commands. You need to have access to the shell environment to be able to issue these commands. This cannot be used for an MQ appliance.

The documentation has similar content but the runmqsc set authrec command is slightly better.

For example see here.

  • runmqsc set authrec explains DSP : Display the attributes of the specified object using the appropriate command set . But it is not clear what a command set is. I think it means PCF or MQSC.
  • setmqauth shows DSP – but does not explain what DSP provides

The syntax of the commands is similar, but different, and this caught me out for a while. For example I used

setmqauth -m QMA -t qmgr -p testuser +inq -dsp

but with the runmqsc I had to specify principal(‘testuser’) in quotes – because as with all runmqsc fields they get converted to upper case when the string is not quoted!

Creating and using profiles

I created profiles

set AUTHREC PROFILE(COLIN_1) OBJTYPE(QUEUE) group('test') AUTHADD(GET) 
set AUTHREC PROFILE(COLIN_2) OBJTYPE(QUEUE) group('test') AUTHADD(SET)
set AUTHREC PROFILE(COLIN_3*) OBJTYPE(QUEUE) group('test') AUTHADD(INQ)
set AUTHREC PROFILE(COLIN_*) OBJTYPE(QUEUE) group('test') AUTHADD(GET,PUT)

When security checks are done, if there are choice of records for a queue the definition, is the most specific one. See here. I could not find anything in MQ which told me which actual profile was used – even though MQ knows this information!

If ‘testuser’ in group test, wants to access some queues, the userid can open

COLIN_1  for get
COLIN_2 for set
COLIN_3 for inquire
COLIN_33 for inquire. The COLIN_3* is more specific than COLIN_*
QUEUE_11 for put + get. This is from the COLIN_* definition.

Displaying a profile

If you issue DIS QUEUE(COLIN*), the * acts as a generic character and says display any queues beginning with COLIN.

The DIS AUTHREC is different. DIS AUTHREC PROFILE(COLIN*) does not say show all profiles beginning with COLIN, it says show me the actual profile COLIN* . Just the same as if it was COLIN@.

Below are some display commands and their responses

DIS AUTHREC PROFILE(COLIN*) returned no object, as explained above.

DIS AUTHREC PROFILE(COLIN_*) this is saying give me the specific profile COLIN_* and it returned

PROFILE(COLIN_*) ENTITY(test) ENTTYPE(GROUP) OBJTYPE(QUEUE) AUTHLIST(GET)

DIS AUTHREC PROFILE(COLIN_1) returned two entries (as both of them could potentially apply)

PROFILE(COLIN_1) … AUTHLIST(GET)
PROFILE(COLIN_*) … AUTHLIST(GET,PUT)

DIS AUTHREC PROFILE(COLIN_*) GROUP(test)returned

AMQ8459I: Not found. This is because the test became upper case TEST

DIS AUTHREC PROFILE(COLIN_*) GROUP(‘test’) returned

PROFILE(COLIN_*) … AUTHLIST(GET)

DIS AUTHREC PROFILE(COLIN_*) group(‘mqm’) returned

AMQ8459I: Not found.

To summarize,

If you issue a display command for a generic looking profile – it will display profiles with the specific name including the ‘*’.

If you display a specific looking name, it will act like a generic, and display all the records which apply to the specific name.

So you can see why I was confused – but it gets more complex.

MATCH(MEMBERSHIP)

There is a parameter MATCH with default PROFILE which says return the profiles and the the value of the principal eg GROUP(TEST), this is what happened above.

There is also MATCH(MEMBERSHIP). This looks into the definitions and gets the list of the userid’s groups and displays the authrecs for the specified userid.

dis authrec profile(COLIN_2) objtype(queue) principal(‘testuser’) match(membership)returned

PROFILE(COLIN_2) ENTITY(test) ENTTYPE(GROUP) … AUTHLIST(SET)

This is because userid testuser is in group test.

You can also specify MATCH(EXACT). This returns the specified profile, and specified principle.

DIS AUTHREC PROFILE(COLIN_1) match(PROFILE)returned

PROFILE(COLIN_1) ENTITY(test)…
PROFILE(COLIN_*) ENTITY(test)…

DIS AUTHREC PROFILE(COLIN_1) match(EXACT)returned

PROFILE(COLIN_1) ENTITY(test)..

How do I find the list of profiles if I cannot use a generic search argument?

To list all queue auth records applicable for group test use

DIS AUTHREC objtype(QUEUE) group(‘test’)

To list all available auth records applicable for group test use

DIS AUTHREC group(‘test’)

To list all auth records, for all objects for all users and groups use

DIS AUTHREC which you may want to capture in a file using

echo “DIS AUTHREC “ |runmqsc QMA > authrec.txt

Good things about the the CCDT in json, and my sorry journey.

I thought that a Client Channel Definition Table (CCDT) in json will be very good for enterprise customers, as they can now do change management, and parameterize key values.

As usual with any new stuff, I tried to get it to work, and tried the silly user errors that people often make, to see how it holds up.

This is a long blog post, so I’ve created some sections

The good stuff – one liners

Having a human (and machine) readable file has many advantages

  • You can use change control on it
  • You can add your own name:value fields such as “comment”:[”Changed by Colin”,”Date-March 2019”], these are ignored by MQ
  • You can use commands like sed to take the .json file, change the contents and create a new json file – for example change the host name and port for production
  • If you have the same channel defined in several queue managers you can have one definition and provide an array of hostnames and their ports.
  • You can use runmqsc -n to display the mq contents of the .json file.

Ive written a short python script which checks the syntax of the .json file, see below.

When deploying it for the first time, you should introduce errors to the .json file, to ensure you are actually using the ccdt.json and not alternative ways of defining channels.

My journey first steps – happiness

I ran on Ubuntu 18.04.

I used runmqsc to display my existing CLNTCONN channel definitions. I also used the provided sample with the complete list of ccdt channel attributes.

I used gedit to edit the file as this has highlighting support for json (other editors and eclipse also have this support).
Creating the file was a bit slow, instead of a simple list of entries like

“description”: “colins channel”,
"host": "localhost",
"port": 1416,

you have some values nested within structures, for example

"connection":  {  "host": "localhost", "port": 1416  },
“description”: “colins channel”,

I had to keep referring to the documentation to look at the structure. I also just used the above sample, “filled in the blanks”, and remove the unused sections.

You have to convert from terms QMNAME to queueManager (remembering to get the spelling, and upper case/lower case correct).

I eventually completed my ccdt.json , and updated my mqclient.ini to include the definition.

CHANNELS:

ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416)

MQReconnectTimeout=30

ReconDelay=(1000,200)(2000,200)(4000,1000)

ChannelDefinitionDirectory=.

ChannelDefinitionFile=ccdt.json

I tried running my programs – and they all worked – first time. Hurrah! I went for a beer to celebrate this unusual event.

Second steps, depression

A bit later, I changed the definitions, restarted my programs – and the changes made no difference. I got depressed and went for a cup of tea.

About an hour later, I discovered that I still had ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416) in my mqclient.ini file! I commented out this ServerConnectionParms statement.

Then I found I had the environment variable MQCHLLIB set, soI used unset MQCHLLIB

The IBM Knowledge Centre says If the location is specified both in the client configuration file and by using environment variables, the environment variables take priority.

I tried my program again. This time I got messages

AMQ9695E: JSON file format error for ‘./ccdt.json’.

And my program gave

MQCONNX to *GROUP cc 2 rc 2058 MQRC_Q_MGR_NAME_ERROR

This was a major step forward as it proved I was finally using the ccdt.json file. This is why I recommend you introduce a few errors in your .json file the first time you use it.

I searched for the message in the KC for AMQ9695E I got a hit on the page, but searching within the page, it was not found; but AMQ9695 without the E was found! (As a psychic programmer you are meant to know to drop the last letter off the message number).

The explanation from the KC. was Parsing of JSON file <insert_3> failed. The file was expected to contain an attribute named <insert_4> but this was not found or was defined with an unexpected type. The parser returned an error of <insert_5> which may be useful in determining any invalid formatting.

This was not very helpful, what are insert_3, insert_4, insert_5, and where where are insert_1, insert_2.

I went for another cup of tea. Half way through eating a chocolate ginger biscuit I had inspiration, there might be information in the error logs.

I used tail -n100 /var/mqm/errors/*01*|less to look in the MQ errors log. This had a full error description and explanation (Hurrah!).

EXPLANATION:
parsing of JSON file ‘./ccdt.json‘ failed. The file was expected to contain an attribute named ‘channel‘ but this was not found or was defined with an unexpected type. The parser returned an error of ‘Required ‘name’ attribute is missing’ which may be useful in determining any invalid formatting.
ACTION: Check that the contents of the file use the correct JSON schema.

I checked my file – it had “channel” with an array of two elements (definitions)(tick), I had “type”: “clientConnection”, which is valid( tick).

By now I was getting bored with it, so I wrote some python code to take my ccdt.json and compare it with the IBM sample one. This told me I had defined “Name”, instead of “name”.

Where the documentation said ‘Required ‘name’ attribute is missing’, it did not mean the generic name:value, it meant the specific field called “name” was missing. So once I understood the problem, the error message made sense!

I fixed that, and a couple of other typos.

Displaying what MQ thinks is in the ccdt.json – Using runmqsc -n.

I had to use unset MQCHLLIB for this to work (as above). Then I ran runmqsc -n and this gave me.

5724-H72 (C) Copyright IBM Corp. 1994, 2019.
Starting local MQSC for ‘AMQCLCHL.TAB’.

(Ignore the ‘AMQCLCHL.TAB’ which is confusing and not true – it would be nicer if it were to say Starting local MQSC for ‘ccdt.json’)

dis chl(*)
1 : dis chl(*)
AMQ9696E: JSON attribute ‘[1] COLIN: sharingConversations’ has an invalid value or an unexpected type.
AMQ9555E: File format error.

The explanation said Ensure string values use double quotes and ensure numeric and boolean values are unquoted.

What does the error message mean?
I had two entries in the file for channel with “name”: “COLIN”.
For JSON attribute ‘[1] COLIN: sharingConversations’ this means

  • The data with key “name”: “COLIN”, with the attribute sharingConversations has a problem.
  • [1] COLIN means the second entry for COLIN, (the counting is 0 based).

I checked my file and I found I had specified “sharingConversations”: “30” with quotes when it should just be 30 (no quotes).

I fixed these, and next time my application worked using these definitions. It was time for another cup of tea and a second chocolate ginger biscuit to celebrate.

If you have specified

“timestamps”: { “altered”: “2018-12-04T15:37:22.000Z” }

This will display in the ALTDATE field. ALTDATE(2018-12-04) ALTTIME(15.37.22). If you do not specify this field you will get ALTDATE(1970-01-01) ALTTIME(01.00.00).

Putting your own fields in the ccdt.json file

I added a comment data into the file. For example

{
“comment”:
{“Createdby”: “Colin Paice”, “Ondate”: “March 2019”},
“channel”: […]
}

the queue manager ignores this; so you can add your own data into the file.

Checking the syntax of the file.

There are tools which can check the syntax of your .json file. I used some web based tools to create a schema from the IBM sample. I then used a validator to check the syntax of my ccdt.json. Overall, I thought it was not worth the effort, as I could not run in as a command line, and the output was not that useful.

I have created some python which takes a ccdt.json, and makes sure all the fields are also in the IBM sample.json file, and that the type of the values are the same. For example with

  • “SharingConversations”: 30 it reports “SharingConversations” not found in … sharingConversations …, so you can spot the spelling mistakes, and
  • “sharingConversations”: “30” it reports types do not match sharingConversations in….

You can install it using

git clone https://github.com/colinpaicemq/MQccdt.json/

then
cd MQccdt.json/MQccdt
and use it
python3 ccdt.py –ccdt …path_to_ your_ccdt
or
python3 ccdt.py –ccdt …path_to_ your_ccdt -schema …path_to_ your_full_ccdt.json

One definition, multiple connections – No Initial connection balancing

You can have one definition with multiple host names.

{ “channel”:
[ {“name”: “COLIN”,
“clientConnection”:
{
“connection”:
[{“host”: “localhost”,”port”: 1414},
{“host”: “localhost”,”port”: 1416}
],
“queueManager”: “GROUP”
},
“type”: “clientConnection”,

With this, my program always connected to the last entry (port 1416) it is was active, if it was not active it chose port 1414. I did not get connections balanced across the available channels.

Multiple definitions, one connection each – No Initial connection balancing

I had a channel[{“name”:”COLIN”,… } {“name”:”COLIN”…}]and “connectionManagement”: { “clientWeight”: 90} on both.
It always connected to the second queue manager.

If I changed the second to have {“clientWeight”: 89} it always connected to the first queue manager.

So it looks like some of the parameters for doing the initial connection balancing are not working.

Tailoring the definitions

I used the shell command

sed ‘s/localhost/remotehost/g’ ccdt.json|sed ‘s/1414/2424/g’

to change localhost to remote host, and port 1414 to 2424

Do not do unnatural things with clustering.

I’ll cover an interesting clustering scenario, and discuss how it could be improved, but first I’d like to mention my grandfather’s axe. I still have it. My father replaced the head, and I replaced the handle – but it is still my grandfather’s axe.

I was looking at a customer’s configuration, and was told “this is the original architecture”. Except they replaced this part with a cluster, and they restructured those applications to be in a different cluster, but it is still their original configuration, and of course the picture is ten times the size from when they started with MQ.

undefined


The simplified picture has a blue cluster and a yellow cluster, and the full repository acts for both clusters.

An application attached to QMA to send a message to QMB, using a clustered Queue Remote defined in the full repository(FR). This mapped to a clustered queue in QMB. So for the MQPUT, the message flowed to the full repository, and was put to a clustered queue, and the message was sent to QMB where it was processed.

This is not efficient as you get double puts and gets, and more opportunities for breakages. Yes, it is using clustering, but it is not a natural use of clustering.

It would make much more sense to put QMA and QMB in the same cluster and save a lot of CPU. This would also avoid a mess when trying to sort it out.

We had a discussion about the architecture and if we could change it. The original architect retired 10 years ago, and the chart(singular) describing the architecture and the ideas behind it, was lost when a laptop was returned and the hard drive was reformatted.

Quick summary of channels used in clustering

In a cluster there are three types of cluster channels

  1. The cluster receiver – this is defined for a queue manager to provide a template for other queue managers to connect to it.
  2. The cluster sender – which connects to the full repository. You do not need to connect to all the full repositories as the definitions for the other full repositories will flow down.
  3. Automatically defined channels between two queue managers. For queue manager QMA to create a channel to QMB, it uses the cluster receiver channel defined on QMB and sent to the full repository.

Is there any advantages in having the existing configuration?

I cannot think of a very good reason for this, I can think of reasons for which this strange configuration is valid – but they still feel wrong!

  1. Before clustering some people had bad experiences of connecting a queue manager to all other queue managers, and the nightmare of managing these connections. Clustering solved the definitional problem. You have only to define two channels per queue manager, not hundreds or thousands. When clustering is used, channels between queue managers will be created dynamically and started as needed. You may get hundreds to channels started, but you do not have to define them. With the overlapping clusters in the picture, you limit the number of channels being started, and force a “hub and spoke” rather than the direct link you get with clustering. With a good automation package, the you should be able to automate the management of the channels, and collect performance data etc.
  2. Number of connections. If you have a large MQ estate, for example 100 queue managers at the back end. You may more than 100 cluster channels active. This should not a problem, you may just have to configure your queue managers to handle more connections. (If there were 10,000 connections we would have a different discussion).
  3. Capacity. QMA and QMB may not have the capacity to store a large number of message, so using the full repository with space for deep queues may be a solution. (But remember a good queue is an almost empty queue).
  4. Security. By having a channel exit on the full repository, you can check the data and authorization. If the control data is on the full repository system, it may be hard to put the exits on the other queue manager. I think you should review the architecture, and look at caching security data on the queue manager machines.
  5. Message logging. This could be duplicating a message, or updating a database with message content. It feels the architecture is wrong. I think a better architecture would be to do two puts in the original application, or an MQPUT and a remote DB2 insert. – but this could affect performance.

How do we fix this?

In principle you just move QMB into the blue cluster, and just remove the QREMOTE definitions from the full repository.

The word that jumps out at me is “just”.

You can change the channel and queue on QMB to use a namelist of both clusters. That is easy, it is the next steps that could cause a hiccup.

With asynchronous processing, events can happen at different times. You define a queue over here, and delete a queue from over there, and on a queue manger far, far away these operations get done in the reverse order.

Let the clustered remote queue on the full repository is called SERVER_on_FR, which points to the queue SERVER_on_QMB, a clustered queue on QMB.
The application attached to QMA does MQOPEN to SERVER_on_FR, and due to the magic of clustering it all works as expected, a message arrives on the SERVER_on_QMB queue.

If you define a clustered QR(SERVER_on_FR) on QMB, pointing to SERVER_on_QMB. There will now be two queues called SERVER_on_FR in the cluster. Both queues may be used, depending on the configuration.

You cannot just delete the QR definition SERVER_on_FR on the FR as there may be messages on cluster transmit queues heading for this queue, and some queue managers may not have seen the updates about the new queue definition. Receiver channels on FR may try putting to the queue to find it gone. (If you get confused, as I did, try reading the section again)

You need to alter the queue on FR to make it cluster(), that is, remove it from all clusters. Over time (minutes to days) this will propagate to all queue managers, and so queue managers will not use it. Message in the cluster transmit queue should all have been processed.

After a suitable interval you can then delete the QR from the FR system.

Your troubles are not over, as now you have a queue called “SERVER_on_FR” on other queue managers than FR. On QMA you could create a QR called “SERVER_on_FR” which points to SERVER_on_QMB, or (better) change the application to use queue SERVER_on_QMB, or even better just use queue name SERVER! but there is a good chance you’ve lost the source for this application.

If you now scale this up to an enterprise you see what a mess this now is.

As a result of doing unnatural things with clustering, you have extra puts and gets, indirect channels, and a mess of queue names – it is much easier to “Keep It Simple Stupid”, and let clustering do what it was designed to do.

Uniform clustering in 9.1.2 gets a tick – and a caution from me.

In MQ 91.2. there is a new function called Uniform Clustering, which I thought looked interesting (with my background in performance and real customer usage of MQ).

Ive had a play with it, and written up what I have found.

What is it?

When Uniform Clustering is active and it detects an imbalance in the number of conversations across queue managers, it can send a request to a connected application to request disconnect and reconnect. This happens under the covers, and it means you do not need to write code to handle this.

MQ has supported client reconnect for a few years. In V8.0 you can stop a channel, or use endmqm -r to get the channels to automagically disconnect and reconnect to a different queue manager with no application code.

I would call it conversation balancing with a side effect of workload balancing. It helps solve the problem where one server is getting most of the work and other servers are under utilized.

By having the connections for an application spread across all of the available queue managers, it should spread the workload across the available queue managers, but the workload balancing depends on the spread of work on each connection.

The documentation originally talked about application balancing – which I think was confusing, as is does not balance applications, it balances where the applications connect to.

A good client has the following characteristics

  1. It connects for a long time, and avoids frequent short lived connections.
  2. It periodically disconnects and reconnects, so over time the connections are spread across all servers.
  3. More instances can be started if needed to service the queues. These instances can be spread around the available servers.
  4. Instances can shut down if there is no work for them. For example MQGET wait for 10 minutes and no message arrives.

The Uniform Clustering helps automate the periodic disconnect and reconnect (situation 2 above).

The IBM documentation says it simplifies the administration and set up – I cannot see how this helps, as you have to define the queues and channels anyway – they do not need to be clustered.

The IBM documentation says Uniform Clustering moves reconnection logic from the application to the queue manager. This is true, but production ready applications need to have additional logic in them to support this (see below).

You should not just turn on Uniform Clustering, you need to review your applications to check they can run in this environment. If you just turn it on, it may appear to work; the problems may be subtle, show up at a later date, and also make trouble shooting harder.

How does it work?

Once the queue managers have been set up, they monitor the number of instances of applications connected to the queue manager. If you have two queue managers and have 20 instances of serverprog connected to QMA, and 0 instances connected to QMC, then over time some of the connections to QMA will be told to disconnect and reconnect, some may reconnect to QMA, and some may reconnect to QMC. Over time the number of conversations should balance out across the available queue managers.

Below are some charts of showing how this balancing works. I had a number of “server” program connected as a client. They started and all sessions connected to QMA. They did not process any messages. From the reports produced by my MQCB program, I could see when application instances were asked to disconnect and reconnect.

The chart below shows the rate of reconnecting for 20 servers connecting as clients to 2 queue managers – doing no work. After 300 seconds there were 10 connections to each queue manager.undefined

The chart below shows the rate of reconnecting for 80 servers connecting as clients to 2 queue managers – doing no work. After 468 seconds there were 40 connections to each queue manager.

We can see that balancing requests are sent out every minute or two. The number of conversations moved depends on how unbalanced the configuration is. The time before the connections were balanced varied from run to run, but the above charts are typical.

What gets balanced.

I had two applications running into my queue managers. If you use DIS CONN(*) APPLTAG, it shows you the names of the programs running.

My client programs had APPLTAG(myclient), my server programs had APPLTAG(serverprog).

The uniform clustering will balance myclient programs as a group, and serverprog programs as a group.

You may have many client programs, for example hundreds of sessions in a web server, and only a few server programs processing the requests from the clients, so they may get balanced at different rates.

This looks like a really useful capability, but you need to be careful.

The MQ reconnection code will open the queue names you were using, and it is transparent to the application.

A thread may get a request to disconnect and reconnect, while the application is processing an MQ request, waiting for a message, or doing other work. For some application patterns this may not matter, for others you may need to take action.

Where’s my reply?

For a server application which does MQGET, MQPUT MQCOMMIT. If the reconnect request happens, the work can get backed out. Another application can process the work. Great – no problems.

For a client application, these do (MQPUT to server queue, MQCOMMIT), (MQGET wait on reply-to-queue, MQCOMMIT). The reconnection request can happen during the MQGET wait. The MQPUT request specified a reply-to queue, and reply-to queue manager. If the application has a reconnect request, it may connected to a different queue manager, so will not be able to get the reply message (as the message is on the original queue manager).

This problem is due to the reconnection support, and has been around for a long time, so most people will have a process in place to handle this. Uniform Clustering makes no difference to this, it happens without you knowing.

Reporting the wrong queue manager.

Good applications report problems with enough information to identify the problems. For example queue manager name, queue and unexpected return code. If you did MQINQ to find the queue manager name at startup, and if your application instance has been reconnected, the queue manager name may now be wrong.

  1. You can use MQCB to capture and report these queue manager changes, so the reconnects and new queue manager name are written to the application log.
  2. You could issue MQINQ for the queue manager name when you report an problem, but the connection may have moved by the time you report an problem.
  3. You also need to handle which queue manager the MQPUT was done on, as this could be different to where the MQGET completed. This might just be a matter of saving the queue manager name in a MQPUT_QM variable every time you do an MQPUT. You need to do this when tracking down missing messages – you need to know which system the MQPUT was done on.
  4. You could keep the time of the MQPUT, report “Reply not received, MQPUT was put at 12:33:44” and then review the application log (1 above) to see what it was connected to at that time.

What gets balanced

Conversations get balanced. So if you have a channel with 4 shared conversations, (DIS CHS gives CURSHRCNV(4)), you might end up with a channel to QMA with one conversation, a channel to QMB with two conversations and a channel to QMC with one conversation. Some channels may have only one conversation per channel instance.

Are there any new commands?

I could not find any new commands.

Can I turn it off this automatic rebalancing?

To put your queue manager in and out of maintenance mode, see here

This is a “challenge” with reconnection, not with Uniform Cluster support. If you change the qm.ini file and remove the

TuningParameters: 
UniformClusterName=MYCLUSTER

statements, this just means the applications connected to this queue manager will not get told to rebalance. You will still get applications trying to connect to the queue manager.

How do I put a queue manager in and out of maintenance mode when using client reconnect?

You want to do some maintenance on one of my queue managers, and want to stop work coming in to the queue manager, and restart work when the maintenance has finished – without causing operational problems.

Applications using reconnection support, can reconnect to an available queue manager. To stop an application connecting to a particular queue manager you need to stop the channel(s). STOP CHL(…) STATUS(STOPPED). An application using the channel will get notified, or reconnected. An application trying to connect, will fail, and go somewhere else.

If you have two channels, one for the web server clients, and a second channel for the server application on the queue manager, I dont think it matters which one you stop first.

  1. If you stop the client program, then the message will go to the server application, be processed and put on the reply queue. The client will not get the reply, as it has been switched.
  2. If you stop the server applications first, then the messages will accumulate on the server queue, until the server applications reconnect to the queue manager and process the queue.

In either case you can have orphaned messages on the reply to queue. You need a process to resolve these, or for non persistent message set a message expiry time.

Once you have done your maintenance work, use START CHL(…) for the server channel, wait for a server to connect to the queue manager and then use START CHL(…) for the client channel. It may take minutes for a server application to connect to the queue manager.

Do it in this order as you want the server to be running before client applications put to the server queue, otherwise you will have to handle time out situations from the application.