Some gotcha’s when using stats and accounting with MQ midrange

I’ve been writing some Python code to process the messages on SYSTEM.ADMIN.STATISTICS.QUEUE and SYSTEM.ADMIN.ACCOUNTING.QUEUE queues and to create tabular data in a .csv file. I fell over some problems, and I thought I would pass on my experiences.

I used /opt/mqm/samp/bin/amqsevt -m QMA -b -q SYSTEM.ADMIN.STATISTICS.QUEUE -o json -w 1 |jq . > statfile.txt

to read the message into json, and use jq to format them better, and write the output to the file statfile.txt.

I used the MQSA Python scripts which will soon be going up on MQTools GitHub repository, to actually produce the .csv files.

From here you can see the content of the messages.

Gotchas..

Not all fields are reported.

In the documentation, some fields are marked as “Always” present the others are marked “When available”. So fields like QName are always present, but PutCount is “When available” This means you cannot just go along the fields in the records and put the first field in column 1, the second field in column 2 etc. You have to put the “qname” data in the “qname” column etc, and you may have to skip columns if you do not have the fields. You also need to set up the column headers with all the possible fields. I used Python csv.DictWriter to write the data.

Many “fields” have multiple values.

For example “puts” has number of non persistent, number of persistent, for example puts[10,4]. I found it easier to add these together and provide one field puts[14]. Other fields, such as set have 8 fields, for queues, namelist…. Again, I added these up to provide one value.

Be careful when combining fields.

Finding the maximum message size put, PutMaxBytes is easy – it is the maximum of the non persistent and persistent values. The PutMinBytes is harder. If there were no persistent messages put, then the value in PutMinBytes will be zero! You need to factor this into your calculations. You need

If PutCount fields are present, and putMinBytes fields are present:
if PutCount[0] == 0 : # we did none
PutMinBytes = PutMinBytes[1] # use the other value
else: if PutCount[1] == 0:
PutMinBytes = PutMinBytes[0]
# else there were puts of P and NP messages so look at both
else: PutMinBytes = Min(PutMinBytes[0],PutMinBytes[1])
# we did no puts
else: PutMinBytes = 0 # theses fields were missing, or we did no puts.

Similarly merging TimeOnQAvg values needs care.

If the fields are present, and total number of gets > 0 :
TimeOnQAvg = (GetCount[0] * TimeOnQAvg[0]
+ GetCount[1] * TimeOnQAvg[1])
/(GetCount[0]+GetCount[1])

Date and time format.

The default format is “startDate”: “2019-02-27” and “startTime”: “17.03.41”. I changed these to “startDate”: “2019/02/27″ and “startTime”: “17:03:41”

Add up the put information

At a high level, you just want to know the total number messages put, and the total number of bytes put – and not if it came from a put, put1, or a topic put. To do this you need to do calculations like

puts_total = 0
put_names = {“puts”, “put1s”, “topicPuts”, “topicPut1s”}
for each element in put_names
if element is present in the data
puts_total += value
create element[“putsTotal”] = puts_total
put_bytes = 0
Do the same with elements = {“putBytes”, “topicPutBytes”}
create element[“putsBytesTotal”] = puts_bytes

Some tools like a date-time field.

I combined the above fields into one field. I effectively used startDate||’/’||startTime and passed this into the data-time code with the de-format string “%Y-%m-%d/%H.%M.%S” to parse it and produce a date time object.

Statistics messages are not produced at well defined times.

They are produced at an interval specified by the qmgr attribute STATINT. Mine was set to 600 (seconds). This produced records at 17:04:06, 17:14:06, 17:24:06. I stopped the queue manager and restarted it, and the next records came out at 17:32:37, 17:42:37, 17:52:37, and the times do not line up.

You need to think how to use this data. You may want to produce a chart showing usage over a day. The vertical access is the number of puts, the horizontal axis is hours. What do you do with a record which was from 19:45 to 20:05, produced at 20:05 ?

You could just record it in 20:00 to 21:00 bucket. This makes the peak look as it is was at 20:00 to 21:00 – not 19:00 to 20:00, so your graphs give the wrong message.

You could distribute the messages according to the time spent in each hour, so from 1945 to 2000 is 15 minutes, and from 2000 to 2005 is 5 minutes. You split the data in a ratio of 15:5 between the 19:00-20:00 bucket, and the 20:00 to 21:00 bucket. This is more more accurate, but still misleading. If you stop trading at 19:59:59. Splitting the data across the buckets will show usage past 20:00 which may not be true

You have to calculate the duration of the record.

Each record has a section called eventData with Start Date, Start Time, and End Date and Time. You can now calculate the duration yourself. As I needed date-time objects for another reason, I found it easiest to create the StartDateTime, and EndDateTime objects and then say duration = EndDateTime – StartDateTime. Rather than calculating EndTime – StartTime. If this is < 0 then add 24 hours.

Epoch needs to be changed.

You get data like

“eventCreation”: {
“timeStamp”: “2019-02-26T12:07:44Z”,
“epoch”: 1551182864
},

Epoch is the number of seconds from 1/1/1970. You can format it for example on Ubuntu, date –date @1551182864 gave Tue 26 Feb 12:07:44 GMT 2019. Unfortunately you cannot use this as a time stamp in some spread sheets, because they use a different day 0!, and so using “format as date” on the epoch integer give you the wrong answer. On Libre Office the calculation is epoch/86400+25569, and format as date time

Formatting the data in a spread sheet.

When you import the data into a spread sheet, you have to specify the formatting for some columns. To automate this, you may want to write a macro to do this automatically for you.

Using tools like Elasticsearch and Kibana.

Documents in Elastic search require a document type and unique id.

The document type could be MQ.Queue_Accounting, MQ.MQI accounting, MQ.Queue_Statistics, MQ.MQI_Statistics.

The unique id could be

qmgr,eventCreation_timeStamp for qmgr MQI statistics
qmgr,eventCreation_timeStamp.queueName for queue statistics
qmgr,eventCreation_timeStamp.processId.threadID for qmgr MQI accounting
qmgr,eventCreation_timeStamp.queueName.processId.threadID for queue accounting.

You can now easily display MQ PCF data using Python

For many years I have been frustrated with displaying data provided by MQ midrange in PCF format, as there was nothing to help you use it. Yes, you could display it, but it was a bit like saying … here is a bucket of bits, now build your own program from it to process the data – good luck.

I had started to write some C code to process it, but then I retired from IBM, and since then have found that Python is a brilliant systems management language.

I have put up on Github, some Python code which does the following.

You can create PCF command, such as INQUIRE QUEUES. It can then parse the output into “English” and you can then do things with it. By “English” I mean

Q_TYPE: LOCAL rather than the value 1
DEF_BIND: BIND_ON_OPEN rather than the value 16384

There are examples of what you can do with it.

get_pcf.py. You specify the connection details, and the queue, and it returns the data in json format which you can pipe into another command.

queues.py issues the INQUIRE QUEUE command. This outputs the data in json format. queues2.py then writes it into a file, one file per queue, one line per attribute. This is great

you do not have to worry about trying to parse the output from runmqsc
you can use standard tools on the file
you can do more…

There is a diff.py sample you give it a list of files and it tells you the difference in the queue definitions (while ignoring attributes like change date), for example

CP0000.yml CP0001.yml : Q_NAME CP0000 / CP0001
CP0000.yml CP0001.yml : Q_DESC Main queue / None
CP0000.yml CP0001.yml : MAX_Q_DEPTH 2000 / 5000
CP0000.yml CP0001.yml : Q_DEPTH_HIGH_EVENT ENABLED / DISABLED

There is standards.py which allows you to check attributes in the .yml file meet your corporate standards!

There is events.py and events2.py so you can now process the events produced by the define, delete and alter commands, and see who made the change what the change was, and when the change was made.

I am working on making the stats and accounting usable, so you can create a .csv file with useful data in it, or pass it into Kibana and other tools. So watch this space.

I would welcome any comments of feedback. Ive had one already. When using Eclipse and Python it supports text completion – so if you typed in MQ.INQ… it should give you a list of options to pick from.

These tools build on top of the excellent pymqi package which provide the MQAPI for Python programs. You use pymqi to put and get messages, then use the mqtools package to process the data.

How to get hold of it…

The README has instructions on how to download it. If there is enough interest I’ll package it up so PIP INSTALL can find it.

Ive got nothing in my events queue!

I was looking to test out my Python code and was checking the PCF messages produces – or in my case was not being produced. I did some playing around.

There is a DIS QMGR EVENT command (not eventS) this gave me

QMNAME(QMA) AUTHOREV(DISABLED)
CHLEV(DISABLED) CMDEV(DISABLED)
CONFIGEV(ENABLED) INHIBTEV(DISABLED)
LOCALEV(DISABLED) LOGGEREV(DISABLED)
PERFMEV(ENABLED) REMOTEEV(DISABLED)
SSLEV(DISABLED) STRSTPEV(ENABLED)

So the reason I was not getting many events is they mostly turned off.

I used runmqsc and exploited the command-complete facility.
I typed alter qmgr chl and pressed tab for it to complete it ALTER QMGR CHLEV(ENABLED). I then enter AUTHO and pressed tab and so on.
Then, when the command was completed, I pressed enter – only for it to complain about LOGGEREV was only valid when using linear logging.

The whole command was

ALTER QMGR CHLEV(ENABLED) AUTHOREV(ENABLED) CONFIGEV(ENABLED) INHIBTEV(ENABLED) LOCALEV(ENABLED) PERFMEV(ENABLED) REMOTEEV(ENABLED) SSLEV(ENABLED) STRSTPEV(ENABLED)

What do change events look like?

I was using my Python programs for processing MQ PCF data, and was testing them out with different sorts of MQ data. I found the documentation for change events in the IBM knowledge centre is a bit sparse, so here is some information on it.

I did

define QL(DELETEME)
alter ql(DELETEME) descr(‘change comment’)
delete QL(DELETEME)

For define and delete there is one event message created. For alter there are two messages created, one contains before data the other has the after data.

The MD looks like

"MQMD":{
     "StrucId":"MD  ",
     "Version":1,
     "Report":"NONE",
     "MsgType":"DATAGRAM",
     "Expiry":-1,
     "Feedback":"NONE",
     "Encoding":546,
     "CodedCharSetId":1208,
     "Format":"MQEVENT ",
     "Priority":0,
     "Persistence":"NOT_PERSISTENT",
     "MsgId":"0x414d5120514d4120202020202020202020…",
     "CorrelId":"0x414d5120514d4120202020202020202020…",
     "BackoutCount":0,
     "ReplyToQ":"  ",
     "ReplyToQMgr":"QMA",
     "UserIdentifier":"            ",
     "AccountingToken":"0x0000…",
     "ApplIdentityData":"  ",
     "PutApplType":"QMGR",
     "PutApplName":"QMA   ",
     "PutDate":"20190217",
     "PutTime":"11141373",
     "ApplOriginData":"    ",
     "GroupId":"0x000000...",
     "MsgSeqNumber":1,
     "Offset":0,
     "MsgFlags":0,
     "OriginalLength":-1
   },

When a change event occurs the MD for the “after” message is effectively the same but note the MsgID is different, and the CorrelId is the same, so you will need to clear the MsgId and keep the CorrelID when getting the second message.

For the create command the PCF header was

"Header":{
     "Type":"EVENT",
     "StrucLength":36,
     "Version":2,
     "Command":"CONFIG_EVENT",
     "MsgSeqNumber":1,
     "Control":"LAST",
     "CompCode":0,
     "Reason":2367,  "CONFIG_CREATE_OBJECT"
     "ParameterCount":58,
     },

For the delete command the PCF header was

"Header":{
     "Type":"EVENT",
     "StrucLength":36,
     "Version":2,
     "Command":"CONFIG_EVENT",
     "MsgSeqNumber":1,
     "Control":"LAST",
     "CompCode":0,
     "Reason":2369, "CONFIG_DELETE_OBJECT
     "ParameterCount":58

}

For the alter/change command the PCF headers were

"Header":{
     "Type":"EVENT",
     "StrucLength":36,
     "Version":2,
     "Command":"CONFIG_EVENT",
     "MsgSeqNumber":1,
     "Control":"NOT_LAST",
     "CompCode":0,
     "Reason":2368,
     "ParameterCount":58    
   },

and

"Header":{
     "Type":"EVENT",
     "StrucLength":36,
     "Version":2,
     "Command":"CONFIG_EVENT",
     "MsgSeqNumber":2,
     "Control":"LAST",
     "CompCode":0,
     "Reason":2368, "CONFIG_CHANGE_OBJECT"
     "ParameterCount":58
   },

The differences are the MsgSeqNumber value and Control being NOT_LAST or LAST.

The data is common to all requests. You need to compare the fields in both the records for the change event, to see which are different. With the data in Python dict – this was a trivial exercise.

"Data":{

    "EVENT_USER_ID":"colinpaice",

    "EVENT_ORIGIN":"CONSOLE",

    "EVENT_Q_MGR":"QMA",

    "OBJECT_TYPE":"Q",

    "Q_NAME":"DELETEME",

    "Q_DESC":"",

    …

    "Q_TYPE":"LOCAL"

  }

after the change

"Data":{
     "EVENT_USER_ID":"colinpaice",
     "EVENT_ORIGIN":"CONSOLE",
     "EVENT_Q_MGR":"QMA",
     "OBJECT_TYPE":"Q",
     "Q_NAME":"DELETEME",
     "Q_DESC":"change comment",
     …
     "ALTERATION_DATE":"2019-02-17",
     "ALTERATION_TIME":"11.14.33",
     …
     "Q_TYPE":"LOCAL"
   }

When I printed out the changed data, the first time ALTERATION_DATA and ALTERATION_TIME were both displayed. The second time, only ALTERATION_TIME was displayed. I thought this was a bug in my program. After checking my program, it was obvious… If I change the queue 10 minutes later – the ALTERATION_DATE does not change – so remember to report both of these, and not just the change values.

Baby Python scripts doing powerful work with MQ

I found PyMqi is an interface from Python to MQ. This is really powerful, and I’m am extending it to be even more amazing!

In this blog post, I give examples of what you can do.

Issue PCF commands and get responses back in words rather than internal codes ( so CHANNEL_NAME instead of 3501)
Saving the output of DISPLAY commands into files
Using these files to compare definitions and highlight differences.
Check these files conform to corporate standards.
Print out from the command event queue, and the stats event queue etc

I can use some python code to display information via PCF

# connect to MQ
qmgr = pymqi.connect( queue_manager,”QMACLIENT”,”127.0.0.1(1414)”)
# I want to inquire on all SYSTEM.* channels
prefix = b”SYSTEM.*”
# This PCF request
args = {pymqi.CMQCFC.MQCACH_CHANNEL_NAME: prefix}
pcf = pymqi.PCFExecute(qmgr)
# go execute it
response = pcf.MQCMD_INQUIRE_CHANNEL(args)

This is pretty impressive as a C program would take over 1000 lines to do the same!

This comes back with data like

3501: b’SYSTEM.AUTO.RECEIVER’,
1511: 3,
2027: b’2018-08-16 ‘,
2028: b’13.32.15′,
1502: 50

which is cryptic even for experts because you need to know 3501 is the value of the type of data for “CHANNEL_NAME”.

I have some python code which converts this to..

‘CHANNEL_NAME’: ‘SYSTEM.AUTO.RECEIVER’,
‘CHANNEL_TYPE’: ‘RECEIVER’,
‘ALTERATION_DATE’: ‘2018-08-16’,
‘ALTERATION_TIME’: ‘13.32.15’
‘BATCH_SIZE’: 50
…

for which you only need a kinder garden level of MQ knowledge to understand it. It converts 3501 to CHANNEL_NAME, and 3 into RECEIVER

With a few lines of python I can write this data out so each queue is a file on disk in YAML format.

A yaml file for a queue looks like

Q_NAME: TEMP
Q_TYPE: LOCAL
ACCOUNTING_Q: Q_MGR
ALTERATION_DATE: ‘2019-02-03’
ALTERATION_TIME: 18.15.52
BACKOUT_REQ_Q_NAME: ”

Now it gets exciting! (really)

Now it is in YAML, I can write small Python scripts to do clever things. For example

Compare queue definitions

from ruamel.yaml import YAML
import sys
yaml=YAML()
q1 = sys.argv[1] # get the first queue name
ignore = [“ALTERATION_DATE”,”ALTERATION_TIME”,
“CREATION_DATE”,”CREATION_TIME”]
in1 = open(q1, ‘r’) # open the first queue
data1 = yaml.load(in1) # and read the contents in
for i in range(2,len(sys.argv)): # for all of the passed in filenames
q2=sys.argv[i] # get the name of the file
in2 = open(q2, ‘r’) # open the file
data2 = yaml.load(in2) # read it in
for e in data1: # for each parameter in file 1
x1 = data1[e] # get the value from file 1
x2 = data2[e] # get the value from the other file
if not e in ignore: # some parameters we want to ignore
if x1 != x2: # if the parameters are different
print(q1,q2,”:”,e,x1,”/”,x2) # print out the queuenames, keywork and values

From this it prints out the differences

queues/CP0000.yml queues/CP0001.yml : Q_NAME CP0000 / CP0001
queues/CP0000.yml queues/CP0001.yml : OPEN_INPUT_COUNT 1 / 0
queues/CP0000.yml queues/CP0001.yml : MONITORING_Q Q_MGR / HIGH
queues/CP0000.yml queues/CP0001.yml : OPEN_OUTPUT_COUNT 1 / 0
queues/CP0000.yml queues/CP0002.yml : Q_NAME CP0000 / CP0002
queues/CP0000.yml queues/CP0002.yml : OPEN_INPUT_COUNT 1 / 0
queues/CP0000.yml queues/CP0002.yml : OPEN_OUTPUT_COUNT 1 / 0

I thought pretty impressive for 20 lines of code.

and another script -for checking standards

from ruamel.yaml import YAML
import sys
yaml=YAML()
q1 = sys.argv[1] # get the queue name
# define the variables to check
lessthan = {“MAX_Q_DEPTH”:100}
ne = {“INHIBIT_PUT”:”PUT_ALLOWED”,”INHIBIT_GET”: “GET_ALLOWED”}
in1 = open(q1, ‘r’) # open the first queue
data = yaml.load(in1) # and read the contents in
# for each element in the LessThan dictionary (MAX_QDEPTH), check with the
# data read from the file.
# if the data in the file is “lessthan” the value (100)
# print print out the name of the queue and the values
for i in lessthan: # just MAX_Q_DEPTH in this case
if data1[i] < lessthant[i] : print(q1,i,data[i],”Field in error. It should be less than <“,lessthan[i])
# if the values are not equal
for i in ne: # INHIBUT_PUT and #INHIBIT_GET
if data[i] != ne[i] : print(q1,i,data[i],”field is not equal to “,lt[i])

the output is

queues/CP0000.yml
MAX_Q_DEPTH 5000 Field in error. It should be < 100

Display command events

difference Q_NAME CP0000 CP0000 ALTERATION_DATE 2019-02-07 2019-02-11

difference Q_NAME CP0000 CP0000 ALTERATION_TIME 20.48.24 21.29.23

difference Q_NAME CP0000 CP0000 MAX_Q_DEPTH 4000 2000

With my journey so far – Python seems to be a clear winner in providing the infrastructure for managing queue managers.

The lows (and occasional high) of managing MQ centrally.

While I was still at IBM, and since I retired from IBM I have been curious how people managed MQ in their enterprise systems.

How do you deploy a change to a queue to 1000 queue managers, safely, accurately, by an authorised person, and by the way one queue manager was down when you tried to make the change?
Are theses identical systems identical – or has someone gone in and made an emergency change on one system and left one parameter different?
We have all of these naming standards – do we follow them? Did we specify encryption on all external channels?

At the bottom of this blog (more like a long essay) I show some very short Python scripts which

compare queue definitions and show you the differences between them.
check when queue attributes do not meet “corporate standards”
printing of data from the change-events queue, so you can see what people altered.
I also have scripts which display PCF data from events, stats etc. I need to clean them up, then I’ll publish them.

I think Python scripting will make systems management so much easier.

Strategic tools do not seem to deliver.

There seem to be many “strategic tools” to help you. These include Chef, Puppet, Ansible, and Salt which are meant to help you deploy to your enterprise

There is a lot of comparison documents on the web – some observations in no particular order

Chef and Puppet have an agent on each machine and seem complex to initially set up
Ansible does not use agents – it uses SSH command to access each machine
Some tools expect deployers to understand and configure in Ruby (so moving the complexity from knowing MQ to Ruby), others use YAML – a simple format.

This seems to be a reasonable comparison.

Stepping back from using these tools I did some work to investigate how I would build a deployment system from standard tools. I have not done it yet, but I thought I would document the journey so far.

Some systems management requirements

What I expect to be able to do in an enterprise MQ environment.

I have a team of MQ administrators. All have read only access to all queue managers. Some can only update test, some can update test and production.
I want to be able to easily add and remove people from role based groups, and not wait a month for someone to press a button to give them authority.
I want to save a copy of the object before, and after a change – for audit trail and Disaster Recovery.
The process needs to handle the case when a change does not work because, the queue manager is down, or the object is in use.
I want to be able to deploy a new MQ server – and have all of the objects created according to a template for that application.
I want to check enforce standards eg names, and values (do you really need a max queue depth of 999 999 999, and why is curdepth 999 999?).
I want to be able to process the event data and stats data produced by MQ and put them in SPLUNK or other tool.
There are MQ object within the queue manager, and other objects such as CCDT tables for clients, and keystores TLS keys. I need to get these to wherever they are used.
I want to report statistics on MQ in my enterprise tool – so I need to get the statistics data from each machine to the central reporting tool
I want Test to look like Production (and use the same processes) so we avoid the problem of not testing what was deployed.

Areas I looked at

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

This may be fine for a test environment, but once deployed I still want to be able to change object attributes on a subset of the queue managers. I don’t think Docker solves this problem (it is hard to tell from the documentation).
I could not see how to set up the Client Channel Definition Tables (CCDT) so my client applications can connect to the appropriate production queue manager.
If I define my queues using clustering, when I add a new queue manager, the objects will be added to the repository cache. When I remove a queue manager from the cluster, and delete the container, the object live on in the cache for many days. This does not feel clean.
I wondered if this was the right environment (virtualised) for running my production performance critical workload on. I could not easily find any reports on this.
How do I manage licenses for these machines and make sure we have enough licenses, and we are not illegally using the same licence for all machines.

Using RUNMQSC

At first running runmqsc locally seemed to be answer to many of the problems.

I could use secure FTP to get my configuration files down to the machine, logon to the machine, pipe the file into runmqsc, capture the output and ftp the file back to my central machine.

Having all the MQ administrators with a userid on each machine can be done using LDAP groups etc. so that works ok
To use a userid and password you specify runmqsc -u user_id qm. This then prompts for the password. If you are pipe your commands in, then you have to put your password as the first line of the piped input. This is not good practice, and I could not see a way of doing it without putting the password in the file in front of the definitions. (Perhaps a Linux guru can tell me)

Having to ftp the files to and from the machine was not very elegant, so I tried using runmqsc as a client (the -c option). At first this seemed to work, then I tried making it secure, and use an SSL channel. I could only get this to work when it used a channel with the same name as the queue manager name. (So to use queue manager QMB I needed an SSL channel called QMB. The documentation says you cannot use MQSERVER environment variable to set up an SSL channel). On my queue manager QMB channel was already in use. I redefined my channel and got this to work.

As you may expect, I fell over the CHLAUTH rules, but with help from some conference charts written by Morag, I got the CHLAUTH rules defined, so that I could allow people with the correct certificate to use the channel. I could then give the channel a userid with the correct authority for change or read access.

I had a little ponder on this, and thought that a more secure way would be to use SSL AND have a userid and password. If someone copied my keystore they would still need the password to connect to MQ, and so I use two factor authentication.

This is an OK is solution, but it does not go far enough. It is tough trying to parse the output from runmqsc (which is why PCF was invented).

Using mqscx

Someone told me about mqscx from mqgem software. If runmqsc is the kinder garden version , mqscx is the adult one. It does so much more and is well worth a look.

See here for a video and here for the web site.

How does it do against my list of requirements?

I can enter userid and password on the command line and also pipe a file in ✔
One column output ( if required) so I can write the output to a file, and it is then very easy to parse ✔
I can use ssl channels ✔
I can use my Client Channel Definition Table (CCDT) ✔

It also has colour, multi column, better use of screen area ( you can display much more on your screen) and its own scripting language.

You can get a try before you buy license.

I moved onto Python and runmqsc…

so I could try to do useful things with it.

Using runmqsc under python does not work very well.

I have found Python is a very good tool for systems management – see below for what I have done with it.

I tried using Python “subprocess” so I could write data down the stdin pipe, into runmqsc and capture the output from the data writen to stdout. This did not work. I think the runmqsc output is written to stdout, but not flushed, so the program waiting for the data does not get it, and you get a deadlock.
I tried using Python “pexpect”, but this did not work as I could send one command to stdin, but then stdin was closed, and I could not send more data.
Another challenge was parsing the output of runmqsc. After a couple of hours I managed to create a regular expression which parsed most of the output, but there were a few edge cases which needed more work, and I gave up on this.
PCF on its own is difficult to use.
I came across PyMqi – MQ for Python. This was great, I could issue PCF commands, and get responses back – and I can process event queues and statistics queues!

From this I think using PyMqi is great! My next blog post will describe some of the amazing things you can do in Python with only a few lines of code!

How to become a wizard?

This question came up in conversation recently, I was not entirely sure of the context, was it

Im old ( well, retired from full time work), have grey hair (not much hair) and a grey beard
Ive worked with MQ since it started
I know quite a lot about quite a lot (but the opposite is true, there is so much I do not know about so many things in MQ)
I can make things disappear – that pint of beer – see after half an hour it has gone!
Im not afraid of going into strange places ( such as going from z/OS into linux)

If you want to be a wizard, here are some thoughts on how to get there.

GUIs are good in some situations

For example

One-of requests
for low skilled people
people with lots of time

My approach is

first time – use the GUI to understand the process
second time – use the GUI to understand the input
third time – automate it – perhaps set up a shell script with the majority of the parameters already filled in

Be brave – go and fight dragons

An easy task is to find the SSL CIPHER specs being used in a queue manager. You use runmqsc and issue dis chl(*) where(SSLCIPH,NE,”) and use your pen and paper to write down what is being used. Easy – but slow.

The dragon task – is to do this for 100 queue managers, and you have half an hour to do it! How does a dragon hunter do this on Linux?

echo “dis chl(*) sslciph” |runmqsc -c QMA | tee -a QMA.FILE

echo “dis chl(*) sslciph” is the command to run
| passes this to runmqsc
the -c in runmqsc means use a client to go to the remote box
QMA is the queue manager name (and the channel name to get there)
| tee passes the output to the terminal and put the output in a file called QMA.FILE

The output from this is a file QMA.FILE on your local machine with the output of the command in it. Put the echo…. command in a file, and repeat it for every queue manager, and run the file

The second bit of magic is the command

grep CIPH Q*.FILE |sort -k2,2 |uniq -c -f1

grep CIPH Q*.FILE this looks for the string CIPH in the files *.FILE and displays the file name and the line of data. For example

QMA.FILE:   SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256) 
QMA.FILE:   SSLCIPH( )
QMB.FILE:   SSLCIPH( )                           
QMB.FILE:   SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256)

|sort -k2,2 says sort on the second field to the second field eg SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256)
|uniq -c -f1 display the count of unique values – skipping the first field (skipping the file name)
the output is

20 QMA.FILE:SSL_CIPHER_SPEC:
4 QMA.FILE:SSL_CIPHER_SPEC: TLS_RSA_WITH_AES_128_CBC_SHA256
1 QMB.FILE:SSL_CIPHER_SPEC: TLS_RSA_WITH_AES_256_GCM_SHA384

So there is the list of cipher spec being used and the count of them – easy !
To finish killing the dragon find which queue managers are using the GCM spec
grep TLS_RSA_WITH_AES_256_GCM_SHA384 *.FILE to show which files have that cipher spec.

If you want to become a person with good technical skills, these are the sorts of skills you need to develop

learn the command line interface, and learn to automate
explore different areas, such as shell short cuts, grep, awk, uniq
if the command do no damage – do not be afraid of trying something.

Good luck!

How do I change SSLCIPH on a channel?

Regular readers of my blog know that most of the topics I write on appear simple, but have hidden depth, this topic is no exception.

The simple answer is

For the client ALTER CHL(xxxx) CHLTYPE(CLNTCONN) SSLCIPH(new value)
For the svrconn
- ALTER CHL(xxxx) CHLTYPE(SVRCONN) SSLCIPH(new value)
- REFRESH SECURITY

The complexity occurs when you have many clients trying to use to the channel, and you cannot change them all at the same time (imagine trying to change 1000 of them – when half of them are not under your control). For the clients that have not changed, you will get message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘…’.

in the /qmgrs/xxxx/errors/AMQERR01.LOG

For this problem the CCDT is your friend. See my blog post here.

I have a client channel CHANNEL(C1) CHLTYPE(CLNTCONN)

On my CCDT queue manager I created another channel the same as the one I want to update.

DEF CHANNEL(C2) CHLTYPE(CLNTCONN) LIKE(C1)

On my server queue manager I used

DEF CHANNEL(C2) CHLTYPE(SVRCONN) LIKE(C1)
DEFINE CHLAUTH(C2) TYPE(BLOCKUSER)
USERLIST(….)
REFRESH SECURITY

When I ran my sample connect program, it connected using C1 as before.

On the MQ Server, I changed the SSLCIPH to the new value for C1.

When I ran my sample connect program it connected using channel(C2). In the AMQERR01.LOG I had the message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘C1′

So the changed channel did not connect, but the second channel with the old cipher spec worked succesfully. (The use of the backup channel was transparent to the application)

I then changed DEF CHANNEL(C1) CHLTYPE(CLNTCONN) so SSLCIPH had the correct, matching value. When my sample program was run, it connected using channel C1 as expected.

Once I have changed all my channels, and get no errors in the error log.

I can change the CHLAUTH(C2) BLOCKUSER(*) and either set warning, or give no warning and no access
Remove C2 from the CCDT queue manager, so applications no longer get this in their CCDT
Finally delete the channel C2 on the server.
Go down the pub to celebrate a successful upgrade!

Should I do in-place or side by side migration of MQ mid-range?

With mid-range MQ there are a couple of migration options:

Upgrade the queue manager in place – if there are problems, restore from backup, and sort out the problems this restore may cause. You may want to do this is you have just the one queue manager.
Upgrade the queue manager in place – if there are problems, leave it down until any problems can be resolved. This assumes that you are a good enterprise user and have other queue managers available to process the work.
Create another queue manager, “next to it” (“side by side”) on the same operating system image. A better description might be “adding a new queue manager to our environment on an existing box, and deleting an old one at a later date” rather than “side by side migration”. You may already have a document to do this.

What do you need to do for in-place migration.

Backup your queue manager see a discussion here
Shut down the queue manager, letting all work end cleanly
Either (see here)
- Delete the previous version of MQ, and install the new version, or better..
- Use Multi-install – so you have old and new versions available at the same time
Switch to the new version (of the multi-install)
Restart the queue manager
Let work flow
Make note of any changes you make to the configuration – for example alter qlocal… in case you need to restore from a backout, and re-apply the changes.

If you need to backout the migration and restore from the backout

You need to

Make sure there are no threads in doubt
Make sure all transmission queues are empty (so you do not overwrite messages when you restore from a backup)
Make sure all transmission queues are empty ( so you do not overwrite messages when you restore from a backup)
Offload messages from application queues – if you are lucky there will be no messages. Do not offload messages from the system queues.
Shut down MQ
Reset the MQ installation to the older version
Restore from your backup see here
Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
Load messages back onto the application queues
Reapply any changes, such as alter QL…

In the situation where you have a problem, personally I think it would be easier to leave the queue manager down, rather than trying to restore it from a backup. You may want to offload any application messages first. Of course this is much easier if you have configured multiple queue managers, and leaving one queue manager shut down should not cause problems. Until any problems are fixed you cannot proceed with migrating other queue managers, and you may have the risk of lower availability because there is one server less.

What you need to do for side by side migration.

“Side by side” migration requires a new queue manager to be created, and work moved to the new queue manager

If this is a cluster repository, you need to move it to another queue manager if only temporarily (otherwise you will get a new repository)
You need a new queue manager name
You need a new port number
Create the queue manager
You may want to alter qmgr SCHINIT (MANUAL) during the configuration so that you do not get client applications trying to connect to your new queue manager before you are ready.
You need to backup all application object definitions, chlauths etc and reapply them to the new queue manager. Do not copy and restore the channels
Apply these application objects to the new queue manager
List the channels on the old system
Create new channels – for example cluster receiver, with CONNNAME will need the updated port, and a new name
You should be able to reuse any sender channels unchanged
If you are using CCDT
- Define new client SVRCONN names (as a CCDT needs unique channel names)
- On the the queue manager which creates the CCDT, create new Client CLNTCONN channels. The queue manager needs unique names
- Send the updated CCDT to applications which use this queue managers, so they can use the new queue manager. Note: From IBM MQ Version 9.0, the CCDT can be hosted in a central location that is accessible through a URI, removing the need to individually update the CCDT for each deployed client. See here
- If you are using clustered queues, then cluster queues will be propagated automatically to the repository and to interested queue managers
- If you are not using clustering, you will need to create sender/receiver channels, and create the same on the queue managers they attach to
Update automation to take notice of the queue managers
Change monitoring to include this queue manager
Change your backup procedures to back up the new queue manager files
Change your configuration and deployment tools, so changes to the old queue manager are copied to the new queue manager as well.
Configure all applications that use bindings mode, to add the new queue manager to the options. Restart these applications so they pick up the new configuration
When you are ready use START CHINIT
Alter the original queue manager to be qmgr SCHINIT (MANUAL), so when you restart the queue manager it does not start the chinit, and so channels will not workload.
- Note there is a strmqm -ns option. The doc says… This prevents any of the following processes from starting automatically when the queue manager starts:
- The channel initiator
- The command server
- Listeners
- Services
- This parameter also runs the queue manager as if the CONNAUTH attribute is blank, regardless of its current value. This allows unauthenticated access to the queue manager for locally bound applications; client applications cannot connect because there are no listeners. Administrative changes must be made by using runmqsc because the command server is not running.
- But you may not want to run unauthenticated.
Stop the original queue manager, after a short time, all applications should disconnect, and reconnect to the new queue manager.
Shut down the old queue manager, and restart it. With SCHINIT (MANUAL) it should get no channels running. Stop any listeners. If you have problems you can issue START CHINIT and START LSTR. After a day shut down the queue manager and leave it down – in case of emergency you can just restart it.
After you have run successfully for a period you can delete the old queue manager.
Remove it from any clusters before deleting it. The cluster repository will remember the queue manager and queues for a long period, then eventually delete them.
Make the latest version of MQ the primary installation, and delete the old version
Update the documentation
Update your procedures – eg configuration automation

As I said at the beginning – an in-place migration looks much easier to do.

Should I use backup and restore of my mid-range queue manager?

In several places the MQ Knowledge centre mentions backing up your queue manager, for example if case of problems when migrating.

I could not find an emojo showing a worried wizard, so let me explain my concerns so you can make an informed decision about using it.

Firstly some obvious statements

You take a backup so you can restore it at a later date to the same state
When you do a restore in-place you overwrite what was there before
The result of a restore should be the same as when you did the backup

See really obvious – but you need to think through the consequences of these.

Creating duplicate messages

Imaging there is a message on the queue saying “transfer 1 million pounds to Colin Paice”. This gets backed up. The messages gets processed, and I am rich!

You restore from the backup – and this message reappears – so unless the applications are smart and can detect a duplicate message – I will get even richer!

Losing messages

An application queues was empty when it was backed up. A message is put to the queue “Colin Paice pays you 1 million pounds”. Before this message gets processed the system is restored – resetting the queue to when it was backed up – so the message disappears and you do not get your money.

Status information gets out of step

The queue manager holds information in queues. For example each channel has information about the sequence number of the message flow. If this gets out of sync, then you have to manually resync them.

If you restore the SYSTEM.CHANNEL.SYNCQ from last week – it will have the values from last week. If you restore this data, the channels will fail to start because the sequence numbers do not match, and you need to use the reset channel command.

If you really want to do backup and restore…

Before you back up..

If this is a full repository, “just” move it to another queue manager.
Stop receiver channels, so work stops flowing into the queue manager
Set all application input queues to put disabled, to stop applications from putting to the queues.
Let the applications drain all application queues (and send the replies back)
Make sure all queues, such as Dead Letter Queue, and Event Queues have been processed and the queues are empty.
Make sure all transmission queues are empty, including SYSTEM.CLUSTER.TRANSMIT.QUEUE and any Split Cluster Transmit queues.
Shut down the queue manager, letting all work end cleanly.
Backup the queue manager files
Make a record of every configuration change you make, such as alter qlocal.

If you need to restore.. you need to empty the queue manager before you overwrite it.

Make sure there are no threads in doubt.
Make sure all transmission queues are empty
Have applications process all of the application messages, or offload the messages
Shut down MQ
Restore from your backup.
Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
You may need to refresh cluster, so that you get the latest definitions sent the machine, and the information on local objects is propagated back to the full repository.
If you offloaded application messages, restore them
Reapply any changes, such as alter QL…
You need to be careful about applications connecting to your queue manager it is ready to do work. You might want to use strtmqm -ns to start in restricted mode.

It is dangerous if you restore the queue manager in a different place

You need to be careful if you restore a queue manager to a different place.

If you restore it, and start it, then channels are likely to start, and messages flow. For example it will contact the full repository, and send information about the objects in the newly restored queue manager. The full repository will get confused as you have two queue managers with the same name, and same unique name sending information. It is difficult to resolve this once it has occurred. People have been known to do this when testing their disaster recovery procedures – and thus causing a major problem!

One more point…

As wpkf pointed out.

You can start the QM without starting the channel. strmqm -ns prevents any of the following processes from starting automatically when the queue manager starts: the channel initiator, the command server, listeners, and services. All connection authentication configuration is suppressed.