Ive always wanted a sample MQ server, and a buggy C program

I will be educating some MQ administrators about programming MQ.

For this I needed a simple server, so they could put a message to a server and get a reply. Unfortunately MQ does not provide such a useful little program. The MQ samples have a program to put messages, get messages, and a complex scenario involving triggering, but not a nice simple server.

Ive created one in my MQTools github.

I have also put up a program source which has MQ programming errors, such as trying to put to queue which did not have MQOO_OUTPUT option, I got your message, but where is mine? The aim is that the non programmers have to change one line of code to fix it. If anyone has any suggestions for other common problems, please let me know and I’ll see if I can incorporate them.

How do I print the reason string for a reason code in my C program?

Easy:

#include <cmqstrc.h> 
MQCONN(argv[1], &hConn, &mqcc,&mqrc);
printf ("MQ conn to %s cc %i rc %i %s\n",argv[1], mqcc, mqrc,MQRC_STR(mqrc));

cmqstrc.h has code

char *MQRC_STR (MQLONG v) 
{
char *c;
switch (v)
{
case 0: c = "MQRC_NONE"; break;
case 2001: c = "MQRC_ALIAS_BASE_Q_TYPE_ERROR"; break;
case 2002: c = "MQRC_ALREADY_CONNECTED"; break;
case 2003: c = "MQRC_BACKED_OUT"; break;

...

You could write an mqstrerror, or mqerror function and include it at linkage time instead of compile time.

Which terminal am I in?

I use Ubuntu as my main work station, and found I was getting confused with so many terminals windows open. I found the following helped me manage it.

From a terminal Ctrl-Alt-T gives you a new tab with a new terminal.

I can quickly move between them using Ctrl-PgDn and Ctrl-Pg-Up – just like in a browser.

I can rearrange them using Ctrl-Alt-PgDN and Ctrl-Alt-PgUp, so my commonly used tabs are adjacent.

I can colour the terminal sessions. Select a terminal, right click, preferences. You can now create more profiles, such as “Blue” or “QMPROD” and select the colour of the font and background. I can then, from my terminal, right click, profiles, and select “Blue”. As long as I remember what I use each colour for – it is a great help.

Big disasters from little problems grow

I was talking to someone about how major accidents occur, and found it very interesting. I told a friend about it, and he said that he had found the same when he was writing critical software for an aircraft control system.

How major accidents occur.

The first guy told me about an incident which started off with the smallest problem.

  • On a ship, the light at the top of the stairs stopped working, and it was not reported (even though many people saw the light was not working)
  • Someone carrying some oil, spilled some at the top of the steps, but because the light was not working, failed to spot this.
  • The ship was coming into harbour
  • Someone else came along to go down the ladder, slipped on the oil, and bumped all the way down the ladder(10 steps) on his coccyx (the bone at the bottom of his spine) and cracked it – ouch!
  • There were not many crew on the boat, and every one rushed to help, including a first aider who should have been helping the ship to dock .
  • The ship crunched into the harbour wall. No major damage done to the ship – but an embarrassing front page photo in the local newspaper.

From a small incident, this led to a chain of incidents of increasing severity.

Critical software for an airplane control systems.

My software developer friend said he was working on a software which controlled an airplane. They tests were 99% ish successful, but there was just one test which consistently failed (in the simulator). The symptoms were that the software would sometimes freeze for about 1 second and then recover. A lot can happen in a 1 second.

They tracked it down to one line of code. In a nested set of ‘if’ statements, an ‘else’ statement was attached to the wrong ‘if’ statement due to bad indentation! This meant a field was not initialized and had garbage in it. This in turn caused a loop to be iterated over 2 million times, instead of twice.

Once they had found the problem, they then used static analysis tools (like lint) and found they had lots of “little problems”. Who would have thought the number of spaces in a line would cause a problem. Fortunately there were no real “disasters” from these little problems, and they fixed all of the little problems.

I thought it interesting how the same sort of process problems occur in totally different fields, and how important it is to fix these small niggles.

This reminds me of the time when adding a comment to a program caused it to fail to compile. Adding one more line made the file bigger and not fit in memory, so an intermediate file was used. There was a bug in this code. This was a case of “I just added a comment” actually did cause problems.

I attended a talk by one of the US astronauts who flew on the space shuttle. he said they made a point of regularly visiting the software development teams so the teams got to know the people whose lives depended on their code. If there was a problem in the software these nice people (who brought coffee and doughnuts) might die. This tended to focus the minds of the developers.

Using the monitoring data provided via publish in MQ midrange.

In V9, MQ provided monitoring data, available in a publish/subscribe programming model. This solved the problem of the MQ Statistics and Accounting information being written to a queue, and only one consumer could use the data.

You can get information on the MQ CPU usage, log data written, as well as MQ API statistics.

A sample is provided (amqsruaa) to subscribe to and print the data, but this is limited and not suitable for an enterprise environment. See /opt/mqm/samp/ amqsruaa.c for the source program and bin/amqsrua bin/amqsruac for the executables, bindings mode and client mode.

I tried to use this new method in my mini enterprise, and found it very hard to use, and I think some of the data is of questionable value.

Overall, I found

  1. The documentation missing or incomplete
  2. The architecture is poor, it is hard to use in a typical customer environment
  3. The implementation is poor, it does not follow PCF standards and has the same id for different data types.
  4. Some of the data provided is not explained, and some data is not that useful.

I’ve written several pages on the Monitoring data in MQ midrange, I was going to blog it all – but I did not think there would be a big audience for it.

“Make not working” due to order of link statements

I had a simple make file for an MQ program but it did not work, and I could not find any hints on how to get it to work.

cparms = -Wno-write-strings
clibs = -I. -I../inc -I’/usr/include’ -I’/opt/mqm/inc’
lparms = -L /opt/mqm/lib64 -Wl,-rpath=/opt/mqm/lib64 -Wl,-rpath=/usr/lib64 -lmqm
% : %.c
gcc -m64 $(cparms) $(clibs) $(lparms) $< -o $@

make mqcmd gave me

... undefined reference to‘MQCONN’

... undefined reference to‘MQOPEN’
... undefined reference to‘MQPUT’

... undefined reference toMQCLOSE’
... undefined reference to‘MQDISC
collect2: error: ld returned 1 exit status
makefile:5: recipe for target ‘mqcmd’ failed

I moved the -lmqm to the end of the line

cparms = -Wno-write-strings
clibs = -I. -I../inc -I’/usr/include’ -I’/opt/mqm/inc’
lparms = -L /opt/mqm/lib64 -Wl,-rpath=/opt/mqm/lib64 -Wl,-rpath=/usr/lib64 -lmqm
% : %.c
gcc -m64 $(cparms) $(clibs) $< -o $@ $(lparms)


And it worked! I later found an entry in a blog post saying the -l... directives are supposed to go after the objects that reference those symbols.

The IBM knowledge center is not very helpful. Under Building 64 bit applications, it has definitions for

  • C client application, 64-bit, non-threaded
  • C server application, 64-bit, non-threaded

My problem is that I am writing a program which is a client as in client – server, running in bindings mode, which does a request reply to a server.

I think where the documentation says “C server” it means “C bindings mode”.

Im not getting workload balancing with MQ ! Of course not.

I had a question “We have have an intelligent workload balancer in front of our two queue managers. Sometimes most of the work goes to queue manager A, sometimes to queue manager B, sometimes it is balanced. What can we do so we get workload balancing?” The tough love answer is that MQ does not do workload balancing.

Clients

An intelligent router can route requests to a server depending on how busy the server is. This is good for requests that can run anywhere for example requests_1 can execute over here, and request_2 from the same user, can execute over there because no state information is held on the server.

With MQ, the “request” is the MQCONN, and this can be routed to a server depending on how busy a server is. All other MQ requests have to go to the same server as the MQCONN executed on. The router does not get involved in these other MQ requests.

If at start of day, Server A was doing no work, and Server B was busy, then the MQCONNs will be routed to Server A. Half an hour later the applications are putting messages to queues – on Server A – even though this server is now overloaded and Server B is idle. It stays connected to Server A until the application disconnects (perhaps a week later)

What can you do? To get around this, you can have the clients disconnect if they are do no work for a time – perhaps 15 minutes. Or if they are active, disconnect and reconnect – perhaps once an hour to a couple of times a day.

There are limits to how many connections a system can support. There are limits in the operating system, and limits with MQ. Having clients disconnect when they have been idle for a time, frees up resources and keeps you away from these limits.

Clustering.

You may say “We have workload balancing with clustering – we use the CLWLWGHT channel attribute”. This is workload routing not workload balancing. You cannot influence which system the message gets sent to depending on how busy the remote server is (and so balance the work). You can do “two for QMA, one for QMB, two for QMA, one for QMB etc”, even though Server A is overloaded.

This is why MQ does not do workload balancing!

My cluster channel definition worked – why?

This question came up during some education on clustering.

The problem arose because she had typed in the wrong name for the channel, yet clustering worked – she could send a message to a remote clustered queue. Most people who have created sender/receiver channel pairs have experienced the frustration of the channel not starting because the name was not quite the same – Ohs and zeros, different punctuation marks etc. So it was a surprise that when she made a mistake it worked!

Let me take a simple example. We have three queue managers in a cluster:

  • QM_REPOS is the full repository. It has a CLUSRCVR Channel called TO_REPOS, and a conname of REPOS(1414).
  • A queue manager called QMA. This has
    • a cluster sender channel called TO_REPOS, and a conname of REPOS(1414) which matches QM_REPOS
    • a cluster receiver channel called TO_QMA, conname(SYSA(1414))
  • A queue manager QMB. This has
    • a cluster queue called TEST
    • a cluster sender channel called TO_REPOS, and a conname of REPOS(1414) which matches QM_REPOS
    • a cluster receiver channel called BAD_SPELLING conname(SYSB(1414))

When QMB joined the cluster it sent information about the queue manager to the cluster repository. The repository then stores the data in the form of a logical table

QNAME=TEST, use_channel_name = BAD_SPELLING conname(SYSB(1414))

When QMA wants to use the TEST queue, it finds it does not have the information, and so asks the repository, which sends down all the data for QNAME=TEST. QMA then caches this data locally. QMA then picks an entry (from the list of one entry), and dynamically creates a cluster sender channel called BAD_SPELLING with conname(SYSB(1414)). Because BAD_SPELLING is the same as what QMB sent to the repository it matches – and so works! Pretty amazing eh!

How can I tell what Ive got?

In this excellent article by David Ware of IBM, it talks about cluster channel. Ive copied some of it below.

Using the DISPLAY CLUSQMGR command. Each CLUSQMGR entry represents how the local queue manager sees the other queue managers in each of the clusters it is a member of. You get a separate entry per cluster irrespective of whether namelists have been used on the cluster channels or not. The entry contains two particularly useful attributes for this discussion, QMTYPE and DEFTYPE.

QMTYPE simply shows if the other queue manager is a full repository (‘REPOS’) for the cluster or a partial repository (‘NORMAL’).

DEFTYPE shows you how the relationship between the queue managers has been established, based on what cluster channels have been defined. DEFTYPE has a number of rather cryptic values, CLUSSDR, CLUSSDRA, CLUSSDRB and CLUSRCVR. I’ll summarize them here:

DEFTYPE values

CLUSRCVR: This is the entry for the local queue manager in each cluster it has a cluster receiver channel defined.

Any CLUSSDR* value means this entry represents a remote queue manager in a cluster. The different values however help you understand how the local queue manager came to know about it:

CLUSSDRA: This is a remote cluster queue manager that the local queue manager has no manually defined cluster sender channel for it, but has been told about it by someone else, either by the remote queue manager itself (typically because this queue manager is a full repository) or because a full repository has told this queue manager about it as it needs to communicate with it for some reason.

CLUSSDRB: This means the local queue manager has a manually defined cluster sender channel which has been used to establish contact with the target queue manager and that queue manager has accepted it from the point of view of the cluster. The target could be a full or a partial repository, although as I’ve already said you really only want it to be a full repository at the other end.

CLUSSDR: This means the local queue manager has manually defined a cluster sender channel to the remote queue manager but the initial cluster handshake between them has not yet completed. This may be because the channel has never started, perhaps because the target is not running or the configuration is incorrect. It could also mean the channel has successfully started but the target queue manager did not like the cluster information provided, for example a cluster name was set in the cluster sender channel definition that does not match the target’s cluster membership. Once the handshake has been performed, the DEFTYPE should change to CLUSSDRB, so in a healthy system CLUSSDR should only be a transitory state.


Many MQ admin only do half a job

I think I can do many jobs very quickly – such as put the rubbish out. My wife says I am terrible at doing jobs because I am not a completer finisher. The task is “put the bag of rubbish out, and put a new bag in the bin’. I typically only do the first part and go and do something else.

As an MQ administrator it is easy to only do half a task. For example, define a sender channel. “It is so easy, I could do it in my sleep” – or so you think. There are the following steps

  1. Type in the define channel command
  2. Display the channel and check it to make sure it is as you expected. For example sometimes data has to be in quotes MCA(‘mqm’). MCA(mqm) without quotes is converted to upper case MQM which is a different userid.
  3. You need to define a transmission queue. Many people stop here.
  4. You need to start the channel and make sure it starts successfully, and you haven’t specified the character O rather than the number 0 and other typing problems.
  5. Set up the triggering rule, so a message on the XMITQ causes the channel to start.
  6. Cause a message to be put to the xmit queue, and test to make sure the channel is triggered.
  7. Set up events such as queue high, so if there are messages over a certain age, or the queue depth is over 100 then product an alert for automation.
  8. Update the monitor tools to include this channel.
  9. Update your reports, so you collect and report on message/hour or day on this channel.
  10. Update your documentation.

I expect most MQ Admins would get 2 out of 10 – could do better!

This is why check lists are very useful.