How do I put a queue manager in and out of maintenance mode when using client reconnect?

You want to do some maintenance on one of my queue managers, and want to stop work coming in to the queue manager, and restart work when the maintenance has finished – without causing operational problems.

Applications using reconnection support, can reconnect to an available queue manager. To stop an application connecting to a particular queue manager you need to stop the channel(s). STOP CHL(…) STATUS(STOPPED). An application using the channel will get notified, or reconnected. An application trying to connect, will fail, and go somewhere else.

If you have two channels, one for the web server clients, and a second channel for the server application on the queue manager, I dont think it matters which one you stop first.

  1. If you stop the client program, then the message will go to the server application, be processed and put on the reply queue. The client will not get the reply, as it has been switched.
  2. If you stop the server applications first, then the messages will accumulate on the server queue, until the server applications reconnect to the queue manager and process the queue.

In either case you can have orphaned messages on the reply to queue. You need a process to resolve these, or for non persistent message set a message expiry time.

Once you have done your maintenance work, use START CHL(…) for the server channel, wait for a server to connect to the queue manager and then use START CHL(…) for the client channel. It may take minutes for a server application to connect to the queue manager.

Do it in this order as you want the server to be running before client applications put to the server queue, otherwise you will have to handle time out situations from the application.

Some secrets of shared conversations and other dark corners of MQ

I was looking into how to balance the number of server threads processing messages, and discovered I knew nothing about shared conversations and related topics. Of course I could draw them on a white board and wave my hands around, but I could not actually describe how they work.

Firstly some things I expect every one knows (except for me).

  1. You can define a shared connection handle. This can be used in different threads – but only serially. See Shared(thread independent) connections with MQCONNX.
  2. A thread can only connect to MQ once using a non shared connection, otherwise you get MQRC_ALREADY_CONNECTED: “A thread can have no more than one nonshared handle.”
  3. A non shared connection cannot be shared between threads. I got MQRC_HCONN_ERROR: “The handle is a nonshared handle that is being used a thread that did not create the handle”.

Multi threaded program

I set up a program which did

do I = 1 to number of threads;
pthread_create – use subroutine
end

The subroutine did

MQCONNX
MQCB (set up MQCB to get queue manager change events such as reconnect)
MQOPEN…

Each thread needed its own MQCONN, and MQCB to capture queue manager events such as disconnect request, and reconnected events.

DIS CHSTATUS shows conversations spread across channels

My CLNTCONN channel was defined with SHARECNV(10). I started my program and specified 15 threads. DIS CHS(COLIN) gave me two channel instances:

 AMQ8417I: Display Channel Status details.
CHANNEL(COLIN) CHLTYPE(SVRCONN)
CONNAME(127.0.0.1) CURRENT
STATUS(RUNNING) SUBSTATE(RECEIVE)
CURSHCNV(5)

AMQ8417I: Display Channel Status details.
CHANNEL(COLIN) CHLTYPE(SVRCONN)
CONNAME(127.0.0.1) CURRENT
STATUS(RUNNING) SUBSTATE(RECEIVE)
CURSHCNV(10)

One channel instance had CURurrent SHared CoNVersations (CURSHCNV) of 5, the other had 10. 5+10 = 15 was the number of threads I had running in my program. With 25 threads, I had three channels active and a total of 25 CURSHCNV.

When running my program the value to DIS QMSTATUS CONNS increased by 25, the number of threads I had running.

Morag wrote a post onMaxChannels vs DIS QMSTATUS CONNS.

Things that didn’t work

I tried to issue one MQCONN, and share the connection within the threads – this did not work as it gave me MQRC_HCONN_ERROR: a non shared thread cannot be shared between threads.

This error MQRC_HCONN_ERROR: a non shared thread cannot be shared between threads is not entirely true.

I use an MQCB to get notified about queue manager events. You specify MQCB and pass the hConn. In my MQCB routine, I could issue MQINQ using the same hConn. So I did have the same hConn being used by different threads – but one of these is a special thread.

I tried to use Async Consume, where you use MQCB to specify a message handler program to process the message when a message arrives. You do MQCONN, and then the hConn is used by the asynchronous process. The hConn cannot be used by other MQ API requests or a second Async Get. In my main program I tried to issue 15 MQCONN, and use one hConn for each Async get. I got MQRC_ALREADY_CONNECTED “A thread can have no more than one nonshared handle.”

I solved this by the same technique as above

do I = 1 to number of threads; 
pthread_create – use subroutine
end
subroutine: use Async Consume.
MQCONN
MQCB for queue manager events
MQCB for Async Consume

I had an email exchange with Morag (thank you) who said

You can have one MQCONN and 15 async getters if you want, if you use the shared handle connection option. (cno.Options … + MQCNO_HANDLE_SHARE_BLOCK)

Only one Async Callback function (and thus one message and application logic) can be processed at a time. One connection equals one channel (or conversation over a channel if you are sharing them – i.e. SHARECNV > 1).
Equally you have have 15 MQCONNs and association each MQCB with a different hConn.
It all depends what sort of concurrency you want in your application. Do you want parallel processing because your workload is heavy, or do you just want to monitor and process 15 different, lightly used queues in the simplest way possible?
If an hConn is currently in use by one callback call, another will not be invoked until the first callback completes.

So if you have an Async consumer for queue1, and an Async consumer for queue2, and a message arrives on each queue, it will work as follows

  • Async code for queue1 is invoked with the message, it does a database update, and an MQPUT1 to the reply-to queue. This application returns.
  • only after the previous code has returned, can the Async code for queue2 be invoked; which does a database update, and an MQPUT1 to the reply-to queue, and returns.

It is not worth having more than one Async consumer per queue, as you will not get parallel processing. You will get

  • Wait until previous consumer to finish, do Async consumer 1 for queue … return;
  • Wait until previous consumer to finish, do Async consumer 2 for same queue … return;

You might just as well have one Async consumer per queue.

As Morag said It all depends what sort of concurrency you want in your application. Do you want parallel processing because your workload is heavy, or do you just want to monitor and process 15 different, lightly used queues in the simplest way possible?

With one application and 15 Async consumers set up, DIS CHS(..) gave me CURSHCNV(1).

What does SHARECNV on a svrconn channel do?

On QMA, I changed SHARECNV(10) to SHARECNV(0). When QMA was the only queue manager running, I got

rc 2012 (07dc) MQRC_ENVIRONMENT_ERROR.

The reason is An MQ client application that has been configured to use automatic reconnection attempted to connect using a channel defined with SHARECNV(0).

When I had both QMA and QMC running, there was a couple of second delay during which the threads connected to QMA, got back MQRC_ENVIRONMENT_ERROR, tried to connect to QMC – and succeeded. There were no error messages in /var/mqm/errors/AMQERR01.LOG to tell me there was a problem in QMA.

On QMA, I changed SHARECNV(10) to SHARECNV(1). When QMA was the only queue manager running, I got 10 channel instances of COLIN running, each with CURSHCNV(1), as expected.

I changed the svrconn channel and specified SHARECNV(30), and used 30 threads. I got 3 channel instances each with 10 connections. This was a surprise to me.

This page says If the CLNTCONN SHARECNV value does not match the SVRCONN SHARECNV value, the lower of the two values is used.

I was using the ccdt in json and added the sharingConversations to the ccdt.

"connectionManagement":
{
"sharingConversations": 30,
},

"name": "COLIN",
"clientConnection":...

When I restarted my application and specified 30 threads, I had one channel started with DIS CHS… giving CURSHCNV(30).

The Knowledge Centre says Use SHARECNV(1). Use this setting whenever possible. It eliminates contention to use the receiving thread, and your client applications can take advantage of new features. So although you can make SHARECNV large, a value of 10 or 1 may best. It is a balance between having more connections which use more resources, and the impact of sharing a channel on channel throughput.

Uniform Clusters and shared conversations.

I started up 8 threads, and had one channel with 8 conversations on it. I had an MQCB to report when the conversation balancing occurred: that is when a conversation got disconnected and reconnected.

At start up all conversations connected to QMA. Over time, some conversations moved to QMC.

Eventually, I had

  • one channel instance to QMA with CURSHCNV(4) and
  • one channel instance to QMC with CURSHCNV(4)

So even with shared conversations you get balancing across channels.

How to start more servers on midrange

I came upon this question when looking into the new Uniform Clustering support in V9.1.2.

5 years ago, a common pattern was to have a machine, containing a front end web server, MQ, and back end servers (in bindings mode), processing the requests, going to a remote database. For this to do more work, you increase the number of servers, and perhaps add more CPUs to the machine.

These days you have MQ in its own (virtual) machine, and the front end web server in its own (virtual) machine connected to MQ over a client interface, with the server application in its own (virtual) machine connected to MQ over a client interface, and going to a remote database.

To scale this, you add more MQ machines, or more servers machines. In my view this solves some administration problems, but introduces more problems – but this is not today’s discussion.

Given this modern configuration, how do you start enough servers to manage the workload?

Consider the scenario where you have MACHINEMQ with the queue manager on it, MACHINEA and MACHINEB with the server applications on it.

Having “smarts in the application”

  1. You want enough servers running, but not too many. (Too many can flood the downstream processes, for example cause contention in a database. Using MQ as a throttle can sometimes improve overall throughput).
  2. If a server thread is not doing any work, then shut it down
  3. If there is a backlog then start more instances of the server threads.

In the server application you might have logic like

MQINQ curdepth, ipprocs.

If( curdepth > X & number of processes and number of processes with queue open for input(ipprocs) < Y then

{

do_something.

}

If get_wait timed out and IPPROCS > 2 then return and free up the session.

For CICS on z/OS, it was easy; do_something was “EXEC CICS START TRAN…”

When running on Unix the “do_something” is a bit harder.

My first thoughts were…

It is not easy to create new processes to run more work.

  1. You can use spawn to do this – not very easy or elegant.
  2. I next thought the application instances could create a trigger message and so a trigger monitor could run and start more processes. This means
    1. Unless you are really clever, the trigger monitor starts a process on its local machine. So running a trigger monitor on MACHINEA, would create more processes on MACHINEA.
    2. This means you need a trigger monitor on MACHINEA and MACHINEB.
    3. If you put a trigger message, the message may always go to MACHINEA, always go to MACHINEB, or go to either. This may not help if one machine is overloaded and gets all of the trigger messages.
  3. I thought you could have one process and lots of threads. I played with this, and found out enough to write another blog post. It was difficult to increase the number of threads dynamically. I found it easiest to pass in a value for the number of threads to the application, and not try to dynamically change the number of threads.
  4. The best “do_something” was to produce an event or alert and have automation start the applications. Automation should have access to other information, so you can have rules such as “Pick MACHINEA or MACHINEB which has the lowest CPU usage over the last 5 minutes – and start the application there”

And to make it more complex.

Today’s scenario is to have multiple queue manager machines, for availability and scalability, so now you have to worry about which queue manager you need to connect to, as well as processing the messages on the queue,
MQ 9.1.2 introduced Uniform Clustering which balances the number of client channel connections across queue manager servers, and can, under the covers, tell an application to connect to a different queue manager.

This should make the balancing simpler. Assuming the queue managers are doing equal amounts of work, you should get workload balancing.

Notes on setting up your server.

You need to be careful to define you CCDT with CLNTWGHT. If CLNTWGHT is 0, then the first available queue manager in the list is used, so all your connects would go to that queue manager. By making all CLNTWGHT > 0, you can bias which queue manager gets selected.

Thanks to Morag for her help in developing this article.

The display connection command doesn’t, and stop connection doesn’t.

Or to put it another way, it is hard top stop a resilient client connection using reconnect.

I was playing around with MQ reconnection,  to see how resilient it was, and wanted to stop one instance of my server program.

I tried to use the display connections command to identify one of my server programs so I could stop it. I failed to do either of these. If I am missing something – please let me know.

I used

echo “dis conn(*) where(appltag,eq,’serverCB’) all” |runmqsc QMA|less

and got several entries like the one below – the only significant difference was the TID value.

AMQ8276I: Display Connection details.
CONN(21029D5C02AA7121) 
EXTCONN(414D5143514D41202020202020202020)
TYPE(CONN) 
PID(10907) TID(36) 
APPLDESC(IBM MQ Channel) APPLTAG(serverCB)
APPLTYPE(USER) ASTATE(STARTED)
CHANNEL(COLIN) CLIENTID( )
CONNAME(127.0.0.1) 
CONNOPTS(MQCNO_HANDLE_SHARE_BLOCK,MQCNO_SHARED_BINDING,
         MQCNO_RECONNECT)
USERID(colinpaice) UOWLOG( )
UOWSTDA(2019-03-29) UOWSTTI(08.10.58)
UOWLOGDA( ) UOWLOGTI( )
URTYPE(QMGR) 
EXTURID(XA_FORMATID[] XA_GTRID[] XA_BQUAL[])
QMURID(0.12295) UOWSTATE(ACTIVE)

The Knowledge Center says

PID: Number specifying the process identifier of the application that is connected to the queue manager.

TID: Number specifying the thread identifier within the application process that has opened the specified queue.

I used the ps-ef command to show the process with pid of 10907 it gave

mqm 10907 10876 0 Mar28 ? 00:00:00 /opt/mqm/bin/amqrmppa -m QMA

amqrmppa is for Process Pooling. This acts as a proxy for your client program.  This is clearly not my application.

One instance of amqrmppa can handle many connections. It creates threads to run the work on behalf of the client application. The TID is the thread number within the process.

If you have lots of clients you can have multiple instances of amqrmppa running.

So the following may be better definitions:

PID: Number specifying the process identifier of the application that is connected to the queue manager. For local bindings this is the process id. For clients this is the process id of the proxy service amqrmppa.

TID: Number specifying the thread identifier within the application process that has opened the specified queue. For clients this is the thread identified within an amqrmppa instance.

On my Ubuntu system, my bindings mode application had TID(1).

Even with this additional information, I was unable tie up the MQ connections with my program instances. I had 3 instances of serverCB running on the same machine.

If you had multiple machines, they would have a different conname,  so you can identify which connection is for which machine – but you cannot tell which connection within a machine

I want to stop an instance.

There is a good technote How to identify MQ client connections and stop them.

This says to stop a connection use the MQ commandSTOP  CONN(21029D5C02AA7121).   My first problem is I dont know which connection is the one I want to stop.

But if this is a client using reconnect, the program will reconnect!

The technote suggests using the MQ command  STOP CHL(…) status(inactive).

The applications using this channel (all of them) will get MQRC_CONNECTION_QUIESCING. If they stop, and are restarted, they will be able to reconnect to this queue manager (or any other available queue manager).    This may be OK.  (But you may not want to stop all of the applications using this channel definition)

If you use STOP CHL(…) status(stopped). The applications using this channel will get MQRC_CONNECTION_QUIESCING . If they stop, and are restarted, they will not be able to reconnect to this queue manager until the channel is started again.   But they can connect to other queue managers which are active and the channel is available.

These clients with reconnection are certainly resilient!

If I am missing something important – please tell me!

 

 

 

Where are the man pages for MQ?

I installed the man pages for MQ 9.1.1, but they were not installed in the standard place. man runmqsc said No manual entry.

I had to use

man -M /opt/mqm/man/ runmqsc


The commands you can specify are

 addmqinf   altauth    altchl    altcomm     altlis
altnl altpro altqa altql altqm
altqmgr altqr altserv altsub alttopic
amqmfsck clearql clearstr crtmqcvx crtmqenv
crtmqinst crtmqm defauth defchl defcomm
deflis defnl defpro defqa defql
defqm defqr defserv defsub deftopic
delauth delcomm deletchl deletelis deleteqa
deleteql deleteqm deleteqr deleteserv deletpro
delnl delrec delsub deltopic dischauth
dischl dischs disclqm discomm disconn
disenauth dislis dislisst disnl dispbsub
dispro disq disqmgr disqmsta disqstat
dissbsta disserv dissub dissvstat distopic
distpstat dltmqinst dltmqm dmpmqaut dmpmqlog
dspauth dspmq dspmqaut dspmqcsv dspmqfls
dspmqinf dspmqinst dspmqrte dspmqtrc dspmqtrn
dspmqver dsprec dspserv endmqcsv endmqlsr
endmqm endmqtrc mqrc pingchl pingqmgr
purgechl rcdmqimg rcrmqobj refclus refqmgr
refsecy resetchl resolchl restclus restqmgr
resuqmgr rmvmqinf rsvmqtrn runmqchi runmqchl
runmqdlq runmqlsr runmqsc runmqtmc runmqtrm
setchaut setlog setmqaut setmqenv setmqinst
setmqm setmqprd setrec stachi stachl
stalsr startserv stopchl stopcon stoplsr
stopserv strmqcfg strmqcsv strmqm strmqtrc
suspqmgr

The verbs are

 mqback   mqbegin  mqbufmh  mqcb     mqcbfunc  mqclose
mqcmit mqconn mqconnx mqcrtmh mqctl mqdisc
mqdltmh mqdltmp mqget mqinq mqinqmp mqmhbuf
mqopen mqput1 mqput mqset mqsetmp mqstat
mqsub mqsubrq

Ive always wanted a sample MQ server, and a buggy C program

I will be educating some MQ administrators about programming MQ.

For this I needed a simple server, so they could put a message to a server and get a reply. Unfortunately MQ does not provide such a useful little program. The MQ samples have a program to put messages, get messages, and a complex scenario involving triggering, but not a nice simple server.

Ive created one in my MQTools github.

I have also put up a program source which has MQ programming errors, such as trying to put to queue which did not have MQOO_OUTPUT option, I got your message, but where is mine? The aim is that the non programmers have to change one line of code to fix it. If anyone has any suggestions for other common problems, please let me know and I’ll see if I can incorporate them.

How do I print the reason string for a reason code in my C program?

Easy:

#include <cmqstrc.h> 
MQCONN(argv[1], &hConn, &mqcc,&mqrc);
printf ("MQ conn to %s cc %i rc %i %s\n",argv[1], mqcc, mqrc,MQRC_STR(mqrc));

cmqstrc.h has code

char *MQRC_STR (MQLONG v) 
{
char *c;
switch (v)
{
case 0: c = "MQRC_NONE"; break;
case 2001: c = "MQRC_ALIAS_BASE_Q_TYPE_ERROR"; break;
case 2002: c = "MQRC_ALREADY_CONNECTED"; break;
case 2003: c = "MQRC_BACKED_OUT"; break;

...

You could write an mqstrerror, or mqerror function and include it at linkage time instead of compile time.

Using the monitoring data provided via publish in MQ midrange.

In V9, MQ provided monitoring data, available in a publish/subscribe programming model. This solved the problem of the MQ Statistics and Accounting information being written to a queue, and only one consumer could use the data.

You can get information on the MQ CPU usage, log data written, as well as MQ API statistics.

A sample is provided (amqsruaa) to subscribe to and print the data, but this is limited and not suitable for an enterprise environment. See /opt/mqm/samp/ amqsruaa.c for the source program and bin/amqsrua bin/amqsruac for the executables, bindings mode and client mode.

I tried to use this new method in my mini enterprise, and found it very hard to use, and I think some of the data is of questionable value.

Overall, I found

  1. The documentation missing or incomplete
  2. The architecture is poor, it is hard to use in a typical customer environment
  3. The implementation is poor, it does not follow PCF standards and has the same id for different data types.
  4. Some of the data provided is not explained, and some data is not that useful.

I’ve written several pages on the Monitoring data in MQ midrange, I was going to blog it all – but I did not think there would be a big audience for it.

“Make not working” due to order of link statements

I had a simple make file for an MQ program but it did not work, and I could not find any hints on how to get it to work.

cparms = -Wno-write-strings
clibs = -I. -I../inc -I’/usr/include’ -I’/opt/mqm/inc’
lparms = -L /opt/mqm/lib64 -Wl,-rpath=/opt/mqm/lib64 -Wl,-rpath=/usr/lib64 -lmqm
% : %.c
gcc -m64 $(cparms) $(clibs) $(lparms) $< -o $@

make mqcmd gave me

... undefined reference to‘MQCONN’

... undefined reference to‘MQOPEN’
... undefined reference to‘MQPUT’

... undefined reference toMQCLOSE’
... undefined reference to‘MQDISC
collect2: error: ld returned 1 exit status
makefile:5: recipe for target ‘mqcmd’ failed

I moved the -lmqm to the end of the line

cparms = -Wno-write-strings
clibs = -I. -I../inc -I’/usr/include’ -I’/opt/mqm/inc’
lparms = -L /opt/mqm/lib64 -Wl,-rpath=/opt/mqm/lib64 -Wl,-rpath=/usr/lib64 -lmqm
% : %.c
gcc -m64 $(cparms) $(clibs) $< -o $@ $(lparms)


And it worked! I later found an entry in a blog post saying the -l... directives are supposed to go after the objects that reference those symbols.

The IBM knowledge center is not very helpful. Under Building 64 bit applications, it has definitions for

  • C client application, 64-bit, non-threaded
  • C server application, 64-bit, non-threaded

My problem is that I am writing a program which is a client as in client – server, running in bindings mode, which does a request reply to a server.

I think where the documentation says “C server” it means “C bindings mode”.

Im not getting workload balancing with MQ ! Of course not.

I had a question “We have have an intelligent workload balancer in front of our two queue managers. Sometimes most of the work goes to queue manager A, sometimes to queue manager B, sometimes it is balanced. What can we do so we get workload balancing?” The tough love answer is that MQ does not do workload balancing.

Clients

An intelligent router can route requests to a server depending on how busy the server is. This is good for requests that can run anywhere for example requests_1 can execute over here, and request_2 from the same user, can execute over there because no state information is held on the server.

With MQ, the “request” is the MQCONN, and this can be routed to a server depending on how busy a server is. All other MQ requests have to go to the same server as the MQCONN executed on. The router does not get involved in these other MQ requests.

If at start of day, Server A was doing no work, and Server B was busy, then the MQCONNs will be routed to Server A. Half an hour later the applications are putting messages to queues – on Server A – even though this server is now overloaded and Server B is idle. It stays connected to Server A until the application disconnects (perhaps a week later)

What can you do? To get around this, you can have the clients disconnect if they are do no work for a time – perhaps 15 minutes. Or if they are active, disconnect and reconnect – perhaps once an hour to a couple of times a day.

There are limits to how many connections a system can support. There are limits in the operating system, and limits with MQ. Having clients disconnect when they have been idle for a time, frees up resources and keeps you away from these limits.

Clustering.

You may say “We have workload balancing with clustering – we use the CLWLWGHT channel attribute”. This is workload routing not workload balancing. You cannot influence which system the message gets sent to depending on how busy the remote server is (and so balance the work). You can do “two for QMA, one for QMB, two for QMA, one for QMB etc”, even though Server A is overloaded.

This is why MQ does not do workload balancing!