How do I know which queue manager to connect to ?

Question: How difficult can it be to decide which queue manager to connect to?

Answer: For the easy, it is easy, for the hard it is hard.
I would not be surprised to find that in many applications the MQCONN(x) are not coded properly!


That is a typical question and answer from me – but let me go into more detail so you understand what I am talking about.
If you have only one queue manager then it is easy to know which queue manager to connect to – it is the one and only queue manager.
If you have more than one – it gets more complex. If your application has just committed a funds transfer request and the connection is broken

  • you may just decide to connect to any available queue manager, and ignore a possibly partial funds transfer request
  • or you might wait for a period trying to connect to the same queue manager, and then give up, and connect to another, and later worry about any orphaned message on the queue.

You now see why the easy scenario is easy, and for the hard one, you need to do some hard thinking and some programming to get the optimum response.

There is an additional complexity that when you connect to the same instance – it may be a highly available queue manager, and it may have restarted somewhere else. For the purposes of this blog post I’ll ignore this, and treat it as the same logical queue manager.

I had lots of help from Morag who helped me understand this topic, and gave me the sample code.

You have only one queue manager.

This is easy, you issue an MQCONN for the queue manager. If the connect is not successful, the program waits for a while and then retries. See – I said it was easy.

You have more than one queue manager, and getting a reply back is not important.

For example, you are using non persistent messages.

Your application can decide which queue manager it tries to connect to, or you can exploit queue manager groups in the CCDT.

On queue manager QMA you can define a client channels for it and also for queue manager QMB

DEF CHL(QMA) CHLTYPE(CLNTCONN) QMNAME(GROUPX) 
CONNAME(LINUXA) CLNTWGHT(50)…
DEF CHL(QMB) CHLTYPE(CLNTCONN) QMNAME(GROUPX)
CONNAME(LINUXB) CLNTWGHT(50)…

DEF CHL(QMA) CHLTYPE(SVRCONN) ...

On Unix these are automatically put into the /var/mqm/qmgrs/../@ipcc/AMQCLCHL.TAB file. This is a binary file, and can be FTPed to the client machines that need it.

You can use the environment variables MQCHLLIB to specify the directory where the table is located, and MQCHLTAB to specify the file name of the table (it defaults to AMQCLCHL). See here for more information.

FTP the files in binary to your client machine, for example into ~/mq/.
I did
export MQCHLLIB=/home/colinpaice/mq
export MQCHLTAB=AMQCLCHL.TAB

I then used the command
SET |grep MQ
to make sure those variables are set, and did not have MQSERVER set.

Sample MQCONN code (from Morag)….

MQLONG  CompCode, Reason;
MQHCONN hConn = MQHC_UNUSABLE_HCONN;
char * QMName = "*GROUPX";
MQCONN(QMName,
&hConn,
&CompCode,
&Reason);
// and MQ will pick one of the two entries in the CCDT.

The application connected with queue manager name *GROUPX . Under the covers the MQ code found the channel connections with QMNAME of GROUPX and picked one to use. The “*” says do not check the name of the queue manager when you actually do the connect. If you omit the “*” you will get return code MQRC_Q_MGR_NAME_ERROR 2058 (080A in hex) because “GROUPX” did not match the queue manager name of “QMA” or “QMB”. I stopped QMA, and reconnected the application, and it connected to QMB as expected.

Common user error:When I tried connecting with queue manager name QMA, this failed with MQRC_Q_MGR_NAME_ERROR because there were no channel definitions with QMNAME value QMA. This was obvious once I had taken a trace, looked at the trace, and had a cup of tea and a biscuit, and remembering I had fallen over this before. So this may be the first thing to check if you get this return code.

Using channels defined with the same QMNAME, if your connection breaks, you reconnect with the same queue manager name “*GROUPX” and you connect to a queue manager if there is one available. You can specify extra options to bias which one gets selected. See CLNTWGHT and AFFINITY. See the bottom of this blog entry.

You can use MQINQ to get back the name of the queue manager you are actually connected to (so you can put it in your error messages).

//   Open the queue manager object to find out its name 
od.ObjectType = MQOT_Q_MGR; // open the queue manager object
MQOPEN(Hcon, // connection handle
  &od, // object descriptor for queue
  MQOO_INQUIRE + // open it for inquire          
  MQOO_FAIL_IF_QUIESCING, // but not if MQM stopping      
  &Hobj, // returned object handle
  &OpenCode, // MQOPEN completion code
  &Reason); // reason code
// report reason, if any
if (Reason != MQRC_NONE)
{
printf("MQOPEN of qm object rc %d\n", Reason);
.....
}
// Now do the actual INQ
Selector = MQCA_Q_MGR_NAME;
MQINQ(Hcon, // connection handle
  Hobj, // object handle for q manager
  1, // inquire only one selector
&Selector, // the selector to inquire
0, // no integer attributes are needed
NULL, // so no integer buffer
  MQ_Q_MGR_NAME_LENGTH, // inquiring a q manager name
ActiveQMName, // the buffer for the name
&CompCode, // MQINQ completion code
&Reason); // reason code

printf("Queue manager in use %s\n",ActiveQMName);

You have more than one queue manager, and getting a reply back >is< important

Your application should have some logic to handle the case when your queue manager is running normally, there is a problem in the back end, and so you do not get your reply message within the expected time. Typical logic for when the MQGET times out is:

  • Produce an event saying “response not received”, to alert automation that there may be a problem somewhere in the back end
  • Produce an event saying “there is a piece of work that needs special processing – to manually redo or undo – update number…..”.
    • At a later time a program can get the orphaned message and resolve it.
    • You do not want an end user getting a message “The status of the funds transfer request to Colin Paice is … unknown” because the reply message is sitting unprocessed on the queue.
    • Note: putting a message to a queue may not be possible as the application may not be connected to a queue manager.

When deciding to connect to any available queue manager, or connect to a specific queue manager, there are two key options in mqcno.Options field:

  • MQCNO_CD_FOR_OUTPUT_ONLY. This means, do not use any data in the passed in MQCD <as the field description says – use it for output only>, but pick a valid and available channel from the CCDT, and return the details.
  • MQCNO_USE_CD_SELECTION. This means, use the information in the MQCD to connect to the queue manager

Sample code (from Morag) showing MQCONNX

MQLONG  CompCode, Reason;
MQHCONN hConn = MQHC_UNUSABLE_HCONN;
MQCNO cno = {MQCNO_DEFAULT};
MQCD cd = {MQCD_CLIENT_CONN_DEFAULT};
char * QMName = "*GROUPX";
cno.Version = MQCNO_VERSION_2;
cno.ClientConnPtr = &cd;
// Main connection - choose freely from the CCDT
cno.Options = MQCNO_CD_FOR_OUTPUT_ONLY;
MQCONNX(QMName,
&cno,
&hConn,
&CompCode,
&Reason);
: :
// Oops, I really need to go back to the same connection to continue.

MQDISC(...); // without this you get queue manager name error

// Using same MQCNO as earlier, it already has MQCD pointer set.

cno.Options = MQCNO_USE_CD_SELECTION;
MQCONNX(QMName,
&cno,
&hConn,
&CompCode,
&Reason);

Let me dig into a typical scenario to show the complexity

  • set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
  • MQCONN to any queue manager
  • MQPUT1 of a persistent message within syncpoint
  • set mqcno.Options = MQCNO_USE_CD_SELECTION, as you now want the application to connect to the same queue manager if there is a problem
  • MQCMIT. After this you want to connect to the specific queue manager you were using
  • MQGET with WAIT
  • MQCMIT
  • set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY. Because the application is not in a business unit of work it can connect to any queue manager.

The tricky bit is in the MQGET with WAIT. If your queue manager needs to be restarted you need to know how long this is likely to take. It may be 5 seconds, it may be 1 minute depending on the amount of work that needs to be recovered. (So make sure you know what this time is.)

Let’s say it typically takes 5 seconds between failure of the queue manager the application is connected to, and restart complete. You need some logic like

mqget with wait..
problem....
failure_time = time_now()
waitfor = 5 seconds
mqcno.Options = MQCNO_USE_CD_SELECTION
loop:
MQCONN to specific queue manager
If this worked goto MQCONN_worked_OK
try_time = time_now()
If try_time - failure_time > waitfor + 1 second goto problem;
sleep 1 second
go to loop:
MQCONN_worked_OK:
MQOPEN the reply to queue
Reissue the MQGET with wait

problem:
report problem to automation
report special_processing_needed .... msgid...
mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
Go to start of program and connect to any queue manager

If you thought the discussion above was complex, it gets worse!

I had a long think about where to put the set mqcno.Options = MQCNO_USE_CD_SELECTION.  My first thoughts were to put it after the first MQCMIT, but this may be wrong.

With the logic

MQCONN
MQPUT1
MQCMIT

If the MQCMIT fails, it could have failed going to the the queue manager so the commit request did not actually get to the queue manager, and the work was rolled back, or the commit could have worked, but the response did not get to your application.

The application should reconnect to the same queue manager, issue the MQGET WAIT. If the message arrives then the commit worked, if the MQGET times out, treat this as the MQGET WAIT timed out case (see above), and produce alerts. This is why I decided to put the set mqcno.Options = MQCNO_USE_CD_SELECTION before the commit. You could just as easily had logic which checks the return code of the MQCMIT and then set it.

A bit more detail on what is going on.

You can treat the MQCD as a black box object which you do not change, nor need to look into. I found it useful to see inside it. (So I could report problems with the channel name etc). The example below shows the fields displayed as an problem is introduced.

Before MQCONNX
set MQCNO_CD_FOR_OUTPUT_ONLY
pMQCD->ChannelName is '' - this is ignored
pMQCD->QMgrName is '' - this is ignored
QMName is '*GROUPX'. -this is needed
==MQCONNX had return code 0 0
pMQCD->ChannelName is 'QMBCLIENT'.
pMQCD->QMgrName is '';
QMName is '*GROUPX'.
MQINQ QMGR queue manager name gave QMB Sleep
during the sleep endmqm -i QMB and strmqm QMB
After sleep
MQOPEN of queue manager object ended with reason code MQ_CONNECTION_BROKEN = 2009.
Issue MQDISC, this ended with reason code 2009

set MQCNO_USE_CD_SELECTION
pMQCD->ChannelName is 'QMBCLIENT' - this is needed
pMQCD->QMgrName is ''
QMName is '*GROUPX'.
MQCONNX return code 0 0
MQINQ queue manager name is QMB

Anything else on clients?

It is good practice to periodically have the clients disconnect and reconnect to do work load balancing. For example

You have two queue managers QMA and QMB. On Monday morning between 0800 and 1000 QMA is shut down for essential maintenance. All the clients connect to QMB. QMA is restarted at 1000 – but does no work, because all the clients are all connected to QMB. If your clients disconnect and reconnect then over time some will connect to QMA.

It is a good idea to have a spread of times before they disconnect, so if 100 clients connected at 0900, they disconnect and reconnect between 10pm and 3am to avoid all 100 disconnecting and reconnecting at the same time.

To get the spread of connections to the various queue managers, you need to use CLNTWGHT with a non zero value. If you omit CLNTWGHT, or specify a value of 0, then the channel chosen is the first alphabetically, in my case they would all go to QMA, and not to QMB.

I feel there is enough material on this for another blog post.