I struggled for a while to understand what the queue manager provided reconnection support for clients gave me. As this has now been extended in 9.1.2 to have Uniform Clusters, I thought I had better spend some time to understand it, and document what I learned.
Overall the MQ reconnection support simplifies an application by making the reconnection after failure transparent to the application. The down side is that you now have to write some subtle code to handle the side effects of the transparent reconnection.
This blog post grew so large, I had to split it up. The topics are
- The ups and downs of MQ Reconnect – the basics
- The ups and downs of MQ Reconnect – what do I need to do to get it to work?
- The ups and downs of MQ Reconnect – the classic MQ scenarios
- The ups and downs of MQ Reconnect – how can I tell if automatic reconnect has happened?
- The ups and downs of MQ Reconnect – little problems
- The ups and downs of MQ Reconnect – frustrating problems.
Basic business problem
Consider the following application scenario.
There is an application in a web server. You use your web browser to connect to the application server. This runs an application which connects to MQ as a client. There is an interaction with MQ and the application sends a response back to the end user.
The client connected application is connected to a queue manager. The queue manager is shut down, we want the application to connect to another queue manager and continue working as quickly as possible.
Reconnecting to another queue manager
Your main queue manager is QMA using port 1414, your alternate queue manager is QMC using port 1416.
In your mqclient.ini you have definition
ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416)
which gives two connections, with the same channel name, but with different IP addresses.
Your application connects to QMA.
Shut down QMA, it fails to connect to QMC, because the queue manager name does not match.
You can fix this by using a blank queue manager name. I am not comfortable with this.
You can also fix this by specifying QMNAME(GROUPX) on your client channel definitions.
Your application has to connect using QMNAME *GROUPX instead of QMA.
You need a CCDT which contains all of the needed channel definitions.
If you are not using reconnection support.
If you shut down QMA. The application gets MQRC_CONNECTION_BROKEN. The application can go to the top of the program and reissue MQCONN, MQOPEN etc. This time the application connects to QMC.
The MQ reconnection support can make this transparent to the application, so you do not need to code this recovery logic yourself.
There is a big section on Automatic Client Reconnection here.
How MQ reconnect works
The reconnection is driven when
- the queue manager abnormally ends
- the endmqm -r is used
- some network problems
The example below shows what happens in an application puts two messages to a queue, and the queue manager is shut down during the transaction.
There are two queue managers QMA and QMC, Each as a remote queue called SERVER_QUEUE. Each has a queue called MYREPLY
Normal behavior
- MQCONN *GROUPX
- MQOPEN SERVER_QUEUE
- MQOPEN MYREPLY
- MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
- MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
- MQGET with wait from MYREPLY
- MQGET with wait from MYREPLY
- MQCLOSE SERVER_QUEUE
- MQCLOSE MYREPLY
- MQDISC
If this had queue manager was shut down after the first put, the second MQPUT call gets MQRC_CONNECTION_BROKEN and the logic starts from the top and connects to a different queue manager. The first MQPUT is reissued. The MQGETs work because the replies are sent to this queue manager – but there is also reply on QMA from the original MQPUT which may need to be handled.
Now with the reconnection scenario
- MQCONN *GROUPX
- MQOPEN SERVER_QUEUE
- MQOPEN MYREPLY
- MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
- Queue manager QMA shut down specifying endmqm -r QMA to tell clients to reconnect.
- Connection to QMA ended
- Connection to QMC started.
- All MQ work is now done on QMC.
- The application does not get any error codes saying a reconnect to a different queue manager has happened.
- MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue. This is put to the queue on QMC.
- MQGET with wait from MYREPLY. This is done on QMC
- MQGET with wait from MYREPLY This is done on QMC
- This gets no message found.
- MQCLOSE SERVER_QUEUE
- MQCLOSE MYREPLY
- MQDISC
- The application gets a return code 2033 ( no message) from the second MQGET
Because the first MQPUT specified a reply-to-queue of MYREPLY at QMA. The reply from the server will be sent there.
The second put specified a reply to queue of MYREPLY at QMC.
The first MQGET on QMC gets the reply to this message.
We now have a message destined for MYREPLY at QMA and we are missing a message on QMC.
The application did not have to worry about the re- connection logic. It has the same sort of problems as the original application about messages being in the wrong place.