The display connection command doesn’t, and stop connection doesn’t.

Or to put it another way, it is hard top stop a resilient client connection using reconnect.

I was playing around with MQ reconnection,  to see how resilient it was, and wanted to stop one instance of my server program.

I tried to use the display connections command to identify one of my server programs so I could stop it. I failed to do either of these. If I am missing something – please let me know.

I used

echo “dis conn(*) where(appltag,eq,’serverCB’) all” |runmqsc QMA|less

and got several entries like the one below – the only significant difference was the TID value.

AMQ8276I: Display Connection details.
CONN(21029D5C02AA7121) 
EXTCONN(414D5143514D41202020202020202020)
TYPE(CONN) 
PID(10907) TID(36) 
APPLDESC(IBM MQ Channel) APPLTAG(serverCB)
APPLTYPE(USER) ASTATE(STARTED)
CHANNEL(COLIN) CLIENTID( )
CONNAME(127.0.0.1) 
CONNOPTS(MQCNO_HANDLE_SHARE_BLOCK,MQCNO_SHARED_BINDING,
         MQCNO_RECONNECT)
USERID(colinpaice) UOWLOG( )
UOWSTDA(2019-03-29) UOWSTTI(08.10.58)
UOWLOGDA( ) UOWLOGTI( )
URTYPE(QMGR) 
EXTURID(XA_FORMATID[] XA_GTRID[] XA_BQUAL[])
QMURID(0.12295) UOWSTATE(ACTIVE)

The Knowledge Center says

PID: Number specifying the process identifier of the application that is connected to the queue manager.

TID: Number specifying the thread identifier within the application process that has opened the specified queue.

I used the ps-ef command to show the process with pid of 10907 it gave

mqm 10907 10876 0 Mar28 ? 00:00:00 /opt/mqm/bin/amqrmppa -m QMA

amqrmppa is for Process Pooling. This acts as a proxy for your client program.  This is clearly not my application.

One instance of amqrmppa can handle many connections. It creates threads to run the work on behalf of the client application. The TID is the thread number within the process.

If you have lots of clients you can have multiple instances of amqrmppa running.

So the following may be better definitions:

PID: Number specifying the process identifier of the application that is connected to the queue manager. For local bindings this is the process id. For clients this is the process id of the proxy service amqrmppa.

TID: Number specifying the thread identifier within the application process that has opened the specified queue. For clients this is the thread identified within an amqrmppa instance.

On my Ubuntu system, my bindings mode application had TID(1).

Even with this additional information, I was unable tie up the MQ connections with my program instances. I had 3 instances of serverCB running on the same machine.

If you had multiple machines, they would have a different conname,  so you can identify which connection is for which machine – but you cannot tell which connection within a machine

I want to stop an instance.

There is a good technote How to identify MQ client connections and stop them.

This says to stop a connection use the MQ commandSTOP  CONN(21029D5C02AA7121).   My first problem is I dont know which connection is the one I want to stop.

But if this is a client using reconnect, the program will reconnect!

The technote suggests using the MQ command  STOP CHL(…) status(inactive).

The applications using this channel (all of them) will get MQRC_CONNECTION_QUIESCING. If they stop, and are restarted, they will be able to reconnect to this queue manager (or any other available queue manager).    This may be OK.  (But you may not want to stop all of the applications using this channel definition)

If you use STOP CHL(…) status(stopped). The applications using this channel will get MQRC_CONNECTION_QUIESCING . If they stop, and are restarted, they will not be able to reconnect to this queue manager until the channel is started again.   But they can connect to other queue managers which are active and the channel is available.

These clients with reconnection are certainly resilient!

If I am missing something important – please tell me!

 

 

 

The ups and downs of MQ Reconnect – the basics

I struggled for a while to understand what the queue manager provided reconnection support for clients gave me. As this has now been extended in 9.1.2 to have Uniform Clusters, I thought I had better spend some time to understand it, and document what I learned.

Overall the MQ reconnection support simplifies an application by making the reconnection after failure transparent to the application. The down side is that you now have to write some subtle code to handle the side effects of the transparent reconnection.

This blog post grew so large, I had to split it up.  The topics are

 

Basic business problem

Consider the following application scenario.

There is an application in a web server. You use your web browser to connect to the application server. This runs an application which connects to MQ as a client. There is an interaction with MQ and the application sends a response back to the end user.

The client connected application is connected to a queue manager. The queue manager is shut down, we want the application to connect to another queue manager and continue working as quickly as possible.

Reconnecting to another queue manager

Your main queue manager is QMA using port 1414, your alternate queue manager is QMC using port 1416.
In your mqclient.ini you have definition

ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416)

which gives two connections, with the same channel name, but with different IP addresses.

Your application connects to QMA.
Shut down QMA, it fails to connect to QMC, because the queue manager name does not match.

You can fix this by using a blank queue manager name. I am not comfortable with this.

You can also fix this by specifying QMNAME(GROUPX) on your client channel definitions.

Your application has to connect using QMNAME *GROUPX instead of QMA.

You need a CCDT which contains all of the needed channel definitions.

If you are not using reconnection support.

If you shut down QMA. The application gets MQRC_CONNECTION_BROKEN. The application can go to the top of the program and reissue MQCONN, MQOPEN etc. This time the application connects to QMC.

The MQ reconnection support can make this transparent to the application, so you do not need to code this recovery logic yourself.

There is a big section on Automatic Client Reconnection here.

How MQ reconnect works

The reconnection is driven when

  • the queue manager abnormally ends
  • the endmqm -r is used
  • some network problems

The example below shows what happens in an application puts two messages to a queue, and the queue manager is shut down during the transaction.

There are two queue managers QMA and QMC, Each as a remote queue called SERVER_QUEUE. Each has a queue called MYREPLY

Normal behavior

  • MQCONN *GROUPX
  • MQOPEN SERVER_QUEUE
  • MQOPEN MYREPLY
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
  • MQGET with wait from MYREPLY
  • MQGET with wait from MYREPLY
  • MQCLOSE SERVER_QUEUE
  • MQCLOSE MYREPLY
  • MQDISC

If this had queue manager was shut down after the first put, the second MQPUT call gets MQRC_CONNECTION_BROKEN and the logic starts from the top and connects to a different queue manager. The first MQPUT is reissued. The MQGETs work because the replies are sent to this queue manager – but there is also reply on QMA from the original MQPUT which may need to be handled.

Now with the reconnection scenario

  • MQCONN *GROUPX
  • MQOPEN SERVER_QUEUE
  • MQOPEN MYREPLY
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
    • Queue manager QMA shut down specifying endmqm -r QMA to tell clients to reconnect.
    • Connection to QMA ended
    • Connection to QMC started.
    • All MQ work is now done on QMC.
    • The application does not get any error codes saying a reconnect to a different queue manager has happened.
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue. This is put to the queue on QMC.
  • MQGET with wait from MYREPLY. This is done on QMC
  • MQGET with wait from MYREPLY This is done on QMC
    • This gets no message found.
  • MQCLOSE SERVER_QUEUE
  • MQCLOSE MYREPLY
  • MQDISC
  • The application gets a return code 2033 ( no message) from the second MQGET

Because the first MQPUT specified a reply-to-queue of MYREPLY at QMA. The reply from the server will be sent there.

The second put specified a reply to queue of MYREPLY at QMC.

The first MQGET on QMC gets the reply to this message.

We now have a message destined for MYREPLY at QMA and we are missing a message on QMC.

The application did not have to worry about the re- connection logic. It has the same sort of problems as the original application about messages being in the wrong place.

The ups and downs of MQ Reconnect – what do I need to do to get it to work?

This is one topic in a series of blog posts.

What you need to do is documented here.

It has, One of:

  • MQCONNX with MQCNO Options set to MQCNO_RECONNECT or MQCNO_RECONNECT_Q_MGR.
  • Defrecon=YES|QMGR in mqclient.ini
  • In JMS set the CLIENTRECONNECTOPTIONS property of the connection factory.

What queues can be used?

The client remembers which queues were used, and as part of the reconnection will open the queues on the applications behalf.

Permanent queues. You have to have the same queues defined on all systems

Dynamic queues. For example an application  used SYSTEM.DEFAULT.MODEL.QUEUE. This created a queue AMQ.5C991A0D23B6310, and used it. When the reconnect occurs, the MQ client code opens SYSTEM.DEFAULT.MODEL.QUEUE and says use the name AMQ.5C991A0D23B6310.

When the application is reconnected, it can continue using the queue.

Note.
If you do MQOPEN SYSTEM.DEFAULT.MODEL.QUEUE and get back AMQ.5C991A0D23B6310, then try to open AMQ.5C991A0D23B6310 for example to use MQINQ and MQSET, or MQOO_INPUT. This will cause problem

The ups and downs of MQ Reconnect – the classic MQ scenarios

This is one topic in a series of blog posts.

Scenario 1 server application with non persistent messages

Consider a server application using a client connection using non persistent messages. It does

  • MQGET out of syncpoint
  • MQPUT out of syncpoint.

If the queue manager is shut down and the application can transparently connect to an available queue manager. The worst situation is that you get from QMA, and put from QMC.

This is not a problem.

Scenario 2 server application with persistent messages

The application does

  • MQGET in syncpoint
  • MQPUT in syncpoint
  • MQCOMMIT

When the queue manager was ended, the application did

  • MQGET in syncpoint
    • endmqm -r QMA, and the application connects to QMC
  • MQPUT in syncpoint
  • MQCOMMIT
    • gave MQRC_BACKED_OUT, and the original message is still available on QMA.

Going round the loop again. This time all of the requests are on QMC.

This is OK.

Scenario 3 – Client application with non persistent messages

The application does

  • MQPUT to server queue with non persistent, out of syncpoint
  • MQGET with wait, out of syncpoint.

If we had

  • MQPUT to server non persistent, out of syncpoint
    • endmqm -r QMA. This switches to QMC
  • MQGET With wait, out of syncpoint.

This is done from queue manager QMC. As the reply-to-queue was REPLY at QMA, then the get will time out.

This is OK.

  1. It is non persistent, so you are expecting that it may be lost
  2. You have set expiry so it will expiry after a short period
  3. Your default behaviour may be to reissue the request, or just to tell the end user “sorry… “

Scenario 4 – Client application with persistent messages

This scenario is similar to the previous scenario – but this time you do not want to lose the message.

  • The application does
  • MQPUT of persistent message, within syncpoint specifying the reply to queue
  • MQCOMMIT
  • MQGET with wait.
  • If message is retrieved
    • MQCOMMIT

This single business transaction has two units of work (MQPUT, COMMIT) and (MQGET, COMMIT).

If we had

  • MQPUT of persistent message, within syncpoint specifying the reply to queue
  • MQCOMMIT
  • MQGET with wait.
    • endmqm -r QMA. This switches to QMC
  • MQGET returns with 2033. No message found.

This is similar to the existing situation when the back end server was slow, and you did not get your reply in time. You should already have a process for when the reply arrives after your application has gone away. Typically you have a process which processes these orphaned messages (eg over 10 minutes old) and you have some logic to fix the problem – such as update a database indicating “special case”.

Tell your end users “possible problem – please check”, or retry the original request and hope it works. The back end application needs to be prepared to handle possible duplicate requests.

In this case you do want the message (because it was persistent).

You have a couple of ways of handling this.

  1. Report this as an exception, use your internal processes cleans up the orphaned messages. Tell the end users “sorry..”
  2. Do not use automatic reconnect – but reconnect to the original queue manager, and wait for a period for the reply. If you cannot connect within x second – pick another queue manager to connect to. Use your existing orphaned message process for clean up. Reconnecting to the original queue manager, may take a few seconds longer, but may reduce the number of orphaned messages to process.

This scenario requires more planning and additional code to implement.

The ups and downs of MQ Reconnect – how can I tell if automatic reconnect has happened.

This is one topic in a series of blog posts.

The MQ reconnection support make any reconnection transparent to the application code.

You can specify an MQ Call Back function, MQCB, and specify a routine which gets control asynchronously when events happen. See sample code amqsphac.c described here.

The specified call back function gets control whenever there is an event. You can then take actions such as print out information, for example

  • switch…
  • case MQRC_RECONNECTED:
    • printf(“%sEVENT : Connection Reconnected\n”,TimeStamp);
    • break;

You could extend this to set a flag in global storage, or write an event message to aid problem determination.

Running an application and issuing the endmqm -r QMA command, the exit produced

14:43:25 : EVENT : Connection Reconnecting (Delay: 1055ms)
14:43:26 : EVENT : Connection Reconnected

Which showed it connected to another queue manager after a short interval.

With only one queue manager active, and issuing endmqm -r QMA

the messages were

14:44:32 : EVENT : Connection Reconnecting (Delay: 1021ms)
14:44:34 : EVENT : Connection Reconnecting (Delay: 2122ms)
14:44:36 : EVENT : Connection Reconnecting (Delay: 4572ms)
14:44:40 : EVENT : Connection Reconnecting (Delay: 4819ms)
14:44:45 : EVENT : Connection Reconnecting (Delay: 4609ms)
14:44:50 : EVENT : Connection Reconnecting (Delay: 4262ms)
14:44:54 : EVENT : Connection Reconnecting (Delay: 4151ms)
14:44:58 : EVENT : Connection Reconnecting (Delay: 4035ms)
14:45:02 : EVENT : Connection Reconnecting (Delay: 4616ms)
14:45:02 : EVENT : Reconnection failed
14:45:02 : EVENT : Connection Broken

In the mqclient.ini file I had

CHANNELS:
   ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416)
   MQReconnectTimeout=30
   ReconDelay=(1000,200)(2000,200)(4000,1000)

The time between first detecting a problem, 14:44:32 and 14:45:02 : EVENT : Reconnection failed, was the time specified in MQReconnectTimeout=30 (seconds).

The reconnections were tried after the times in ReconDelay=(1000,200)(2000,200)(4000,1000).

For (1000,200) this says try connecting after 1000 ms + a random time in interval between 0 and 200 ms. The first interval was 1021 ms.

See here for more information.

 

The ups and downs of MQ Reconnect – little problems

This is one topic in a series of blog posts.

Your error messages contain the wrong queue manager name

  • You have an application which does
  • MQCONNX
  • MQINQ queue manager – get queue manager name
  • MQOPEN MYREPLY
  • MQOPEN SERVER
  • MQPUT SERVER
  • MQGET MYREPLY
    MQDISC.

You are a professional application programmer, so you used MQINQ to extracted the queue manager name, and produce event messages like.

MQGET from MYREPLY queue on QMA reason code: MQRC_NO_MSG_AVAILABLE

If you reconnected to a different queue manager before the MQGET, your message will be wrong. You will have people looking for a problem on QMA – when the problem was actually on QMC.

In your MQCB exit you can have logic

case MQRC_RECONNECTED:

    printf("%sEVENT : Connection Reconnected\n",TimeStamp);

    MQINQQMNAME(hConn,QMNAME);

    break;

Where the MQINQQMNAME does

MQOPEN on the queue manager object,
MQINQ Selectors[0]=MQCA_Q_MGR_NAME;

and stores the queue manager name in global storage.

Your error message can now use this global storage and report

MQGET from MYREPLY queue on QMC reason code: MQRC_NO_MSG_AVAILABLE

Your error messages still contain the wrong queue manager name

Using the same scenario as above, you look into why there was no reply to be got. The MQADMIN team look at the status of the server queue, and say “the last time a message was put to the server queue was 4 weeks ago. It must be an application bug”.

You need some logic like

MQPUT(...)
PutQMNAME = QMNAME; // from the global storage above 
                    //  this might be QMA

Now if you get no reply message, your error message can be like

MQGET from MYREPLY queue on QMC reason code: MQRC_NO_MSG_AVAILABLE.  Message was put to SERVER queue on QMA at 14:40:23”.

Any MQSETs are not repeated.

If you issue an MQSET, and then reconnect to another queue manager, the MQSET is not repeated.

Best practice is not to use MQSET.

The attributes you can change, are for triggering and inhibit get and inhibit put, all of which would be better done using automation.

There are limitations as to what you can and cannot do.

See here.

For example

  • getting a message under cursor, or in a group.
  • using a message context
  • using a security context.

The ups and downs of MQ Reconnect – Frustrating problems.

This is one topic in a series of blog posts.

 

If there are problems during MQ reconnection, the queue manager may not report them.

I had a SERVER queue  defined on QMA and QMC, then the reconnection worked OK.
If I deleted the queue from QMC.

I got ( using the MQCB exit to print the status)

18:05:37 : EVENT : Connection Reconnecting (Delay: 1135ms)
18:05:38 : EVENT : Reconnection failed
18:05:38 : EVENT : Connection Broken
MQGET get cc 2 rc 2548 MQRC_RECONNECT_FAILED

In /var/mqm/qmgrs/QMC/errors/AMQERR01.LOG

I had

26/03/19 18:05:17 - Process(1617.4) User(colinpaice) Program(amqrmp)
Host(colinpaice) Installation(Installation1)
VRMF(9.1.2.0) QMgr(QMC)
Time(2019-03-26T18:05:17.489Z)
RemoteHost(127.0.0.1)
CommentInsert1(localhost (127.0.0.1))
CommentInsert2(TCP/IP)
CommentInsert3(COLIN)
AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel 'COLIN'
closed.
AMQ9999E: Channel 'COLIN' to host '127.0.0.1' ended abnormally.

Not much use to tell me where the problem was, and there was nothing else to help me find out what the problem was

I took an internal trace, formatted it, and looked for a likely problem.  I could use the time stamp to narrow down the range of records.

The trace had MQI:MQOPEN HConn=0140000F HObj=00000000 rc=00000825 ObjType=00000001 ObjName=SERVER

0825 is MQRC_UNKNOWN_OBJECT_NAME

I found it helpful for the application to  explicitly connect to the queue manager. In this case, I got

MQ connx to QMC cc 0 rc 0 MQRC_NONE
QMNAME is QMC
Return code from MQOPEN to SERVER is 
cc 2 rc 2085,MQRC_UNKNOWN_OBJECT_NAME

From this I could see what the problem was.

Best practice.

If you are going to use MQ reconnection you need to review your application.

  1. Print out the queue manager name at start up.
  2. Use MQCB to provide a routine which prints out when reconnection occurs, and which queue manager is currently being used.
  3. You should now have an trail of which queue manager was being used at which time, and can see.
  4. Try connecting the application to every queue manager, and make sure it works successfully.  If you do not do this it will be hard to tell why a connection failed.