The display connection command doesn’t, and stop connection doesn’t.

Or to put it another way, it is hard top stop a resilient client connection using reconnect.

I was playing around with MQ reconnection,  to see how resilient it was, and wanted to stop one instance of my server program.

I tried to use the display connections command to identify one of my server programs so I could stop it. I failed to do either of these. If I am missing something – please let me know.

I used

echo “dis conn(*) where(appltag,eq,’serverCB’) all” |runmqsc QMA|less

and got several entries like the one below – the only significant difference was the TID value.

AMQ8276I: Display Connection details.
CONN(21029D5C02AA7121) 
EXTCONN(414D5143514D41202020202020202020)
TYPE(CONN) 
PID(10907) TID(36) 
APPLDESC(IBM MQ Channel) APPLTAG(serverCB)
APPLTYPE(USER) ASTATE(STARTED)
CHANNEL(COLIN) CLIENTID( )
CONNAME(127.0.0.1) 
CONNOPTS(MQCNO_HANDLE_SHARE_BLOCK,MQCNO_SHARED_BINDING,
         MQCNO_RECONNECT)
USERID(colinpaice) UOWLOG( )
UOWSTDA(2019-03-29) UOWSTTI(08.10.58)
UOWLOGDA( ) UOWLOGTI( )
URTYPE(QMGR) 
EXTURID(XA_FORMATID[] XA_GTRID[] XA_BQUAL[])
QMURID(0.12295) UOWSTATE(ACTIVE)

The Knowledge Center says

PID: Number specifying the process identifier of the application that is connected to the queue manager.

TID: Number specifying the thread identifier within the application process that has opened the specified queue.

I used the ps-ef command to show the process with pid of 10907 it gave

mqm 10907 10876 0 Mar28 ? 00:00:00 /opt/mqm/bin/amqrmppa -m QMA

amqrmppa is for Process Pooling. This acts as a proxy for your client program.  This is clearly not my application.

One instance of amqrmppa can handle many connections. It creates threads to run the work on behalf of the client application. The TID is the thread number within the process.

If you have lots of clients you can have multiple instances of amqrmppa running.

So the following may be better definitions:

PID: Number specifying the process identifier of the application that is connected to the queue manager. For local bindings this is the process id. For clients this is the process id of the proxy service amqrmppa.

TID: Number specifying the thread identifier within the application process that has opened the specified queue. For clients this is the thread identified within an amqrmppa instance.

On my Ubuntu system, my bindings mode application had TID(1).

Even with this additional information, I was unable tie up the MQ connections with my program instances. I had 3 instances of serverCB running on the same machine.

If you had multiple machines, they would have a different conname,  so you can identify which connection is for which machine – but you cannot tell which connection within a machine

I want to stop an instance.

There is a good technote How to identify MQ client connections and stop them.

This says to stop a connection use the MQ commandSTOP  CONN(21029D5C02AA7121).   My first problem is I dont know which connection is the one I want to stop.

But if this is a client using reconnect, the program will reconnect!

The technote suggests using the MQ command  STOP CHL(…) status(inactive).

The applications using this channel (all of them) will get MQRC_CONNECTION_QUIESCING. If they stop, and are restarted, they will be able to reconnect to this queue manager (or any other available queue manager).    This may be OK.  (But you may not want to stop all of the applications using this channel definition)

If you use STOP CHL(…) status(stopped). The applications using this channel will get MQRC_CONNECTION_QUIESCING . If they stop, and are restarted, they will not be able to reconnect to this queue manager until the channel is started again.   But they can connect to other queue managers which are active and the channel is available.

These clients with reconnection are certainly resilient!

If I am missing something important – please tell me!

 

 

 

The ups and downs of MQ Reconnect – the basics

I struggled for a while to understand what the queue manager provided reconnection support for clients gave me. As this has now been extended in 9.1.2 to have Uniform Clusters, I thought I had better spend some time to understand it, and document what I learned.

Overall the MQ reconnection support simplifies an application by making the reconnection after failure transparent to the application. The down side is that you now have to write some subtle code to handle the side effects of the transparent reconnection.

This blog post grew so large, I had to split it up.  The topics are

 

Basic business problem

Consider the following application scenario.

There is an application in a web server. You use your web browser to connect to the application server. This runs an application which connects to MQ as a client. There is an interaction with MQ and the application sends a response back to the end user.

The client connected application is connected to a queue manager. The queue manager is shut down, we want the application to connect to another queue manager and continue working as quickly as possible.

Reconnecting to another queue manager

Your main queue manager is QMA using port 1414, your alternate queue manager is QMC using port 1416.
In your mqclient.ini you have definition

ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416)

which gives two connections, with the same channel name, but with different IP addresses.

Your application connects to QMA.
Shut down QMA, it fails to connect to QMC, because the queue manager name does not match.

You can fix this by using a blank queue manager name. I am not comfortable with this.

You can also fix this by specifying QMNAME(GROUPX) on your client channel definitions.

Your application has to connect using QMNAME *GROUPX instead of QMA.

You need a CCDT which contains all of the needed channel definitions.

If you are not using reconnection support.

If you shut down QMA. The application gets MQRC_CONNECTION_BROKEN. The application can go to the top of the program and reissue MQCONN, MQOPEN etc. This time the application connects to QMC.

The MQ reconnection support can make this transparent to the application, so you do not need to code this recovery logic yourself.

There is a big section on Automatic Client Reconnection here.

How MQ reconnect works

The reconnection is driven when

  • the queue manager abnormally ends
  • the endmqm -r is used
  • some network problems

The example below shows what happens in an application puts two messages to a queue, and the queue manager is shut down during the transaction.

There are two queue managers QMA and QMC, Each as a remote queue called SERVER_QUEUE. Each has a queue called MYREPLY

Normal behavior

  • MQCONN *GROUPX
  • MQOPEN SERVER_QUEUE
  • MQOPEN MYREPLY
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
  • MQGET with wait from MYREPLY
  • MQGET with wait from MYREPLY
  • MQCLOSE SERVER_QUEUE
  • MQCLOSE MYREPLY
  • MQDISC

If this had queue manager was shut down after the first put, the second MQPUT call gets MQRC_CONNECTION_BROKEN and the logic starts from the top and connects to a different queue manager. The first MQPUT is reissued. The MQGETs work because the replies are sent to this queue manager – but there is also reply on QMA from the original MQPUT which may need to be handled.

Now with the reconnection scenario

  • MQCONN *GROUPX
  • MQOPEN SERVER_QUEUE
  • MQOPEN MYREPLY
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue
    • Queue manager QMA shut down specifying endmqm -r QMA to tell clients to reconnect.
    • Connection to QMA ended
    • Connection to QMC started.
    • All MQ work is now done on QMC.
    • The application does not get any error codes saying a reconnect to a different queue manager has happened.
  • MQPUT to SERVER_QUEUE out of syncpoint specifying MYREPLY as the Reply To Queue. This is put to the queue on QMC.
  • MQGET with wait from MYREPLY. This is done on QMC
  • MQGET with wait from MYREPLY This is done on QMC
    • This gets no message found.
  • MQCLOSE SERVER_QUEUE
  • MQCLOSE MYREPLY
  • MQDISC
  • The application gets a return code 2033 ( no message) from the second MQGET

Because the first MQPUT specified a reply-to-queue of MYREPLY at QMA. The reply from the server will be sent there.

The second put specified a reply to queue of MYREPLY at QMC.

The first MQGET on QMC gets the reply to this message.

We now have a message destined for MYREPLY at QMA and we are missing a message on QMC.

The application did not have to worry about the re- connection logic. It has the same sort of problems as the original application about messages being in the wrong place.

The ups and downs of MQ Reconnect – what do I need to do to get it to work?

This is one topic in a series of blog posts.

What you need to do is documented here.

It has, One of:

  • MQCONNX with MQCNO Options set to MQCNO_RECONNECT or MQCNO_RECONNECT_Q_MGR.
  • Defrecon=YES|QMGR in mqclient.ini
  • In JMS set the CLIENTRECONNECTOPTIONS property of the connection factory.

What queues can be used?

The client remembers which queues were used, and as part of the reconnection will open the queues on the applications behalf.

Permanent queues. You have to have the same queues defined on all systems

Dynamic queues. For example an application  used SYSTEM.DEFAULT.MODEL.QUEUE. This created a queue AMQ.5C991A0D23B6310, and used it. When the reconnect occurs, the MQ client code opens SYSTEM.DEFAULT.MODEL.QUEUE and says use the name AMQ.5C991A0D23B6310.

When the application is reconnected, it can continue using the queue.

Note.
If you do MQOPEN SYSTEM.DEFAULT.MODEL.QUEUE and get back AMQ.5C991A0D23B6310, then try to open AMQ.5C991A0D23B6310 for example to use MQINQ and MQSET, or MQOO_INPUT. This will cause problem

The ups and downs of MQ Reconnect – the classic MQ scenarios

This is one topic in a series of blog posts.

Scenario 1 server application with non persistent messages

Consider a server application using a client connection using non persistent messages. It does

  • MQGET out of syncpoint
  • MQPUT out of syncpoint.

If the queue manager is shut down and the application can transparently connect to an available queue manager. The worst situation is that you get from QMA, and put from QMC.

This is not a problem.

Scenario 2 server application with persistent messages

The application does

  • MQGET in syncpoint
  • MQPUT in syncpoint
  • MQCOMMIT

When the queue manager was ended, the application did

  • MQGET in syncpoint
    • endmqm -r QMA, and the application connects to QMC
  • MQPUT in syncpoint
  • MQCOMMIT
    • gave MQRC_BACKED_OUT, and the original message is still available on QMA.

Going round the loop again. This time all of the requests are on QMC.

This is OK.

Scenario 3 – Client application with non persistent messages

The application does

  • MQPUT to server queue with non persistent, out of syncpoint
  • MQGET with wait, out of syncpoint.

If we had

  • MQPUT to server non persistent, out of syncpoint
    • endmqm -r QMA. This switches to QMC
  • MQGET With wait, out of syncpoint.

This is done from queue manager QMC. As the reply-to-queue was REPLY at QMA, then the get will time out.

This is OK.

  1. It is non persistent, so you are expecting that it may be lost
  2. You have set expiry so it will expiry after a short period
  3. Your default behaviour may be to reissue the request, or just to tell the end user “sorry… “

Scenario 4 – Client application with persistent messages

This scenario is similar to the previous scenario – but this time you do not want to lose the message.

  • The application does
  • MQPUT of persistent message, within syncpoint specifying the reply to queue
  • MQCOMMIT
  • MQGET with wait.
  • If message is retrieved
    • MQCOMMIT

This single business transaction has two units of work (MQPUT, COMMIT) and (MQGET, COMMIT).

If we had

  • MQPUT of persistent message, within syncpoint specifying the reply to queue
  • MQCOMMIT
  • MQGET with wait.
    • endmqm -r QMA. This switches to QMC
  • MQGET returns with 2033. No message found.

This is similar to the existing situation when the back end server was slow, and you did not get your reply in time. You should already have a process for when the reply arrives after your application has gone away. Typically you have a process which processes these orphaned messages (eg over 10 minutes old) and you have some logic to fix the problem – such as update a database indicating “special case”.

Tell your end users “possible problem – please check”, or retry the original request and hope it works. The back end application needs to be prepared to handle possible duplicate requests.

In this case you do want the message (because it was persistent).

You have a couple of ways of handling this.

  1. Report this as an exception, use your internal processes cleans up the orphaned messages. Tell the end users “sorry..”
  2. Do not use automatic reconnect – but reconnect to the original queue manager, and wait for a period for the reply. If you cannot connect within x second – pick another queue manager to connect to. Use your existing orphaned message process for clean up. Reconnecting to the original queue manager, may take a few seconds longer, but may reduce the number of orphaned messages to process.

This scenario requires more planning and additional code to implement.

The ups and downs of MQ Reconnect – how can I tell if automatic reconnect has happened.

This is one topic in a series of blog posts.

The MQ reconnection support make any reconnection transparent to the application code.

You can specify an MQ Call Back function, MQCB, and specify a routine which gets control asynchronously when events happen. See sample code amqsphac.c described here.

The specified call back function gets control whenever there is an event. You can then take actions such as print out information, for example

  • switch…
  • case MQRC_RECONNECTED:
    • printf(“%sEVENT : Connection Reconnected\n”,TimeStamp);
    • break;

You could extend this to set a flag in global storage, or write an event message to aid problem determination.

Running an application and issuing the endmqm -r QMA command, the exit produced

14:43:25 : EVENT : Connection Reconnecting (Delay: 1055ms)
14:43:26 : EVENT : Connection Reconnected

Which showed it connected to another queue manager after a short interval.

With only one queue manager active, and issuing endmqm -r QMA

the messages were

14:44:32 : EVENT : Connection Reconnecting (Delay: 1021ms)
14:44:34 : EVENT : Connection Reconnecting (Delay: 2122ms)
14:44:36 : EVENT : Connection Reconnecting (Delay: 4572ms)
14:44:40 : EVENT : Connection Reconnecting (Delay: 4819ms)
14:44:45 : EVENT : Connection Reconnecting (Delay: 4609ms)
14:44:50 : EVENT : Connection Reconnecting (Delay: 4262ms)
14:44:54 : EVENT : Connection Reconnecting (Delay: 4151ms)
14:44:58 : EVENT : Connection Reconnecting (Delay: 4035ms)
14:45:02 : EVENT : Connection Reconnecting (Delay: 4616ms)
14:45:02 : EVENT : Reconnection failed
14:45:02 : EVENT : Connection Broken

In the mqclient.ini file I had

CHANNELS:
   ServerConnectionParms=COLIN/TCP/127.0.0.1(1414),127.0.0.1(1416)
   MQReconnectTimeout=30
   ReconDelay=(1000,200)(2000,200)(4000,1000)

The time between first detecting a problem, 14:44:32 and 14:45:02 : EVENT : Reconnection failed, was the time specified in MQReconnectTimeout=30 (seconds).

The reconnections were tried after the times in ReconDelay=(1000,200)(2000,200)(4000,1000).

For (1000,200) this says try connecting after 1000 ms + a random time in interval between 0 and 200 ms. The first interval was 1021 ms.

See here for more information.

 

The ups and downs of MQ Reconnect – little problems

This is one topic in a series of blog posts.

Your error messages contain the wrong queue manager name

  • You have an application which does
  • MQCONNX
  • MQINQ queue manager – get queue manager name
  • MQOPEN MYREPLY
  • MQOPEN SERVER
  • MQPUT SERVER
  • MQGET MYREPLY
    MQDISC.

You are a professional application programmer, so you used MQINQ to extracted the queue manager name, and produce event messages like.

MQGET from MYREPLY queue on QMA reason code: MQRC_NO_MSG_AVAILABLE

If you reconnected to a different queue manager before the MQGET, your message will be wrong. You will have people looking for a problem on QMA – when the problem was actually on QMC.

In your MQCB exit you can have logic

case MQRC_RECONNECTED:

    printf("%sEVENT : Connection Reconnected\n",TimeStamp);

    MQINQQMNAME(hConn,QMNAME);

    break;

Where the MQINQQMNAME does

MQOPEN on the queue manager object,
MQINQ Selectors[0]=MQCA_Q_MGR_NAME;

and stores the queue manager name in global storage.

Your error message can now use this global storage and report

MQGET from MYREPLY queue on QMC reason code: MQRC_NO_MSG_AVAILABLE

Your error messages still contain the wrong queue manager name

Using the same scenario as above, you look into why there was no reply to be got. The MQADMIN team look at the status of the server queue, and say “the last time a message was put to the server queue was 4 weeks ago. It must be an application bug”.

You need some logic like

MQPUT(...)
PutQMNAME = QMNAME; // from the global storage above 
                    //  this might be QMA

Now if you get no reply message, your error message can be like

MQGET from MYREPLY queue on QMC reason code: MQRC_NO_MSG_AVAILABLE.  Message was put to SERVER queue on QMA at 14:40:23”.

Any MQSETs are not repeated.

If you issue an MQSET, and then reconnect to another queue manager, the MQSET is not repeated.

Best practice is not to use MQSET.

The attributes you can change, are for triggering and inhibit get and inhibit put, all of which would be better done using automation.

There are limitations as to what you can and cannot do.

See here.

For example

  • getting a message under cursor, or in a group.
  • using a message context
  • using a security context.

The ups and downs of MQ Reconnect – Frustrating problems.

This is one topic in a series of blog posts.

 

If there are problems during MQ reconnection, the queue manager may not report them.

I had a SERVER queue  defined on QMA and QMC, then the reconnection worked OK.
If I deleted the queue from QMC.

I got ( using the MQCB exit to print the status)

18:05:37 : EVENT : Connection Reconnecting (Delay: 1135ms)
18:05:38 : EVENT : Reconnection failed
18:05:38 : EVENT : Connection Broken
MQGET get cc 2 rc 2548 MQRC_RECONNECT_FAILED

In /var/mqm/qmgrs/QMC/errors/AMQERR01.LOG

I had

26/03/19 18:05:17 - Process(1617.4) User(colinpaice) Program(amqrmp)
Host(colinpaice) Installation(Installation1)
VRMF(9.1.2.0) QMgr(QMC)
Time(2019-03-26T18:05:17.489Z)
RemoteHost(127.0.0.1)
CommentInsert1(localhost (127.0.0.1))
CommentInsert2(TCP/IP)
CommentInsert3(COLIN)
AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel 'COLIN'
closed.
AMQ9999E: Channel 'COLIN' to host '127.0.0.1' ended abnormally.

Not much use to tell me where the problem was, and there was nothing else to help me find out what the problem was

I took an internal trace, formatted it, and looked for a likely problem.  I could use the time stamp to narrow down the range of records.

The trace had MQI:MQOPEN HConn=0140000F HObj=00000000 rc=00000825 ObjType=00000001 ObjName=SERVER

0825 is MQRC_UNKNOWN_OBJECT_NAME

I found it helpful for the application to  explicitly connect to the queue manager. In this case, I got

MQ connx to QMC cc 0 rc 0 MQRC_NONE
QMNAME is QMC
Return code from MQOPEN to SERVER is 
cc 2 rc 2085,MQRC_UNKNOWN_OBJECT_NAME

From this I could see what the problem was.

Best practice.

If you are going to use MQ reconnection you need to review your application.

  1. Print out the queue manager name at start up.
  2. Use MQCB to provide a routine which prints out when reconnection occurs, and which queue manager is currently being used.
  3. You should now have an trail of which queue manager was being used at which time, and can see.
  4. Try connecting the application to every queue manager, and make sure it works successfully.  If you do not do this it will be hard to tell why a connection failed.

 

I think the IBM instructions for install MQ on Ubuntu gets 6/10

I went through the documentation for migrating mid-range MQ up to the latest level (9.1.1) for MQ on Ubuntu,and for installing fix packs, so I could be ready to install MQ 9.1.2 which is now out.

It felt like the documentation had not been properly tested, nor tested in a typical enterprise environment, so I have written up some up some instructions to help you.

Ive written this up here  and tried to cover the scenarios most people will have to go through.  For example

  • Before installing 9.1.2 you’ll need to delete 9.1.1.
  • You cannot have multi version install with Ubuntu.
  • Being a cautious person I use sudo to issue the commands that need root, rather than switch to root itself for the duration of the install.
  • When you install a new v.r.m it puts the files in MQServer directory, when you install a fix pack, it puts the files in the same directory as your .gz file, along with all of the rubbish you have accumulated over the years.
  • Rename your MQServer directory to MQServer911 – to make it clear what it is for,  and so when you install 9.1.2 it does not overwrite the contents.
  • It is good to clean up after you

 

Where are the man pages for MQ?

I installed the man pages for MQ 9.1.1, but they were not installed in the standard place. man runmqsc said No manual entry.

I had to use

man -M /opt/mqm/man/ runmqsc


The commands you can specify are

 addmqinf   altauth    altchl    altcomm     altlis
altnl altpro altqa altql altqm
altqmgr altqr altserv altsub alttopic
amqmfsck clearql clearstr crtmqcvx crtmqenv
crtmqinst crtmqm defauth defchl defcomm
deflis defnl defpro defqa defql
defqm defqr defserv defsub deftopic
delauth delcomm deletchl deletelis deleteqa
deleteql deleteqm deleteqr deleteserv deletpro
delnl delrec delsub deltopic dischauth
dischl dischs disclqm discomm disconn
disenauth dislis dislisst disnl dispbsub
dispro disq disqmgr disqmsta disqstat
dissbsta disserv dissub dissvstat distopic
distpstat dltmqinst dltmqm dmpmqaut dmpmqlog
dspauth dspmq dspmqaut dspmqcsv dspmqfls
dspmqinf dspmqinst dspmqrte dspmqtrc dspmqtrn
dspmqver dsprec dspserv endmqcsv endmqlsr
endmqm endmqtrc mqrc pingchl pingqmgr
purgechl rcdmqimg rcrmqobj refclus refqmgr
refsecy resetchl resolchl restclus restqmgr
resuqmgr rmvmqinf rsvmqtrn runmqchi runmqchl
runmqdlq runmqlsr runmqsc runmqtmc runmqtrm
setchaut setlog setmqaut setmqenv setmqinst
setmqm setmqprd setrec stachi stachl
stalsr startserv stopchl stopcon stoplsr
stopserv strmqcfg strmqcsv strmqm strmqtrc
suspqmgr

The verbs are

 mqback   mqbegin  mqbufmh  mqcb     mqcbfunc  mqclose
mqcmit mqconn mqconnx mqcrtmh mqctl mqdisc
mqdltmh mqdltmp mqget mqinq mqinqmp mqmhbuf
mqopen mqput1 mqput mqset mqsetmp mqstat
mqsub mqsubrq

For toilets read server….

I had a weekend away, and my experience of toilets at an airport gave me insight into servers used in computing. In the blog post below – where I say toilet – think server.

When I first joined IBM, 40 years ago we had “opinion surveys” where you could raise issues, and management would ignore them. At one feedback session, someone said we need bigger capacity toilets. There were comments like “We know you are full of ****, do you need bigger bowls?”. He meant that we needed more cubicles, because on a Friday afternoon, when some people came back from the pub, they would sit on the toilet and go to sleep. This was my first insight into the multiple meanings of the term capacity.

Later when I was just starting in performance, they replaced our mainframe machine which had one 60 MIPS CPU, with the newest machine with 6 CPUs each at 10 MIPS. The accountants saw this as the same sized computer. To us, a single CPU-bound transaction took 6 times longer – but you could now do 6 transactions in parallel, so overall the throughput was comparable. Like toilets, most of the time was spent doing I/O.

For my weekend away, I spent time at an airport in Scotland. The departures side has a central section, and a wing on either side. Our departure gate was in one wing. This had one toilet for men, and another for women. The men’s toilet had about 10 cubicles, and so if you needed one you normally did not have to wait very long. If one cubicle was out of service, this did not have a major impact on throughput. Unfortunately these toilets were closed for refurbishment. The closest toilets were back in the central area, but with only two cubicles, and you often had to wait. This showed there was insufficient capacity. If you could not wait, you had to walk further to find the next toilets. These had more capacity, but some times you still had to wait. For the ladies toilet, the queue was out of the door, and along the corridor – so a real lack of capacity – which shows lack of planning (or a male architect).

By the time I had been to the toilet and walked back to my gate, they were closing the flight!

What insight did I learn?

  • If you have several big servers, and one is shut down, you need enough capacity on the other servers to cope.
  • If you shut down one big server, the time spent per transaction will increase, as there is an increased waiting time for the servers.
  • Depending on the location of the servers, you may have extra delay getting to the servers.
  • Routing work to small servers may not help, as the servers may get overloaded. This small server is overloaded, but those two server are not busy – but you cannot route work to them.
  • A larger server can handle peaks in workload better than multiple smaller ones.
  • Some architects are not good at designing systems, for availability and capacity. You need to know the duration of your typical transaction and plan for that. Some transactions may take longer than others. In Copenhagen airport, the toilets are unisex which helps solve the availability and capacity problems!