How do I get a client to disconnect?

I had a question from a customer who asked how they can reduce the number of client connections in use.  They had tried setting a disconnect interval (DISCINT) on the channel, but the connections were like weeds – you kill them off, and they grow back again.

DISCINT is “the length of time after which a channel closes down, if no message arrives during that period”.  This sounds perfect for most people.   The application is in an MQGET, and if no messages arrive, the channel can be disconnected, and the application gets connection broken.   The application can then decide to disconnect or reconnect.
If the application is not in an MQGET, then it will get notified of the broken connection next time it tries to use MQ.

Independent applications

Many applications are well written in that when they get Connection Broken, they just reconnect again, and so the DISCINT has no effect on reducing the number of connections. This may be good for availability but not for resource usage.   It may be good to have 1000 application instances running the day, but perhaps not overnight when there is no work to do.   Ive seen instances where the applications do an MQGET every minute, and with 1000 instances this can use a lot of CPU and doing no useful work.  In this case you want unused application instances to stop, and be restarted when needed.

You cannot use triggering with client connections (unless you have a very smart trigger monitor to produce an event which says start a client program over there).

Use automation periodically check the queue depth, and number of input handles. If there is a high queue depth, or a low number of handles(eg 2)  then start more application instances, across your back-end servers.  Your applications can then disconnect if they have not received a message within say 10 minutes.  This should keep the right number of application instances active.

An administrator should be able to get this automation set up, but getting the application to connect could be a challenge, as this requires the application developer to change the code!

Running under a web server

If your applications are running under a web server you may have mis-configured connection pools.  You can specify the initial size of the pool, and this many connections are made.  As more connections are needed, then more can be added to the pool until the pool maximum is reached. You should specify a time out value, so periodically the pool gets cleaned up, and unused connections are removed, until the pool is back to the initial size.  You should review the initial size of the pools ( is it too large), and the value of the time out value.

This should just be an administrative change.

Good luck, you may be successful in reducing the number of client connections, but do not set your hopes too high.

WebSphere Liberty connectionPool statistics

This blog post explains how to get and understand statistics from WebSphere Liberty on connectionPool usage.

In your MDB application you can have code like

 InitialContext ctx = new InitialContext();
ConnectionFactory cf = (ConnectionFactory)ctx.lookup("CF3");

This says lookup the connection defined by CF3 and issue MQCONN for this connection.

In WebSphere Liberty you defined connection information in server.xml.  For example

<jmsConnectionFactory jndiName="CF3" id="CF3ID">
  <connectionManager maxPoolSize="2" connectionTimeout="7s"/> 
  <properties.wmqJms 
   queueManager="QMA"
   transportType="BINDINGS"
   applicationName="Hello"/>
</jmsConnectionFactory>

The maxPoolSize gives the maximum number of connections available in this pool.

If server.xml has

<featureManager>
   <feature>monitor-1.0</feature>
</featureManager>

then you can get out statistics on connectionPools using the JMX interface.

In ./usr/servers/test/jvm.options I had

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

Which defined the JMX port as 9010, and so I can get information through this port.

Looking at the output

There is documentation here on the connectionPool statistics.

You can use jconsole to get the JMX data, but this is not very usable, so I used jmxquery, which is part of a python package.  I installed it using pip install jmxquery.

I used the command

java -jar jmxquery.jar -url service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi -u admin c -p admin -q ‘WebSphere:*’ > outputfile

-q ‘WebSphere:*’  means give all records belonging to the WebSphere component.  If you say -q ‘*:*’ you get statistics for all components, see the bottom of the blog post.  Example output is given below.

This command wrote all of the output to file outputfile.  I then used grep to extract the relevant records.

grep WebSphere:type=ConnectionPoolStats,name outputfile

If you change a parameter in server.xml for the jmsConnectionPool, the pool is deleted, recreated, and the JMX data is reset.   If the pool has been reset, or not been used, statistics for that pool are not available.  On the first use of the pool the pool is created, and JMX statistics are available.

The JMX data for connectionPools

The data was like

WebSphere:type=ConnectionPoolStats,name=CF3/CreateCount (Long) = 2

The detailed records for WebSphere:type=ConnectionPoolStats,name=CF3 are

  • CreateCount (Long) = 2   this is the number of connections created,
  • DestroyCount (Long) = 0 this is the number of connections released because the pool was purged,
  • WaitTime (Double) = 76.36986301369863  there were insufficient threads.  For those threads that had to wait, this is the average wait time before a connection became available,
  • InUseTime (Double) = 18.905405405405407 the threads were active this time on average,
  • WaitTimeDetails/count (Long) = 98 requests because had to wait,
  • WaitTimeDetails/description (String) = Missing,
  • WaitTimeDetails/maximumValue (Long) = 110  the maximum wait time in milliseconds,
  • WaitTimeDetails/mean (Double) = 78.13265306122449 the average wait time,
  • WaitTimeDetails/minimumValue (Long) = 16 the minimum wait time,
  • WaitTimeDetails/standardDeviation (Double) = 16.474205982730254 the standard deviation,
  • WaitTimeDetails/total (Double) = 7657.0 in milliseconds.  7657/(number of waits 98) = average 78.13 (above),
  • WaitTimeDetails/unit (String) = UNKNOWN looks like a bug – this should be milliseconds,
  • WaitTimeDetails/variance (Double) = 271.82426517365184 ,
  • ManagedConnectionCount (Long) = 2  The total number of managed connections in the free, shared, and unshared pools,
  • ConnectionHandleCount (Long) = 0  this is the current handles in use,
  • FreeConnectionCount (Long) = 2  this is the number of connections in the pool, but not in use,
  • InUseTimeDetails/count (Long) = 101 – number of requests for a connection (ctx.lookup(“CF3”)),
  • InUseTimeDetails/description (String) = Missing,
  • InUseTimeDetails/maximumValue (Long) = 53 the maximum time the connection as in use in milliseconds,
  • InUseTimeDetails/mean (Double) = 18.099009900990097  the average time the connections were in use in milliseconds,
  • InUseTimeDetails/minimumValue (Long) = 10  the minimum time the connections were in use in milliseconds,
  • InUseTimeDetails/standardDeviation (Double) = 5.63923216261808,
  • InUseTimeDetails/total (Double) = 1828.0  in milliseconds.   This value(1828)/(number of connections used 101) gives the mean value 18.09 above,
  • InUseTimeDetails/unit (String) = UNKNOWN.

Note the order of the record can vary, for example CreateCount, can be first, or nearly last.

After a time interval aged connections can be released.  When there is sufficient workload to need more connections, they will be created as needed.  If the CreateCount increases significantly during the day, you may either have an irregular workload, or you need to increase you connectionTimeout value, to smooth out the connect/disconnect.

Having WaitTimeDetails/count=0 is good.  If this number is large in comparison to InUseTimeDetails/total then the pool is too small.

Other data you can get from JMX

  • IBM MQ:type=CommonServices
  • java.lang:type=ClassLoading
  • java.lang:type=Compilation
  • java.lang:type=GarbageCollector
  • java.lang:type=Memory
  • java.lang:type=Threading
  • osgi.core:
  • JMImplementation:type=MBeanServerDelegate
  • java.util.logging:type=Logging
  • java.nio:type=BufferPool,name=direct
  • java.lang:type=MemoryManager
  • java.lang:type=MemoryPool,name=Code Cache
  • java.lang:type=OperatingSystem
  • java.lang:type=Runtime
  • WebSphere:feature=apiDiscovery,name=APIDiscovery
  • WebSphere:feature=kernel,name=ServerInfo
  • WebSphere:type=JvmStats
  • WebSphere:type=ThreadPoolStats
  • WebSphere:type=ConnectionPoolStats ( as described above)
  • WebSphere:service=com.ibm.websphere.application.ApplicationMBean,name=CCP

How do I make my MDB transactional?

I found from the application trace  that my MDB was doing MQGET, MQCMIT in the listener, and MQOPEN, MQPUT, MQCLOSE and no MQCMIT in my application.    Digging into this I found that the MQPUT was NO_SYNCPOINT, which was a surprise to me!

My application had session = connection.createSession(true, 1); // true = transactional. So I expected it to work.

The ejb-jar.xml had

enterprise-beans
  message-driven
    transaction-type Container
...
assembly-descriptor
  container-transaction
    trans-attribute NotSupported

I changed NotSupported to Required and it worked.

 

The application trace for the Listener part of the MDB gave me

Operation      CompCode MQRC HObj (ObjName) 
MQXF_XASTART            0000 -
MQXF_GET       MQCC_OK  0000    2 (JMSQ2 )
MQXF_XAEND              0000 -
MQXF_XAPREPARE          0000 -
MQXF_XACOMMIT           0000 -

The trace for the application part of the MDB gave me

Operation                    CompCode MQRC HObj (ObjName)
MQXF_XASTART                             0000         –
MQXF_OPEN             MQCC_OK   0000         2 (CP0000 )
MQXF_PUT                MQCC_OK   0000          2 (CP0000 )
MQXF_CLOSE           MQCC_OK   0000          2 (CP0000 )
MQXF_XAEND                                0000         –
MQXF_XAPREPARE                       0000 –
MQXF_XACOMMIT                        0000 –

and the put options had _SYNCPOINT.

I had read documentation saying that you needed to have XAConnectionFactory instead of ConnectionFactory.  I could not get this work,  but found it was not needed for JMS;  it may be needed for JDBC.

On Weblogic why isnt my MDB scaling past 10 instances?

This is another tale of one step back,  two steps sideways.  I was trying to understand why the JMX data on the MDBs was not as I expected, and why I was not getting tasks waiting.  I am still working on that little problem, but in passing I found I could not get my MDBs to scale.  I have rewritten parts of this post multiple times, as I understand more of it.  I believe the concepts are correct, but the implementation may be different to what I have described.

There are three parts to an MDB.

  1. A thread gets a message from the queue
  2. The message is passed to the application “OnMessage() method of the application
  3. The application uses a connection factory to get a connection to the queue manager and to the send the reply.

Expanding this to provide more details.

Thread pools are used to reuse the MQ connection, as the MQCONN and MQDISC are expensive operations.  By using a thread pool, the repeated MQCONN and MQDISC can be avoided.

There is a specific pool for the application, and when threads are released from this pool, they are put into a general pool.   Periodically  threads can be removed from the general pool, by issuing MQDISC, and then deleting the thread.

Get the message from the queue

The thread has two modes of operation Async consume – or plain old fashioned MQGET.

If the channel has SHARECNV(0) there is a  listener thread which browses the queue, and waits a short period( for example 5 seconds)  for a message.  There is a short wait, so that the thread can take action if required ( for example stop running).  This means if there is no traffic there is an empty MQGET every 5 seconds.   This can be expensive.

If the channel has SHARECNV(>0) then Asyn consume is used.  Internally there is a thread which browses the queue, and multiple threads which can get the message.

The maximum number of threads which can get messages is defined in the ejb-jar.xml activation-config-property-name maxPoolDepth  value.

These threads are in a pool called EJBPoolRuntime.  Each MDB has a thread pool of this name, but from the JMX data you can identify the pool as the JMS descriptor has a descriptor like MessageDrivenEJBRuntime=WMQ_IVT_MDB, Name=WMQ_IVT_MDB, ApplicationRuntime=MDB3, Type=EJBPoolRuntime, EJBComponentRuntime=MDB3/… where my MDB was called MDB3.

The parameters are defined in the ejb-jar.xml file.   The definitions are documented here.  The example below shows how to get from a queue called JMSQ2, and there will be no more than 37 threads able to get a message.

ejb-jar
  enterprise-beans
    message-driven
      activation-config
        activation-config-property>  
          activation-config-property-name maxPoolDepth  
            activation-config-property-value 37
          activation-config-property-name destination 
            activation-config-property-value JMSQ2
 

Note:  I did get messages like the following messages which I ignored ( as I think they are produced in error)

    • <Warning> <EJB> <BEA-015073> <Message-Driven Bean WMQ_IVT_MDB(Application: MDB3, EJBComponent: MDB3.jar) is configured with unknown activation-config-property name maxPoolDepth>
    • <Warning> <EJB> <BEA-015073> <Message-Driven Bean WMQ_IVT_MDB(Application: MDB3, EJBComponent: MDB3.jar) is configured with unknown activation-config-property name destination>

The default value of maxPoolDepth is 10 – this explains why I only had  10 threads getting messages from the queue.

Passing the message to the application for processing.

Once a message is available it will pass it to the OnMessage method of the application. There is some weblogic specific code, which seems to add little value. The concepts of this are

  1. There is an array of handles/Beans of size max-beans-in-free-pool.
  2. When the first message is processed, create “initial-beans-in-free-pool” beans and populate the array invoke the EJBCreate() method of the application.
  3. When a message arrives, then find a free element in this array,
    1. If the slot has a bean, then use it
    2. else allocate a bean and store it in the slot.   This allocation invokes the EJBCreate() method of the application.  On my laptop it took a seconds to allocate a new bean, which means there is a lag when responding to a spike in workload.
    3. call the OnMessage() method of the application.
  4. If all of the slots are in use – then wait.
  5. On return from the OnMessage() flag the entry as free
  6. Every idle-timeout-seconds scan the array, and free beans to make the current size the same as the in initial-beans-in-free-pool.  As part of this the EJBRemove() method of the application is invoked.

The definitions are documented here.

weblogic-ejb
  weblogic-enterprise-bean-jar
    pool    
      max-beans-in-free-pool 47
      initial-beans-in-free-pool 17 
      idle-timeout-seconds 60

I could find no benefit in using this pool.

The default max-beans-in-free-pool is 1000 which feels large enough.  You should make the initial-beans-in-free-pool the same or larger than the number of threads getting messages, see maxPoolDepth above.

If this value is too small, then periodically the pool will be purged down to the initial-beans-in-free-pool and then beans will be allocated as needed.  You will get a periodic drop in throughput.

Note the term max-beans-in-free-pool is not entirely accurate: the maximum number of threads for the pool is current threads in pool + active threads.   The term max-beans-in-free-pool  is accurate when there are no threads in use.

In the JMX statistics data, there is information on this pool.   The data name is likecom.bea:ServerRuntime=AdminServer2, MessageDrivenEJBRuntime=WMQ_IVT_MDB, Name=WMQ_IVT_MDB, ApplicationRuntime=MDB3, Type=EJBPoolRuntime, where WMQ_IVT_MDB is the display name of the MDB, and MDB3 is the name of the jar file.  This allows you to identify the pool for each MDB.

Get a connection and send the reply – the application connectionFactory pool.

The application typically needs to issue an MQCONN, MQOPEN of the reply to queue, put the message, and issue MQDISC before returning.   This MQCONN, MQDISC is expensive so a pool is used to save the queue manager connection handle between calls.  The connections are saved in a thread pool.

In the MDB java application there is code like ConnectionFactory cf = (ConnectionFactory)ctx.lookup(“CF3”);

Where the connectionFactory CF3 is defined in the resource Adapter configuration.

The connectionFactory cf can then be used when putting messages.

The logic is like

  • If there is a free thread in the connectionFactory pool then use it
  • else there is no free thread in the connectionFactory pool
    • if the number of threads in the connectionFactory pool at the maximum value, then throw an exception
    • else create a new thread (doing an MQCONN etc)
  • when the java program issues connection.close(), return the thread to the connectionFactory pool.

It looks like the queue handle is not cached, so there is an MQOPEN… MQCLOSE of the reply queue for every request.

You configure the connectionFactory resource pool from: Home, Deployments, click on your resource adapter, click on the Configuration tab, click on the + in front of the javax.jms, ConnectionFactory, click on the connectionFactory name, click on the Connection Pool tab, specify the parameters and click on the save button.
Note: You have to stop and restart the server or redeploy the application to pick up changes!

This pool size needs to have enough capacity to handle the case when all input threads are busy with an MQGET.

JMX provides statistics with a description like com.bea: ServerRuntime=AdminServer2, Name=CF3, ApplicationRuntime=colinra, Type=ConnectorConnectionPoolRuntime, ConnectorComponentRuntime=colinra_colin where CF3 is the name of the connection pool defined to the resource adapter, colinra is the name I gave to the resource adapter when I installed it, colin.rar is the name of the resource adapter file.

Changing userids

The application connectionFactory pool can be used by different MDBs.  You need to make sure this pool has enough capacity for all the MDBs using it.

If the pool is used by MDBs running with different userids, then when a thread is obtained, it the thread was last used for a different userid, then the thread has to issue MQDISC and MQCONN with the current userid, this defeats the purpose of having a connection pool.

To prevent this you should have a connection pool for MDBs running with the same userid.

Getting a userid from the general pool may have the same problem, so you should ensure your pools have a maxium limit which is suitable for expected peak processing, and initial-beans-in-free-pool for your normal processing.   This should reduce the need to switching userids.

Cleaning up the connectionFactory

When the connectionFactory is configured, you can specify

  • Shrink Frequency Seconds:
  • Shrink Enabled: true|false

These parameters effectively say after after the “Shrink Frequency Seconds”, if the number of threads in the connectionFactory pool is larger than the initial pool size, then end threads (doing an MQDISC) to reduce the number of threads to the  initial pool size.   If the initial pool size is badly chosen you may get 20 threads ending, so there are 20 MQDISC, and because of the load, 20 threads are immediately created to handle the workload.  During this period  there will be insufficient threads to handle the workload, so you will get a blip in the throughput.

If you have one connectionFactory pool being used by a high importance MDB and by a low importance MDB, it could be that the high importance MDB is impacted by this “release/acquire”, and the low priority MDB is not affected.  Consider isolating the connectionFactory pools and specify the appropriate initial pool size.

To find out what was going on I used

  • DIS QSTATUS(inputqueue)  to see the number of open handles.   This is  listener count(1) + current number of threads doing MQGETS, so with maxPoolDepth = 19, this value was up to 20.
  • I changed my MDB application to display the instance number when it was deployed.
    •  ...
       private final static AtomicInteger count = new AtomicInteger(0); 
      ...
      public void ejbCreate() {
        SimpleDateFormat sdftime = new SimpleDateFormat("HH:mm:ss.SSS"); 
        Timestamp now = new Timestamp(System.currentTimeMillis());
        instance = count.addAndGet(1); 
        System.out.println(sdftime.format(now) +":" + this.getClass().getSimpleName()+":EJBCreate:"+instance);
      }
      public void ejbRemove()
      {
        System.out.println(this.getClass().getSimpleName()+":EJBRemove:"+instance 
                           +" messages processed "+messageCount);
        count.decrementAndGet();
      }

This gave me a message which told me when the instance was created, so I could see when it was started.   I could then see more instances created as the workload increased.

07:16:50.520:IVTMDB:EJBCreate:0

  • By using a client connection, I could specify the appltag for the connection pool and so see the number of MQCONNs from the application connectionFactory.

What happens if I get the numbers wrong?

  1. If the input queue is slow to process messages, or the depth is often too high, you may have a problem.
  2. If ejb-jar.xml maxPoolDepth is too small, this will limit the number of messages you can process concurrently.
  3. The weblogic max-beans-in-free-pool is too small. If the all the beans in the pool(array) are busy, consider making the pool bigger.   The requests queue in the listeners waiting for a free MDB instance.   However the JMX data has fields with names like “Wait count”.   In my tests these were always zero, so I think these fields are of no value.
  4. The number of connections in the connectionFactory is too small.  If the number of requests exceeded the pool size the MDB instance got an exception.  MQJCA1011: Failed to allocate a JMS connection.  You need to change the resource adapter definition Max Capacity for the connectionFactory pool size.
  5. If you find you have many MQDISC and MQCONNs happening at the same instance, consider increasing the initial size of the connectionFactory pool.
  6. Make the initial values suitable for your average workload.  This will prevent  the periodic destroy and recreate of the connections and beans.

 

You may want to have more than one weblogic server for availability and scalability.

You could also deploy the same application with a different MDB name, so if you want to stop and restart an MDB, you have another MDB processing messages.

Are all your jms messages persistent?

While debugging my application to see why it was so slow, I found from the MQ activity trace that my replies were all persistent.

The first problem was that by default all jms messages are persistent, so I used

int deliveryMode = message.getJMSDeliveryMode();

to get the persistence of the input message,

and used the obvious code to set the JMSDeliveryMode,

TextMessage response = session.createTextMessage("my reply");
response.setJMSDeliveryMode(deliveryMode);

to set it the same as the input message.  I reran my test and the reply was still persistent.

Eventually I found you need

producer = session.createProducer(dest);
producer.setDeliveryMode(deliveryMode);

And this worked!  It is all explained here.

How do I check?

You can either check your code (bearing in mind that this may be hidden by the productivity tools you use ( Swing, Camel etc)), or turn on activity trace for a couple of seconds to check.

What do I need to make my business applications resilient?

In the same way that a three legged seat needs three strong legs,  a business transaction has three legs.  The business transactions needs
  1. An architecture and infrastructure which can provide the resilence
  2. Application which are well designed, well coded and robust
  3. Operations that can detect problems and automatically take actions to remedy the problems.
If any one is weak, the whole business transaction is not resilient.
For the infrastructure perspective, the question of needing MQ shared queue, MQ midrange or appliance comes down to the requirements of the business and the management of risk.
For your business application you need to understand the impact to your business.  If the application was not available for
  • 1 second
  • 1 minute
  • 1 hour
The cost could be your reputation, rules and regulations of your industry, and financial cost.  For example an outage may cost you 1 million dollars a minute in fines and compensation.  Your reputation could suffer if many people reported problems on twitter if your service is not available.

Overview of availability options.

  1. Queue sharing groups on z/OS give the highest level of availability, with the highest upfront cost (preventing an outage might be worth that cost, and more and more businesses are using QSGs now)
  2. The data replication features in the appliance and replicated data queue managers (RDQM) are the best ways to achieve high availability of queue managers on distributed. See RDQM for HA, and RDQM for Disaster Recovery.
  3. Multi-instance queue managers, where you have an active and a standby queue manager, and clusters can be useful too.
The applications need to be written to be reliable and resilient, so as to:
  1. Not cause an outage, and use MQ (and other software in the stack) as efficiently as possible.  Many “outages” are cause by badly written applications
  2. Deal well with a problem if one occurs.
  3. Make it easy to diagnose any problems that occur
You need to automate your operations so errors are quickly picked up and actioned.
What availability do your business applications need?
You need to be able to handle planned outages.  These may occur once a week.  You stop work going one route, and so it flows via a different route.  Once “all the pipes are empty” you can perform shut down.  This should be transparent to the applications.
You need to be able to handle unplanned outages where messages may be in flight in the queue manager and network.  These may occur once a year.  If there is a problem, messages in flight could be stuck on a queue manager until the queue manager is restarted.  Once a problem is detected, new messages should be able to flow via an alternative route.  In this case a few seconds, or minutes worth of messages could be unavailable.
You can use clustering to automatically route traffic over available channels while a problem in one queue manager is being resolved.
Do you have a requirement for serialized transactions where the order of execution must be maintained?  For example trading stocks and shares.  The price of the second request depends on the trade of the first request.   If so, this means you can only have one back end server, no parallelism, and one route to the back end.  This does not provide a robust solution.
How smart are your applications?
If your application gets no reply within 1 second, the application could try resending the request, and it may take a different route through the network, and succeed.  For inquiry transactions, a duplicate request should have little impact.  For an update requests, the applications need logic to handle a possible duplicate request, where it detects the request has already been processed, and a negative response is sent back.
The business application may need a program to clear up possible unprocessed, duplicate,  responses and take compensating action.
Having smart applications which are resilient means the infrastructure does not need to be so smart.
Operational maturity
For the best reliability and availability you need a mature operations environment.
The infrastructure is usually reliable.  “outages” usually occur because of human intervention,  a change, or a bad application.  For example an application can continually try a failing connection, and fill up the MQ error logs.
Examples of operational maturity include
  1. Do not make a change to two critical systems at once, have a day between changes.
  2. Make sure every change has a back-out procedure which has been tested.
  3. You monitor the systems, so you can quickly tell if there is abnormal behavior.
It can take several minutes to detect a problem, shut down, and restart a queue manager (perhaps in a different place).
If you have 100 linux servers to support, it takes a lot of work to make changes on all of these servers (from making a configuration change to applying fixes).  It may be less work on z/OS.
You need to make sure that the infrastructure has sufficient capacity, and a queue manager is not short of CPU, nor has long disk response time.
Below are several configurations and configurations:
Shared queue across multiple machines, across sites
A message in a Queue Sharing Group can be processed by any queue manager in a QSG,  providing high availability.
Good for business transactions where
  1. You cannot have messages “paused” for minutes while a server is restart.
  2. You can tolerate a “pause” a few seconds if one QM in the QSG goes down, and the channel restarts to a different queue manager in the QSG.
  3. Your applications are not smart.
  4. There is a need for serialized message processing.
  5. The cost of an outage would cover the cost of running z/OS.
Multiple mid-range machines configured across multiple machines across sites (RDQM).  Use of MQ appliance
For business transactions where
  1. Messages can be spread across any of the servers to provide scalability and availability.
  2. If you have a requirement for short response time, you need smart applications which can retry sending the message and handle duplicate requests and responses.
  3. If you can tolerate waiting for in-flight message whilst a queue manager is restarted, the applications do not need to be so smart.

These mid-range systems, can take a minute or so to restart after an outage.

RDQM queue managers are generally better than Multi Instance queue managers. See the performance report here.

Single server
This is a single point of failure, and not suitable for production work.
Your enterprise may have combinations of the above patterns.

You need to consider each business application and evaluate the risk.

For example

  • My applications are not smart. They are running on mid-range with 2 servers.  If I  had an unplanned outage which lasted for 5 minutes then with my typical message volumes, this means I could have 6000 requests stuck until queue manager was restarted.  My management would not be happy with this.
  • If I had an outage on these two servers…  ahh that would be a problem.  I need more servers.

Many thanks to Gwydion of IBM for his comments and suggestions.

Configuring your WebSphere Liberty MDB properly

I found the documentation on how to use and monitor an MDB in a WebSphere Liberty web server environment was not very good.  Some of the documentation is wrong, and some is missing.

I’ll document “how I found it worked”,  in another post I’ll document what the Liberty statistics mean, and how they connects to the configuration.

 

The application

The application is a simple Message Driven Bean.  When this is deployed you specify the queue manager, and which queue the listener task should get messages from.

There are many “moving parts” that need to have matching configuration.  I’ll try to show which bits must match up.

The application deployment

  1. The java IVTMDB.java program has
    1. onMessage(Message message){..} This method is given the message to process.
    2. ConnectionFactory cf = (ConnectionFactory)ctx.lookup(“CF3”); Where CF3 is defined below
  2. Within the WMQ_IVT_MDB.jar
    1. META-INF/ejb-jar.xml has
      1. <ejb-name>WMQ_IVT_MDB_EJBNAME</ejb-name>.  This name is used in the Liberty server.xml file.
      2. <ejb-class>ejbs.IVTMDB</ejb-class>. With a ‘.’ in the name.  Within the jar file is ejbs/IVTMDB.class.   This is the java program that gets executed.  If you specify “ejbs/IVTMDB” you get a java exception IllegalName: ejbs/IVTMDB.
      3. <method><ejb-name>WMQ_IVT_MDB</ejb-name> <method-name>onMessage</method-name> This is the method within the java program which gets the message.  The program has public void onMessage(Message message)
    2. META-INF/MANIFEST.MF This is usually generated automatically
    3. ejbs/IVTMDB.class the actual class file to be used.  This is what was described in the <ejb-class> above.
    4. Other files which may add configuration information for specific web servers.
  3. Within the CCP.ear file
    1. The WMQ_IVT_MDB.jar file described above
    2. META-INF/MANIFEST.MF.   This gets created if one does not exist.
  4. The .ear file is copied to ~/wlp/usr/servers/test/dropins/

The server.xml file for the Liberty instance has

<jmsActivationSpec id="CCP/WMQ_IVT_MDB/WMQ_IVT_MDB_EJBNAME">
  <properties.wmqJms
    destinationRef="AAAA"
    transportType="CLIENT"
    channel="SYSTEM.DEF.SVRCONN"
    hostName="127.0.0.1" 
    port="1414"
    clientID="MDBClientID" 
    applicationName="CCPMDB"
    maxPoolDepth="50"
    poolTimeout="5000ms" 
    queueManager="QMA"/>
  <authData id="auth1" user="colinpaice" password="ret1red"/>
</jmsActivationSpec>
<jmsQueue id="AAAA" jndiName="IVTQueue">
  <properties.wmqJms baseQueueName="IVTQueue"/>
</jmsQueue>

<jmsConnectionFactory jndiName="CF3" id="CF3ID">
  <connectionManager maxPoolSize="6" connectionTimeout="7s"/> 
  <properties.wmqJms queueManager="QMA"
      transportType="BINDINGS"
      applicationName="Hello"/>
</jmsConnectionFactory>

 

Where

  • <jmsActivationSpec> defines the application to the web server.  See  here  for the definition of the content of the jmsActivationSpec.
    • id is composed of
      • CCP is the name of the .ear file
      • WMQ_IVT_MDB is the name of the .jar file
      •  WMQ_IVT_MDB_EJBNAME is the name in the <ejb-name> within the ejb-jar.xml file.
    • The destinationRef=”AAAA” connects the jmsActiviationSpec to the queue name IVTQueue, see jmsQueue below.
    • transportType, channel, hostName, port define how the program connects to the queue manager.  The other choice is transportType=”BINDINGS”.
    • clientID I could not see where this is used.
    • applicationName is only used when transportType=CLIENT.  If you use runmqsc to display the connection, it will have this name if a client connection is used.
    • maxPoolDepth this is the number of instances of your program. If you use runmqsc DIS QSTATUS() the number of IPPROCS can be up to the maxPoolDepth+1.
    • poolTimeout  see below.
    • queueManager is used when transportType=”BINDINGS”.
    • <authdata…> is the userid to be used.
  • </jmsActivation> is the end of the definition.
  • <jmsQueue..> defines a queue.
    • id=…  matches the jmsActivationSpec destinationRef= entry above.
    • jndiName the specified value can be used in an application to look up the queue name.
  • </jmsQueue> defines the end of the queue definition
  • <jmsConnectionFactory.. > defines how the program connects to the queue manager
    • jndiName=”CF3″.  The application issued ConnectionFactory cf = (ConnectionFactory)ctx.lookup(“CF3”) which does a jndi lookup of CF3
    • <connectionManager>defines the  connection properties
      • maxPoolSize=”6″  This means that at most 6 of (onMessage) application instances can get a connection.  If there are 10 instances running –  6 can get a connection and run, 4 will have to wait.
      • connectionTimeout=”7s”  This is meant to say the pool can be shrunk if connections are not used, and not used for 7 seconds.   This allows connections to be freed up.

 

 

How do I configure the numbers?

With the definition <jmsActivationSpec … <properties.wmqJms  maxPoolDepth=”50″… then up to 50 threads can have the queue open, and be getting messages.  Each listener which has got a message will pass the message to the onMessage() method of your application.  Typically the application connects to the queue manager and puts a reply back to the originator.

This means that the connection pool used by the application (CF3 in my case) needs at least a maxPoolDepth connections as the number as the listeners jmsActivationSpec.maxPoolDepth.  The application will wait if there are no connections available.   Liberty provides some basic statistics on the number of connections used, and the number of requests that had to wait.

If you have more than one application using the connection pool, then you need to size the pool for all the potential applications.

I could not find any Liberty statistics as to the number of instances with the input queue open, so you will need to issue the runmqsc DIS QSTATUS(..) and display the number of IPPROCS.

You can change the server.xml configuration to change the connection properties (such as making the maxPoolDepth larger).   This causes all existing instances to stop, and restart, which, in a busy system can cause a short blip in your throughtput.

When connections are not used for a period, they can be freed.  See Using JMS connection pooling with WebSphereApplication Server and WebSphere MQ, Part 1

and Part2.

Unused connections move from the connectionPool to an mqjms holding pool.  Periodically this pool is purged.  After running a workload, I could see from the application trace that some MQDISCs were done 3 minutes + afterwards.

Tuning the inbound “connection pool”.

For the jmsActivationSpec there is no connectionPool as such. There is an internal inbound connectionPool for all MDB listeners.  The maxPoolDepth limits how many connections can be used by the listeners. Every 300 seconds a task wakes up and checks all the “inbound” connections.  If it has not been used for the poolTimeOut duration, then the connection is release.

If you specify a poolTimeOut of 1 second, then the connections could be release after 1 to 301 seconds.  This behaviour means that when the task wakes up, you may have many connections released (MQDISC).  You may want to set the poolTimeOut to 300 seconds so some connections are released when the task runs, and the remainder are released the next time the task runs, to spread the load.

If the poolTimeOut is too small you may get a lot of MQCONN, MQDISC activity.  By using a longer value of poolTimeOut you may avoid this behaviour, so the listeners connect at the start of the day, stay connected most of the day, and disconnect at the end of the day.

You can use maxPoolDepth to throttle the work being processed.  If the number is too small, work will be delayed.  If the number is too large, you may get a spike in activity.  If you use DIS QSTATUS(‘queuename’) you will see the number of threads with the queue open for input and the current depth.  Vary the maxPoolDepth till you get the best balance.