I came upon this question when looking into the new Uniform Clustering support in V9.1.2.
5 years ago, a common pattern was to have a machine, containing a front end web server, MQ, and back end servers (in bindings mode), processing the requests, going to a remote database. For this to do more work, you increase the number of servers, and perhaps add more CPUs to the machine.
These days you have MQ in its own (virtual) machine, and the front end web server in its own (virtual) machine connected to MQ over a client interface, with the server application in its own (virtual) machine connected to MQ over a client interface, and going to a remote database.
To scale this, you add more MQ machines, or more servers machines. In my view this solves some administration problems, but introduces more problems – but this is not today’s discussion.
Given this modern configuration, how do you start enough servers to manage the workload?
Consider the scenario where you have MACHINEMQ with the queue manager on it, MACHINEA and MACHINEB with the server applications on it.
Having “smarts in the application”
- You want enough servers running, but not too many. (Too many can flood the downstream processes, for example cause contention in a database. Using MQ as a throttle can sometimes improve overall throughput).
- If a server thread is not doing any work, then shut it down
- If there is a backlog then start more instances of the server threads.
In the server application you might have logic like
MQINQ curdepth, ipprocs.
If( curdepth > X & number of processes and number of processes with queue open for input(ipprocs) < Y then
If get_wait timed out and IPPROCS > 2 then return and free up the session.
For CICS on z/OS, it was easy; do_something was “EXEC CICS START TRAN…”
When running on Unix the “do_something” is a bit harder.
My first thoughts were…
It is not easy to create new processes to run more work.
- You can use spawn to do this – not very easy or elegant.
- I next thought the application instances could create a trigger message and so a trigger monitor could run and start more processes. This means
- Unless you are really clever, the trigger monitor starts a process on its local machine. So running a trigger monitor on MACHINEA, would create more processes on MACHINEA.
- This means you need a trigger monitor on MACHINEA and MACHINEB.
- If you put a trigger message, the message may always go to MACHINEA, always go to MACHINEB, or go to either. This may not help if one machine is overloaded and gets all of the trigger messages.
- I thought you could have one process and lots of threads. I played with this, and found out enough to write another blog post. It was difficult to increase the number of threads dynamically. I found it easiest to pass in a value for the number of threads to the application, and not try to dynamically change the number of threads.
- The best “do_something” was to produce an event or alert and have automation start the applications. Automation should have access to other information, so you can have rules such as “Pick MACHINEA or MACHINEB which has the lowest CPU usage over the last 5 minutes – and start the application there”
And to make it more complex.
Today’s scenario is to have multiple queue manager machines, for availability and scalability, so now you have to worry about which queue manager you need to connect to, as well as processing the messages on the queue,
MQ 9.1.2 introduced Uniform Clustering which balances the number of client channel connections across queue manager servers, and can, under the covers, tell an application to connect to a different queue manager.
This should make the balancing simpler. Assuming the queue managers are doing equal amounts of work, you should get workload balancing.
Notes on setting up your server.
You need to be careful to define you CCDT with CLNTWGHT. If CLNTWGHT is 0, then the first available queue manager in the list is used, so all your connects would go to that queue manager. By making all CLNTWGHT > 0, you can bias which queue manager gets selected.
Thanks to Morag for her help in developing this article.