Do not do unnatural things with clustering.

I’ll cover an interesting clustering scenario, and discuss how it could be improved, but first I’d like to mention my grandfather’s axe. I still have it. My father replaced the head, and I replaced the handle – but it is still my grandfather’s axe.

I was looking at a customer’s configuration, and was told “this is the original architecture”. Except they replaced this part with a cluster, and they restructured those applications to be in a different cluster, but it is still their original configuration, and of course the picture is ten times the size from when they started with MQ.

undefined


The simplified picture has a blue cluster and a yellow cluster, and the full repository acts for both clusters.

An application attached to QMA to send a message to QMB, using a clustered Queue Remote defined in the full repository(FR). This mapped to a clustered queue in QMB. So for the MQPUT, the message flowed to the full repository, and was put to a clustered queue, and the message was sent to QMB where it was processed.

This is not efficient as you get double puts and gets, and more opportunities for breakages. Yes, it is using clustering, but it is not a natural use of clustering.

It would make much more sense to put QMA and QMB in the same cluster and save a lot of CPU. This would also avoid a mess when trying to sort it out.

We had a discussion about the architecture and if we could change it. The original architect retired 10 years ago, and the chart(singular) describing the architecture and the ideas behind it, was lost when a laptop was returned and the hard drive was reformatted.

Quick summary of channels used in clustering

In a cluster there are three types of cluster channels

  1. The cluster receiver – this is defined for a queue manager to provide a template for other queue managers to connect to it.
  2. The cluster sender – which connects to the full repository. You do not need to connect to all the full repositories as the definitions for the other full repositories will flow down.
  3. Automatically defined channels between two queue managers. For queue manager QMA to create a channel to QMB, it uses the cluster receiver channel defined on QMB and sent to the full repository.

Is there any advantages in having the existing configuration?

I cannot think of a very good reason for this, I can think of reasons for which this strange configuration is valid – but they still feel wrong!

  1. Before clustering some people had bad experiences of connecting a queue manager to all other queue managers, and the nightmare of managing these connections. Clustering solved the definitional problem. You have only to define two channels per queue manager, not hundreds or thousands. When clustering is used, channels between queue managers will be created dynamically and started as needed. You may get hundreds to channels started, but you do not have to define them. With the overlapping clusters in the picture, you limit the number of channels being started, and force a “hub and spoke” rather than the direct link you get with clustering. With a good automation package, the you should be able to automate the management of the channels, and collect performance data etc.
  2. Number of connections. If you have a large MQ estate, for example 100 queue managers at the back end. You may more than 100 cluster channels active. This should not a problem, you may just have to configure your queue managers to handle more connections. (If there were 10,000 connections we would have a different discussion).
  3. Capacity. QMA and QMB may not have the capacity to store a large number of message, so using the full repository with space for deep queues may be a solution. (But remember a good queue is an almost empty queue).
  4. Security. By having a channel exit on the full repository, you can check the data and authorization. If the control data is on the full repository system, it may be hard to put the exits on the other queue manager. I think you should review the architecture, and look at caching security data on the queue manager machines.
  5. Message logging. This could be duplicating a message, or updating a database with message content. It feels the architecture is wrong. I think a better architecture would be to do two puts in the original application, or an MQPUT and a remote DB2 insert. – but this could affect performance.

How do we fix this?

In principle you just move QMB into the blue cluster, and just remove the QREMOTE definitions from the full repository.

The word that jumps out at me is “just”.

You can change the channel and queue on QMB to use a namelist of both clusters. That is easy, it is the next steps that could cause a hiccup.

With asynchronous processing, events can happen at different times. You define a queue over here, and delete a queue from over there, and on a queue manger far, far away these operations get done in the reverse order.

Let the clustered remote queue on the full repository is called SERVER_on_FR, which points to the queue SERVER_on_QMB, a clustered queue on QMB.
The application attached to QMA does MQOPEN to SERVER_on_FR, and due to the magic of clustering it all works as expected, a message arrives on the SERVER_on_QMB queue.

If you define a clustered QR(SERVER_on_FR) on QMB, pointing to SERVER_on_QMB. There will now be two queues called SERVER_on_FR in the cluster. Both queues may be used, depending on the configuration.

You cannot just delete the QR definition SERVER_on_FR on the FR as there may be messages on cluster transmit queues heading for this queue, and some queue managers may not have seen the updates about the new queue definition. Receiver channels on FR may try putting to the queue to find it gone. (If you get confused, as I did, try reading the section again)

You need to alter the queue on FR to make it cluster(), that is, remove it from all clusters. Over time (minutes to days) this will propagate to all queue managers, and so queue managers will not use it. Message in the cluster transmit queue should all have been processed.

After a suitable interval you can then delete the QR from the FR system.

Your troubles are not over, as now you have a queue called “SERVER_on_FR” on other queue managers than FR. On QMA you could create a QR called “SERVER_on_FR” which points to SERVER_on_QMB, or (better) change the application to use queue SERVER_on_QMB, or even better just use queue name SERVER! but there is a good chance you’ve lost the source for this application.

If you now scale this up to an enterprise you see what a mess this now is.

As a result of doing unnatural things with clustering, you have extra puts and gets, indirect channels, and a mess of queue names – it is much easier to “Keep It Simple Stupid”, and let clustering do what it was designed to do.

2 thoughts on “Do not do unnatural things with clustering.

  1. Hi Colin. There is a case for this MQ cluster setup, and we use it. QMA and QMB (and other qmgrs in the clusters) are in different network domains in our internal organization. The servers are physically separated and firewalled, for security and application containment reasons. The FR is on a dedicated server that has appropriate firewall rules to allow MQ channels to start to/from QMA and QMB. The FR has qalias objects that allow messages to cross over between the clusters, without the source knowing the destination qmgr or ultimate destination queue name. The FR doesn’t need to do the cross over itself, it could be done by another qmgr on the server that is in both clusters. Cheers, Glenn

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s