Are your client connections not configured for optimum high availability?.

I would expect the answer for most people is – no, they are not configured for optimum high availability.

In researching my previous blog post on which queue manager to connect to, I found that the the default options for CLNTWGHT and AFFINITY may not be the best. They were set up to provide consistency from a previous release. The documentation was missing words “once you have migrated then consider changing these options”. As they are hard to understand, I expect most people have not changed the options.

The defaults are

  • CLNTWGHT(0)
  • AFFINITY(PREFERRED)

I did some testing and found some bits were good, predicable, and gave me High Availability other bits did not.

My recommendations for high availability and consistency are the complete opposite of the defaults:

  • use CLNTWGHT values > 0, with a value which would give you the appropriate load balancing
  • use AFFINITY(NONE)

There are several combination of settings

  • all clients use AFFINITY(NONE) – was reliable
    • CLNTWGHT > 0 this was reliable, and gave good load balancing
    • CLNTWGHT being >= 0 was reliable and did not give good load balancing
  • all clients use AFFINITY(PREFERRED) – was consistent, and not behave as I read the documentation
  • a mixture of clients with AFFINITY PREFERRED and NONE. This gave me weird, inconsistent behavior.

So as I said above my recommendations for high availability are

  • use CLNTWGHT values > 0, and with a value which would give you the appropriate load balancing.
  • use AFFINITY(NONE).

My set up

I had three queue managers set up on my machine QMA,QMB,QMC.
I used channels
QMACLIENT for queue manager QMA,
QMBCLIENT for queue manager QMB,
QMCCLIENT for queue manager QMC.

The channels all had QMNAME(GROUPX)

A CCDT was used

A batch C program does (MQCONN to QMNAME *GROUPX, MQINQ for queue manager name, MQDISC) repeatedly.
After 100 iterations it prints out how many times each queue manager was used.

AFFINITY(NONE) and clntwght > 0 for all channels

  • QMACLIENT CLNTWGHT(50), chosen 50 % on average
  • QMBCLIENT CLNTWGHT(20), chosen 20 % on average
  • QMCCLIENT CLNYWGHT(30), chosen 30 % on average.

On average the number of times a queue manager was used, was the same as channel_weight/sum(weights).
For QMACLIENT this was 50 /(50+20+30) = 50 / 100 = 50%. This matches the chosen 50% of the time as seen above.
I shut down queue manager QMC, and reran the test and got

  • QMACLIENT CLNTWGHT(50), chosen 71 % on average
  • QMBCLIENT CLNTWGHT(20), chosen 28 % on average
  • QMCCLIENT CLNYWGHT(30) not selected.
    For QMACLIENT the weighting is 50/ (50 + 20) = 71%. So this works as expected.

AFFINITY(NONE) for all queue manager and clntwght >= 0

The documentation in the knowledge centre says any channels with CLNTWGHT=0 are considered first, and they are processed in alphabetical order. If none of these channel is available then the channel is select as in the CLNTWGHT(>0) case above.

  • QMACLIENT CLNTWGHT(50) not chosen
  • QMBCLIENT CLNTWGHT(0) % times chosen 100%
  • QMCCLIENT CLNYWGHT(30) not chosen

This shows that the CLNTWGHT(0) was the only one selected.
When CLNTWGHT for QMACLIENT was set to 0, (so both QMACLIENT and QMBCLIENT had CLNTWGHT(0) ), all the connections went to QMA – as expected, because of the alphabetical order.

If QMA was shut down, all the connections went to QMB. Again expected behavior.

With

  • QMACLIENT CLNTWGHT(0)
  • QMBCLIENT CLNTWGHT(20)
  • QMCCLIENT CLNYWGHT(30)

and QMA shut down, the connections were in the ratio of 20:30 as expected.

Summary: If you want all connections (from all machines) to go the same queue manager, then you can do this by setting CLNTWGHT to 0.

I do not think this is a good idea, and suggest that all CLNTWGHT values > 0 to give workload balancing.

Using AFFINITY(PREFERRED)

The documentation for AFFINITY(PREFERRED) is not clear.
For AFFINITY(NONE) it takes the list of clients with CLNTWGHT(0), sorts the list by channel name, and then goes through this list till it can successfully connect. If this fails, then it picks a channel at random depending on the clntwghts.

My interpretation of how PREFERRED works is

  • it builds a list of CLNTWGHT(0) sorted alphabetically,
  • then creates another list of the other channels selected at random with a bias of the CLNTWGHT and keeps that list for the duration of the program (or until the CCDT is changed).
  • Any threads within the process will use the same list.
  • For an application doing MQCONN, MQDISC and MQCONN it will access the same list.
  • With the client channels defined above, for different machines, or different applications instances you may get a list when CLNTWGHT >0 .

For example on different machines, or different application instances the lists may be:

  • QMACLIENT, QMBCLIENT, QMCCLIENT
  • QMBCLIENT, QMACLIENT, QMCCLIENT
  • QMACLIENT, QMBCLIENT, QMCCLIENT (same as the first one)
  • QMCCLIENT,QMACLIENT,QMCCLIENT

I’ll ignore the CLNTWGHT(0) as these would be at the front of the list in alphabetical order.

With

  • QMACLIENT CLNTWGHT(50) AFFINITY(PREFERRED)
  • QMBCLIENT CLNTWGHT(20) AFFINITY(PREFERRED)
  • QMCCLIENT CLNYWGHT(30) AFFINITY(PREFERRED)

According to the documentation, if I run my program I would expect 100% of the connections to one queue manager. This is what happened.

If I ran the job many times, I would expect the queue managers to be selected according to the CLNTWGHT.

I ran my program 10 times and in different terminal windows., and each time QMC got 100% of the connections. This was not what was expected!

I changed the QMBCLIENT CLNTWGHT from 20 to 10 and reran my program, and now all of my connections went to QMA!

With the QMBCLIENT CLNTWGHT 18 all the connections went to QMA, with QMBCLIENT CLNTWGHT 19 all the connections went to QMC.

This was totally unexpected behavior and not consistent with the documentation.

I would not use AFFINITY(PREFERRED) because it is unreliable and unpredictable. If you want to connect to the same queue manager specify the channel name in the MQCD and use mqcno.Options = MQCNO_USE_CD_SELECTION.

Having a mix of AFFINITY PREFERRED and NONE

With

  • QMACLIENT CLNTWGHT(50) AFFINITY(PREFERRED)
  • QMBCLIENT CLNTWGHT(20) AFFINITY(NONE)
  • QMCCLIENT CLNTWGHT(30) AFFINITY(NONE)

All of the connections went to QMA.

With

  • QMACLIENT CLNTWGHT(50) AFFINITY(NONE)
  • QMBCLIENT CLNTWGHT(20) AFFINITY(PREFERRED)
  • QMCCLIENT CLNTWGHT(30) AFFINITY(NONE)

there was a spread of connections as if the PREFERRED was ignored.

When I tried to validate the results – I got different results. (It may be something to do with the first or last object altered or defined).

Summary: Having a mix of AFFINITY with values NONE and PREFERRED, it is hard to be able to predict what will happen, so this situation should be avoided.

How do I know which queue manager to connect to ?

Question: How difficult can it be to decide which queue manager to connect to?

Answer: For the easy, it is easy, for the hard it is hard.
I would not be surprised to find that in many applications the MQCONN(x) are not coded properly!


That is a typical question and answer from me – but let me go into more detail so you understand what I am talking about.
If you have only one queue manager then it is easy to know which queue manager to connect to – it is the one and only queue manager.
If you have more than one – it gets more complex. If your application has just committed a funds transfer request and the connection is broken

  • you may just decide to connect to any available queue manager, and ignore a possibly partial funds transfer request
  • or you might wait for a period trying to connect to the same queue manager, and then give up, and connect to another, and later worry about any orphaned message on the queue.

You now see why the easy scenario is easy, and for the hard one, you need to do some hard thinking and some programming to get the optimum response.

There is an additional complexity that when you connect to the same instance – it may be a highly available queue manager, and it may have restarted somewhere else. For the purposes of this blog post I’ll ignore this, and treat it as the same logical queue manager.

I had lots of help from Morag who helped me understand this topic, and gave me the sample code.

You have only one queue manager.

This is easy, you issue an MQCONN for the queue manager. If the connect is not successful, the program waits for a while and then retries. See – I said it was easy.

You have more than one queue manager, and getting a reply back is not important.

For example, you are using non persistent messages.

Your application can decide which queue manager it tries to connect to, or you can exploit queue manager groups in the CCDT.

On queue manager QMA you can define a client channels for it and also for queue manager QMB

DEF CHL(QMA) CHLTYPE(CLNTCONN) QMNAME(GROUPX) 
CONNAME(LINUXA) CLNTWGHT(50)…
DEF CHL(QMB) CHLTYPE(CLNTCONN) QMNAME(GROUPX)
CONNAME(LINUXB) CLNTWGHT(50)…

DEF CHL(QMA) CHLTYPE(SVRCONN) ...

On Unix these are automatically put into the /var/mqm/qmgrs/../@ipcc/AMQCLCHL.TAB file. This is a binary file, and can be FTPed to the client machines that need it.

You can use the environment variables MQCHLLIB to specify the directory where the table is located, and MQCHLTAB to specify the file name of the table (it defaults to AMQCLCHL). See here for more information.

FTP the files in binary to your client machine, for example into ~/mq/.
I did
export MQCHLLIB=/home/colinpaice/mq
export MQCHLTAB=AMQCLCHL.TAB

I then used the command
SET |grep MQ
to make sure those variables are set, and did not have MQSERVER set.

Sample MQCONN code (from Morag)….

MQLONG  CompCode, Reason;
MQHCONN hConn = MQHC_UNUSABLE_HCONN;
char * QMName = "*GROUPX";
MQCONN(QMName,
&hConn,
&CompCode,
&Reason);
// and MQ will pick one of the two entries in the CCDT.

The application connected with queue manager name *GROUPX . Under the covers the MQ code found the channel connections with QMNAME of GROUPX and picked one to use. The “*” says do not check the name of the queue manager when you actually do the connect. If you omit the “*” you will get return code MQRC_Q_MGR_NAME_ERROR 2058 (080A in hex) because “GROUPX” did not match the queue manager name of “QMA” or “QMB”. I stopped QMA, and reconnected the application, and it connected to QMB as expected.

Common user error:When I tried connecting with queue manager name QMA, this failed with MQRC_Q_MGR_NAME_ERROR because there were no channel definitions with QMNAME value QMA. This was obvious once I had taken a trace, looked at the trace, and had a cup of tea and a biscuit, and remembering I had fallen over this before. So this may be the first thing to check if you get this return code.

Using channels defined with the same QMNAME, if your connection breaks, you reconnect with the same queue manager name “*GROUPX” and you connect to a queue manager if there is one available. You can specify extra options to bias which one gets selected. See CLNTWGHT and AFFINITY. See the bottom of this blog entry.

You can use MQINQ to get back the name of the queue manager you are actually connected to (so you can put it in your error messages).

//   Open the queue manager object to find out its name 
od.ObjectType = MQOT_Q_MGR; // open the queue manager object
MQOPEN(Hcon, // connection handle
  &od, // object descriptor for queue
  MQOO_INQUIRE + // open it for inquire          
  MQOO_FAIL_IF_QUIESCING, // but not if MQM stopping      
  &Hobj, // returned object handle
  &OpenCode, // MQOPEN completion code
  &Reason); // reason code
// report reason, if any
if (Reason != MQRC_NONE)
{
printf("MQOPEN of qm object rc %d\n", Reason);
.....
}
// Now do the actual INQ
Selector = MQCA_Q_MGR_NAME;
MQINQ(Hcon, // connection handle
  Hobj, // object handle for q manager
  1, // inquire only one selector
&Selector, // the selector to inquire
0, // no integer attributes are needed
NULL, // so no integer buffer
  MQ_Q_MGR_NAME_LENGTH, // inquiring a q manager name
ActiveQMName, // the buffer for the name
&CompCode, // MQINQ completion code
&Reason); // reason code

printf("Queue manager in use %s\n",ActiveQMName);

You have more than one queue manager, and getting a reply back >is< important

Your application should have some logic to handle the case when your queue manager is running normally, there is a problem in the back end, and so you do not get your reply message within the expected time. Typical logic for when the MQGET times out is:

  • Produce an event saying “response not received”, to alert automation that there may be a problem somewhere in the back end
  • Produce an event saying “there is a piece of work that needs special processing – to manually redo or undo – update number…..”.
    • At a later time a program can get the orphaned message and resolve it.
    • You do not want an end user getting a message “The status of the funds transfer request to Colin Paice is … unknown” because the reply message is sitting unprocessed on the queue.
    • Note: putting a message to a queue may not be possible as the application may not be connected to a queue manager.

When deciding to connect to any available queue manager, or connect to a specific queue manager, there are two key options in mqcno.Options field:

  • MQCNO_CD_FOR_OUTPUT_ONLY. This means, do not use any data in the passed in MQCD <as the field description says – use it for output only>, but pick a valid and available channel from the CCDT, and return the details.
  • MQCNO_USE_CD_SELECTION. This means, use the information in the MQCD to connect to the queue manager

Sample code (from Morag) showing MQCONNX

MQLONG  CompCode, Reason;
MQHCONN hConn = MQHC_UNUSABLE_HCONN;
MQCNO cno = {MQCNO_DEFAULT};
MQCD cd = {MQCD_CLIENT_CONN_DEFAULT};
char * QMName = "*GROUPX";
cno.Version = MQCNO_VERSION_2;
cno.ClientConnPtr = &cd;
// Main connection - choose freely from the CCDT
cno.Options = MQCNO_CD_FOR_OUTPUT_ONLY;
MQCONNX(QMName,
&cno,
&hConn,
&CompCode,
&Reason);
: :
// Oops, I really need to go back to the same connection to continue.

MQDISC(...); // without this you get queue manager name error

// Using same MQCNO as earlier, it already has MQCD pointer set.

cno.Options = MQCNO_USE_CD_SELECTION;
MQCONNX(QMName,
&cno,
&hConn,
&CompCode,
&Reason);

Let me dig into a typical scenario to show the complexity

  • set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
  • MQCONN to any queue manager
  • MQPUT1 of a persistent message within syncpoint
  • set mqcno.Options = MQCNO_USE_CD_SELECTION, as you now want the application to connect to the same queue manager if there is a problem
  • MQCMIT. After this you want to connect to the specific queue manager you were using
  • MQGET with WAIT
  • MQCMIT
  • set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY. Because the application is not in a business unit of work it can connect to any queue manager.

The tricky bit is in the MQGET with WAIT. If your queue manager needs to be restarted you need to know how long this is likely to take. It may be 5 seconds, it may be 1 minute depending on the amount of work that needs to be recovered. (So make sure you know what this time is.)

Let’s say it typically takes 5 seconds between failure of the queue manager the application is connected to, and restart complete. You need some logic like

mqget with wait..
problem....
failure_time = time_now()
waitfor = 5 seconds
mqcno.Options = MQCNO_USE_CD_SELECTION
loop:
MQCONN to specific queue manager
If this worked goto MQCONN_worked_OK
try_time = time_now()
If try_time - failure_time > waitfor + 1 second goto problem;
sleep 1 second
go to loop:
MQCONN_worked_OK:
MQOPEN the reply to queue
Reissue the MQGET with wait

problem:
report problem to automation
report special_processing_needed .... msgid...
mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
Go to start of program and connect to any queue manager

If you thought the discussion above was complex, it gets worse!

I had a long think about where to put the set mqcno.Options = MQCNO_USE_CD_SELECTION.  My first thoughts were to put it after the first MQCMIT, but this may be wrong.

With the logic

MQCONN
MQPUT1
MQCMIT

If the MQCMIT fails, it could have failed going to the the queue manager so the commit request did not actually get to the queue manager, and the work was rolled back, or the commit could have worked, but the response did not get to your application.

The application should reconnect to the same queue manager, issue the MQGET WAIT. If the message arrives then the commit worked, if the MQGET times out, treat this as the MQGET WAIT timed out case (see above), and produce alerts. This is why I decided to put the set mqcno.Options = MQCNO_USE_CD_SELECTION before the commit. You could just as easily had logic which checks the return code of the MQCMIT and then set it.

A bit more detail on what is going on.

You can treat the MQCD as a black box object which you do not change, nor need to look into. I found it useful to see inside it. (So I could report problems with the channel name etc). The example below shows the fields displayed as an problem is introduced.

Before MQCONNX
set MQCNO_CD_FOR_OUTPUT_ONLY
pMQCD->ChannelName is '' - this is ignored
pMQCD->QMgrName is '' - this is ignored
QMName is '*GROUPX'. -this is needed
==MQCONNX had return code 0 0
pMQCD->ChannelName is 'QMBCLIENT'.
pMQCD->QMgrName is '';
QMName is '*GROUPX'.
MQINQ QMGR queue manager name gave QMB Sleep
during the sleep endmqm -i QMB and strmqm QMB
After sleep
MQOPEN of queue manager object ended with reason code MQ_CONNECTION_BROKEN = 2009.
Issue MQDISC, this ended with reason code 2009

set MQCNO_USE_CD_SELECTION
pMQCD->ChannelName is 'QMBCLIENT' - this is needed
pMQCD->QMgrName is ''
QMName is '*GROUPX'.
MQCONNX return code 0 0
MQINQ queue manager name is QMB

Anything else on clients?

It is good practice to periodically have the clients disconnect and reconnect to do work load balancing. For example

You have two queue managers QMA and QMB. On Monday morning between 0800 and 1000 QMA is shut down for essential maintenance. All the clients connect to QMB. QMA is restarted at 1000 – but does no work, because all the clients are all connected to QMB. If your clients disconnect and reconnect then over time some will connect to QMA.

It is a good idea to have a spread of times before they disconnect, so if 100 clients connected at 0900, they disconnect and reconnect between 10pm and 3am to avoid all 100 disconnecting and reconnecting at the same time.

To get the spread of connections to the various queue managers, you need to use CLNTWGHT with a non zero value. If you omit CLNTWGHT, or specify a value of 0, then the channel chosen is the first alphabetically, in my case they would all go to QMA, and not to QMB.

I feel there is enough material on this for another blog post.

Not for humans but for search engines

MQRC_EPH_ERROR 2420 (0974) (RC2420)

  • You have specified a channel in MQCONNX and this is not in the CCDT, so if you have a channel called QMACLIENT, and use use “QM” or “QM*” both will give MQRC_HOST_NOT_AVAILABLE.
  • You had a network problem, for example the application gets MQRC_CONNECTION_BROKEN. If the next MQ verb the application issues is MQCONN or MQCONNX this will fail with MQRC_HOST_NOT_AVAILABLE. You need to issue MQDISC, or retry the MQCONN(X) a second time.
  • You specified a connection address like 127.0.0.1:1414 when it was expecting 127.0.0.1(1414).

MQRC_UNKNOWN_OBJECT_QMGR: 2086 (0826) (RC2086) with a client application

This can be caused when using a client connection and specifying a queue manager name of the format “*name” (for availability) . The application takes this queue manager name, and uses it in the MQOD.
If the first character of the Queue Manager Name is “*” then MQINQ should be used to retrieve the actual queue manager name, or do not use the “*name”.

MQRC_NOT_AUTHORIZED: 2035 (07F3) (RC2035) with MQCONNX

Trying to use MQCONNX to connect to a queue manger. The info from the Knowledge centre and the AMQ message say a blank userid or password was given. I also found the following can cause the same return code

  • mqcno.SecurityParmsPtr = 0;
  • csp.CSPPasswordLength = 0;
  • sp.CSPUserIdLength = 0;
  • csp.CSPPasswordPtr= 0;
  • csp.CSPUserIdPtr = 0;
  • csp.AuthenticationType != MQCSP_AUTH_USER_ID_AND_PWD;

MQRC_ENVIRONMENT_ERROR: 2012 (07DC) (RC2012) with MQCONNX

Trying to use MQCONNX with MQCNO_RECONNECT_Q_MGR or MQCNO_RECONNECT;

  • Not using threaded application. My C program was built with -lmqic instead of -lmqic_r -lpthread
  • SHRCONV = 0 on the channel definitions

MQRC_Q_MGR_NAME_ERROR: 2058 (080A) (RC2058)

  • export MQCHLLIB not pointing to correct location
  • export MQCHLTAB pointing to the wrong name, or not set and AMQCLCHL.TAB not found in the location pointed to by MQCHLLIB
  • remember to update your .profile so this does not happen again
  • you are using a CCDT and passed in a QMNAME of XXXX, for all channels with QMNAME XXXX none could connect to the queue manager in the conname.
  • You think you were using a mqclient.ini file … but are now in a different directory
  • You are using the correct mqclient.ini file.  It has a ChannelDefinitionFile=… file.   This ccdt file is missing entries for the queue manager.  use the runmqsc command DIS CHL(*) where chltype(eq,svrconn) to display the valid channels on the server.
  • You tried to connect with the queue manager name, and need to connect to the QM group name.
  • You forgot the * in front of the queue manager name when using groups.

MQRC_KEY_REPOSITORY_ERROR: 2381 (094D) (RC2381)

  • MQSSLKEYR not set to the keystore path and file name
  • you specified …/key.kdb instead of /key without the .kdb
  • remember to update your .profile so this does not happen again

 

MQRC_OPTIONS_ERROR:2046 (07FE) (RC2046)

During MQCONNX: mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY + MQCNO_USE_CD_SELECTION;

Solved it using

  • mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY + MQCNO_USE_CD_SELECTION
  • or
  • mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
  • but not both

MQRC_CD_ERROR2277 (08E5) (RC2277)

I received message in the /var/mqm/error/*.LOG saying

AMQ9498E: The MQCD structure supplied was not valid.

EXPLANATION: The value of the ‘ChannelName’ field has the value ‘0’. This value is invalid for the operation requested.

This is only partially true. If you specify mqcno.Options=MQCNO_CD_FOR_OUTPUT_ONLY, this returns the name of the channel to you. In this case specifying a blank channel name is valid. If this options value is not specified, then a channel name is required.

AMQ9202E: Remote host not available, retry later.

EXPLANATION:
The attempt to allocate a conversation using TCP/IP to host ” for channel
QMZZZ was not successful. However the error may be a transitory one and it may be possible to successfully allocate a TCP/IP conversation later.

This is not strictly accurate.

In my MQCONNX I specified a channel name of QMZZZ which did not exist in the Client Channel Definition Table (CCDT).

  • Check the channel name in ClientConn.ChannelName
  • Specify mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY so it ignores what is in the channel, and picks one from the entries in the CCDT.

AMQ9498E: The MQCD structure supplied was not valid.

EXPLANATION:
The value of the ‘ChannelName’ field has the value ‘0’. This value is invalid for the operation requested.
ACTION:
Change the parameter and retry the operation.

  • I got this when I specified a blank (not ‘0’ ) in the ChannelName field. If I specified mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY I did not get this error message, as the specified channelname value is ignored. I fixed the problem by changing the MQCNO, not the MQCD

PCF: MQRCCF_MSG_LENGTH_ERROR: 3016 (0BC8) (RC3016)

I got this when using PCF and got my lengths mixed up, for example StrucLength was longer than the structure.

PCF: MQRCCF_CFST_PARM_ID_ERROR: 3015 (0BC7) (RC3015)

I got this when I issued INQUIRE_Q and passed in a channel namePCF:MQRC_UNEXPECTED_ERROR 2195 (0893) RC2195

I also got back section MQIACF_ERROR_IDENTIFIER (1013) with a value of 2031619. I cant find what this means.
My problem was I had specified an optional section – but not a required one.

PCF:MQRCCF_CFST_PARM_ID_ERROR 3015 (0BC7) RC3015

I got this when using MQCMD_INQUIRE_Q, and I had specified MQCACF_Q_NAMES instead of MQCACF_Q_NAME ( no ‘s’).

MQWEB on z/OS

SRVE0279E: Error occured while processing global listeners for the application com.ibm.mq.rest:
java.lang.NoClassDefFoundError: com.ibm.mq.mft.rest.v1.resource.MFTCommonResource (initialization failure)

SRVE0279E: Error occured while processing global listeners for the application com.ibm.mq.console: java.lang.NoClassDefFoundError: com.ibm.mq.ui.api.ras.RasDescriptor (initialization failure)

SRVE0321E: The [SecurityFilter] filter did not load during start up.
SRVE0321E: The [JSONFilter] filter did not load during start up.
SRVE0321E : The [MQConsoleSecurityFilter] filter did not load during start up.

I got this because the MQ JMS libraries had not been installed. I had /colin3/mq923/web, but was missing/colin3/mq923/java .

Liberty

CWPKI0024E: The certificate alias BPECC specified by the property com.ibm.ssl.keyStoreServerAlias is not found in KeyStore ://IZUSVR/KEY

The RACF command RACDCERT LISTRING(KEY ) ID(IZUSVR) <check the case>

gives

Certificate Label Name Cert Owner USAGE DEFAULT
-------------------------------- ------------ -------- -------
BPECC ID(START1) PERSONAL YES

So it is in the key store.

You need to check there is profile for the keyring, and as the requester needs access to the private key, has update access to it.

The userid issuing the command may not have access to the keyring. The private key was needed, so needs update access to the keyring.

RLIST rdatalib START1.KEY.LST authuser
RDEFINE RDATALIB IZUSVR.KEY.LST UACC(NONE) 
PERMIT IZUSVR.KEY.LST CLASS(RDATALIB) ID(IZUSVR) ACCESS(UPDATE)
SETROPTS RACLIST(RDATALIB) REFRESH 
SETROPTS RACLIST(DIGTCERT,DIGTRING ) refresh

Note: The SETROPTS RACLIST(DIGTCERT,DIGTRING ) refresh is not strictly needed but it is worth doing it incase there were updates to the certificates and the refresh command was not done.

Other options

  • The certificate was not in the keyring
  • It was NOTRUST
  • It had expired
  • The CA for the certificate was not in the keyring,

CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLHandshakeException: no cipher suites in common

This can be caused by

  • the requester not having access to the private key in the keyring.
  • no valid certificate in the ring.

Z/OSMF

ERROR   ] CWPKI0022E: SSL HANDSHAKE FAILURE:  … PKIX path building failed: com.ibm.security.cert.IBMCertPathBuilderException: unable to find valid certification path to requested target.

With message
The signer might need to be added to local trust store … , located in SSL configuration alias izuSSLConfig.  The extended error message from the SSL handshake exception is: PKIX path building failed: com.ibm.security.cert.IBMCertPathBuilderException: unable to find valid certification path to requested target.

Action: A client has sent a certificate and Liberty is trying to validate it

  1. The certificate from the client  is self signed and not in the keyring (or trust keyring if this is used)
  2. The CA or intermeditate CAs are not in the keyring
  3. The CA’s are in the keyring, but not trusted
  4. There are CAs with the same name, but not the same content in the keyring. Check dates and other attributes

It may be that the Server’s certificate is being used to validate, so check the certificate being used by z/OSMF or Liberty.

Firefox is getting Error code: SEC_ERROR_UNKNOWN_ISSUER

Check your certificates.   You need the CA and any intermediate CAs in the “Authorities” section of certificates.  They may need to be trusted.

They are not automatically imported when you import a certificate.

IZUG476E: The HTTP request to the secondary z/OSMF instance “S0W1” failed with error type “HttpConnectionFailed” and response code “0”

I got this when trying to submit a job in the workflow topic.   You should get some ffdcs generated.

I had

  • java.net.UnknownHostException: s0w1.dal-ebis.ihost.com 
  • WorkflowException: IZUWF9999E: The request cannot be completed because an error occurred.  The following error data is returned: “IZUG476E:The HTTP request to the secondary z/OSMF instance “S0W1” failed with error type “HttpConnectionFailed” and response code “0” .”

Ping s0w1.dal-ebis.ihost.com and nslookup s0w1.dal-ebis.ihost.com did not return any data.

I edited /etc/hosts/

10.1.1.2 S0W1.CANLAB.IBM.COM S0W1 
10.1.1.2 s0w1.dal-ebis.ihost.com

and tso ping s0w1.dal-ebis.ihost.com worked.

I had to restart z/OSMF for it to pick up the change.

Server reports Certificate errors – certificate_unknown

  • unable to find valid certification path to requested target
  • Rethrowing javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
  • certificate_unknown

This was caused by the trust store at the client end did not have the CA certificate for the certificate sent from the server.  It may have had it, but it may have expired.

You may also get sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target because the trust store did not have the CA certificate, or the certificate was not valid – for example not trusted, or expired.

java.security.cert.CertificateException: PKIXCertPathBuilderImpl could not build a valid CertPath.

Check in the trace and ffdc.  I got errors

FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertPathBuilderException: PKIXCertPathBuilderImpl could not build a valid CertPath.; internal cause is:
java.security.cert.CertPathValidatorException: The certificate issued by CN=TEMP4Certification Authorit2, OU=TEST, O=TEMP is not trusted; internal cause is: java.security.cert.CertPathValidatorException: Certificate chaining error
com.ibm.ws.ssl.core.WSX509TrustManager checkServerTrusted” 

CWPKI0022E: SSL HANDSHAKE FAILURE: A signer with SubjectDN (the cerificate used by the server)  was sent from the target host. The signer might need to be added to local trust store safkeyring://my/TRUST, located in SSL configuration alias defaultSSLSettings.

The extended error message from the SSL handshake exception is: PKIX path building failed: java.security.cert.CertPathBuilderException: PKIXCertPathBuilderImpl  could not build a valid CertPath.; internal cause is: java.security.cert.CertPathValidatorException: The certificate issued
by (my ca)  is not trusted; internal cause is:  java.security.cert.CertPathValidatorException: Certificate chaining error

 IZUWF9999E: The request cannot be completed because an error occurred. The following error data is returned:  “java.security.cert.CertificateException: PKIXCertPathBuilderImpl could not build a valid CertPath.”

Action: Add the CA for the server’s certificate to the trust store.   I had to restart z/OSMF to pick it up

CWPKI0033E: The keystore located at safkeyringhybrid://START1/KEY did not load because of the following error: Invalid keystore format

Change

location=”safkeyringhybrid://USERID/Keyring to location=”safkeyring://USERID/Keyring to

BPXF024I

You get this message if the syslogd program is not running.

BPXP015I HFS PROGRAM /usr/lpp/zosmf/lib/libIzuCommandJni.so IS NOT MARKED PROGRAM CONTROLLED.   BPXP014I ENVIRONMENT MUST BE CONTROLLED FOR DAEMON (BPX.DAEMON) PROCESSING.

I had the wrong SAF_PREFIX(‘IZUDFLT‘) in USER.Z24A.PARMLIB(IZUPRMCP).   IZUDFLT was correct.

I had other problems like invalid password when I logged onto the web browser.

Fix the problem and regenerate.

IZUG807E  An error occurred while attempting to load a required program library. Error: “require is not defined”

With an FFDC saying SRVE0190E: File not found: /IzuUICommon/1_5/zosmf/util/ui/resources/common.css

Action: close the browser and restart it

RACF certificates

IRRD109I The certificate cannot be added. Profile…. is already defined.

Action use RACDCERT LIST ID(…) to list all the certificate belonging to a user. Search for the CN value Due to a mistake, a certificate had been created using the label LABEL00000006.

I then used RACDCERT ID(START1) DELETE(LABEL(‘LABEL00000006’)) to delete it

IRRD140I The filter value does not begin with a valid prefix.

Ensure you are using upper case sod

IDNFILTER(‘CN=SSCA256.OU=CA.O=DOC.C=GB’)

instead of

IDNFILTER(‘cn=SSCA256.ou=CA.o=DOC.c=GB’)

TLS trace

java.security.cert.CertPathValidatorException: Could not determine revocation status

This is displayed when a self signed certificate is processed. It could be a self signed certificate, or the top of the hierarchy of a chain of signers.

Java java.security.NoSuchAlgorithmException: TLSv1.3 SSLContext not available

z/OS does not support TLS v1.3 yet, and this is thrown. It was announced in April 2020.

CWWKS1100A: Authentication did not succeed for user ID COLIN. An invalid user ID or password was specified.

Also check the stderr log

[ERROR ] CWWKS2907E: SAF Service IRRSIA00_CREATE did not succeed because user COLIN has insufficient authority to access APPL-ID IZUDFLT.

SAF return code 0x00000008. RACF return code 0x00000008. RACF reason code 0x00000020.

CONNECT user_id GROUP(group_id)

or


Permit IZUDFLT class(APPL) id(userid) Access(read)
setropts raclist(Appl) refresh

Java share classes.

JVMSHRC659E An error has occurred while opening shared memory
JVMSHRC336E Port layer error code = -459502
JVMSHRC337E Platform error message: shmctl : EDC5111I Permission denied.
JVMSHRC028E Permission Denied
JVMSHRC626I The stats of the shared cache cannot be obtained since a valid shared cache does not exist.
JVMJ9VM015W Initialization error for library j9shr29(11): JVMJ9VM009E J9VMDllMain failed

The userid issuing the command does not have access to the resource.

The documentation says the shared class cache is created with ONLY USER read/write access by default unless the groupAccess command-line suboption is used, in which case the access is read/write for user and groups.

Note: Users with super user authority gid=0(SYS1) can issue the command with no additional authority.

To find the group list the directories containing the cache, for example if /var/zosmf/data/logs/.classCache/ was specified use ls -ltr /var/zosmf/data/logs/.classCache/javasharedresources.

For me it had owner IZUSVR group IZUADMIN.

I used the RACF command connect COLIN group(IZUADMIN) to connect the userid to the group. Even then the command failed, because groupAccess had not been defined on the -Xshareclasses… parameter. I had to delete the cache so it was recreated next time theJVM started. Then the java -Xshareclasses:cacheDir=/var/zosmf/data/logs/.classCache,name=liberty-IZUSVR,verbose,printStats worked.

JVMSHRC023E   Cache does not exist

I had

-Xshareclasses:cacheDir=/javasc,name=izusvr1cache,printStats

I had to remove the printStats

IKJ56251I USER NOT AUTHORIZED FOR SUBMIT YOUR TSO ADMINISTRATOR MUST AUTHORIZE USE OF THIS COMMAND

You need to give the userid access to the TSOAUTH resource

//TSO3 EXEC PGM=IKJEFT01
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
PERMIT SUBMIT CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)
PERMIT CONSOLE CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)
PERMIT JCL CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)
PERMIT PARMLIB CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)
SETROPTS RACLIST(TSOAUTH)


IKJ56702I INVALID GROUP, PKIGRP3

I got this with

DELGROUP PKIGRP3

The message is totally wrong.

I could not delete the group because it had users connected to it. When I removed the userids it worked OK.

ICSF

IEC143I 213-85, … RC=X’00000008′,RSN=X’0000271C’

You may need to refresh the in memory copy of the PKDS.

MQ applications

IEW2456E SYMBOL CSQB1CON UNRESOLVED.
IEW2456E SYMBOL CSQB1DSC UNRESOLVED.

Was using cc to compile in Unix Services, and had Binder option dll. The compiler did not have this option, and so gave this message.

I used

cc -c -o c.o -Wc,SO,LIST(lst),SHOWINC,SSCOM,DLL,LSEARCH(‘COLIN.MQ924.SCSQC370′) -I //’COLIN.MQ924.SCSQC370’ c.c
cc -o mqsamp -V -Wl,LIST,MAP,INFO,DYNAM=DLL,AMODE=31 //’COLIN.MQ924.SCSQDEFS.OBJ(CSQBMQ1)’ c.o

Note I had to create the COLIN.MQ924.SCSQDEFS.OBJ, when using the xlc compiler.

IOEZ00312I Dynamic growth of aggregate ZFS.USERS in progress,
IOEZ00329I Attempting to extend ZFS.USERS by a secondary extent.
IEF196I IEC070I 104-204,OMVS,OMVS,SYS00022,0A9E,C4USS2,ZFS.USERS,
IEF196I IEC070I ZFS.USERS.DATA,CATALOG.Z24C.MASTER
IEC070I 104-204,OMVS,OMVS,SYS00022,0A9E,C4USS2,ZFS.USERS, 588
IEC070I ZFS.USERS.DATA,CATALOG.Z24C.MASTER
IOEZ00445E Error extending ZFS.USERS. DFSMS return code = 104, PDF code = 204.

MSG IEC070I 104-204 data set would exceed 4 gig if extended.

z/OS

CCN0629(U) DD:SYSLIN has invalid attributes.
CCN0703(I) An error was encountered in a call to fopen() while processing DD:SYSLIN.

We got these when compiling a C program, and using SYSLIN.
The problem is that the procedures such as EDCCBG, have

//SYSLIN .. DCB=(RECFM=FB,LRECL=80,BLKSIZE=3200)

when the data set had a blksize of more than 3200 (eg 27920). It sees and reports the mismatch.

Unix services

FSUM7332 syntax error: got Word, expecting )

I was trying to use a Python virtual environment and used the command

. env/bin/activate

The problem was the code page of the file.

I needed

export _BPXK_AUTOCVT=ON

I put this in my .profile file/

openssl

I got the following using x3270.

Use export SSL_VERBOSE_ERRORS=”1″ to get more info

Error: SSL: Private key file load ("...") failed:
error:0909006C:PEM routines:get_name:no start line

Using

openssl s_client -connect 10.1.1.2:2023 -cert … -certform PEM

gave more info

unable to load client certificate private key file
error:0909006C:PEM routines:get_name:no start line:../crypto/pem/pem_lib.c:745:Expecting: ANY PRIVATE KEY

I needed certificate and key, for example

x3270 -port 2023 -trace -tracefile x3270.trace -certfile ~/ssl/ssl2/colinpaice.pemkeyfile /home/colinpaice/ssl/ssl2/colinpaice.key.pem 10.1.1.2

Z/OS

IEE535I … INVALID PARAMETER

I had

TRACE CT,WTRSTART=CTWTR
IEE535I TRACE INVALID PARAMETER

TRACE CT,WTRSTART=CTWTR,WRAP
ITT038I … WERE SUCCESSFULLY EXECUTED.

The first command was copied from a document. It had a trailing non blank space (x41). Remove it and the command works.

Try pasting the command into an ISPF edit session and using hex on to display the command.

BPX1SND rv -1 rc EOPNOTSUPP(1112) rs 1977578120 (0x75df7288) EDC8112I Operation not supported on socket.

I got this trying to issue bpx1snd() when there was data in the receive buffer. I used bpx1rcv to read the data, and the problem went away.

I peeked at the data before getting it, so I knew the length of the data to get, and so avoided waiting for data.

char buf[4000];
int lbuff = sizeof(buf); 
int alet = 0; 
int flags = MSG_PEEK; 
BPX1RCV( &sd,   // socket desciptor 
        &lbuff, 
        &buf, 
        &alet, 
        &flags, 
        &rv, // -1 or number of bytes 
        &rc, 
        &rs); 
 
printf("BPX1RCV Peek bytes %d data... \n",rv   ); 

lbuff = rv; // the number of bytes in the buffer 
flags = 0       ; 
BPX1RCV( &sd,   // socket desciptor 
        &lbuff, 
        &buf, 
        &alet, 
        &flags, 
        &rv, // -1 or number of bytes 
        &rc, 
        &rs); 

printf("BPX1RCV bytes %d data... \n",rv   ); 

MQWEB gets killed off after starting.

Gwydion kindly helped me with the following…. ( his words).

strmqweb is a shell script that sets some environment variables then calls the server start command to start Liberty. But, when you run it with the remote shell command, such as Ansible shell module, when the shell terminates it also kills the JVM child process, which kills the server.   The solution is to use nohup when running strmqweb, as described (half way down the page) here: https://ansibledaily.com/execute-detached-process-with-ansible.

Migrating your queue manager in an enterprise.

Migrating an isolated queue manager is not too difficult and some of it is covered in the MQ Knowledge Center.

Since my first blog post on this topic, Ive had some comments asking for more detailed steps… so I’ve added a section at the bottom.

Below are some other things you need to think about when working in an enterprise when you have test and production, multiple queue managers per application (for HA or load balancing) and multiple applications. I am focusing on midrange MQ, and not z/OS though many of the topics are relevant to all platforms.

Consider the scenario where you have 4 queue managers
QMA and QMB supporting MOBILE and STATUS applications,
QMX and QMY supporting PAYROLL application.

You want a low risk approach, so you decide to upgrade the STATUS application first. This application uses QMA and QMB which are also used by business critical application MOBILE. This would be a high risk change.
It would be safer to to first migrate application PAYROLL on QMX and QMY.

Looking at QMX and QMY.
You could migrate both queue managers the same weekend – this would be least work, but has a risk that you do not have a good fall back plan if it does not work as expected.
You could migrate QMX this weekend, and QMY next weekend if there were no problems found.
If QMX has problems you can continue using QMY while you resolve problems. If QMX has problem, then if QMY has problems or is shut down you have an availability issue, so you may want to define a new environment with QMZ (and the web server etc – so not a trivial task).

As well as production QMX and QMY you have test systems: You need to plan to migrate and test these pre-production systems before considering migrating production. While the test and production levels of MQ are different, you may want to freeze making application changes, and factor this in the plan.

If you have a machine with one MQ level of code, and multiple queue managers on it, you cannot just migrate one queue manager, as you delete the MQ executables and install the new version. You can use multiple installed levels of MQ – but you may have to migrate to this before exploiting it. See Multiple Installations.

Clustering. Remember to migrate your full repositories first – you might want to consider creating dedicated queue managers for your repositories if this is a problem.

License: You will need licenses for the versions of MQ you use. The MQ command dspmqver gives you information about your existing installation. Some licenses entitle you to support from IBM, others are for development or trial use.

There are three stages to migrating applications.

  • Run the applications with no changes on the upgraded system. These should run successfully, but MQ may do more checks, for example some data is meant to be on a 4 byte boundary. MQ now polices this.
  • Recompile the applications to use the newer MQ libraries. Some application MQ control blocks may be larger, and this may uncover application coding problems. For example uninitialised storage.
  • Exploitation of new function. Do this once you have successfully migrated the existing queue managers.

Testing: You need to test the normal working application, plus error paths, such as messages being put to a Dead Letter Queue, and making sure this process works.

Update your infrastructure harness: You need to review what new messages your automation processes, and what actions to take.
You need to decide what additional statistics etc to use and what reports you want to product for capacity planning, health review and day to day running.

You have to worry about applications coming in to your queue manager. For example what levels of MQ are they using. They may need to be rebuilt with the newer libraries. The client code may need to up upgraded. You can use the DIS CHS(…) RVERSION to display the level of MQ client code. Of course your challenge will be to get people outside of you organization to update their code – especially when they say they have lost the source to the program.

MQ is rarely used in isolation. You may need to upgrade web servers to a newer level which support the new level of MQ.

You may need to upgrade the hardware and operating system.

Going down to the next level of detail.

Exits

You need to check any exits you have can support new functions and different levels of control blocks. For example there are shared connections, and the MQMD can change size from release to release.

If you cannot have one exit that supports all level of MQ. You’ll have to manage how you deploy the exit matching the queue manager level.

TLS and SSL setup

  • You need to review the TLS and SSL support. Newer levels of MQ removes support for weaker levels of TLS.
  • You need to review the end user certificates to make sure they are using supported levels of encryption.
  • You need to review the cipherspecs used by SSL channels, and upgrade them before you migrate the queue manager. (You could migrate to a newer version and see which channels fail to start, then fix them, but this is not so good).
  • As part of this cipherspec review you may wish to upgrade to strong cipher specs which use less CPU, or can be offload on z/OS.
  • You may have a problem sharing keystores, and make sure you include the keystore files in your backups. See APAR IT16295.

Building your applications

  • In some environments application developers compile programs on their own machines; in other environments, there is a process to generate applications on a central build machine. You will need to change the build environment to have the newer version header files, and change the build process to be able to use them.
  • You will need to set up a build environment so you can use the MQ V9 header files for just the application being migrated.
  • You many need to change your deploy tool so that the program compiled at MQ V9 is only deployed to TESTQMA, ( at MQ V9) and not to TESTQMB(still at MQ v7).
  • You need to change your deploy tool for test, pre-production and production.

Using the Client Channel Definition Table (CCDT)

  • Older clients must continue to use existing CCDT
  • Newer clients are able to understand older CCDTs.
  • For an application to use a newer version CCDT, you must update the MQ client.
  • So you need to be careful about moving the CCDT file around

System management applications

You may have home grown applications that are used to manage MQ. These need to be changed to support new object types( such as chlauth records and topics) and new fields on objects. You cannot rely on a field of interest being the 5th in the list as it was in MQ V5.

MQ Console (MQWEB)

If you are using the MQ Console server to provide a web browser or REST API to a queue manager, you may need to do extra work for this.

You have an instance of MQ Console to support MQ V9.0 and a different instance to support MQ 9.1

If you have multiple queue managers on a box, and plan to to use MQ Multiple Installation to migrate one queue manager at a time, then you will to consider the following

  • The box has QMA and QMB on it at MQ V9.0
  • These box use MQCONSOLE-XX with port 9090
  • Install MQ 9.1 on the same box.
  • Migrate QMA to 9.1
  • Create an MQCONSOLE-YY at MQ 9.1 with port 9191
  • Change your web browser URL and REST api apps to use port 9191
  • Wait for a week
  • Migrate QMB to 9.1
  • Migrate MQCONSOLE-XX to 9.1
  • Web browser URL and REST API url can continue using port 9090
  • Shutdown MQCONSOLE-YY
  • Undo any changes to change your web browser URL and REST api apps to use port 9191 and go back to port 9090

“The rest of the stuff”

I remember seeing a poster of child sitting on a potty with a caption saying “no job is complete until the paper work is complete”.

Someone said that doing the actual migration of a queue manager took 1 hour. Doing the paper work ; planning, change management, talking to user etc took two weeks per queue manager.

And yes, you do need to update your documentation!

Education

You need to talk to the teams around your organization. This is mainly applications – but other teams as well ( eg monitoring, networking)

  • Tell them what changes you will be making, the time scales etc..
  • There will be an application freeze during the migration.
  • The application teams will need to test their applications, and may need to make changes to them.
  • The application teams will get these new events/alerts which they need to handle.
  • You may learn about how they use MQ, and how this will affect your migration plans. (We used this unsupported program for which we have no source and no one knows how it works – which is critical to our business).
  • You may get a free trip to an exotic location to talk to the application teams (or you may get told to go to some hell hole)
  • You need to talk to people outside if your organization. The hard bit may be finding out who they are

Security

  • You need to protect any new libraries.
  • MQ may have new facilities such as topics which you need to develop and implement a security policy for. In V9 MQ midrange now publishes statistics to a topic.
  • Your tools for processing MQ security events, may need to be enhanced to handle new resource types or new events.

New messages and events

You need to review all new events or messages, and add automation to process them. You need to decide who gets notified, and what actions to take.

You need to review changed messages or alerts in case you are relying on “constant” information in particular place in the message, which has been changed.

Backups

People often dump the configuration of their queue managers every day, so they can use runmqsc to recreate the queues etc. You need to backup all objects including topics and chlauth records, and check you can recreated them in a queue manager.

Backup your mq libraries for queue manager and clients – or be able to redeploy them from your systems management software.

Performing the migration

This is documented in the Knowledge Centre. One path for migration involves deleting the old level of MQ and installing the new level of MQ. If you need to go back to the old level, you need to have a copy of the old level of MQ base + CSD level as you were running on!

Carefully check the documentation for the hops.

The Migration paths documentation says

  • You can migrate from V8.0 or later direct to the latest version.
  • To migrate from V7.0.1, you must first migrate to V8.0.
  • To migrate from V7.1 or V7.5, you must first migrate to V8.0 or V9.0.
  • You might have an extra step to go to MQ V9.1

I found some really old doc saying

“If you are still on MQ version 5.3, you should plan a 2 step migration: first migrate to MQ v7.0.1 then migrate to 7.1 or 7.5”. This could be a challenge as you can not get the MQ 7.0.1 or the MQ 7.1 product. One of the reasons for this two stage approach is that the layout of files changed, so you have to restart at MQ 7.0.1 to make these file system changes.

Finally…

If I have missed anything or got something wrong, please let me know and I’ll update the list

Checklist for implementation

Different stages

  • Pre-reqs
  • Education for team doing migration
  • Investigate – until you have done the investigation you cannot plan the work. For example how many exits are used, and how many need to be changed.
  • Plan. The first time you do something you may be slow. Successive times should be faster as you should know what you are doing.
  • Implement/Migrate
  • Exploit new features.

Before you start

  • People doing the work need access to systems
  • Need to draw up a schedule (but you may need to do the investigation work before you know how much work there is to do)
  • Appoint a team leader.
  • Determine what skills you need, eg TLS, application design, build
  • Which people do admin – which people handle code eg review exit programs
  • Reporting and status
  • Communication with other teams – we will be migrating in.. and you will be asked to do some work..
  • Extract configuration to common disk, so people do not need to access each queue manager.
  • External customers – provide one list of changes for them if possible. This is better than giving them multiple lists of changes, and will help them understand the size of their work.

Education

  • Ensure every one has basic knowledge
    • MQ commands
    • Unix commands
    • TLS and security( and stop using SSL)
    • Manage remote MQ from one site using remote runmqsc command or logon to each machine
    • Efficient way of processing data
      • Use GREP on a file to find something, pipe it … sort it, do not find things by hand
  • How the project will be tracked

Areas for migration

  • TLS parameters and using stronger encryption
  • TLS certificate strength
  • Exits
  • Applications
  • Queue manager
  • Clients using the queue manager. A client may be able to connect to many queue managers.

Investigate SSL/TLS

  • Which TLS parameters are being used
  • Which ones are not supported in newer versions of MQ?
    • SSLCIPH
    • Need to worry about both ends of the connection
  • Identify “right” TLS parameters to use
    • eg Strong encryption which can be offloaded on z/OS.
  • Will these cost more CPU? Is this a problem?
  • If TLS not being used – document this

Implement TLS

  • Need a plan to change any cipher specs which are out of support.
  • May need to make multiple changes across multiple queue managers at “same time” – coordinate different ends
  • Can be done before MQ migration.
  • Can be done AFTER MQ migration if you set a flag.
    • May make implementation easier
    • Still need a plan to change any which are out of support.
    • Still may need to make multiple changes across multiple queue managers at “same time”

Investigate certificates

  • Investigate if certificates are using weak encryption
    • Which certificates need to be changed? May need RACF/Security team to help report userids that need to change
  • Plan to roll out updated certificates
    • Include checking external Business partners
  • Investigate any other changes in MQ configuration
  • Check changes to your TLS keystore in APAR UT16295.

How to check a certificate

  • /opt/mqm/java/jre64/jre/bin/ikeycmd -cert -details -db key.kdb -label …
  • A password is required to access the source key database.
  • Please enter a password:
  • Label: CLIENT
  • Key Size: 2048
  • Serial Number: ….
  • Issued by: CN=colinpaiceCA, O=Stromness Software Solutions, ST=Orkney, C=GB ? Check this is still valid
  • Subject: CN=colinpaice, C=GB
  • Valid: From: Thursday, 17 January 2019 18:22:45 o’clock GMT To: Sunday, 31 May 2020 19:22:45 o’clock BST ? Check ‘to’ date
  • Signature Algorithm: SHA256withRSA (1.2.840.113549.1.1.11) ? I think this needs to be SHA256withRSA
  • Trust Status: enabled

Implement certificate change

  • This can be done at any time before migrating a queue manager

Investigate exits

  • Find which exits are being used
    • DIS CHL(*)… grep for EXIT
    • dis qmgr grep for exit
  • Queue manager and clients
  • /var/mqm/qmgrs/QMNAME/qm.ini, channel definition (grep for EXIT)
  • Check exits at the correct level on all queue managers and clients. (change date,size)
  • May need emails to business partner.
  • Do exits need to be converted from 31 bit to 64 bit?
  • Locate exit source
  • Review source
  • Control blocks may be bigger
  • May have to support new functions, eg shared connections
  • Is function still needed?
  • Document exit usage

Implement exit changes

  • Recompile all exits and deploy to all platforms before you do any migration work – check no problems
  • Change and test exits
  • Need to change build tools to allow builds with new levels of header files etc, and roll out to selected queue managers
  • Should work on old and new releases
  • May need a V9.1 MQ to test exits on before migration
  • Can be deployed before MQ Migration ? Or do you have requirements for specific levels of exits.
  • Create documentation for exits

Investigate applications

  • External business partners as well as internal
  • Need to get named contact for each application
  • Check level of MQ client code
  • Check TLS options
  • Identify where connection info is stored (AMQCHLTAB)
  • What co-req products need to be updated
    • Web servers
  • Is there test suite which includes error paths etc.
  • Identify build and deploy tools
  • Need capability to compile application using newer MQ header files, and deploy to one MQ

Implement application changes

  • Need to have change freeze during migration
  • Build project plan
  • Duration for testing
  • Which systems to be used for testing
  • Create process to update MQ client code
  • Make sure there is process to roll out changes in future
  • Need to allow buffer

Application recompile

  • Recompile programs using existing libraries and jar files -to make sure every think works before you migrate
  • Deploy and test
  • Change deploy process to use new versions of libraries
  • Recompile using newer versions of libraries
  • Deploy and test
    • Any problems found need to be validated at previous levels, or have conditional statements around it
  • Once all queue managers upgraded
    • Comment out code for compiling with previous libraries
    • To prevent accidents
    • In case of problems in production (before migration) needing a fix.

Investigate queue managers

  • Does the hardware need to be upgraded?
  • Are there any coreqs – eg multi instance or HA environments?
  • Any co-reqs eg upgrade web server, database?
  • Does the Operating System need to be upgraded
    • For example MQ now 64 bit. Early versions were 31 bit
    • Newer versions of Java
  • Identify which applications run on this queue manager
    • Need plan for each application
  • Identify pre-reqs
    • TLS
    • Exits

Plan how you are going to update the Client Channel Definiton Table

  • If you migrate a queue manager, then its CCDT will be migrated to the newer level.
  • Clients cannot use a CCDT from a higher level queue manager.
  • If you migrate your clients to the latest level you will have no problems with the CCDT
  • If you migrate the CCDT owner queue manager first, you need to be careful about copying the CCDT to other machines, to prevent a mismatch.

Plan queue managers

  • Plan software and hardware upgrades
  • Identify order of queue manager migration
    • Test, pre-prod, production
    • Full repositories then partial
      • Consider setting up new QM just for full repository?
    • Do one server, test applications, do other servers
    • Need to worry about multi instance and HA queue. These need to be coordinated and done at the same time.
  • Check license for MQ
  • May need to migrate queue manager multiple times
    • from MQ V5.3 to V7.x
    • from 7.x to V9.0
    • from 9.0 to 9.1
  • Clients first/later

Automation

  • Need to set up automation for new messages and new events

Backups etc

  • Make sure you have back up your queue managers (and other tools such as build configuration files before you make any changes).

Do the migration

  • Follow the MQ knowledge centre.