Baby Python scripts doing powerful work with MQ

I found PyMqi is an interface from Python to MQ. This is really powerful, and I’m am extending it to be even more amazing!

In this blog post, I give examples of what you can do.

  • Issue PCF commands and get responses back in words rather than internal codes ( so CHANNEL_NAME instead of 3501)
  • Saving the output of DISPLAY commands into files
  • Using these files to compare definitions and highlight differences.
  • Check these files conform to corporate standards.
  • Print out from the command event queue, and the stats event queue etc


I can use some python code to display information via PCF

  • # connect to MQ
  • qmgr = pymqi.connect( queue_manager,”QMACLIENT”,”127.0.0.1(1414)”)
  • # I want to inquire on all SYSTEM.* channels
  • prefix = b”SYSTEM.*”
  • # This PCF request
  • args = {pymqi.CMQCFC.MQCACH_CHANNEL_NAME: prefix}
  • pcf = pymqi.PCFExecute(qmgr)
  • # go execute it
  • response = pcf.MQCMD_INQUIRE_CHANNEL(args)

This is pretty impressive as a C program would take over 1000 lines to do the same!

This comes back with data like

  • 3501: b’SYSTEM.AUTO.RECEIVER’,
  • 1511: 3,
  • 2027: b’2018-08-16 ‘,
  • 2028: b’13.32.15′,
  • 1502: 50

which is cryptic even for experts because you need to know 3501 is the value of the type of data for “CHANNEL_NAME”.

I have some python code which converts this to..

  • ‘CHANNEL_NAME’: ‘SYSTEM.AUTO.RECEIVER’,
  • ‘CHANNEL_TYPE’: ‘RECEIVER’,
  • ‘ALTERATION_DATE’: ‘2018-08-16’,
  • ‘ALTERATION_TIME’: ‘13.32.15’
  • ‘BATCH_SIZE’: 50

for which you only need a kinder garden level of MQ knowledge to understand it. It converts 3501 to CHANNEL_NAME, and 3 into RECEIVER

With a few lines of python I can write this data out so each queue is a file on disk in YAML format.

A yaml file for a queue looks like

  • Q_NAME: TEMP
  • Q_TYPE: LOCAL
  • ACCOUNTING_Q: Q_MGR
  • ALTERATION_DATE: ‘2019-02-03’
  • ALTERATION_TIME: 18.15.52
  • BACKOUT_REQ_Q_NAME: ”

Now it gets exciting! (really)

Now it is in YAML, I can write small Python scripts to do clever things. For example

Compare queue definitions

  • from ruamel.yaml import YAML
  • import sys
  • yaml=YAML()
  • q1 = sys.argv[1] # get the first queue name
  • ignore = [“ALTERATION_DATE”,”ALTERATION_TIME”,
  • “CREATION_DATE”,”CREATION_TIME”]
  • in1 = open(q1, ‘r’) # open the first queue
  • data1 = yaml.load(in1) # and read the contents in
  • for i in range(2,len(sys.argv)): # for all of the passed in filenames
  • q2=sys.argv[i] # get the name of the file
  • in2 = open(q2, ‘r’) # open the file
  • data2 = yaml.load(in2) # read it in
  • for e in data1: # for each parameter in file 1
  • x1 = data1[e] # get the value from file 1
  • x2 = data2[e] # get the value from the other file
  • if not e in ignore: # some parameters we want to ignore
  • if x1 != x2: # if the parameters are different
  • print(q1,q2,”:”,e,x1,”/”,x2) # print out the queuenames, keywork and values

From this it prints out the differences

  • queues/CP0000.yml queues/CP0001.yml : Q_NAME CP0000 / CP0001
  • queues/CP0000.yml queues/CP0001.yml : OPEN_INPUT_COUNT 1 / 0
  • queues/CP0000.yml queues/CP0001.yml : MONITORING_Q Q_MGR / HIGH
  • queues/CP0000.yml queues/CP0001.yml : OPEN_OUTPUT_COUNT 1 / 0
  • queues/CP0000.yml queues/CP0002.yml : Q_NAME CP0000 / CP0002
  • queues/CP0000.yml queues/CP0002.yml : OPEN_INPUT_COUNT 1 / 0
  • queues/CP0000.yml queues/CP0002.yml : OPEN_OUTPUT_COUNT 1 / 0

I thought pretty impressive for 20 lines of code.

and another script -for checking standards

  • from ruamel.yaml import YAML
  • import sys
  • yaml=YAML()
  • q1 = sys.argv[1] # get the queue name
  • # define the variables to check
  • lessthan = {“MAX_Q_DEPTH”:100}
  • ne = {“INHIBIT_PUT”:”PUT_ALLOWED”,”INHIBIT_GET”: “GET_ALLOWED”}
  • in1 = open(q1, ‘r’) # open the first queue
  • data = yaml.load(in1) # and read the contents in
  • # for each element in the LessThan dictionary (MAX_QDEPTH), check with the
  • # data read from the file.
  • # if the data in the file is “lessthan” the value (100)
  • # print print out the name of the queue and the values
  • for i in lessthan: # just MAX_Q_DEPTH in this case
  • if data1[i] < lessthant[i] : print(q1,i,data[i],”Field in error. It should be less than <“,lessthan[i])
  • # if the values are not equal
  • for i in ne: # INHIBUT_PUT and #INHIBIT_GET
  • if data[i] != ne[i] : print(q1,i,data[i],”field is not equal to “,lt[i])

the output is

queues/CP0000.yml
MAX_Q_DEPTH 5000 Field in error. It should be < 100

Display command events

difference Q_NAME CP0000 CP0000 ALTERATION_DATE 2019-02-07 2019-02-11

difference Q_NAME CP0000 CP0000 ALTERATION_TIME 20.48.24 21.29.23

difference Q_NAME CP0000 CP0000 MAX_Q_DEPTH 4000 2000

With my journey so far – Python seems to be a clear winner in providing the infrastructure for managing queue managers.

The lows (and occasional high) of managing MQ centrally.

While I was still at IBM, and since I retired from IBM I have been curious how people managed MQ in their enterprise systems.

  1. How do you deploy a change to a queue to 1000 queue managers, safely, accurately, by an authorised person, and by the way one queue manager was down when you tried to make the change?
  2. Are theses identical systems identical – or has someone gone in and made an emergency change on one system and left one parameter different?
  3. We have all of these naming standards – do we follow them? Did we specify encryption on all external channels?

At the bottom of this blog (more like a long essay) I show some very short Python scripts which

  • compare queue definitions and show you the differences between them.
  • check when queue attributes do not meet “corporate standards”
  • printing of data from the change-events queue, so you can see what people altered.
  • I also have scripts which display PCF data from events, stats etc. I need to clean them up, then I’ll publish them.

I think Python scripting will make systems management so much easier.

Strategic tools do not seem to deliver.

There seem to be many “strategic tools” to help you. These include Chef, Puppet, Ansible, and Salt which are meant to help you deploy to your enterprise

There is a lot of comparison documents on the web – some observations in no particular order

  • Chef and Puppet have an agent on each machine and seem complex to initially set up
  • Ansible does not use agents – it uses SSH command to access each machine
  • Some tools expect deployers to understand and configure in Ruby (so moving the complexity from knowing MQ to Ruby), others use YAML – a simple format.

This seems to be a reasonable comparison.

Stepping back from using these tools I did some work to investigate how I would build a deployment system from standard tools. I have not done it yet, but I thought I would document the journey so far.

Some systems management requirements

What I expect to be able to do in an enterprise MQ environment.

  • I have a team of MQ administrators. All have read only access to all queue managers. Some can only update test, some can update test and production.
  • I want to be able to easily add and remove people from role based groups, and not wait a month for someone to press a button to give them authority.
  • I want to save a copy of the object before, and after a change – for audit trail and Disaster Recovery.
  • The process needs to handle the case when a change does not work because, the queue manager is down, or the object is in use.
  • I want to be able to deploy a new MQ server – and have all of the objects created according to a template for that application.
  • I want to check enforce standards eg names, and values (do you really need a max queue depth of 999 999 999, and why is curdepth 999 999?).
  • I want to be able to process the event data and stats data produced by MQ and put them in SPLUNK or other tool.
  • There are MQ object within the queue manager, and other objects such as CCDT tables for clients, and keystores TLS keys. I need to get these to wherever they are used.
  • I want to report statistics on MQ in my enterprise tool – so I need to get the statistics data from each machine to the central reporting tool
  • I want Test to look like Production (and use the same processes) so we avoid the problem of not testing what was deployed.

Areas I looked at

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

  • This may be fine for a test environment, but once deployed I still want to be able to change object attributes on a subset of the queue managers. I don’t think Docker solves this problem (it is hard to tell from the documentation).
  • I could not see how to set up the Client Channel Definition Tables (CCDT) so my client applications can connect to the appropriate production queue manager.
  • If I define my queues using clustering, when I add a new queue manager, the objects will be added to the repository cache. When I remove a queue manager from the cluster, and delete the container, the object live on in the cache for many days. This does not feel clean.
  • I wondered if this was the right environment (virtualised) for running my production performance critical workload on. I could not easily find any reports on this.
  • How do I manage licenses for these machines and make sure we have enough licenses, and we are not illegally using the same licence for all machines.

Using RUNMQSC

At first running runmqsc locally seemed to be answer to many of the problems.

I could use secure FTP to get my configuration files down to the machine, logon to the machine, pipe the file into runmqsc, capture the output and ftp the file back to my central machine.

  • Having all the MQ administrators with a userid on each machine can be done using LDAP groups etc. so that works ok
  • To use a userid and password you specify runmqsc -u user_id qm. This then prompts for the password. If you are pipe your commands in, then you have to put your password as the first line of the piped input. This is not good practice, and I could not see a way of doing it without putting the password in the file in front of the definitions. (Perhaps a Linux guru can tell me)

Having to ftp the files to and from the machine was not very elegant, so I tried using runmqsc as a client (the -c option). At first this seemed to work, then I tried making it secure, and use an SSL channel. I could only get this to work when it used a channel with the same name as the queue manager name. (So to use queue manager QMB I needed an SSL channel called QMB. The documentation says you cannot use MQSERVER environment variable to set up an SSL channel). On my queue manager QMB channel was already in use. I redefined my channel and got this to work.

As you may expect, I fell over the CHLAUTH rules, but with help from some conference charts written by Morag, I got the CHLAUTH rules defined, so that I could allow people with the correct certificate to use the channel. I could then give the channel a userid with the correct authority for change or read access.

I had a little ponder on this, and thought that a more secure way would be to use SSL AND have a userid and password. If someone copied my keystore they would still need the password to connect to MQ, and so I use two factor authentication.

This is an OK is solution, but it does not go far enough. It is tough trying to parse the output from runmqsc (which is why PCF was invented).

Using mqscx

Someone told me about mqscx from mqgem software. If runmqsc is the kinder garden version , mqscx is the adult one. It does so much more and is well worth a look.

See here for a video and here for the web site.

How does it do against my list of requirements?

  • I can enter userid and password on the command line and also pipe a file in ✔
  • One column output ( if required) so I can write the output to a file, and it is then very easy to parse ✔
  • I can use ssl channels ✔
  • I can use my Client Channel Definition Table (CCDT) ✔

It also has colour, multi column, better use of screen area ( you can display much more on your screen) and its own scripting language.

You can get a try before you buy license.

I moved onto Python and runmqsc…

so I could try to do useful things with it.

Using runmqsc under python does not work very well.

I have found Python is a very good tool for systems management – see below for what I have done with it.

  • I tried using Python “subprocess” so I could write data down the stdin pipe, into runmqsc and capture the output from the data writen to stdout. This did not work. I think the runmqsc output is written to stdout, but not flushed, so the program waiting for the data does not get it, and you get a deadlock.
  • I tried using Python “pexpect”, but this did not work as I could send one command to stdin, but then stdin was closed, and I could not send more data.
  • Another challenge was parsing the output of runmqsc. After a couple of hours I managed to create a regular expression which parsed most of the output, but there were a few edge cases which needed more work, and I gave up on this.
  • PCF on its own is difficult to use.
  • I came across PyMqi – MQ for Python. This was great, I could issue PCF commands, and get responses back – and I can process event queues and statistics queues!

From this I think using PyMqi is great!  My next blog post will describe some of the amazing things you can do in Python with only a few lines of code!

How to become a wizard?

This question came up in conversation recently, I was not entirely sure of the context, was it

  • Im old ( well, retired from full time work), have grey hair (not much hair) and a grey beard
  • Ive worked with MQ since it started
  • I know quite a lot about quite a lot (but the opposite is true, there is so much I do not know about so many things in MQ)
  • I can make things disappear – that pint of beer – see after half an hour it has gone!
  • Im not afraid of going into strange places ( such as going from z/OS into linux)

If you want to be a wizard, here are some thoughts on how to get there.

GUIs are good in some situations

For example

  • One-of requests
  • for low skilled people
  • people with lots of time

My approach is

  • first time – use the GUI to understand the process
  • second time – use the GUI to understand the input
  • third time – automate it – perhaps set up a shell script with the majority of the parameters already filled in

Be brave – go and fight dragons

An easy task is to find the SSL CIPHER specs being used in a queue manager. You use runmqsc and issue dis chl(*) where(SSLCIPH,NE,”) and use your pen and paper to write down what is being used. Easy – but slow.

The dragon task – is to do this for 100 queue managers, and you have half an hour to do it! How does a dragon hunter do this on Linux?

echo “dis chl(*) sslciph” |runmqsc -c QMA | tee -a QMA.FILE

  • echo “dis chl(*) sslciph” is the command to run
  • | passes this to runmqsc
  • the -c in runmqsc means use a client to go to the remote box
  • QMA is the queue manager name (and the channel name to get there)
  • | tee passes the output to the terminal and put the output in a file called QMA.FILE

The output from this is a file QMA.FILE on your local machine with the output of the command in it. Put the echo…. command in a file, and repeat it for every queue manager, and run the file

The second bit of magic is the command

grep CIPH Q*.FILE |sort -k2,2 |uniq -c -f1

  • grep CIPH Q*.FILE this looks for the string CIPH in the files *.FILE and displays the file name and the line of data. For example
QMA.FILE:   SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256) 
QMA.FILE: SSLCIPH( )
QMB.FILE: SSLCIPH( )
QMB.FILE: SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256)
  • |sort -k2,2 says sort on the second field to the second field eg SSLCIPH(TLS_RSA_WITH_AES_128_CBC_SHA256)
  • |uniq -c -f1 display the count of unique values – skipping the first field (skipping the file name)
  • the output is

20 QMA.FILE:SSL_CIPHER_SPEC:
4 QMA.FILE:SSL_CIPHER_SPEC: TLS_RSA_WITH_AES_128_CBC_SHA256
1 QMB.FILE:SSL_CIPHER_SPEC: TLS_RSA_WITH_AES_256_GCM_SHA384

  • So there is the list of cipher spec being used and the count of them – easy !
  • To finish killing the dragon find which queue managers are using the GCM spec
  • grep TLS_RSA_WITH_AES_256_GCM_SHA384 *.FILE to show which files have that cipher spec.

If you want to become a person with good technical skills, these are the sorts of skills you need to develop

  • learn the command line interface, and learn to automate
  • explore different areas, such as shell short cuts, grep, awk, uniq
  • if the command do no damage – do not be afraid of trying something.

Good luck!

How do I change SSLCIPH on a channel?

Regular readers of my blog know that most of the topics I write on appear simple, but have hidden depth, this topic is no exception.

The simple answer is

  • For the client ALTER CHL(xxxx) CHLTYPE(CLNTCONN) SSLCIPH(new value)
  • For the svrconn
    • ALTER CHL(xxxx) CHLTYPE(SVRCONN) SSLCIPH(new value)
    • REFRESH SECURITY

The complexity occurs when you have many clients trying to use to the channel, and you cannot change them all at the same time (imagine trying to change 1000 of them – when half of them are not under your control). For the clients that have not changed, you will get message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘…’.

in the /qmgrs/xxxx/errors/AMQERR01.LOG

For this problem the CCDT is your friend. See my blog post here.

I have a client channel CHANNEL(C1) CHLTYPE(CLNTCONN)

On my CCDT queue manager I created another channel the same as the one I want to update.

DEF CHANNEL(C2) CHLTYPE(CLNTCONN) LIKE(C1)

On my server queue manager I used

DEF CHANNEL(C2) CHLTYPE(SVRCONN) LIKE(C1)

DEFINE CHLAUTH(C2) TYPE(BLOCKUSER)
USERLIST(….)

REFRESH SECURITY

When I ran my sample connect program, it connected using C1 as before.

On the MQ Server, I changed the SSLCIPH to the new value for C1.

When I ran my sample connect program it connected using channel(C2). In the AMQERR01.LOG I had the message

AMQ9631E: The CipherSpec negotiated during the SSL handshake does not match the required CipherSpec for channel ‘C1′

So the changed channel did not connect, but the second channel with the old cipher spec worked succesfully. (The use of the backup channel was transparent to the application)

I then changed DEF CHANNEL(C1) CHLTYPE(CLNTCONN) so SSLCIPH had the correct, matching value. When my sample program was run, it connected using channel C1 as expected.

Once I have changed all my channels, and get no errors in the error log.

  • I can change the CHLAUTH(C2) BLOCKUSER(*) and either set warning, or give no warning and no access
  • Remove C2 from the CCDT queue manager, so applications no longer get this in their CCDT
  • Finally delete the channel C2 on the server.
  • Go down the pub to celebrate a successful upgrade!


Should I do in-place or side by side migration of MQ mid-range?

With mid-range MQ there are a couple of migration options:

  • Upgrade the queue manager in place – if there are problems, restore from backup, and sort out the problems this restore may cause. You may want to do this is you have just the one queue manager.
  • Upgrade the queue manager in place – if there are problems, leave it down until any problems can be resolved. This assumes that you are a good enterprise user and have other queue managers available to process the work.
  • Create another queue manager, “next to it” (“side by side”) on the same operating system image. A better description might be “adding a new queue manager to our environment on an existing box, and deleting an old one at a later date” rather than “side by side migration”. You may already have a document to do this.

What do you need to do for in-place migration.

  • Backup your queue manager see a discussion here
  • Shut down the queue manager, letting all work end cleanly
  • Either (see here)
    • Delete the previous version of MQ, and install the new version, or better..
    • Use Multi-install – so you have old and new versions available at the same time
  • Switch to the new version (of the multi-install)
  • Restart the queue manager
  • Let work flow
  • Make note of any changes you make to the configuration – for example alter qlocal… in case you need to restore from a backout, and re-apply the changes.

If you need to backout the migration and restore from the backout

You need to

  • Make sure there are no threads in doubt
  • Make sure all transmission queues are empty (so you do not overwrite messages when you restore from a backup)
  • Make sure all transmission queues are empty ( so you do not overwrite messages when you restore from a backup)
  • Offload messages from application queues – if you are lucky there will be no messages. Do not offload messages from the system queues.
  • Shut down MQ
  • Reset the MQ installation to the older version
  • Restore from your backup see here
  • Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
  • Load messages back onto the application queues
  • Reapply any changes, such as alter QL…

In the situation where you have a problem, personally I think it would be easier to leave the queue manager down, rather than trying to restore it from a backup. You may want to offload any application messages first. Of course this is much easier if you have configured multiple queue managers, and leaving one queue manager shut down should not cause problems. Until any problems are fixed you cannot proceed with migrating other queue managers, and you may have the risk of lower availability because there is one server less.

What you need to do for side by side migration.

“Side by side” migration requires a new queue manager to be created, and work moved to the new queue manager

  • If this is a cluster repository, you need to move it to another queue manager if only temporarily (otherwise you will get a new repository)
  • You need a new queue manager name
  • You need a new port number
  • Create the queue manager
  • You may want to alter qmgr SCHINIT (MANUAL) during the configuration so that you do not get client applications trying to connect to your new queue manager before you are ready.
  • You need to backup all application object definitions, chlauths etc and reapply them to the new queue manager. Do not copy and restore the channels
  • Apply these application objects to the new queue manager
  • List the channels on the old system
  • Create new channels – for example cluster receiver, with CONNNAME will need the updated port, and a new name
  • You should be able to reuse any sender channels unchanged
  • If you are using CCDT
    • Define new client SVRCONN names (as a CCDT needs unique channel names)
    • On the the queue manager which creates the CCDT, create new Client CLNTCONN channels. The queue manager needs unique names
    • Send the updated CCDT to applications which use this queue managers, so they can use the new queue manager. Note: From IBM MQ Version 9.0, the CCDT can be hosted in a central location that is accessible through a URI, removing the need to individually update the CCDT for each deployed client. See here
    • If you are using clustered queues, then cluster queues will be propagated automatically to the repository and to interested queue managers
    • If you are not using clustering, you will need to create sender/receiver channels, and create the same on the queue managers they attach to
  • Update automation to take notice of the queue managers
  • Change monitoring to include this queue manager
  • Change your backup procedures to back up the new queue manager files
  • Change your configuration and deployment tools, so changes to the old queue manager are copied to the new queue manager as well.
  • Configure all applications that use bindings mode, to add the new queue manager to the options. Restart these applications so they pick up the new configuration
  • When you are ready use START CHINIT
  • Alter the original queue manager to be qmgr SCHINIT (MANUAL), so when you restart the queue manager it does not start the chinit, and so channels will not workload.
    • Note there is a strmqm -ns option. The doc says… This prevents any of the following processes from starting automatically when the queue manager starts:
    • The channel initiator
    • The command server
    • Listeners
    • Services
    • This parameter also runs the queue manager as if the CONNAUTH attribute is blank, regardless of its current value. This allows unauthenticated access to the queue manager for locally bound applications; client applications cannot connect because there are no listeners. Administrative changes must be made by using runmqsc because the command server is not running.
    • But you may not want to run unauthenticated.
  • Stop the original queue manager, after a short time, all applications should disconnect, and reconnect to the new queue manager.
  • Shut down the old queue manager, and restart it. With SCHINIT (MANUAL) it should get no channels running. Stop any listeners. If you have problems you can issue START CHINIT and START LSTR. After a day shut down the queue manager and leave it down – in case of emergency you can just restart it.
  • After you have run successfully for a period you can delete the old queue manager.
  • Remove it from any clusters before deleting it. The cluster repository will remember the queue manager and queues for a long period, then eventually delete them.
  • Make the latest version of MQ the primary installation, and delete the old version
  • Update the documentation
  • Update your procedures – eg configuration automation

As I said at the beginning – an in-place migration looks much easier to do.

Should I use backup and restore of my mid-range queue manager?

In several places the MQ Knowledge centre mentions backing up your queue manager, for example if case of problems when migrating.

I could not find an emojo showing a worried wizard, so let me explain my concerns so you can make an informed decision about using it.

Firstly some obvious statements

  • You take a backup so you can restore it at a later date to the same state
  • When you do a restore in-place you overwrite what was there before
  • The result of a restore should be the same as when you did the backup

See really obvious – but you need to think through the consequences of these.

Creating duplicate messages

Imaging there is a message on the queue saying “transfer 1 million pounds to Colin Paice”. This gets backed up. The messages gets processed, and I am rich!

You restore from the backup – and this message reappears – so unless the applications are smart and can detect a duplicate message – I will get even richer!

Losing messages

An application queues was empty when it was backed up. A message is put to the queue “Colin Paice pays you 1 million pounds”. Before this message gets processed the system is restored – resetting the queue to when it was backed up – so the message disappears and you do not get your money.

Status information gets out of step

The queue manager holds information in queues. For example each channel has information about the sequence number of the message flow. If this gets out of sync, then you have to manually resync them.

If you restore the SYSTEM.CHANNEL.SYNCQ from last week – it will have the values from last week. If you restore this data, the channels will fail to start because the sequence numbers do not match, and you need to use the reset channel command.

If you really want to do backup and restore…

Before you back up..

  • If this is a full repository, “just” move it to another queue manager.
  • Stop receiver channels, so work stops flowing into the queue manager
  • Set all application input queues to put disabled, to stop applications from putting to the queues.
  • Let the applications drain all application queues (and send the replies back)
  • Make sure all queues, such as Dead Letter Queue, and Event Queues have been processed and the queues are empty.
  • Make sure all transmission queues are empty, including SYSTEM.CLUSTER.TRANSMIT.QUEUE and any Split Cluster Transmit queues.
  • Shut down the queue manager, letting all work end cleanly.
  • Backup the queue manager files
  • Make a record of every configuration change you make, such as alter qlocal.

If you need to restore.. you need to empty the queue manager before you overwrite it.

  • Make sure there are no threads in doubt.
  • Make sure all transmission queues are empty
  • Have applications process all of the application messages, or offload the messages
  • Shut down MQ
  • Restore from your backup.
  • Any MCA channels which have been used may have the wrong sequence numbers, and will need to be reset
  • You may need to refresh cluster, so that you get the latest definitions sent the machine, and the information on local objects is propagated back to the full repository.
  • If you offloaded application messages, restore them
  • Reapply any changes, such as alter QL…
  • You need to be careful about applications connecting to your queue manager it is ready to do work. You might want to use strtmqm -ns to start in restricted mode.

It is dangerous if you restore the queue manager in a different place

You need to be careful if you restore a queue manager to a different place.

If you restore it, and start it, then channels are likely to start, and messages flow. For example it will contact the full repository, and send information about the objects in the newly restored queue manager. The full repository will get confused as you have two queue managers with the same name, and same unique name sending information. It is difficult to resolve this once it has occurred. People have been known to do this when testing their disaster recovery procedures – and thus causing a major problem!

One more point…

As wpkf pointed out.

You can start the QM without starting the channel. strmqm  -ns prevents any of the following processes from starting automatically when the queue manager starts:  the channel initiator, the command server, listeners,  and services.    All connection authentication configuration is suppressed. 

Should I use multi-installation when migrating my mid-range queue manager?

The short answer is yes; the long answer is yes.

Multi-installation – where you have MQ V9.0.0.0, MQ 9.0.0.1 and MQ 9.1.0.0 on the same box – I would have called this multi-version or multi-level support. This is not to be confused with multi- instance queue managers are where you have primary and backup queue managers running on the same level of code.

Without multi-installation

When you migrate to a new level you have to shut down your queue managers, delete your old level of MQ libraries, install the new level of MQ ones and restart the queue managers. If you want to go back you have to reverse the process. This assumes you have the installation materials for the previous level, and all of the fixes you may have applied to it. All of this takes time (and you may spend a long time looking for the CD because your corporate network rules do not allow you to download big files). During this time your queue managers are down.

If you have more than one queue manager on the image, they will all use the new level of code when they restart.

With multi-installation

You can install new levels in parallel, run a command and a queue manager will use the new libraries “at the flick of a switch”. So the actual migration is shut down MQ, issue the command, restart MQ. If you have more than one queue managers, you can migrate one, then next day migrate the others.

Be careful…

The MQ documentation says

For example, if you want to upgrade IBM MQ Version 9.0.0.0 to IBM MQ Version 9.0.0, Fix Pack 1, you can install a second copy of IBM MQ Version 9.0.0.0, apply the maintenance bring it to IBM MQ Version 9.0.0, Fix Pack 1, and then move the queue managers across to the new installation.

You still have the original installation, so it is a simple matter to move the queue managers back if you encounter any problems.

I think this is carefully worded. 9.0.0.0 to 9.0.0.1 and back to 9.0.0.0 may work as the documentation says. I do not feel confident that you can go from MQV8 to MQ V9.0, or MQ V9.1, run for a week, and then be able to go back. I would love to be proved wrong.

Are your client connections not configured for optimum high availability?.

I would expect the answer for most people is – no, they are not configured for optimum high availability.

In researching my previous blog post on which queue manager to connect to, I found that the the default options for CLNTWGHT and AFFINITY may not be the best. They were set up to provide consistency from a previous release. The documentation was missing words “once you have migrated then consider changing these options”. As they are hard to understand, I expect most people have not changed the options.

The defaults are

  • CLNTWGHT(0)
  • AFFINITY(PREFERRED)

I did some testing and found some bits were good, predicable, and gave me High Availability other bits did not.

My recommendations for high availability and consistency are the complete opposite of the defaults:

  • use CLNTWGHT values > 0, with a value which would give you the appropriate load balancing
  • use AFFINITY(NONE)

There are several combination of settings

  • all clients use AFFINITY(NONE) – was reliable
    • CLNTWGHT > 0 this was reliable, and gave good load balancing
    • CLNTWGHT being >= 0 was reliable and did not give good load balancing
  • all clients use AFFINITY(PREFERRED) – was consistent, and not behave as I read the documentation
  • a mixture of clients with AFFINITY PREFERRED and NONE. This gave me weird, inconsistent behavior.

So as I said above my recommendations for high availability are

  • use CLNTWGHT values > 0, and with a value which would give you the appropriate load balancing.
  • use AFFINITY(NONE).

My set up

I had three queue managers set up on my machine QMA,QMB,QMC.
I used channels
QMACLIENT for queue manager QMA,
QMBCLIENT for queue manager QMB,
QMCCLIENT for queue manager QMC.

The channels all had QMNAME(GROUPX)

A CCDT was used

A batch C program does (MQCONN to QMNAME *GROUPX, MQINQ for queue manager name, MQDISC) repeatedly.
After 100 iterations it prints out how many times each queue manager was used.

AFFINITY(NONE) and clntwght > 0 for all channels

  • QMACLIENT CLNTWGHT(50), chosen 50 % on average
  • QMBCLIENT CLNTWGHT(20), chosen 20 % on average
  • QMCCLIENT CLNYWGHT(30), chosen 30 % on average.

On average the number of times a queue manager was used, was the same as channel_weight/sum(weights).
For QMACLIENT this was 50 /(50+20+30) = 50 / 100 = 50%. This matches the chosen 50% of the time as seen above.
I shut down queue manager QMC, and reran the test and got

  • QMACLIENT CLNTWGHT(50), chosen 71 % on average
  • QMBCLIENT CLNTWGHT(20), chosen 28 % on average
  • QMCCLIENT CLNYWGHT(30) not selected.
    For QMACLIENT the weighting is 50/ (50 + 20) = 71%. So this works as expected.

AFFINITY(NONE) for all queue manager and clntwght >= 0

The documentation in the knowledge centre says any channels with CLNTWGHT=0 are considered first, and they are processed in alphabetical order. If none of these channel is available then the channel is select as in the CLNTWGHT(>0) case above.

  • QMACLIENT CLNTWGHT(50) not chosen
  • QMBCLIENT CLNTWGHT(0) % times chosen 100%
  • QMCCLIENT CLNYWGHT(30) not chosen

This shows that the CLNTWGHT(0) was the only one selected.
When CLNTWGHT for QMACLIENT was set to 0, (so both QMACLIENT and QMBCLIENT had CLNTWGHT(0) ), all the connections went to QMA – as expected, because of the alphabetical order.

If QMA was shut down, all the connections went to QMB. Again expected behavior.

With

  • QMACLIENT CLNTWGHT(0)
  • QMBCLIENT CLNTWGHT(20)
  • QMCCLIENT CLNYWGHT(30)

and QMA shut down, the connections were in the ratio of 20:30 as expected.

Summary: If you want all connections (from all machines) to go the same queue manager, then you can do this by setting CLNTWGHT to 0.

I do not think this is a good idea, and suggest that all CLNTWGHT values > 0 to give workload balancing.

Using AFFINITY(PREFERRED)

The documentation for AFFINITY(PREFERRED) is not clear.
For AFFINITY(NONE) it takes the list of clients with CLNTWGHT(0), sorts the list by channel name, and then goes through this list till it can successfully connect. If this fails, then it picks a channel at random depending on the clntwghts.

My interpretation of how PREFERRED works is

  • it builds a list of CLNTWGHT(0) sorted alphabetically,
  • then creates another list of the other channels selected at random with a bias of the CLNTWGHT and keeps that list for the duration of the program (or until the CCDT is changed).
  • Any threads within the process will use the same list.
  • For an application doing MQCONN, MQDISC and MQCONN it will access the same list.
  • With the client channels defined above, for different machines, or different applications instances you may get a list when CLNTWGHT >0 .

For example on different machines, or different application instances the lists may be:

  • QMACLIENT, QMBCLIENT, QMCCLIENT
  • QMBCLIENT, QMACLIENT, QMCCLIENT
  • QMACLIENT, QMBCLIENT, QMCCLIENT (same as the first one)
  • QMCCLIENT,QMACLIENT,QMCCLIENT

I’ll ignore the CLNTWGHT(0) as these would be at the front of the list in alphabetical order.

With

  • QMACLIENT CLNTWGHT(50) AFFINITY(PREFERRED)
  • QMBCLIENT CLNTWGHT(20) AFFINITY(PREFERRED)
  • QMCCLIENT CLNYWGHT(30) AFFINITY(PREFERRED)

According to the documentation, if I run my program I would expect 100% of the connections to one queue manager. This is what happened.

If I ran the job many times, I would expect the queue managers to be selected according to the CLNTWGHT.

I ran my program 10 times and in different terminal windows., and each time QMC got 100% of the connections. This was not what was expected!

I changed the QMBCLIENT CLNTWGHT from 20 to 10 and reran my program, and now all of my connections went to QMA!

With the QMBCLIENT CLNTWGHT 18 all the connections went to QMA, with QMBCLIENT CLNTWGHT 19 all the connections went to QMC.

This was totally unexpected behavior and not consistent with the documentation.

I would not use AFFINITY(PREFERRED) because it is unreliable and unpredictable. If you want to connect to the same queue manager specify the channel name in the MQCD and use mqcno.Options = MQCNO_USE_CD_SELECTION.

Having a mix of AFFINITY PREFERRED and NONE

With

  • QMACLIENT CLNTWGHT(50) AFFINITY(PREFERRED)
  • QMBCLIENT CLNTWGHT(20) AFFINITY(NONE)
  • QMCCLIENT CLNTWGHT(30) AFFINITY(NONE)

All of the connections went to QMA.

With

  • QMACLIENT CLNTWGHT(50) AFFINITY(NONE)
  • QMBCLIENT CLNTWGHT(20) AFFINITY(PREFERRED)
  • QMCCLIENT CLNTWGHT(30) AFFINITY(NONE)

there was a spread of connections as if the PREFERRED was ignored.

When I tried to validate the results – I got different results. (It may be something to do with the first or last object altered or defined).

Summary: Having a mix of AFFINITY with values NONE and PREFERRED, it is hard to be able to predict what will happen, so this situation should be avoided.

How do I know which queue manager to connect to ?

Question: How difficult can it be to decide which queue manager to connect to?

Answer: For the easy, it is easy, for the hard it is hard.
I would not be surprised to find that in many applications the MQCONN(x) are not coded properly!


That is a typical question and answer from me – but let me go into more detail so you understand what I am talking about.
If you have only one queue manager then it is easy to know which queue manager to connect to – it is the one and only queue manager.
If you have more than one – it gets more complex. If your application has just committed a funds transfer request and the connection is broken

  • you may just decide to connect to any available queue manager, and ignore a possibly partial funds transfer request
  • or you might wait for a period trying to connect to the same queue manager, and then give up, and connect to another, and later worry about any orphaned message on the queue.

You now see why the easy scenario is easy, and for the hard one, you need to do some hard thinking and some programming to get the optimum response.

There is an additional complexity that when you connect to the same instance – it may be a highly available queue manager, and it may have restarted somewhere else. For the purposes of this blog post I’ll ignore this, and treat it as the same logical queue manager.

I had lots of help from Morag who helped me understand this topic, and gave me the sample code.

You have only one queue manager.

This is easy, you issue an MQCONN for the queue manager. If the connect is not successful, the program waits for a while and then retries. See – I said it was easy.

You have more than one queue manager, and getting a reply back is not important.

For example, you are using non persistent messages.

Your application can decide which queue manager it tries to connect to, or you can exploit queue manager groups in the CCDT.

On queue manager QMA you can define a client channels for it and also for queue manager QMB

DEF CHL(QMA) CHLTYPE(CLNTCONN) QMNAME(GROUPX) 
CONNAME(LINUXA) CLNTWGHT(50)…
DEF CHL(QMB) CHLTYPE(CLNTCONN) QMNAME(GROUPX)
CONNAME(LINUXB) CLNTWGHT(50)…

DEF CHL(QMA) CHLTYPE(SVRCONN) ...

On Unix these are automatically put into the /var/mqm/qmgrs/../@ipcc/AMQCLCHL.TAB file. This is a binary file, and can be FTPed to the client machines that need it.

You can use the environment variables MQCHLLIB to specify the directory where the table is located, and MQCHLTAB to specify the file name of the table (it defaults to AMQCLCHL). See here for more information.

FTP the files in binary to your client machine, for example into ~/mq/.
I did
export MQCHLLIB=/home/colinpaice/mq
export MQCHLTAB=AMQCLCHL.TAB

I then used the command
SET |grep MQ
to make sure those variables are set, and did not have MQSERVER set.

Sample MQCONN code (from Morag)….

MQLONG  CompCode, Reason;
MQHCONN hConn = MQHC_UNUSABLE_HCONN;
char * QMName = "*GROUPX";
MQCONN(QMName,
&hConn,
&CompCode,
&Reason);
// and MQ will pick one of the two entries in the CCDT.

The application connected with queue manager name *GROUPX . Under the covers the MQ code found the channel connections with QMNAME of GROUPX and picked one to use. The “*” says do not check the name of the queue manager when you actually do the connect. If you omit the “*” you will get return code MQRC_Q_MGR_NAME_ERROR 2058 (080A in hex) because “GROUPX” did not match the queue manager name of “QMA” or “QMB”. I stopped QMA, and reconnected the application, and it connected to QMB as expected.

Common user error:When I tried connecting with queue manager name QMA, this failed with MQRC_Q_MGR_NAME_ERROR because there were no channel definitions with QMNAME value QMA. This was obvious once I had taken a trace, looked at the trace, and had a cup of tea and a biscuit, and remembering I had fallen over this before. So this may be the first thing to check if you get this return code.

Using channels defined with the same QMNAME, if your connection breaks, you reconnect with the same queue manager name “*GROUPX” and you connect to a queue manager if there is one available. You can specify extra options to bias which one gets selected. See CLNTWGHT and AFFINITY. See the bottom of this blog entry.

You can use MQINQ to get back the name of the queue manager you are actually connected to (so you can put it in your error messages).

//   Open the queue manager object to find out its name 
od.ObjectType = MQOT_Q_MGR; // open the queue manager object
MQOPEN(Hcon, // connection handle
  &od, // object descriptor for queue
  MQOO_INQUIRE + // open it for inquire          
  MQOO_FAIL_IF_QUIESCING, // but not if MQM stopping      
  &Hobj, // returned object handle
  &OpenCode, // MQOPEN completion code
  &Reason); // reason code
// report reason, if any
if (Reason != MQRC_NONE)
{
printf("MQOPEN of qm object rc %d\n", Reason);
.....
}
// Now do the actual INQ
Selector = MQCA_Q_MGR_NAME;
MQINQ(Hcon, // connection handle
  Hobj, // object handle for q manager
  1, // inquire only one selector
&Selector, // the selector to inquire
0, // no integer attributes are needed
NULL, // so no integer buffer
  MQ_Q_MGR_NAME_LENGTH, // inquiring a q manager name
ActiveQMName, // the buffer for the name
&CompCode, // MQINQ completion code
&Reason); // reason code

printf("Queue manager in use %s\n",ActiveQMName);

You have more than one queue manager, and getting a reply back >is< important

Your application should have some logic to handle the case when your queue manager is running normally, there is a problem in the back end, and so you do not get your reply message within the expected time. Typical logic for when the MQGET times out is:

  • Produce an event saying “response not received”, to alert automation that there may be a problem somewhere in the back end
  • Produce an event saying “there is a piece of work that needs special processing – to manually redo or undo – update number…..”.
    • At a later time a program can get the orphaned message and resolve it.
    • You do not want an end user getting a message “The status of the funds transfer request to Colin Paice is … unknown” because the reply message is sitting unprocessed on the queue.
    • Note: putting a message to a queue may not be possible as the application may not be connected to a queue manager.

When deciding to connect to any available queue manager, or connect to a specific queue manager, there are two key options in mqcno.Options field:

  • MQCNO_CD_FOR_OUTPUT_ONLY. This means, do not use any data in the passed in MQCD <as the field description says – use it for output only>, but pick a valid and available channel from the CCDT, and return the details.
  • MQCNO_USE_CD_SELECTION. This means, use the information in the MQCD to connect to the queue manager

Sample code (from Morag) showing MQCONNX

MQLONG  CompCode, Reason;
MQHCONN hConn = MQHC_UNUSABLE_HCONN;
MQCNO cno = {MQCNO_DEFAULT};
MQCD cd = {MQCD_CLIENT_CONN_DEFAULT};
char * QMName = "*GROUPX";
cno.Version = MQCNO_VERSION_2;
cno.ClientConnPtr = &cd;
// Main connection - choose freely from the CCDT
cno.Options = MQCNO_CD_FOR_OUTPUT_ONLY;
MQCONNX(QMName,
&cno,
&hConn,
&CompCode,
&Reason);
: :
// Oops, I really need to go back to the same connection to continue.

MQDISC(...); // without this you get queue manager name error

// Using same MQCNO as earlier, it already has MQCD pointer set.

cno.Options = MQCNO_USE_CD_SELECTION;
MQCONNX(QMName,
&cno,
&hConn,
&CompCode,
&Reason);

Let me dig into a typical scenario to show the complexity

  • set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
  • MQCONN to any queue manager
  • MQPUT1 of a persistent message within syncpoint
  • set mqcno.Options = MQCNO_USE_CD_SELECTION, as you now want the application to connect to the same queue manager if there is a problem
  • MQCMIT. After this you want to connect to the specific queue manager you were using
  • MQGET with WAIT
  • MQCMIT
  • set mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY. Because the application is not in a business unit of work it can connect to any queue manager.

The tricky bit is in the MQGET with WAIT. If your queue manager needs to be restarted you need to know how long this is likely to take. It may be 5 seconds, it may be 1 minute depending on the amount of work that needs to be recovered. (So make sure you know what this time is.)

Let’s say it typically takes 5 seconds between failure of the queue manager the application is connected to, and restart complete. You need some logic like

mqget with wait..
problem....
failure_time = time_now()
waitfor = 5 seconds
mqcno.Options = MQCNO_USE_CD_SELECTION
loop:
MQCONN to specific queue manager
If this worked goto MQCONN_worked_OK
try_time = time_now()
If try_time - failure_time > waitfor + 1 second goto problem;
sleep 1 second
go to loop:
MQCONN_worked_OK:
MQOPEN the reply to queue
Reissue the MQGET with wait

problem:
report problem to automation
report special_processing_needed .... msgid...
mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
Go to start of program and connect to any queue manager

If you thought the discussion above was complex, it gets worse!

I had a long think about where to put the set mqcno.Options = MQCNO_USE_CD_SELECTION.  My first thoughts were to put it after the first MQCMIT, but this may be wrong.

With the logic

MQCONN
MQPUT1
MQCMIT

If the MQCMIT fails, it could have failed going to the the queue manager so the commit request did not actually get to the queue manager, and the work was rolled back, or the commit could have worked, but the response did not get to your application.

The application should reconnect to the same queue manager, issue the MQGET WAIT. If the message arrives then the commit worked, if the MQGET times out, treat this as the MQGET WAIT timed out case (see above), and produce alerts. This is why I decided to put the set mqcno.Options = MQCNO_USE_CD_SELECTION before the commit. You could just as easily had logic which checks the return code of the MQCMIT and then set it.

A bit more detail on what is going on.

You can treat the MQCD as a black box object which you do not change, nor need to look into. I found it useful to see inside it. (So I could report problems with the channel name etc). The example below shows the fields displayed as an problem is introduced.

Before MQCONNX
set MQCNO_CD_FOR_OUTPUT_ONLY
pMQCD->ChannelName is '' - this is ignored
pMQCD->QMgrName is '' - this is ignored
QMName is '*GROUPX'. -this is needed
==MQCONNX had return code 0 0
pMQCD->ChannelName is 'QMBCLIENT'.
pMQCD->QMgrName is '';
QMName is '*GROUPX'.
MQINQ QMGR queue manager name gave QMB Sleep
during the sleep endmqm -i QMB and strmqm QMB
After sleep
MQOPEN of queue manager object ended with reason code MQ_CONNECTION_BROKEN = 2009.
Issue MQDISC, this ended with reason code 2009

set MQCNO_USE_CD_SELECTION
pMQCD->ChannelName is 'QMBCLIENT' - this is needed
pMQCD->QMgrName is ''
QMName is '*GROUPX'.
MQCONNX return code 0 0
MQINQ queue manager name is QMB

Anything else on clients?

It is good practice to periodically have the clients disconnect and reconnect to do work load balancing. For example

You have two queue managers QMA and QMB. On Monday morning between 0800 and 1000 QMA is shut down for essential maintenance. All the clients connect to QMB. QMA is restarted at 1000 – but does no work, because all the clients are all connected to QMB. If your clients disconnect and reconnect then over time some will connect to QMA.

It is a good idea to have a spread of times before they disconnect, so if 100 clients connected at 0900, they disconnect and reconnect between 10pm and 3am to avoid all 100 disconnecting and reconnecting at the same time.

To get the spread of connections to the various queue managers, you need to use CLNTWGHT with a non zero value. If you omit CLNTWGHT, or specify a value of 0, then the channel chosen is the first alphabetically, in my case they would all go to QMA, and not to QMB.

I feel there is enough material on this for another blog post.

Not for humans but for search engine

MQRC_EPH_ERROR 2420 (0974) (RC2420)

  • You have specified a channel in MQCONNX and this is not in the CCDT, so if you have a channel called QMACLIENT, and use use “QM” or “QM*” both will give MQRC_HOST_NOT_AVAILABLE.
  • You had a network problem, for example the application gets MQRC_CONNECTION_BROKEN. If the next MQ verb the application issues is MQCONN or MQCONNX this will fail with MQRC_HOST_NOT_AVAILABLE. You need to issue MQDISC, or retry the MQCONN(X) a second time.
  • You specified a connection address like 127.0.0.1:1414 when it was expecting 127.0.0.1(1414).

MQRC_UNKNOWN_OBJECT_QMGR: 2086 (0826) (RC2086) with a client application

This can be caused when using a client connection and specifying a queue manager name of the format “*name” (for availability) . The application takes this queue manager name, and uses it in the MQOD.
If the first character of the Queue Manager Name is “*” then MQINQ should be used to retrieve the actual queue manager name, or do not use the “*name”.

MQRC_NOT_AUTHORIZED: 2035 (07F3) (RC2035) with MQCONNX

Trying to use MQCONNX to connect to a queue manger. The info from the Knowledge centre and the AMQ message say a blank userid or password was given. I also found the following can cause the same return code

  • mqcno.SecurityParmsPtr = 0;
  • csp.CSPPasswordLength = 0;
  • sp.CSPUserIdLength = 0;
  • csp.CSPPasswordPtr= 0;
  • csp.CSPUserIdPtr = 0;
  • csp.AuthenticationType != MQCSP_AUTH_USER_ID_AND_PWD;

MQRC_ENVIRONMENT_ERROR: 2012 (07DC) (RC2012) with MQCONNX

Trying to use MQCONNX with MQCNO_RECONNECT_Q_MGR or MQCNO_RECONNECT;

  • Not using threaded application. My C program was built with -lmqic instead of -lmqic_r -lpthread
  • SHRCONV = 0 on the channel definitions

MQRC_Q_MGR_NAME_ERROR: 2058 (080A) (RC2058)

  • export MQCHLLIB not pointing to correct location
  • export MQCHLTAB pointing to the wrong name, or not set and AMQCLCHL.TAB not found in the location pointed to by MQCHLLIB
  • remember to update your .profile so this does not happen again
  • you are using a CCDT and passed in a QMNAME of XXXX, for all channels with QMNAME XXXX none could connect to the queue manager in the conname.
  • You think you were using a mqclient.ini file … but are now in a different directory
  • You are using the correct mqclient.ini file.  It has a ChannelDefinitionFile=… file.   This ccdt file is missing entries for the queue manager.  use the runmqsc command DIS CHL(*) where chltype(eq,svrconn) to display the valid channels on the server.
  • You tried to connect with the queue manager name, and need to connect to the QM group name.
  • You forgot the * in front of the queue manager name when using groups.

MQRC_KEY_REPOSITORY_ERROR: 2381 (094D) (RC2381)

  • MQSSLKEYR not set to the keystore path and file name
  • you specified …/key.kdb instead of /key without the .kdb
  • remember to update your .profile so this does not happen again

 

MQRC_OPTIONS_ERROR:2046 (07FE) (RC2046)

During MQCONNX: mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY + MQCNO_USE_CD_SELECTION;

Solved it using

  • mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY + MQCNO_USE_CD_SELECTION
  • or
  • mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY
  • but not both

MQRC_CD_ERROR2277 (08E5) (RC2277)

I received message in the /var/mqm/error/*.LOG saying

AMQ9498E: The MQCD structure supplied was not valid.

EXPLANATION: The value of the ‘ChannelName’ field has the value ‘0’. This value is invalid for the operation requested.

This is only partially true. If you specify mqcno.Options=MQCNO_CD_FOR_OUTPUT_ONLY, this returns the name of the channel to you. In this case specifying a blank channel name is valid. If this options value is not specified, then a channel name is required.

AMQ9202E: Remote host not available, retry later.

EXPLANATION:
The attempt to allocate a conversation using TCP/IP to host ” for channel
QMZZZ was not successful. However the error may be a transitory one and it may be possible to successfully allocate a TCP/IP conversation later.

This is not strictly accurate.

In my MQCONNX I specified a channel name of QMZZZ which did not exist in the Client Channel Definition Table (CCDT).

  • Check the channel name in ClientConn.ChannelName
  • Specify mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY so it ignores what is in the channel, and picks one from the entries in the CCDT.

AMQ9498E: The MQCD structure supplied was not valid.

EXPLANATION:
The value of the ‘ChannelName’ field has the value ‘0’. This value is invalid for the operation requested.
ACTION:
Change the parameter and retry the operation.

  • I got this when I specified a blank (not ‘0’ ) in the ChannelName field. If I specified mqcno.Options = MQCNO_CD_FOR_OUTPUT_ONLY I did not get this error message, as the specified channelname value is ignored. I fixed the problem by changing the MQCNO, not the MQCD

PCF: MQRCCF_MSG_LENGTH_ERROR: 3016 (0BC8) (RC3016)

I got this when using PCF and got my lengths mixed up, for example StrucLength was longer than the structure.

PCF: MQRCCF_CFST_PARM_ID_ERROR: 3015 (0BC7) (RC3015)

I got this when I issued INQUIRE_Q and passed in a channel name PCF:MQRC_UNEXPECTED_ERROR 2195 (0893) RC2195

I also got back section MQIACF_ERROR_IDENTIFIER (1013) with a value of 2031619. I cant find what this means.
My problem was I had specified an optional section – but not a required one.

PCF:MQRCCF_CFST_PARM_ID_ERROR 3015 (0BC7) RC3015

I got this when using MQCMD_INQUIRE_Q, and I had specified MQCACF_Q_NAMES instead of MQCACF_Q_NAME ( no ‘s’).

MQWEB on z/OS

SRVE0279E: Error occured while processing global listeners for the application com.ibm.mq.rest:
java.lang.NoClassDefFoundError: com.ibm.mq.mft.rest.v1.resource.MFTCommonResource (initialization failure)

SRVE0279E: Error occured while processing global listeners for the application com.ibm.mq.console: java.lang.NoClassDefFoundError: com.ibm.mq.ui.api.ras.RasDescriptor (initialization failure)

SRVE0321E: The [SecurityFilter] filter did not load during start up.
SRVE0321E: The [JSONFilter] filter did not load during start up.
SRVE0321E : The [MQConsoleSecurityFilter] filter did not load during start up.

I got this because the MQ JMS libraries had not been installed. I had /colin3/mq923/web, but was missing/colin3/mq923/java .

Liberty

CWPKI0024E: The certificate alias BPECC specified by the property com.ibm.ssl.keyStoreServerAlias is not found in KeyStore ://IZUSVR/KEY

The RACF command RACDCERT LISTRING(KEY ) ID(IZUSVR) <check the case>

gives

Certificate Label Name Cert Owner USAGE DEFAULT
-------------------------------- ------------ -------- -------
BPECC ID(START1) PERSONAL YES

So it is in the key store.

You need to check there is profile for the keyring, and as the requester needs access to the private key, has update access to it.

The userid issuing the command may not have access to the keyring. The private key was needed, so needs update access to the keyring.

RLIST rdatalib START1.KEY.LST authuser
RDEFINE RDATALIB IZUSVR.KEY.LST UACC(NONE) 
PERMIT IZUSVR.KEY.LST CLASS(RDATALIB) ID(IZUSVR) ACCESS(UPDATE)
SETROPTS RACLIST(RDATALIB) REFRESH 
SETROPTS RACLIST(DIGTCERT,DIGTRING ) refresh

Note: The SETROPTS RACLIST(DIGTCERT,DIGTRING ) refresh is not strictly needed but it is worth doing it in case there were updates to the certificates and the refresh command was not done.

Other options

  • The certificate was not in the keyring
  • It was NOTRUST
  • It had expired
  • The CA for the certificate was not in the keyring,
  • The userid did not have update access to the keyring when there are private certificates from other userids. See here

CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLHandshakeException: no cipher suites in common

This can be caused by

  • the requester not having access to the private key in the keyring.
  • no valid certificate in the ring.

CWWKB0117W: The IZUANG1 angel process is not available. No authorized
services will be loaded. The reason code is 4,104.
CWWKB0115I: This server is not authorized to load module bbgzsafm.
No authorized services will be loaded.

You need to define profiles and give the userid access to them

RDEF SERVER BBG.ANGEL UACC(NONE)                                    
RDEF SERVER BBG.ANGEL.ANGEL UACC(NONE)
RDEF SERVER BBG.AUTHMOD.BBGZSAFM UACC(NONE)
RDEF SERVER BBG.AUTHMOD.BBGZSCFM UACC(NONE)
RDEF SERVER BBG.AUTHMOD.BBGZSAFM.ZOSWLM UACC(NONE)
RDEF SERVER BBG.AUTHMOD.BBGZSAFM.TXRRS UACC(NONE)
RDEF SERVER BBG.AUTHMOD.BBGZSAFM.PRODMGR UACC(NONE)

PERMIT BBG.AUTHMOD.BBGZSAFM.SAFCRED CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.LOCALCOM CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.PRODMGR CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.SAFCRED CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.TXRRRS CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.TXRRS CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.WOLA CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.ZOSAIO CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.ZOSDUMP CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSAFM.ZOSWLM CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSCFM CLASS(SERVER) -
ID(START1) ACCESS(READ)
PERMIT BBG.AUTHMOD.BBGZSCFM.WOLA CLASS(SERVER) -
ID(START1) ACCESS(READ)
SETROPTS RACLIST(SERVER) REFRESH

Z/OSMF

ERROR   ] CWPKI0022E: SSL HANDSHAKE FAILURE:  … PKIX path building failed: com.ibm.security.cert.IBMCertPathBuilderException: unable to find valid certification path to requested target.

With message
The signer might need to be added to local trust store … , located in SSL configuration alias izuSSLConfig.  The extended error message from the SSL handshake exception is: PKIX path building failed: com.ibm.security.cert.IBMCertPathBuilderException: unable to find valid certification path to requested target.

Action: A client has sent a certificate and Liberty is trying to validate it

  1. The certificate from the client  is self signed and not in the keyring (or trust keyring if this is used)
  2. The CA or intermeditate CAs are not in the keyring
  3. The CA’s are in the keyring, but not trusted
  4. There are CAs with the same name, but not the same content in the keyring. Check dates and other attributes

It may be that the Server’s certificate is being used to validate, so check the certificate being used by z/OSMF or Liberty.

Firefox is getting Error code: SEC_ERROR_UNKNOWN_ISSUER

Check your certificates.   You need the CA and any intermediate CAs in the “Authorities” section of certificates.  They may need to be trusted.

They are not automatically imported when you import a certificate.

IZUG476E: The HTTP request to the secondary z/OSMF instance “S0W1” failed with error type “HttpConnectionFailed” and response code “0”

I got this when trying to submit a job in the workflow topic.   You should get some ffdcs generated.

I had

  • java.net.UnknownHostException: s0w1.dal-ebis.ihost.com 
  • WorkflowException: IZUWF9999E: The request cannot be completed because an error occurred.  The following error data is returned: “IZUG476E:The HTTP request to the secondary z/OSMF instance “S0W1” failed with error type “HttpConnectionFailed” and response code “0” .”

Ping s0w1.dal-ebis.ihost.com and nslookup s0w1.dal-ebis.ihost.com did not return any data.

I edited /etc/hosts/

10.1.1.2 S0W1.CANLAB.IBM.COM S0W1 
10.1.1.2 s0w1.dal-ebis.ihost.com

and tso ping s0w1.dal-ebis.ihost.com worked.

I had to restart z/OSMF for it to pick up the change.

Server reports Certificate errors – certificate_unknown

  • unable to find valid certification path to requested target
  • Rethrowing javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
  • certificate_unknown

This was caused by the trust store at the client end did not have the CA certificate for the certificate sent from the server.  It may have had it, but it may have expired.

You may also get sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target because the trust store did not have the CA certificate, or the certificate was not valid – for example not trusted, or expired.

java.security.cert.CertificateException: PKIXCertPathBuilderImpl could not build a valid CertPath.

Check in the trace and ffdc.  I got errors

FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertPathBuilderException: PKIXCertPathBuilderImpl could not build a valid CertPath.; internal cause is:
java.security.cert.CertPathValidatorException: The certificate issued by CN=TEMP4Certification Authorit2, OU=TEST, O=TEMP is not trusted; internal cause is: java.security.cert.CertPathValidatorException: Certificate chaining error
com.ibm.ws.ssl.core.WSX509TrustManager checkServerTrusted” 

CWPKI0022E: SSL HANDSHAKE FAILURE: A signer with SubjectDN (the cerificate used by the server)  was sent from the target host. The signer might need to be added to local trust store safkeyring://my/TRUST, located in SSL configuration alias defaultSSLSettings.

The extended error message from the SSL handshake exception is: PKIX path building failed: java.security.cert.CertPathBuilderException: PKIXCertPathBuilderImpl  could not build a valid CertPath.; internal cause is: java.security.cert.CertPathValidatorException: The certificate issued
by (my ca)  is not trusted; internal cause is:  java.security.cert.CertPathValidatorException: Certificate chaining error

 IZUWF9999E: The request cannot be completed because an error occurred. The following error data is returned:  “java.security.cert.CertificateException: PKIXCertPathBuilderImpl could not build a valid CertPath.”

Action: Add the CA for the server’s certificate to the trust store.   I had to restart z/OSMF to pick it up

CWPKI0033E: The keystore located at safkeyringhybrid://START1/KEY did not load because of the following error: Invalid keystore format

Change

location=”safkeyringhybrid://USERID/Keyring to location=”safkeyring://USERID/Keyring to

BPXF024I

You get this message if the syslogd program is not running.

BPXP015I HFS PROGRAM /usr/lpp/zosmf/lib/libIzuCommandJni.so IS NOT MARKED PROGRAM CONTROLLED.   BPXP014I ENVIRONMENT MUST BE CONTROLLED FOR DAEMON (BPX.DAEMON) PROCESSING.

Use the command extattr /usr/lpp/zosmf/lib/libIzuCommandJni.so to check the Program Controlled attribute is set. Use the extattr +p…. to set it if required.

I had the wrong SAF_PREFIX(‘IZUDFLT‘) in USER.Z24A.PARMLIB(IZUPRMCP).   IZUDFLT was correct.

I had other problems like invalid password when I logged onto the web browser.

Fix the problem and regenerate.

IZUG807E  An error occurred while attempting to load a required program library. Error: “require is not defined”

With an FFDC saying SRVE0190E: File not found: /IzuUICommon/1_5/zosmf/util/ui/resources/common.css

Action: close the browser and restart it

BPXO042I with D OMVS,PFS

I was expecing D OMVS,PFS or D OMVS,P to give me BPXO068I and a list of Physical File Systems.

it gives BPXO042I when the command failed.

This was due to having an HFS definition in my z/OS 3.1 system. HFS is not supported on 3.1 . I removed the definition and it worked.

FSUM2378 The start of the session was not recorded. The slot (in /etc/utmpx) for this terminal could not be updated, or a new slot for the terminal could not be created.

Function = pututxline(), terminal name = ‘/dev/ttyp0001’, program name = ‘/bin/fomtlinc’, errno = 141 (X’0000008D’), reason code =
5620060, message = ‘EDC5141I Read-only file system.’

I got this when the /etc/utmpx file was not access readwrite. My /etc was mounted read only, and the permissions of the file were 744. When I made permissions 777 it worked.

With IBMUSER with id(0) this worked with no problems.

RACF certificates

IRRSDL00 R_datalib RC 8 RS 44 (0x2c)

I got this when the job userid did not have update access to the keyring for accessing private certificate information. Eg RDATALIB profile START1.CCPKeyring.IZUDFLT.LST, and z/OSMF userid IZUSVR. The profile may not exist.

IRRD103I An error was encountered processing the specified input data set.

I got this when using RACDCERT CHECKCERT(‘COLIN.CARSA.PEM’).

The error was caused by having the file open read write. If I exited from the file, the command worked.

I also got this when using RACDCERT ADD(dsn) and dsn was not variable blocked.

IRRD104I The input data set does not contain a valid certificate.

The certificate did not have a subject DN in it.

EZD1287I TTLS Error RC: 435 Initial Handshake

435 Certification authority is unknown.

I got this having replaced the CA certificate. Deleting a certificate removes it from any keyring. When you recreate the CA, you need to add it to every keyring it was in. Before deleting a certificate it is worth listing it to see where it is used. I added it to my keyring and it worked!

IRRD109I The certificate cannot be added. Profile…. is already defined.

Action use RACDCERT LIST ID(…) to list all the certificate belonging to a user. Search for the CN value Due to a mistake, a certificate had been created using the label LABEL00000006.

I then used RACDCERT ID(START1) DELETE(LABEL(‘LABEL00000006’)) to delete it

IRRD140I The filter value does not begin with a valid prefix.

Ensure you are using upper case sod

IDNFILTER(‘CN=SSCA256.OU=CA.O=DOC.C=GB’)

instead of

IDNFILTER(‘cn=SSCA256.ou=CA.o=DOC.c=GB’)

IRRD147I EXPORT in PKCS12 format requires a certificate with an associated non-ICSF private key. The request is not processed.

It looks like the private key data is stored in ICSF, and so not accessible to the RACF export command.

TLS trace

java.security.cert.CertPathValidatorException: Could not determine revocation status

This is displayed when a self signed certificate is processed. It could be a self signed certificate, or the top of the hierarchy of a chain of signers.

Java java.security.NoSuchAlgorithmException: TLSv1.3 SSLContext not available

z/OS does not support TLS v1.3 yet, and this is thrown. It was announced in April 2020.

CWWKS4000E: A configuration exception has occurred. The
requested TokenService instance of type Ltpa2 could not be
found.

I found I could no longer authenticate to z/OSMF and there were CWWKS4000E messages in the z/OSMF logs. In my /global/zosmf/data/logs/zosmfServer/logs/message…. I had near the top of the file

CWWKS4106E: LTPA configuration error. Unable to create or read LTPA key file:
/global/zosmf/configuration/servers/zosmfServer/resources/security/ltpa.keys

I renamed /global/zosmf/configuration/servers/zosmfServer/resources/securityldpa.keys to keys.saved, and restarted z/OSMF.

On restart it recreated the file, and I could logon successfully.

CWWKS1100A: Authentication did not succeed for user ID COLIN. An invalid user ID or password was specified.

Also check the stderr log

[ERROR ] CWWKS2907E: SAF Service IRRSIA00_CREATE did not succeed because user COLIN has insufficient authority to access APPL-ID IZUDFLT.

SAF return code 0x00000008. RACF return code 0x00000008. RACF reason code 0x00000020.

CONNECT user_id GROUP(group_id)

or


Permit IZUDFLT class(APPL) id(userid) Access(read)
setropts raclist(Appl) refresh

IKJ56251I USER NOT AUTHORIZED FOR SUBMIT YOUR TSO ADMINISTRATOR MUST AUTHORIZE USE OF THIS COMMAND

You need to give the userid access to the TSOAUTH resource

//TSO3 EXEC PGM=IKJEFT01
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
PERMIT CONSOLE CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)
PERMIT JCL CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)
PERMIT PARMLIB CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)
SETROPTS RACLIST(TSOAUTH)

I’ve been told the following is no longer needed (but used to be needed)

PERMIT SUBMIT CLASS(TSOAUTH) ID(COLIN) ACCESS(READ)


IKJ56702I INVALID GROUP, PKIGRP3

I got this with

DELGROUP PKIGRP3

The message is totally wrong.

I could not delete the group because it had users connected to it. When I removed the userids it worked OK.

ICSF

IEC143I 213-85, … RC=X’00000008′,RSN=X’0000271C’

You may need to refresh the in memory copy of the PKDS.

IEC614I … RC 192, DIAGNOSTIC INFORMATION IS (040343C9)

You need to know to look up the last 4 digits 43c9 in DFSMS Diagnostic aids. The code means SMS-managed volumes specified for non-SMS request.

You can use the operator command D SMS,VOL(USER00), to list one volid.  If this is SMS managed, it gives the storage group name. If it is not SMS managed it gives: IGD005I COMMAND REJECTED VOLUME …… IS NOT AN SMS MANAGED DASD VOLUME .

Note: The command d sms,sg(All),listvol lists all volumes defined to SMS – even though they may not exit on the z/OS IMAGE.

MQ applications

IEW2456E SYMBOL CSQB1CON UNRESOLVED.
IEW2456E SYMBOL CSQB1DSC UNRESOLVED.

Was using cc to compile in Unix Services, and had Binder option dll. The compiler did not have this option, and so gave this message.

I used

cc -c -o c.o -Wc,SO,LIST(lst),SHOWINC,SSCOM,DLL,LSEARCH(‘COLIN.MQ924.SCSQC370′) -I //’COLIN.MQ924.SCSQC370’ c.c
cc -o mqsamp -V -Wl,LIST,MAP,INFO,DYNAM=DLL,AMODE=31 //’COLIN.MQ924.SCSQDEFS.OBJ(CSQBMQ1)’ c.o

Note I had to create the COLIN.MQ924.SCSQDEFS.OBJ, when using the xlc compiler.

IOEZ00312I Dynamic growth of aggregate ZFS.USERS in progress,
IOEZ00329I Attempting to extend ZFS.USERS by a secondary extent.
IEF196I IEC070I 104-204,OMVS,OMVS,SYS00022,0A9E,C4USS2,ZFS.USERS,
IEF196I IEC070I ZFS.USERS.DATA,CATALOG.Z24C.MASTER
IEC070I 104-204,OMVS,OMVS,SYS00022,0A9E,C4USS2,ZFS.USERS, 588
IEC070I ZFS.USERS.DATA,CATALOG.Z24C.MASTER
IOEZ00445E Error extending ZFS.USERS. DFSMS return code = 104, PDF code = 204.

MSG IEC070I 104-204 data set would exceed 4 gig if extended.

z/OS

CCN0629(U) DD:SYSLIN has invalid attributes.
CCN0703(I) An error was encountered in a call to fopen() while processing DD:SYSLIN.

We got these when compiling a C program, and using SYSLIN.
The problem is that the procedures such as EDCCBG, have

//SYSLIN .. DCB=(RECFM=FB,LRECL=80,BLKSIZE=3200)

when the data set had a blksize of more than 3200 (eg 27920). It sees and reports the mismatch.

Unix services

FSUM7332 syntax error: got Word, expecting )

I was trying to use a Python virtual environment and used the command

. env/bin/activate

The problem was the code page of the file.

I needed

export _BPXK_AUTOCVT=ON

I put this in my .profile file/

openssl

I got the following using x3270.

Use export SSL_VERBOSE_ERRORS=”1″ to get more info

Error: SSL: Private key file load ("...") failed:
error:0909006C:PEM routines:get_name:no start line

Using

openssl s_client -connect 10.1.1.2:2023 -cert … -certform PEM

gave more info

unable to load client certificate private key file
error:0909006C:PEM routines:get_name:no start line:../crypto/pem/pem_lib.c:745:Expecting: ANY PRIVATE KEY

I needed certificate and key, for example

x3270 -port 2023 -trace -tracefile x3270.trace -certfile ~/ssl/ssl2/colinpaice.pemkeyfile /home/colinpaice/ssl/ssl2/colinpaice.key.pem 10.1.1.2

Z/OS

IEE535I … INVALID PARAMETER

I had

TRACE CT,WTRSTART=CTWTR
IEE535I TRACE INVALID PARAMETER

TRACE CT,WTRSTART=CTWTR,WRAP
ITT038I … WERE SUCCESSFULLY EXECUTED.

The first command was copied from a document. It had a trailing non blank space (x41). Remove it and the command works.

Try pasting the command into an ISPF edit session and using hex on to display the command.

EDC5164I SAF/RACF error. errno2 rs 199754829 0be8044d 0x0b8044d

I got this when I was trying to authentic using pthread_security_applid_np, and the certificate sent up did not have a subject DN.

The lack of subject DN was caused by commonName = supplied being missing in openssl when doing openssl ca … -policy signing_policy.

BPX1SOC TTLS_INIT_CONNECTION rv -1 rc ECONNRESET(1121) rs 2007593789 (0x77a9733d) 77a9733d EDC8121I Connection reset

The bpxmtext 77a9733d gives

TCPIP
JrTtlsHandshakeFailed: AT-TLS was unable to successfully negotiate a secure
TCP connection with the remote end.
Action: Review message EZD1286I for more information about the error.

On syslog was

EZD1287I TTLS Error RC: 403 Initial Handshake

Where 403 is The required certificate was not received from the communication partner.

The Wireshark output had a Certificate flow from the client to the server. This had no certificate in it.

The reason for this was,

  • the client had an RSA certificate
  • the Signature Hash Algorithms sent from the server did not include RSA.

The client was thus unable to send a certificate matching the SHA.

If I specified RSA only signature pairs, I could only use an RSA certificate. An Elliptic Curve certificate (ECDSA) had the same message and error code.

BPX1BND rv -1 rc EADDRINUSE(1115) rs 1951167047 (0x744c7247) EDC8115I Address already in use.

Because a program may not know that the “FIN” (end of conversation) has got to the other end, a socket enters a TIMEWAIT state. The IBM documentation says

If the server cannot wait for one to four minutes, you can use the setsockopt() call in the server to specify SO_REUSEADDR before it issues the bind() call. In that case, the server will be able to bind its socket to the same port number it was using before, even if the TIMEWAIT period has not elapsed. However, the TCP protocol layer still prevents it from establishing a connection to the same partner socket address. As clients normally initiate connections and clients use ephemeral port numbers, the likelihood of this is low.

BPX1SND rv -1 rc EOPNOTSUPP(1112) rs 1977578120 (0x75df7288) EDC8112I Operation not supported on socket.

I got this trying to issue bpx1snd() when there was data in the receive buffer. I used bpx1rcv to read the data, and the problem went away.

I peeked at the data before getting it, so I knew the length of the data to get, and so avoided waiting for data.

char buf[4000];
int lbuff = sizeof(buf); 
int alet = 0; 
int flags = MSG_PEEK; 
BPX1RCV( &sd,   // socket desciptor 
        &lbuff, 
        &buf, 
        &alet, 
        &flags, 
        &rv, // -1 or number of bytes 
        &rc, 
        &rs); 
 
printf("BPX1RCV Peek bytes %d data... \n",rv   ); 

lbuff = rv; // the number of bytes in the buffer 
flags = 0       ; 
BPX1RCV( &sd,   // socket descriptor 
        &lbuff, 
        &buf, 
        &alet, 
        &flags, 
        &rv, // -1 or number of bytes 
        &rc, 
        &rs); 

printf("BPX1RCV bytes %d data... \n",rv   ); 

BPXF135E RETURN CODE 00000079, REASON CODE 055B005C

I got this using the command

MOUNT FILESYSTEM(‘COLIN.ZFS2’) TYPE(ZFS) MOUNTPOINT(‘/u/ibmuser/temp’ )

code 79 is invalid. The 005b005c means already in use. Either

  • COLIN.ZFS2 is already mounted
  • there is something else mounted on /u/ibmuser/temp

You can use the D OMVS,F command to display the file system and where they are mounted.

BPXF135E RETURN CODE 00000081, REASON CODE 053B006C

May because the file system is mounted READ and it needs to be RDWR.

BPXMTEXT 053B006C -> JRFileNotThere: The requested file does not exist.

Problem 1

I had MOUNTPOINT(‘/u/ibmuser/test’ ) (which did not exit) not the correct MOUNTPOINT(‘/u/ibmuser/temp’ )

Problem 2

I was trying to mount it at /my. I had to go into Unix and issue mkdir /my only then could I mount the file system.

BPXF137E RETURN CODE 00000079, REASON CODE 0588002E.

THE UNMOUNT FAILED FOR FILE SYSTEM …

002E is JRFilesysNotThere. Check the file system is mounted

BPXF137E RETURN CODE 00000072, REASON CODE 058800AA

BPXF137E RETURN CODE 00000072 (the resource is busy) , REASON CODE 058800AA JRFsParentFs The file system has file systems mounted on it.

I was trying to unmount a ZFS file systemm, and got the above messages. It means you cannot unmount it, because you have other file systems attached to it. On the z/OS console it had

BPXF271I FILE SYSTEM ZFS.USERS                             
FAILED TO UNMOUNT BECAUSE IT CONTAINS MOUNTPOINT DIRECTORIES FOR  
ONE OR MORE OTHER FILE SYSTEMS WHICH MUST BE UNMOUNTED FIRST,     
INCLUDING FILE SYSTEM COLIN.ZFS2                                  

I used

unmount filesystem('COLIN.ZFS2') Immediate

and got message on the console

IOEZ00048I Detaching aggregate COLIN.ZFS2

RACF

ICH409I 500-002 ABEND

RACF abended S 500 00000002. I had problems with the RACF database. I carefully reallocated it using

//IBMCOPYR  JOB    1,MSGCLASS=H 
//STEP EXEC PGM=IEFBR14
//SYSUT1 DD SPACE=(CYL,(10,10),RLSE),
// DCB=(LRECL=4096,RECFM=F),DISP=(MOD,DELETE),
// DSN=SYS1.COLIN.RACFDB.Z31B
//STEP EXEC PGM=IRRUT200 PARM=ACTIVATE
//SYSRACF DD DSN=COLIN.RACFDB.NEW,DISP=SHR
//SYSUT1 DD SPACE=(CYL,(70),,CONTIG),
// DCB=(DSORG=PS),DISP=(NEW,CATLG),
// DSN=SYS1.COLIN.RACFDB.Z31B
/*
//SYSUT2 DD SYSOUT=A
//SYSPRINT DD SYSOUT=A
//SYSIN DD *
INDEX
MAP
END
/*

and it worked.

ICH15004I BACKUP DATASET CAN NOT BE SWITCHED; dsname IGNORED

#rvary list gave me

ACTIVE USE  NUM VOLUME   DATASET                    
------ --- --- ------ -------
YES PRIM 1 B3CFG1 SYS1.RACFDS
YES BACK 1 B3USR1 SYS1.COLIN.RACFDB.Z31B

I used rvary switch,dataset(SYS1.COLIN.RACFDB.Z31B) and got the above message.

I should have just used rvary switch.

ICH21053I Unexpected return code=00000004 and reason code=00000000 from IBM MFA while processing user …

I stopped and restarted the AZF server AZF#in00 and the problem went away.

ICH408I USER(…) GROUP(…) NAME(ADCDA )NOT AUTHORIZED TO ADMINISTER DIGITAL CERTIFICATES OR CERTIFICATE REQUESTS. READ DENIED

and

IKYI002I SAF Service IRRSPX00 Returned SAF RC = 8 RACF RC = 8 RACF RSN = 8 Request denied, not authorized.

This can be caused a user not having access, or by the wrong userid being used:

User not having access

The user issuing the request was not authorised to IRR.RPKISERV.PKIADMIN CLASS(FACILITY).

Note what the message says

  • READ DENIED
  • UPDATE DENIED

Use

tso rlist facility irr.RPKISERV.PKIADmin auth

and connect the userid ( if required) to a group or give the required access with

PERMIT IRR.RPKISERV.PKIADMIN CLASS(FACILITY)
ID(ADCDA ) ACCESS(read )

setropts raclist(FACILITY) refresh

The wrong userid being used

In Http I had

<files qc2.rexx> 
  AuthName          SAFSurrogateUser 
  AuthType          Basic 
  AuthBasicProvider saf 
  Require           valid-user 
  SAFRunAs          PKISERV 
</files> 

This worked fine.

I created a new rexx exec, without a defintion, and this caused the error messages because it ran with the WEBSRV userid, which did not have access, and so failed.I changed the SAFRunAs to %%CLIENT%% and it worked.

FSUM7351 not found

echo: /usr/lpp/ihsa_zos/bin/apachectl 88: FSUM7351 not found

At line 88 in the file, the command “echo” was not found. Check the path and libpath and check that /bin:/usr/sbin: are both specified

EZD0860I Stack INET is not available : errno 1011 (EDC8011I A name of a PFS was specified that either is not configured or is not a Sockets PFS.) errnojr 0x11B3005A

Programs like TCPIP ipsec could not find the default IP name.

For example /etc/resolv.conf was missing TCPIPJOBNAME

nameserver 127.0.0.1 
TCPIPJOBNAME TCPIP

See Configuring TCPIP.DATA, Configuration statements in TCPIP.DATA, and TCPIPJOBNAME.

EZY2642E Unknown keyword:

I got this with FTP

EZY2642E Unknown keyword: PASSIVEDATAPORTS(8000,8100)

There needs to be a blank between PASSIVEDATAPORTS and the values (8000,8100)

PASSIVEDATAPORTS (8000,8100)

IKJ56529I SYMBOLIC PARMS IN VALUE LIST IGNORED
IKJ56529I COMMAND PROCEDURE HAS NO PROC STMT

In my TSO rexx program I had

/* REXX */ 

address tso
if userid = "" then userid = SYSVAR("SYSUID")
say "userid="userid"."
x = outtrap("var.")
"TSO RACDCERT LISTRING(TN3270) ID(PAICE)"

and got

IKJ56529I SYMBOLIC PARMS IN VALUE LIST IGNORED – RACDCERT LISTRING(TN3270) ID(START1 )+
IKJ56529I COMMAND PROCEDURE HAS NO PROC STMT

The problem was the TSO in front of the command. In effect the command was

address tso TSO RACDCERT LISTRING(TN3270) ID(PAICE)

and the TSO Command processor was unable to parse the statement.

Removing the TSO in “TSO RACDCERT LISTRING(TN3270) ID(PAICE)” solved the problem

DFDSS messages

ADR374E (001)-OPNCL(14), UNABLE TO OPEN DDNAME TARGET, 14

The target data set had a RACF profile which meant it would be encrypted. For example had

DFP INFORMATION                                  
---------------
RESOWNER= NONE
DATAKEY= COLINBATCHAES

Action: Use a different data set name.

ADR412E DATA SET …IN CATALOG … ON VOLUME … FAILED SERIALIZATION

You need TOL(ENQF)

DUMP  - 
DATASET(INCLUDE(USER.Z25D.PROCLIB -
USER.Z25D.PARMLIB -
USER.Z25D.CLIST )-
) -
TOL(ENQF) -
OUTDDNAME(TARGET) -
COMPRESS

ADR380E DATA SET … NOT PROCESSED, 31

Code 31 means it did not know where to put it. Use

//S1  EXEC PGM=ADRDSSU,REGION=0M PARM='TYPRUN=NORUN'               
//TARGET DD DSN=COLIN.BACKUP.CSF,DISP=SHR
//SYSPRINT DD SYSOUT=*
//DASD2 DD UNIT=3390,VOL=(PRIVATE,SER=D5CFG1),DISP=OLD
//SYSIN DD *
RESTORE -
DATASET(INCLUDE(CSF.**) ) -
REPLACE -
OUTDDNAME(DASD2 ) -
INDDNAME(TARGET)
/*
DATASET(INCLUDE(CSF.*) ) - REPLACE - OUTDDNAME(DASD2 ) - INDDNAME(TARGET) /

ADR380E DATA SET … NOT PROCESSED, 18

Code 18 means you need replace

//S1  EXEC PGM=ADRDSSU,REGION=0M PARM='TYPRUN=NORUN'           
//TARGET DD DSN=COLIN.BACKUP.CSF,DISP=SHR
//SYSPRINT DD SYSOUT=*
//DASD2 DD UNIT=3390,VOL=(PRIVATE,SER=D5CFG1),DISP=OLD
//SYSIN DD *
RESTORE -
DATASET(INCLUDE(CSF.**) ) -
OUTDDNAME(DASD2 ) -
REPLACE -
INDDNAME(TARGET)
/*

AZF2606E Failed to listen on loopback address (port:…. rc:112, rsn:0x112b00b6)

rc:112 means EAGAIN – resource temporarily unavailable.

112b and 00b6

I got this when TCPIP was down, and so a connect to a socket failed.

TSO

IKJ56251I USER NOT AUTHORIZED FOR SUBMIT

The userid needs access to JCL

permit JCL     class(TSOAUTH)id(COLIN) access(REAd) 
permit CONSOLE class(TSOAUTH)id(COLIN) access(REAd)
setropts raclist(TSOAUTH) refresh
setropts raclist(ACCTNUM) refresh

Binder

IEW2469E 9907 THE ATTRIBUTES OF A REFERENCE TO … FROM SECTION … DO NOT MATCH THE ATTRIBUTES OF THE TARGET SYMBOL. REASON 2

Message IEW2469E reason 2 is The xplink attributes of the reference and target do not match.

I was compiling this from a 64 bit C program (so is XPLINK). I needed a

#pragma linkage(IRR… ,OS)

in my program to say the program is a stub/ assembler program.

VSAM

IDC3009I ** VSAM CATALOG RETURN CODE IS 80 – REASON CODE IS IGG0CLAT-4

DEFINE PATH -
 (NAME( COLIN.ISM400.UTIL.ZFS ) -
  PATHENTRY( ISM400.UTIL.ZFS ))


IDC3022I INVALID RELATED OBJECT
IDC3009I ** VSAM CATALOG RETURN CODE IS 80 – REASON CODE IS IGG0CLAT-4
IDC3003I FUNCTION TERMINATED. CONDITION CODE IS 12

it needs

   DEFINE PATH  - 
(NAME( COLIN.ISM400.UTIL.ZFS ) -
PATHENTRY( ISM400.UTIL.ZFS )) -
CATALOG(USERCAT.Z25D.PRODS)

Abend 0C4 in CELQLIB

I got SYSTEM COMPLETION CODE=0C4 REASON CODE=00000010 while my program was starting up.

This was caused by having a 64 bit C program linkedited with

// BPARM='SIZE=(900K,124K),RENT,LIST,RMODE=ANY,AMODE=31' 

instead of AMODE=64.

AWSEMI307I Warning! Disabled Wait CPU 0 = 00020000 00000000 00000000 00000088

I got this WAIT088 reIPLing a system after a migration. It is described here. It means I did not have a LOADxx member corresponding to the IPL parm with xx.

This is not to be confused with System Abend code 088 (in the same manual) The auxiliary storage manager (ASM) detected a paging I/O error when attempting to read from or write to storage-class memory (SCM). Which is an Abend code, not a wait code.

C compiler

ERROR CCN3166 file:line Definition of function … requires parentheses

I had code

#include <findkey.h> 
#include <keytype.h>

I had a definition typedef union keyTYPE…. in keytype.h but I used it in findkey before it was defined.

Solution:

Move the definition before use, or add #include <keytype.h> at the start of findkey.h

ERROR CCN3277 COLIN.ICSF.C.HELPERS(KEYTEST):31 Syntax error: possible missing
ERROR CCN3045 COLIN.ICSF.C.HELPERS(KEYTEST):32 Undeclared identifier rule.

At list 31 in COLIN.ICSF.C.HELPERS(KEYTEST) I had

char8 rule[2] = {"AES ","KEY-LEN "};

but char8 was not defined.

In my program I then defined typedef char char8[8]; and it worked. The clue was in the second message – not the first.

I put the following in my code

#ifndef  char8 
#error char8 not defined
#endif

and the compilation produced

ERROR CCN3205 COLIN.ICSF.C(DELETE):36    char8 not defined          

Abend S206-c0

I got the system 20c abend rc c0. The documentation says A parameter was not addressable or was in the wrong storage key.

I got this compiling a 64 bit C program, which was not XPLINK. I changed EDCCB to EDCQCB, and it worked.

EINVAL 0x0717014A

I was getting 0717014A when using shmatt. Eventually I changed my program to be 64 bit and it worked. ( I compiled it with EDCQCB <Compile, bind, and run a 64-bit C program>). It may be the original shared memory was defined in 64 bit mode.

TCP/IP

EZZ8342I gethostbyname(ABCD: Unknown host)

The names server had not been set up properly.

F RESOLVER,display

gives information like

F RESOLVER,DISPLAY                                                 
EZZ9298I RESOLVERSETUP - ADCD.Z31B.TCPPARMS(GBLRESOL)
EZZ9298I DEFAULTTCPIPDATA - ADCD.Z31B.TCPPARMS(GBLTDATA)
EZZ9298I GLOBALTCPIPDATA - ADCD.Z31B.TCPPARMS(GBLTDATA)
EZZ9298I DEFAULTIPNODES - ADCD.Z31B.TCPPARMS(ZPDTIPN1)
EZZ9298I GLOBALIPNODES - ADCD.Z31B.TCPPARMS(ZPDTIPN1)
EZZ9304I COMMONSEARCH
EZZ9304I CACHE
EZZ9298I CACHESIZE - 200M
EZZ9298I MAXTTL - 2147483647
EZZ9298I MAXNEGTTL - 2147483647
EZZ9304I NOCACHEREORDER
EZZ9298I UNRESPONSIVETHRESHOLD - 25
EZZ9293I DISPLAY COMMAND PROCESSED

The configuration file is RESOLVERSETUP – ADCD.Z31B.TCPPARMS(GBLRESOL).

The definitions of for the local name server are in DEFAULTIPNODES and GLOBALIPNODES.

You can either change one of these files, and use the command

F RESOLVER,refresh

to pick up the change. I did not want to change “production” so

  • I copied ADCD.Z31B.TCPPARMS(GBLRESOL) to USER.Z31B.TCPPARMS(GBLRESOL).
  • Created USER.Z31B.TCPPARMS(ZPDTIPN1) from ADCD.Z31B.TCPPARMS(ZPDTIPN1) , and added in my changes
  • Changed USER.Z31B.TCPPARMS(GBLRESOL) to have DEFAULTIPNODES – USER.Z31B.TCPPARMS(ZPDTIPN1)
  • Made the change active F RESOLVER,refresh,setup=’user.Z31B.TCPPARMS(GBLRESOL)’

When it worked, I copied my change from USER.Z31B.TCPPARMS(ZPDTIPN1) to ADCD.Z31B.TCPPARMS(ZPDTIPN1), and used F RESOLVER,refresh,setup=’user.Z31B.TCPPARMS(GBLRESOL)’ to go back to the system definitions.

IEF450I GPMSERVE GPMSERVE – ABEND=S0C4 U0000 REASON=00000011

After I got this message I started RMF, and used F RMF,START III and it worked successfully.

ERB944I Report is not available, reason code 3.

Using the RMF III option ZFSSUM I got the message, which has

3 Backlevel data or no data from the zFS interface.

I could not find what the problem was. I did wonder if it was because I had ZFS running within the OMVS address space (for performance), and so RMF could not find the ZFS job.

REXX on TSO

I got RC(-2168) from issuing a RACF command.

On the z/OS console I had

IEA705I ERROR DURING GETMAIN SYS CODE = 878-10 IBMKEYR STEPNAME 40      
IEA705I 00FC6000 008B9368 008B9368 00015610 00003000

Abend 878-10 is There is not enough virtual private area storage 

I changed my JCL to add a region size

//IBMKEYR JOB 1,MSGCLASS=H 
//STEPNAME EXEC PGM=IKJEFT01,PARM='LRING START1',REGION=0M

ZD&T ZPDT OPRMSG no console

Problem: I tried to start a new ZD&T system, but the console did not go into a full screen mode, but was like a line printer on the Linux terminal. You could issue commands using OPRMSG ‘d a,l’

Solution: I did not have 3270port 3270 in the devmap, so no 3270s were defined. I added it

[system]
processors 3
memory 8192m
system_name VS01
# ipl DE27 DE28NVM
3270port 3270 # port number for TN3270 connections

[manager]

The 3270 port matched the x3270 definition

x3270 -model 5  mstcon@localhost:3270  &