Should I Use MQ workflow in z/OSMF or not? Not.

It was a bit windy up here in Orkney, so instead of going out for a blustery walk, I thought I would have a look at the MQ Workflow in z/OSMF; in theory it makes it easier to deploy a queue manager on z/OS. It took a couple of days to get z/OSMF working, a few hours to find there were no instructions on how to use the MQ workflow in z/OSMF, and after a few false starts, I gave up because there were so many problem! I hate giving up on a challenge, but I could not find solutions to the problems I experienced.

I’ve written some instructions below on how I used z/OSMF and the MQ workflow. They may be inaccurate or missing information as I was unable to successfully deploy a queue manager.

I think that the workflow technique could be useful, but not with the current implementation. You could use the MQ stuff as an example of how do do things.

I’ve documented below some of the holes I found.

Overall this approach seems broken.

I came to this subject knowing nothing about it.  May be I should have gone on a two week course to find out how to use it.  Even if I had had education I think it is broken. It may be that better documentation is needed.

For example

  • Rather than simplifying the process, the project owner may have to learn about creating and maintaining the workflows and the auxiliary files the flows use – for example loops in the JCL files.
  • I have an MQ fix in a library, so I have a QFIX library in my steplib.   The generated JCL has just the three SCSQLOAD,SCSQAUTH, SCSQANLE in STEPLIB. I cannot see how to get the QFIX library included. This sort of change should be trivial.
  • If I had upgraded from CSQ913 to CSQ920, I do not see how the workflows get refreshed to point to the new libraries.
  • There are bits missing – setting up the STARTED profile for the queue manager, and specifying the userid.

Documentation and instructions to tell you how to use it.

The MQ documentation points to the program directory which just list the contents of the ../zosmf directory. Using your psychic powers you are meant to know that you have to go the z/OSMF web console and use the WorkFlows icon. There is a book z/OSMF Programming Guide which tells you how to create a workflow – but not how to use one!

The z/OSMF documentation (which you can access from z/OSMF) is available here. It is a bit like “here is all we know about how to use it”, rather than “baby steps for the new user”. There is a lot to read, and it looks pretty comprehensive. My eyes glazed over after a few minutes reading and so it was hard to see how to get started.

Before you start

You need access to the files in the ZFS.

  • For example  /usr/lpp/mqm/V9R1M1/zosmf/provision.xml this is used without change.   This has all of the instructions that the system needs to create the workflow.
  • Copy /usr/lpp/mqm/V9R1M1/zosmf/workflow_variables.properties  to your directory.  You need to edit it and fill in all of the variables between <…>.   There are a lot of parameters to configure!

You need a working z/OSMF where you can logon, use TSO etc.

Using z/OSMF to use the work flow

When I logged on to z/OSMF the first time, I had a blank  web page. I played around with the things at the bottom left of the screen, and then I got a lot of large icons.  These look a bit of a mess – rather than autoflow them to fill the screen, you either need to scroll sideways or make your screen very wide (18 inches wide).  These icons do not seem to be in any logical order.  It starts with “Workflows”; “SDSF” is a distance away from “SDSF settings”.  I could not see how to make small icons, or to sort them.

I selected “Classic” from the option by my  userid – This was much more usable, it had a compact list of actions, grouped sensibly,  down the side of the screen,

Using “workflow” to define MQ queue managers.

Baby steps instructions to get you started are below.

  • Click on the Workflows icon.
  • From the actions pull down, select Create Workflow.
  • In the workflow definition file enter /usr/lpp/mqm/V9R1M1/zosmf/provision.xml  or what workflow file you plan to use.
  • In the Workflow variable input file, enter the name of your edited workflow_variables.properties file.  Once you have selected this the system copies the content. To use a different file, or update the file,  you have to create a new Workflow.  If you update it, any workflows based on it do not get updated.
  • From the System pull down, select the system this is for.
  • Click ‘next’.   This will validate the variables it will be using.  Variables it does not use, do not get checked.
    • It complained that CSQ_ARC_LOG_PFX = <MQ.ARC.LOG.PFX> had not been filled in
    • I changed it to CSQ_ARC_LOG_PFX = MQ.ARCLOG – and it still complained
    • It would only allow CSQ_ARC_LOG_PFX = MQ, pity as my standards were MQDATA.ARCHIVE.queue_manager etc.
  • Once the input files have been validated it displays a window “Create Workflow”.  Tick the box “Assign all steps to owner userid”.  You can (re-)assign them to people later.
  • Click Finish.
  • It displays “Procedure to provision a MQ for zOS Queue manager 0 Workflow_o” and lists all of the steps – all 22 of them.
  • You are meant to do the steps in order.   The first step has State “Ready”, the rest are “Not ready”.
  • I found the interface unreliable.  For example
    • Right click on the Title of the first item.  Choose “Assignment And Ownership”.  All of the items are greyed out and cannot be selected.
    • If you click the tick box in the first column, click on “Actions” pull down above the workflow steps.  Select “Assignment And Ownership”.  You can now  assign the item to someone else.
      • If you select this, you get the “Add Assignees” panel.  By default it lists groups.  If you go to the “Actions” pull down, you can add a SAF userid or group.
      • Select the userids or groups and use the “Add >” to assign the task to them.
  • With the list of tasks, you can pick “select all”, and assign them;  go to actions pull down, select Assignment And Ownership, and add the userids or groups.
  • Once you are back at the workflow you have to accept the task(s).   Select those you are interested in, and pick Actions -> Accept. 
  • Single left click on the Title – “Specify Queue Manager Criteria”.  It displays a tabbed pane with tabs
    • General – a description of the task
    • Details – it says who the task has been assigned to  and its status.
    • Notes – you can add items to this
    • Perform – this actually does the task.
  • Click on the “Perform” tab.  If this is greyed out, it may not have been assigned to you, or you may not have accepted it.
    • It gives a choice of environments, eg TEST, Production.  Select one
    • The command prefix eg !ZCT1.
    • The SSID.
    • Click “Next”.  It gives you Review Instructions; click Finish.
  • You get back to the list of tasks.
    • Task 1 is now marked as Complete.
    • Task 2 is now ‘ready’.
    • Left single click task 2 “Validate the Software Service Instance Name Length”.
    • The Dependencies tab now has an item “Task 1 complete”.  This is all looking good.
  • Note: Instead of going into each task, sometimes you can use the Action -> perform to go straight there – but usually not.
  • Click on Perform
    • Click next
    • It displays some JCL which you can change, click Next. 
    • It displays “review JCL”.
      • It displays Maximum Record Length 1024.  This failed for me – I changed it to 80, and it still used 1024!
    • Click Next..  When I clicked Finish, I got “The request cannot be completed because an error occurred. The following error data is returned: “IZUG476E:The HTTP request to the secondary z/OSMF instance “S0W1” failed with error type “HttpConnectionFailed” and response code “0” .”  The customising book mentions this, and you get it if you use AT-TLS – which I was not using.  It may be caused by not having the IP address in my Linux /etc/hosts file.  Later, I added the address to the /etc/hosts file on my laptop, restarted z/OSMF and it worked.
    • I unticked “Submit JCL”, and ticked “Save JCL”.  I saved it in a Unix file, and clicked finish.  It does not save the location so you have to type it in every time (or keep it in your clipboard), so not very usable.
    • Although I had specified  Maximum Record Length of 80, it still had 1024.  I submitted the job and it complained with “IEB311I CONFLICTING DCB PARAMETERS”.   I edited to change 1024 to 80 in the SPACE and DCB, and submitted it.  The JCL then worked.
    • When the rexx ran, it failed with …The workflow Software Service Instance Name (${_workflow-softwareServiceInstanceName})….  The substitution had not been done.  I dont know why – but it means you cannot do any work with it.
    • When I tried another step, this also had no customisation done, so I gave up.
    • Sometimes when I ran this the “Finish” button stayed greyed out, so I was unable to complete the step.  After I shut down z/OSMF and restarted it, it usually fixed it.
  • I looked at the job “Define MQ Queue Manager Security Permissions” – this job creates a profile to disable MQ security – it did not define the security permissions for normal use.
  • I tried the step to Dynamically allocate a port for the MQ chin.  I got the same IZUG476E error as before.   I fixed my IP address, and got another error. It received status 404 from the REST request.   In the /var/zosmfcp/data/logs/zosmfServer/logs/messages.log  I had  SRVE0190E: File not found: /resource-mgmt/rest/1.0/rdp/network/port/actions/obtain.  For more information on getting a port see here.

Many things did not work, so I gave up.

Put messages to a queue workflow. 

I tried this, and had a little (very little) more success.

As before I did

  • Workflows
  • Actions pull down
  • Create workflow.   I used
    • /usr/lpp/mqm/V9R1M1/zosmf/putQueue.xml
    • and the same variable input file
  • Lots of clicks – including  Assign all steps to owner userid
  • Click Finish.   This produced a workflow with one step!
  • Left click on the step.  Go to Perform.  This lists
    • Subsystem ID
    • Queue name
    • Message_data
    • Number of messages.
  • Click Next.
  • Click Next, this shows the job card
  • Click Next,  this shows the job.
  • Click Next.  It has “Submit JCL” ticked.  Click Finish.   This time it managed to submit the JCL successfully!
  • After several seconds it displays the “Status” page, and after some more seconds, it displays the job output.
  • There is a tabbed panel with tabs for JESMSGLG, JESJCL, JESYSMSG,SYSPRINT.
  • I had a JCL error – I had made a mistake in the  MQ libraries High level qualifier.
  • I updated my workflow_variables.properties file, but I could not find say of telling the workflow to use the update variable properties file.  To solve this I had to
    • Go back to the Workflows screen where it lists the workflows I have created. 
    • Delete the workflow instance.
    • Create a new workflow instance, which picks up the changed file
    • I then deployed it and “performed” the change, and the JCL worked.
    • I would have preferred a quick edit to some JCL and resubmit the job, rather than the relatively long process I had to follow.
  • If this had happened during the deploy a queue manager workflow this would have been really difficult.   There is no “Undo Step”, so I think you would have had to create  the De-provision workflow – which would have failed because many of the steps would not have been done, or you delete the provision workflow, fix the script and redo all the steps (or skip them).

If this all worked, would I use it?

There were too many clicks for my liking.  It feels like trying to simplify things has made it more complex.  There are many more things that could go wrong – and many did go wrong, and it was hard to fix.  Some problems I could not fix.  I think these work flows are provided as an example to the customer.  Good luck with it!

A practical guide to getting z/OSMF working.

Every product using Liberty seems to have a different way of configuring the product.  As first I thought specifing parameters to z/OSMF was great, as you do it by a SYS1.PARMLIB member.  Then I found you have other files to configure; then I found that I could not reuse my existing definitions from z/OS Connect, and MQWEB.  Then I found it erases any local copy of the server.xml file, and links to the one shipped as part of the server.xml file from the configuration files each time.   Later I used this to my advantage.  Once again I seemed to be struggling against the product to do simple things.  Having gone through the pain, and learnt how to configure z/OSMF, its configuration is OK.

You specify some parameters in SYS1.PARMLIB(IZUPRMxx) concatenation.  See here for the syntax and the list of parameters. In mid 2020 there was an APAR PH24088  which allowed you to change change these parameters dynamically, using a command such as:

SETIZU ILUNIT=SYSDA

Before you start.

I had many problems with certificates before I could successfully logon to z/OSMF.

I initially found it hard to understand where to specify configuration options, as I was expecting to specify them in the server.xml file. See z/OSMF configuration options mapping for the list of options you can specify.

If you change the options you have to restart the server.   Other systems that use Liberty have a refresh option which tells the system to reread the server.xml file.   z/OSMF stores the variables in variable strings in the bootstrap.options file, and I could not find a refresh command which refreshed the data.   (There is a refresh command which does not refresh.)  See z/OSMF commands.

 Define the userid

I used a userid with a home directory /var/zosmfcp/data/home/izusvr.   I had to issue

mkdir /var/zosmfcp
chown izusvr /var/zosmfcp

mkdir -p /var/zosmfcp/data/home/izusvr
chown izusvr /var/zosmfcp/data/home/izusvr

touch /var/zosmfcp/configuration/local_override.cfg 
chmod g+r /var/zosmfcp/configuration/local_override.cfg
chown :IZUADMIN /var/zosmfcp/configuration/local_override.cfg

Getting the digital certificate right.

I had problems using the provided certificate definitions.

  1. The CN did not match what I expected.
  2. There was no ALTNAME specified.  For example ALTNAME((IP(10.1.1.2))  (where 10.1.1.2 was the external IP address of my z/OS). The browser complained because it was not acceptable.   An ALTNAME must match the IP address or host name the data came from.  Without a valid ALTNAME you can get into a logon loop.  Using Chrome I got
    1. “Your connection is not private”. Common name invalid.
    2. Click on Advanced and “proceed to..  (unsafe)”
    3. Enter userid and password.
    4. I had the display with all of the icons.  Click on the Pul-up and  switch to “classic interface”
    5. I got “Your connection is not private”. Common name invalid, and round the loop again.
  3. The keystore is also used as the trust store, so it needs the Certificate Authority’s certificates.  Other products using Liberty use a separate trust store.  (The keystore contains the certificate the server uses to identify itself.  The trust store contains the certificates (such as Certificate Authority certificates) to validates certificates sent from clients to the server).   With z/OSMF there is no definition for the trust store.   To make the keystore work as a trust store, the keystore needs:
    1.  the CA for the server (z/OSMF talks to itself over TLS) this means the each end of the conversation within the server, needs it to validate the server’s certificate.
    2. the CA for any certificates in any browsers being used.
    3. I had to use the following statements to convert my keystore to add the trust store entries.
      RACDCERT ID(IZUSVR) CONNECT(CERTAUTH -
      LABEL('MVS-CA') RING(KEY) )

      RACDCERT ID(IZUSVR) CONNECT(CERTAUTH -
      LABEL('Linux-CA2') -
      RING(KEY ) USAGE(CERTAUTH))

Reusing my existing keyring

Eventually I got this to work.  I had to…

    1. Connect the CA of the z/OS server into the keyring.
    2. Update /var/zosmfcp/configuration/local_override.cfg for ring //START1/KEY2
    3. KEYRING_NAME=KEY2
      KEYRING_OWNER_USERID=START1
      KEYRING_TYPE=JCERACFKS
      KEYRING_PASSWORD=password

The z/OSMF started task userid requests CONTROL access to the keyring. 

It requests CONTROL access (RACF reports this!), but UPDATE  access seems to work See RACF: Sharing one certificate among multiple servers.

 With only READ access I got.

CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired. Exception is javax.net.ssl.SSLException: Received fatal alert: certificate_unknown

If it does not have UPDATE access, then z/OSMF cannot see the private certificate.

Use the correct keystore type. 

My RACF KEYRING keystore only worked when I had a keystore type of JCERACFKS.  I specified it in /var/zosmf/configuration/local_override.cfg

KEYSTORE_TYPE=JCERACFKS 

Before starting the server the first time

If you specify TRACE=’Y’, either in the procedure or as part of the start command, it traces the creating of the configuration file, and turns on the debug script.  TRACE=’X’ gives a BASH trace as well.

It looks like the name of the server is hard coded internally as zosmfServer, and the value in the JCL PARMS= is ignored.

Once it has started

If you do not get the logon screen, you may have to wait.  Running z/OS under my Ubuntu Laptop is normally fine for editing etc.  it takes about 10 minutes to start z/OSMF.  If you get problems, with incomplete data displayed, or messages saying resources not found, wait and see if it gets resolved.  Sometimes I had to close and restart my browser.

Useful to know…

  1. Some, but not all, error messages are in the STDERR output from the started task.
  2. The logs and trace are in /var/zosmfcp/data/logs/zosmfServer/logs/.  Other products using Liberty have /var/zosmfcp/data/logs/zosmfServer/logs/ so all the files are under /var/zosmfcp/zosmfServer
  3. The configuration is in /var/zosmfcp/configuration and /var/zosmfcp/configuration/servers/zosmfServer.   This is a different directory tree layout from other Liberty components.
  4. If you want to add one JVM option, edit the local_override.cfg and add   JVM_OPTIONS=-Doption=value.  See here if you want to add more options.  I used JVM_OPTIONS=-Djavax.net.debug=ssl:handshake to give me the trace of the TLS handshake.
  5. If you have problems with certificate not being found on z/OS, you might issue the SETROPTS  command to be 100% sure that what is defined to RACF is active in RACF.  Use SETROPTS RACLIST(DIGTCERT,DIGTRING,RDATALIB) refresh.  
  6. Check keyring contents using racdcert listring(KEY2) id(START1)
  7. Check certificate status using RACDCERT LIST(LABEL(‘IZUZZZZ2’ )) ID(START1) and check it has status:trust and the dates are valid.
  8. If your browser is not working as expected – restart it.

Tailoring your web browser environment.

Some requests, for example workflow, use a REST request to perform an action.  Although I had specified a HOSTNAME of 10.1.1.2, z/OSMF used an internal address of S0W1.DAL-EBIS.IHOST.COM When the rest request was issued from my browser – it could not find the back end z/OS.  I had to add this site to my Linux /etc/hosts

10.1.1.2   S0W1.DAL-EBIS.IHOST.COM 

Even after I had resolved the TCPIP problem on z/OS which caused this strange name to be used, z/OSMF continued to use it. When I recreated the environment the problem was resolved. I could not see this value in any of the configuration files.

Getting round the configuration problems

I intially found it hard so specify additional configuration options, to override what was provided by z/OSMF.  For example reuse what I had from other Liberty servers.

I changed the server.xml file to include

<include optional="false" location="${server.config.dir}/colin.xml"/> 

This allowed me to put things in the colin.xml file.  Im sure this solution is not recommended by IBM, but it is a practical solution.  This file may be read only to you.

You should not need this solution if you can use the z/OSMF configuration options mapping.