How do I used Linux to manage my corporate certificates?

Having used z/OS to be my corporate Certificate Authority I thought I would use Linux to be a corporate CA, and manage z/OS certificates.  For more information on Certificate Authorities, and signing on certificates see here.

Setting up your Corporate CA up on Linux

At the top of the CA certificate hierarchy is a self signed certificate.

Create the CA self signed certificate

openssl req -x509 -config openssl-ca.cnf -newkey rsa:4096 -days 3000 -nodes -subj “/C=GB/O=SSS/OU=CA/CN=SSCA8” -out cacert.pem -keyout cacert.key.pem -outform PEM  addext basicConstraints=”critical,CA:TRUE, pathlen:0″ -addext keyUsage=”keyCertSign, digitalSignature”

This creates a certificate with

  1. -x509 says make it self signed – so my enterprise master CA
  2. using 4096 rsa encryption
  3. a subject “/C=GB/O=SSS/OU=CA/CN=SSCA8”
  4. valid for 3000 days
  5. Not DES encryption ( -nodes) of the output files
  6. the who and public key are stored in cacert.pem
  7. the private key is stored in cacert.key.pem
  8. using format pem
  9. extra parameters CA:TRUE and keyusage…

Display it

openssl x509 -in cacert.pem -text -noout|less

Create a personal  certificate on Linux and sign it.

Create a personal certificate on Linux  and get it signed byt the CA created above.

I set up a shell script to do the work

name=”adcdd”
subj=’-subj “/C=GB/O=cpwebuser/CN=adcdd” ‘
#Passwords are stored in a file called password.file
passwords=”-passin file:password.file -passout file:password.file”

#clean out the old foils
rm $name.key.pem

rm $name.csr
rm $name.pem

CA=”cacert”

# generate a private certificate using Elliptic curve and type secp256r1
openssl ecparam -name secp256r1 -genkey -noout -out $name.key.pem

#create a certificate signing request (CSR)
openssl req -new -key $name.key.pem -out $name.csr -outform PEM $subj $passwords

#sign it – or send it off to be signed. Get the $name.pem back from the request.  
openssl ca -config openssl-ca-user.cnf -md sha384 -out $name.pem -cert cacert.pem -keyfile cacert.key.pem -policy signing_policy -extensions signing_mqweb -md sha256 -infiles $name.csr

 

#Get the  *.pem file back, if required,  and merge the files to form the .p12 file
openssl pkcs12 -export -inkey $name.key.pem -in $name.pem -out $name.p12 -CAfile $CA.pem -chain -name $name $passwords

I stored common information in configuration files, such as openssl-ca-user.cnf .  This had a section called signing_policy,  and signing_mqweb which had

[ signing_mqweb ]

keyUsage = digitalSignature
subjectAltName = DNS:localhost, IP:127.0.0.1
keyUsage = digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth

Use Linux to be the Certificate Authority for my z/OS RACDCERT certificates.

Create a certificate on z/OS.

//IBMRACF JOB 1,MSGCLASS=H 
//S1 EXEC PGM=IKJEFT01,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
 
RACDCERT ID(START1) DELETE(LABEL('MYCERTL'))
/* create the certificate - note it is not signed
RACDCERT ID(START1) GENCERT -
SUBJECTSDN(CN('MYCERTL') -
O('SSS') -
OU('SSS')) -
ALTNAME(IP(10.1.1.2) -
DOMAIN('WWW.ME2.COM') )-
SIZE(4096) -
RSA -
WITHLABEL('MYCERTL')

/* convert this to a certificate request and output it
RACDCERT GENREQ (LABEL('MYCERTL')) ID(START1) -
DSN('IBMUSER.CERT.MYCERTL.CSR')
/

The first line of the data set is —–BEGIN NEW CERTIFICATE REQUEST—– which is what I expect for a Certificate Signing Request.

FTP this down to the Linux machine as mycertl.csr and use the openssl ca command.  This uses the cacert.*.pem files created above

openssl ca -config openssl-ca-user.cnf -md sha384 -out mycertl.pem -notext -cert cacert.pem -keyfile cacert.key.pem -policy signing_policy -extensions signing_mqweb -md sha256 -infiles mycertl.csr

Note: My openssl-ca-user.cnf  is given below.

I carefully checked the details displayed, and replied y to both questions.

This produced

Using configuration from openssl-ca-user.cnf
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
organizationName :PRINTABLE:'SSS'
organizationalUnitName:PRINTABLE:'SSS'
commonName :PRINTABLE:'MYCERTL'
Certificate is to be certified until Oct 14 15:28:45 2023 GMT (1000 days)
Sign the certificate? [y/n]:y

1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated

If the option -notext is not specified, the output file contains the readable interpretation of the certificate.  Specify -notext to get the output file where the first line is —–BEGIN CERTIFICATE—–

Upload this output file to z/OS (for example “put mycertl.pem ‘IBMUSER.CERT.MYCERTL.PEM’ ” ).

Check the contents before you add it to the RACF keystore

RACDCERT CHECKCERT(‘IBMUSER.CERT.MYCERTL.PEM’)

Add the certificate to the keystore

//IBMRACF JOB 1,MSGCLASS=H 
//S1 EXEC PGM=IKJEFT01,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT ADD('IBMUSER.CERT.MYCERTL.PEM') -
ID(START1) WITHLABEL('MYCERTL)
/*

The command racdcert list (label(‘MYCERTL’))  id(start1) shows the certificate has NOTRUST, so will not be visible on any keyring. You need

RACDCERT id(START1) ALTER(LABEL(MYCERTL’))TRUST

and will need to connect it to any keyrings.

Upload the CA certificate into the RACF database.

You will also need to upload the public key to the RACF database as a CERTAUTH, and connect it to any ring that uses a certificate signed by the Linux CA.

FTP the certificate file, cacert.pem (created above), to z/OS as text.  Once you have FTPed the file, check the first line is “—–BEGIN CERTIFICATE—–“

Add it to RACF

You can add the certificate owned by a userid ( rather than certauth).   The certificate needs to be connected to the keyring as usage(CERTAUTH).

//IBMRACF JOB 1,MSGCLASS=H 
//S1 EXEC PGM=IKJEFT01,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
/* delete it if needed
RACDCERT DELETE (LABEL('Linux-CA256')) ID(START1)

RACDCERT id(start1) ADD('IBMUSER.CA256.PEM') -
WITHLABEL('Linux-CA256') TRUST

RACDCERT CONNECT(id(start1) LABEL('Linux-CA256') -
RING(TRUST) USAGE(CERTAUTH)) ID(START1)

RACDCERT CONNECT(id(start1) LABEL('Linux-CA256') -
RING(DANRING) USAGE(CERTAUTH)) ID(START1)

RACDCERT LISTRING(TRUST ) ID(START1)

racdcert list (label('Linux-CA256')) id(start1)

SETROPTS RACLIST(DIGTCERT,DIGTRING ) refresh

 

My openssl-ca-user.cnf file.

HOME = .
RANDFILE = $ENV::HOME/.rnd

####################################################################
[ ca ]
default_ca = CA_default # The default ca section

[ CA_default ]
default_days = 1000 # How long to certify for
default_crl_days = 30 # How long before next CRL

default_md = sha256 # Use public key default MD
preserve = no # Keep passed DN ordering

x509_extensions = ca_extensions # The extensions to add to the cert

email_in_dn = no # Don’t concat the email in the DN
copy_extensions = copy # Required to copy SANs from CSR to cert

base_dir = .
certificate = $base_dir/cacert.pem # The CA certifcate
private_key = $base_dir/cakey.pem # The CA private key
new_certs_dir = $base_dir # Location for new certs after signing
database = $base_dir/index.txt # Database index file
serial = $base_dir/serial.txt # The current serial number

unique_subject = no # Set to ‘no’ to allow creation of
# several certificates with same subject.

[ signing_policy ]
countryName = optional
stateOrProvinceName = optional
localityName = optional
organizationName = optional
organizationalUnitName = optional
commonName = supplied
emailAddress = optional

##########

[ signing_mqweb ]

subjectAltName = DNS:localhost, IP:127.0.0.1
keyUsage = digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth

 

How to backup only key data sets on z/OS

I’ve been backing up my datasets on z/OS, and wondered what the best way of doing it was.

I wanted to backup datasets containing data I wanted to keep, but did not want to backup other data sets which could easily be recreated, such as IPCS dump dataset, the output of compiles, or the SMF records.

DFDSS has a backup and restore program which is very powerful.  With it you can

  • Process data sets under a High Level Qualifier – include or exclude data sets.
  • Backup only changed data sets
  • Backup individual files in a ZFS or USS – but this is limited, you have to explicitly specify the files you want to backup.   You cannot backup a directory

You cannot backup individual members of a PDS(E).   You have to backup the whole PDS(E),   If you need to restore a member, restore the backup with a different HLQ and select the members from that.

What should I use?

I tend to use XMIT and DFDSS – the Storage Management component on z/OS. This tends to be used by the data managers as it can backup groups of data sets, volumes, etc..

Backing up using XMIT.

This has the advantage that the output file is a card image, which is a portable format.

I have a job

//MYLIBS1 JCLLIB ORDER=USER.Z24A.PROCLIB 
// SET TODAY='D201224'
//S1 EXEC PROC=BACKUP,P=USER.Z24A.PARMLIB,DD=&TODAY.
//S1 EXEC PROC=BACKUP,P=USER.Z24A.PROCLIB,DD=&TODAY.

Where

  • P is the name of the dataset
  • TODAY  – is where I set today’s date.

The backup procedure has

//BACKUP PROC P='USER.Z24A.PROCLIB',DD='UNKNOWN' 
//S1 EXEC PGM=IKJEFT01,REGION=0M,
// PARM='XMIT A.A DSN(''&P'') OUTDSN(''BACKUP.&P..&DD'')'
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
// PEND

The command that gets generated when P is USER.Z24A.PROCLIB and DD=D201224 is

XMIT A.A DSN('USER.Z24A.PROCLIB') OUTDSN('BACKUP.USER.Z24A.PROCLIB.D201224') 

This makes it easy to find the backups for a file, and a particular data.

To restore a file you use command TSO RECEIVE INDSN(‘BACKUP.USER.Z24A.PROCLIB.D201224’)  .

Using DFDSS to backup

This is a powerful program, and it is worth taking baby steps to understand it.

The basic job is

//IBMDFDSS JOB 1,MSGCLASS=H 
//S1 EXEC PGM=ADRDSSU,REGION=0M,PARM='TYPRUN=NORUN'
//TARGET DD DSN=COLIN.BACKUP.DFDSS,DISP=(MOD,CATLG),
// SPACE=(CYL,(50,50),RLSE)
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DUMP -
DATASET(INCLUDE(COLIN.JCL,COLIN.WLM,COLIN.C) -
BY(DSCHA,EQ,YES)) -
OUTDDNAME(TARGET) -
COMPRESS
/*
  • For the syntax of the dump data set command see here.
  • This dumps the specified data sets, COLIN.JCL. COLIN.WLM, COLIN.C, takes them and puts them in one file through TARGET.   TARGET is defined a dataset (COLIN.BACKUP.DFDSS).
  • This does not actually do the backup because it has TYPRUN=NORUN.
  • You can specify many filter criteria, in the BY(…) such as last reference, size, etc.  See here.
  • The BY(DSCHA,EQ,YES) says dump datasets only if they have the “changed flag” set.  The Changed flag is set when a data set was open for output.  Using ADRDSSU with the RESET option resets the changed flag.   This allows you to backup only data sets which have changed – see below.
  • It compresses the files as it backs up the files.

I did have

DATASET(INCLUDE(COLIN.**) - 
EXCLUDE(COLIN.O.**,COLIN.SMP*.**,COLIN.DDIR ) -
BY(DSCHA,EQ,YES)) -

Which says backup all data sets with the High Level Qualifier COLIN.**, but exclude the listed files.  I ran this using TYPRUN=NORUN, and this listed 100+ datasets.   Whoops, so I changed it to explicitly include the files I wanted to backup.  Once  I had determined the files I wanted to backup I removed the TYPRUN=NORUN, and backed up the datasets.

Using DFDSS to restore

You can restore from the DFDSS backups using a job like

//S1 EXEC PGM=ADRDSSU,REGION=0M,PARM='TYPRUN=NORUN' 
//TARGET DD DSN=COLIN.BACKUP.DFDSS,DISP=SHR
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
RESTORE -
DATASET(INCLUDE(COLIN.C) ) -
RENAME(COLINN) -
INDDNAME(TARGET)
/*

This says restore the files specified in the INCLUDE…  rename the HLQ to be COLINN.  From the dataset via //TARGET.

Initially I specified PARM=’TYPRUN=NORUN’ so it did not actually try to restore the files.   It reported

THE INPUT DUMP DATA SET BEING PROCESSED IS IN LOGICAL DATA SET FORMAT AND WAS CREATED BY Z/OS DFSMSDSS 
VERSION 2 RELEASE 4 MODIFICATION LEVEL 0 ON 2020.359 17:16:44
DATA SET COLINN.C WAS SELECTED
PROCESSING BYPASSED DUE TO NORUN OPTION
THE FOLLOWING DATA SETS WERE SUCCESSFULLY PROCESSED
COLIN.C

From the time stamp 2020.359 17:16:44 we can see I was using the expected backup.

Once you are happy you have the right backup, and list of data sets, you can remove the PARM=’TYPRUN=NORUN’ to restore the data.

If you have backed up COLIN.JCL, and SUE.JCL, and try to rename on restore ( so you do not overwrite existing files) it would fail because if would create COLINN.JCL and then try to create COLINN.JCL from the other file!   To get round this using INCLUDE(COLIN.**) RENAMEN(COLINN) and INCLUDE(SUE.*) renamen(SUEN) .

 

What’s in the backup?

You can use the following to list the contents  (with TYPRUN=NORUN)

RESTORE - 
DATASET(INCLUDE(**) ) -
INDDNAME(TARGET)

Note: that because this job does not have REPLACE, it will not overwrite any files.

Using advanced backup facilities.

Each dataset has a changed-flag associated with it.   If this bit is on, the data set has been changed.  You can display this in the data set section of ISMF.  Field 24 – CHG IND, or if you have access to the DCOLLECT output, it is in one of the flags.

If you use

DUMP - 
DATASET(INCLUDE(COLIN.JCL,COLIN.WLM,COLIN.C) -
BY(DSCHA,EQ,YES)) -
RESET -
OUTDDNAME(TARGET) -
COMPRESS

it will backup the data sets, and reset the changed flag.  In my case it backed up the 3 data sets I had specified.

When I reran the same job, it backup up NO data sets, giving me a message

ADR383W (001)-DTDSC(01), DATA SET COLIN.JCL NOT SELECTED, 01.
Where  01 means  The fully qualified data set name did not pass the INCLUDE, EXCLUDE, and/or BY filtering criteria.

This is because I had specified BY(DSCHA,EQ,YES)) which says filter by Data Sets with the CHAnge flag on (DSCSHA) flag on.  The first DUMP request RESET the flag, the second DUMP job skipped the data sets.

You can exploit this by backing up all data sets once a week, and just changed data sets during the week.

You might need to keep the output of the dump job in member of a PDS, so you can search for your dataset name to find the date when a backup was done which included the file.

How many backups should I keep?

This depends on if you are backing up all, or just changed files.  You can use GDG (see here) where you use a generation of dataset.  If you specify 3 generations, then when you create the 4th copy, it deletes copy 1 etc.

How can I replicate the RACF definitions for MQ on z/OS?

If you are the very careful person who makes all updates to RACF only through batch jobs, then this is easy – take the old jobs, and change the queue manager name and rerun them.

For the other 99.99% of us,  read on…

Even if you have been careful to keep track of any changes to security definitions,  someone else may have made a change either using the native TSO commands, or via the ISPF panels. 
You can list the RACF database, but there is no easy way of listing the RACF database in command format, to allow you to do a global rename, and submit the commands.

I have found two ways of extracting the RACF definitons.

  1. Using an unloaded copy of the RACF database
  2. Using RACF commands to extract and recreate the requests

Using an unloaded copy of the RACF database

I discovered dbsync on a RACF tools repository which does most of the hard work.   You can run a RACF utility to unload the RACF database into a flat file (omitting sensitive information like passwords etc).  Dbsync is a rexx program which takes two copies of an unloaded database, and generates the RACF commands for the differences. I simply used my existing unloaded file and a null file, and got out the commands to create all of the entries.

The steps are

  1. Unload the RACF database
  2. Get dbsync into your z/OS system
  3. Run DBsync
  4. Edit the files, and remove all lines which are not relevant
  5. Run the output to create/modify the definitions

Unload the database

//IBMUSUN JOB 1,MSGCLASS=H 
//* use the TSO RVARY command to display databases
//UNLOAD EXEC PGM=IRRDBU00,PARM=NOLOCKINPUT
//SYSPRINT DD SYSOUT=*
//INDD1 DD DISP=SHR,DSN=SYS1.RACFDS
//OUTDD DD DISP=(MOD,CATLG),DSN=COLIN.RACF.UNLOAD,
// SPACE=(CYL,(1,1)),DCB=(LRECL=4096,RECFM=VB,BLKSIZE=13030)

Of course this assumes you have the authority to create this file.  If not ask a friendly sysprog to run the command, edit the to output delete all records which do not have MQ in them.

Run dbsync

I had to make the following changes

  1. Dataset 1 was the dataset I created above
  2. Dataset 2 was a dummy

Modify the sort step to output to a temporary output file

//COLINRA JOB 1,MSGCLASS=H 
//* ftp://public.dhe.ibm.com/eserver/zseries/zos/racf/dbsync/
//SORT1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=COLIN.RACF.UNLOAD
//SORTOUT DD DISP=(NEW,PASS),DSN=&TEMP1,SPACE=(CYL,(1,1))
//SYSIN DD *
SORT FIELDS=(5,2,CH,A,7,1,AQ,A,8,549,CH,A)
ALTSEQ CODE=(F080,F181,F282,F383,F484,F585,F686,F787,F888,F989,
C191,C292,C393,C494,C595,C696,C797,C898,C999,
D1A1,D2A2,D3A3,D4A4,D5A5,D6A6,D7A7,D8A8,D9A9,
E2B2,E3B3,E4B4,E5B5,E6B6,E7B7,E8B8,E9B9)
OPTION VLSHRT,DYNALLOC=(SYSDA,3)
/*

Delete the sort of the other data set – as I was using a dummy file

Run dbsync

I changed the bold lines below, the template JCL had

//OUTSCD1 DD DSN=your.dsname.for.outscd1,
// DISP=(NEW,CATLG),

so I changed

  • your.dsname.for to COLIN.RACF
  • NEW,CATLG to MOD,CATLG
  • Upper cased the changed lines using the ucc…ucc ISPF edit line command.
//DBSYNC EXEC PGM=IKJEFT01,REGION=5000K,DYNAMNBR=50,PARM='%DBSYNC' 
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD DUMMY
//SYSEXEC DD DISP=SHR,DSN=COLIN.DBSYNC.REXX
//OPTIONS DD *
/* your options here
//INDD1 DD DISP=SHR,DSN=*.SORT1.SORTOUT
//INDD2 DD DUMMY
//OUTADD1 DD DSN=COLIN.RACF.ADDFILE1,
// DISP=(MOD,CATLG),
// UNIT=SYSDA,SPACE=(CYL,(25,25),RLSE),
// DCB=(RECFM=VB,LRECL=255,BLKSIZE=6400)
etc

The output was rexx commands in a file, such as

“rdefine MQCMDS CSQ9.** owner(IBMUSER ) uacc(CONTROL )
    audit(failures(READ )) level(00)”
“permit CSQ9.** class(MQCMDS) reset”
“rdefine MQQUEUE CSQ9.** owner(IBMUSER ) uacc(NONE )
     audit(failures(READ )) level(00) warning notify(IBMUSER )”
“permit CSQ9.** class(MQQUEUE) reset”
“rdefine MQCONN CSQ9.BATCH owner(IBMUSER ) uacc(CONTROL )
    audit(failures(READ )) level(00)”
“permit CSQ9.BATCH class(MQCONN) reset”
“rdefine MQCONN CSQ9.CHIN owner(IBMUSER ) uacc(READ )
    audit(failures(READ )) level(00)”
“permit CSQ9.BATCH class(MQCONN) id(IBMUSER ) access(ALTER )”
“permit CSQ9.BATCH class(MQCONN) id(START1 ) access(UPDATE )”
“permit CSQ9.CHIN class(MQCONN) id(IBMUSER ) access(ALTER )”

You edit and run the the Rexx exec to issue the commands.

Easy – it took me less than half an hour from start to finish.

Using RACF commands to extract and recreate the requests

I found that most people do not have access to an unloaded RACF database.  My normal userid does not have the authority to create the unloaded copy. 

I put an exec up on Github.   It issues a display command for each class in MQCMDS MXCMDS MQQUEUE MXQUEUE MXTOPIC MQADMIN MXADMIN MQCONN and formats it as a RDEFINE command, and then issues the permit command to give people access to it.  It writes the output in to the file being edited.

Use ISPF to edit a member where you want the output.

Make sure the rexx exec is in the SYSPROC or SYSEXEC concatenation, for example use ISRDDN to check.

Syntax

genclass <queuemanagername>

The output is like

 /* class:MXCMDS profile:MQPA class not found 
/* class:MXQUEUE profile:MQPA profile not found
/* class:MXTOPIC profile:MQPA profile not found
/* class:MXADMIN profile:MQPA profile not found
RDEFINE MQCONN -
MQPA.CICS -
- /* Create date 07/17/20
OWNER(ADCDA) -
- /* Last reference Date 07/17/20
- /* Last changed date 07/17/20
- /* Alter count 0
- /* Control count 0
- /* Update count 0
- /* Read count 0
UACC(NONE) -
LEVEL(0) -
- /* Global audit NONE
/* Permit MQPA.CICS CLASS(MQCONN ) RESET
Permit MQPA.CICS CLASS(MQCONN ) ID(ADCDA ) ACCESS(ALTER )
Permit MQPA.CICS CLASS(MQCONN ) ID(START1 ) ACCESS(READ )
/* class:MQCONN profile:MQPA.CICS profile not found

It includes a Permit… RESET if you want to remove all access

Here’s another nice mess I’ve gotten into!

Or “How to clean up the master catalog when you have filled it up with junk”. Looking at my z/OS system,  I was reminded of my grandfathers garage/workshop where the tools were carefully hung up on walls, the chisels were carefully stored a a cupboard to keep them sharp etc.   He had boxes of screws, different sizes and types in different boxes.   My father had a shed with a big box of tools.  In the box were his chisels, hammers, saws etc..  He had a big jar of “Screws – miscellaneous – to be sorted”.    The z/OS master catalog should be like my grandfather’s garage, but I had made it like my father’s shed.

Well, what a mess I found!   This blog post describes some of the things I had to do to clean it up and follow best practices.

In days of old, well before PCs were invented, all data sets were cataloged in the master catalog.  Once you got 10s of thousands of data sets on z/OS, the time to search the catalog for a dataset name increased, and the catalogs got larger and harder to manage.  They solved this about 40 years ago by developing the User Catalog  – a catalog for User entries.
Instead of 10,000 entries for my COLIN.* data sets, there should be an Alias COLIN in the Master Catalog which points to another catalog which could be just for me, or can be shared by other users. This means that even if I have 1 million datasets in the user catalog, the access time for system datasets is not affected.  What I expected to see in the master catalog is the system datasets, and aliases for the userids.  I had over 400 entries for COLIN.* datasets, 500 BACKUP.COLIN.* datasets, 2000, MQ.ARCHIVE.* datasets etc.  What a mess!

Steps to solve this.

Prevention is better than cure.

You can use the RACF Option PROTECTALL.  This says a userid needs a RACF profile before it can create a dataset.  This means each userid (and group) needs a profile  like ‘COLIN.*’, and give the userid access to this profile.  Once you have done this for all userids, you can use the RACF command SETROPTS PROTECTALL(WARNING) to enable this support.   This will allow users to create datasets, when there is no profile, but produces a warning message on the operator console – so you can fix it.  An authorised person can use SETROPTS NOPROTECTALL to turn this off.  Once you have this running with no warnings you can use the command SETROPTS PROTECTALL to make it live – without warnings, and you will live happily ever after, or at least till the next problem.

Action:

  1. Whenever you create a userid you need to create the RACF dataset profile for the userid.
  2. You also need to set up an ALIAS for the new userid to point to a User Catalog.

How bad is the problem?

You can use IDCAMS to print the contents of a catalog

//S1 EXEC PGM=IDCAMS 
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
LISTCAT CATALOG(CATALOG.Z24A.MASTER) NAME
/*

This has output like

NONVSAM ------- BACKUP.USER.Z24A.VTAMLST.D201210 
NONVSAM ------- BACKUP.USER.Z24A.VTAMLST.D201222
NONVSAM ------- BACKUP.USER.Z24A.VTAMLST.D201224
ALIAS --------- BAQ300

This says there are datasets BACKUP… which should not be in the catalog.
There is an Alias BAQ300 which points to a user catalog.   This is what I expect.

The IDCAMS command

LISTCAT ALIAS CATALOG(CATALOG.Z24A.MASTER) ALL

list all of the aliases in the catalog, for example

ALIAS --------- BAQ300 
... 
ASSOCIATIONS
USERCAT--USERCAT.Z24A.PRODS

This shows for high level qualifier BAQ3000, go and look in the user catalog  USERCAT.Z24A.PRODS.

Moving the entries out of the Master Catalog

The steps to move the COLIN.* entries out of the Master Catalog are

  1. Create a User Catalog
  2. Create an ALIAS COLIN2 which points to this User Catalog. 
  3. Rename COLIN…. to COLIN2….
  4. Create an ALIAS COLIN for all new data sets.
  5. Rename COLIN2… to COLIN…
  6. Delete the ALIAS COLIN2.

Create a user catalog

Use IDCAMS to create a user catalog

 DEFINE USERCATALOG - 
( NAME('A4USR1.ICFCAT') -
MEGABYTES(15 15) -
VOLUME(A4USR1) -
ICFCATALOG -
FREESPACE(10 10) -
STRNO(3 ) ) -
DATA( CONTROLINTERVALSIZE(4096) -
BUFND(4) ) -
INDEX(BUFNI(4) )

To list what is in a user catalog

Use a similar IDCAMS command to list the master catalog 

LISTCAT ALIAS CATALOG(A4USR1.ICFCAT) ALL

Create an alias for COLIN2

 DEFINE ALIAS (NAME(COLIN2) RELATE('A4USR1.ICFCAT') ) 

Get the COLIN.* entries from the Master Catalog into the User Catalog

This was a bit of a challenge as I could not see how to do a global  rename.

You can rename non VSAM dataset either using ISPF 3.4 or use the TSO rename command in batch.

The problem occurs with the VSAM data sets.   When I tried to use the IDCAMS rename, I got an error code IGG0CLE6-122 which says I tried to do a rename, which would cause a change of catalog.

The only way I found of doing it was to copy the datasets to a new High Level Qualifier, and delete the originals.   Fortunately DFDSS has a utility which can do this for you.

//S1 EXEC PGM=ADRDSSU,REGION=0M PARM='TYPRUN=NORUN' 
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
COPY -
DATASET(INCLUDE(COLIN.** )) -
DELETE -
RENUNC(COLIN2)
/*

Most of the data sets were “renamed” to COLIN2… but I had a ZFS which was in use, and some dataset aliases.  I used

  •  the TSO command unmount filesystem(‘COLIN.ZCONNECT.BETA.ZFS’)
  • the  IDCAMS command DELETE COLIN.SCEERUN ALIAS for each of the aliases.

and reran the copy job.   This time it renamed the ZFS.  The renaming steps are

  • Check there are no datasets with the HLQ COLIN.
  • Define an alias for COLIN in the master catalog to point to a user catalog.
  • Rerun the copy job to copy from COLIN2 back to COLIN.
  • Mount the file system.
  • Redefine the alias to data sets (eg COLIN.SCEERUN).
  • Delete the alias for COLIN2.

To be super efficient, and like my grandfather, I could have upgraded the SMS ACS routines to force data sets to have the “correct” storage class, data class, or management class.  The job output showed  “Data set COLIN2.STOQ.CPY has been allocated with newname  COLIN.STOQ.CPY using STORCLAS SCBASE,  no DATACLAS, and no MGMTCLAS“.  These classes were OK for me, but may not be for a multi-user z/OS system.

One last thing, don’t forget to add the new user catalog to your list of datasets to backup.

What should I monitor for MQ on z/OS – logging statistics

For the monitoring of MQ on z/OS, there are a couple of key metrics you need to keep an eye on for the logging component, as you sit watching the monitoring screen.

I’ll explain how MQ logging works, and then give some examples of what I think would be key metrics.

Quick overview of MQ logging

  1. MQ logging has a big(sequential) buffer for logging data, it wraps.
  2. Application does an MQPUT of a persistent message.
  3. The queue manager updates lots of values (eg queue depth, last put time) as well as move data into the queue manager address space.  This data is written to log buffers. A 4KB page can hold data from many applications.
  4. An application does an MQCOMMIT.  MQ takes the log buffers up and including the current buffer and writes it to the current active log data set.  Meanwhile other applications can write to other log buffers.
  5. The I/O finishes and the log buffers just written can be reused.
  6. MQ can write up to 128 pages in an I/O. If there are more than 128 buffers to write there will be more than 1 I/O.
  7. If application 1 commits, the IO starts,  and then application 2 commits. The I/O for the commit in application 2 has to wait for the first set of disk writes to finish, before the next write can occur.
  8. Eventually the active log data set fills up.  MQ copies this active log to an archive data set.  This archive can be on disk or tape.   This archive data set may never be used again in normal operation.  It may be needed for recovery of transactions or after a failure.   The Active log which has just been copied can now be reused.

What is interesting?

Displaying how much data is logged per second.

Today       XXXXXXXXXXXXXXXXXXXX
Last week XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Yesterday XXXXXXXXXX          
      0                     100MB/Sec    200 MB/Sec

This shows that the logging rate today is lower than last week.   This could be caused by

  1. Today is just quieter than last week
  2. There is a problem and there are fewer requests coming into MQ.   This could be caused by problems in another component, or a problem in MQ.    When using persistent messages the longest part of a transaction is the commit and waiting for the log disk I/O.  If this I/O is slower it can affect the overall system throughput.
  3. You can get the MQ log IO response times from the MQ log data.

Displaying MQ log I/O response time

You can break down where time is spent in doing I/O into the following area

  1. Scheduling the I/O – getting the request into the I/O processor on the CPU
  2. Sending the request down to the Disk controller(eg 3990)
  3. Transferring data
  4. The I/O completes, and send an interrupt to z/OS, z/OS has to catch this interrupt and wake up the requester.

 Plotting the I/O time does not give an entirely accurate picture, as the time to transfer the data depends on the amount of data to transfer.  On a well run system there should be enough capacity so the other times are constant.    (I was involved in a critical customer situation where the MQ logging performance “died” every Sunday morning.   They did backups, which overloaded the I/O system).

In the MQ log statistics you can calculate the average I/O time.  There are two sets of data for each log

  1. The number of requests, and sum of the times of the requests to write 1 page.  This should be pretty constant, as the data is for when only one 4KB was transferred
  2. The number of requests, and sum of the times of the requests to more than 1 page.  The average I/O time will depend on the amount of data transferred.
  • When the system is lightly loaded, there will be many requests to write just one page. 
  • When big messages are being processed (over 4 KB) you will see multiple pages per I/O.
  • If an application processes many messages before it commits you will get many pages per I/O.   This is typical of a channel with a high achieved batch size.
  • When the system is busy you may find that most of the I/O write more than one page, because many requests to write a small amount of data fills up more than one page.

I think displaying the average I/O times would be useful.   I haven’t done tried this in a customer environment (as I dont have customer environment to use).    So if the data looks like

Today         XXXXXXXXXXXXXXXXXXXXXXXX
Last week     XXXXXXXXXXXXXXXXXXXXXXXXXXXXX  
One hour ago XXXXXXXXXXXXXXXXXXX
time in ms 0 1 2 3

it gives you a picture of the I/O response time.

  • The dark green is for I/O with just one page, the size of the bar should be constant.
  • The light green is for I/O with more than one page, the size of the bar will change slightly with load.  If it changes significantly then this indicates a problem somewhere.

Of course you could just display the total I/O response time = (total duration of I/Os) / (total number of I/Os), but you lose the extra information about the writing of 1 page.

Reading from logs

If an application using persistent messages decides to roll back:

  • MQ reads the log buffers for the transaction’s data and undoes any changes.
  • It may be the data is old and not in the log buffers, so the data is read from the active log data sets.
  • It may be that the request is really old (for example half an hour or more), MQ reads from the archive logs (perhaps on tape).

Looking at the application doing a roll back, and having to read from the log.

  • Reading from buffers is OK.   A large number indicates application problem or a DB/2 deadlock type problem.  You should investigate why there is so much rollback activity
  • Reading from Active logs … . this should be infrequent.  It usually indicates an application coding issue where the transaction took too long before commit.  Perhaps due to a database deadlock, or bad application design (where there is a major delay before the commit)
  • Reading from Archive logs… really bad news…..  This should never happen.

Displaying reads from LOGS

Today         XXXXXXXXXXXXXXXXXXXXXXXX
Last week     X
One hour ago  XXXXX
rate          0        10    20     40

Where green is “read from buffer”, orange is “read from active log”, red is “read from Archive log. Today looks a bad day”.

Do we still need the wine maker’s nose, the mechanics ear and the performance analyst’s glasses?

When I left University one of my university friends went into the wine industry.  We met up a few years later and said that his nose was more useful than his PhD in Chemistry.  Although they had moved towards gas chromatography (which gave you a profile of all of the chemicals in the wine), this was good at telling you if there were bad chemicals in the brew, but not if it would be a good vintage, for that they needed the human nose.

My father would tune his motorbike by listening to it.  He said the bike would tell you when you had tuned it just right, and got it “in the sweet spot”.  These days you plug the computer in and the computer tells you what to do.  A friend of mine had an expensive part replaced, because the computer said so.  A week later he took the car back to the garage because the computer “knew” there was a problem with the same expensive part, and said it should be replaced.    This time the more experienced mechanic cleaned a sensor and solved the problem.  Computers do not always know best.

When I first started in the performance role, the RMF performance reports were bewildering.   These reports were lots of numbers in a small font (so you needed your glasses).  Worse than that, they had several reports on the same page, and to a novice there was a blur of numbers.  Someone then helped me with comments like, you can ignore all the data, except for this number  3 inches in and 4 inches down.  That should be less than 95%.  On this other page – check this column is zeros, and so on.  As you gain more experience in performance, you get to know the “smell” of the data.  It just needs a quick sniff test to check things are OK.  If not, then it takes more time to dig into the data.

There are many tools for processing the SMF data and printing out reports full of numbers, but they add little value.    “The disconnect time is 140 microseconds” – is this good, or is it bad, it better than a disconnect time of 100?.    If the tools were smart enough to say “The disconnect time is 140 microseconds. This value should typically be zero” then this give you useful information instead of just data.

If you think that they could control the Starship Enterprise from one operations desk, they clearly did not have all of the raw data displayed.  It must have been smart enough to report “The impulse engines are running hot: colour red, suggest you reduce power”, because that is what Scottie the engineer kept saying.

If there were smart reports of the problems rather than just displaying data, it would reduce the skill needed to interpret reports, and the need for the performance analyst’s glasses.  Producing these smart reports is difficult and needs experience to know what is useful, and what is just confusing.

Sometimes it feels like the statistics produced have not been thought through.  One example I recently experienced; there is a counter of  the number of reads+writes to disk rather than cache.  For reads, there should be no reads from disk.  For writes, it may be good to write directly to disk, and not flood the cache.  Instead of one number for reads, and one number for write, there is one number for both.  So If I had 10 disk reads, 10 disk writes and 10 disk accesses – is this good or bad ? I don’t know.   This is not a head banging problem, as you usually have only reads or only writes – but not both.  I just had to use my nose,  10 million would be a problem, just 10 – not a problem, and I’ll still need my glasses.

 

What data set is my C program using?

I wanted to know what data set my C program was using.  There is a facility BPXWDYN: a text interface to dynamic allocation and dynamic output designed for REXX users, but callable from C.

This is not very well documented, so here is my little C program based on the sample IBM provided.

The documentation says use  RTARG dsname = {45,”rtdsn”};  but this is for alloc.  With “info” it gives the error message IKJ56231I TEXT UNIT X’0056′ CONTAINS INVALID KEY .  Which basically means rtdsn is not value.     I had to use RTARG dsname = {45,”INRTDSN”};

#include <stdio.h> 
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main(int argc, char * argv[]) {
typedef int EXTF();
#pragma linkage(EXTF,OS)
EXTF *bpxwdyn=(EXTF *)fetch("BPXWDY2 ");
int i,j,rc;
typedef struct s_rtarg {
short len;
char str[260];
} RTARG;
char *info ="info DD(APF1) ";

RTARG dsname = {45,"INRTDSN"}; // not rtdsn as the doc says
RTARG ddname = {9,"INRTDDN"}; // not rtddn as the doc says
RTARG volser = {7,"INRTVOL"};
RTARG msg = {3,"MSG "};
RTARG m[4] = {258,"msg.1",258,"msg.2",258,"msg.3",258,"msg.4"};

rc=bpxwdyn(info,&dsname,&ddname,&volser,
&msg,&m[0],&m[1],&m[2],&m[3]);
if (rc!=0) printf("bpxwdyn rc=%X %i\n",rc,rc);

if (*ddname.str) printf("ddname=%s\n",ddname.str);
if (*dsname.str) printf("dsname=%s\n",dsname.str);
if (*volser.str) printf("volser=%s\n",volser.str);
for (i=0,j=atoi(msg.str);i<j && i<4;i++)
printf("%s\n",m[i].str);

return;
}

 

How hard is it to delete lots of data sets? – Easy!

I was configuring a product and had some problems, so I needed to clean up.  I had hundreds of VSAM clusters.  I started using ISPF 3.4 and using the delete line command, but you have extra typing to do for VSAM files, so I gave up.

I had a faint memory of using a MASK to delete things, and a quick search gave me

//S1 EXEC PGM=IDCAMS 
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DELETE COLIN.O.RTE.RK* MASK
/*

Which deleted all my data sets.

Wasn’t this easy!

How do I do things with a subset of PDS members matching a pattern?

There are some clever things you can do on a subset of members of a PDS.

If you use ISPF  (Browse) or ISPF 2 (Edit) you can specify a data set name of

  • ‘COLIN.AAA.PROCLIB(%%%%%%00)’ and it displays only the members ending in 00.
  • ‘COLIN.AAA.PROCLIB(*AH*)’ to display all member with an AH in the name.
  • ‘COLIN.AAA.PROCLIB’  for all of the members.

If you use ISPF 3;4 I havent found a way of doing the same.

Acting on a subset.

If you have a list of members, for example ISPF 1,2,3;4  you can issue a primary command

sel *99 e 

which says select all those members ending in  99, and use the command “e” in front.  Similary  sel %%%%%%00 b.

Sorting the list

You can sort the list by many fields, name, size last changed.  For example “Sort Name”.

I have “Tab to point-and-shoot fields” enabled.   I can tab to column headers, and press enter.   The rows are sorted by this column.

I often use “sort changed” to find the ones I changed recently, and “sort id” to see who else has been changing the members.

Srchfor

I use “srchfor ” or “srchfor value” to look for the members containing a string (or two).

When this command has completed tab to “prompt” and press enter, or enter “sort prompt” to sort the members with hit to the top of the list.

Refresh

If the member list has changed, you can use “refresh” to refresh it.

 

 

How do I compare the directories of two PDS(E)s?

I wanted to compare  two directories to find the differences.   I could see that the number of members was different, but it was hard to see what was missing.

I browsed the web, and found that this was a commonly asked question, and often the solution was to write some Rexx and use the ISPF LM* functions.  I felt this was the wrong way.  

I had used Superc to compare members of different files – could it tell me same information about the member list – yes!

SuperC has different compare types

  1. File –  Compares source data sets for differences, but does not show what
    the differences are.
  2. Line – Compares source data sets for line differences.  It is record-oriented and points out inserted or deleted lines.
  3. Word – Compares source data sets for word differences.  If two data sets contain the same words in the same order, SuperC considers them to be identical, even if those words are not
    on the same lines.
  4. Byte – Compares source data sets for byte differences.  This compare type is most useful for comparing  machine readable data.

Example output of the File comparison type.

NEW: COLIN.ZZZ.PROCLIB  OLD: HLQ.Y.ABCNPARU                                                                                     
MEMBER SUMMARY LISTING (FILE COMPARE)                                                                                     
DIFF SAME MEMBERS   N-BYTES O-BYTES N-LINES O-LINES  HASH1 HASH2 
                                                                                     
 **       ABC11111   171120  173200    2139    2165  78D5C 1113D
      **  ABC9999       640     640       8       8  AB58A AB58A 
     

We can see

  • ABC1111 is different because the “**” in the DIFF column, and the hash code at the right is different
  • ABC9999 is the same in each because the “**” is in the SAME column, and the hash value is the same

You also get a summary of differences

   10   TOTAL MEMBER(S) PROCESSED AS A PDS 
    1   TOTAL MEMBER(S) PROCESSED HAD CHANGES 
    9   TOTAL MEMBER(S) PROCESSED HAD NO CHANGES 
    9   TOTAL NEW FILE MEMBER(S) NOT PAIRED 
  179   TOTAL OLD FILE MEMBER(S) NOT PAIRED 

List of members not in both

MEMBER SUMMARY LISTING (FILE COMPARE)                                 
NON-PAIRED NEW FILE MEMBERS | NON-PAIRED OLD FILE MEMBERS               
     ABC$$$$$               |       ZAA$$$$ 
     ABCSCLRR               |       ZYZAPST5 
                            |       ZYZAPST6
  • Member ABC$$$$ and one other are in the “new” PDS, but not in the “old” PDS.
  • Member ZAA$$$$ and 2 others are in “old” PDS, but not in the “new” PDS.

Like most things – easy – once you know how do to it!

Using Line mode

When I used line mode I got output like

                                                  N-LN# O-LN# 
I - SYSNAME                      &SYSNAME.        00004 00003
D - SYSNAME                      S0W1 

For one member, the  “new-file” at line 4 was similar to the line in the “old-file” at line 3.

To get from the old file to the new file, delete the line with S0W1 in it and insert the line with &SYSNAME.