Of course – JCL subroutines is the answer

I was processing RACF SMF records to report how clients were logging into MQ. This as a multi step job, and with each report I added, the JCL got more and more messy.
The requirements were simple

  • JCL to copy the dump the SMF data sets to a temporary data set
  • Run a tool against this data set to product the reports
    • There were reports for logon and logoff, and pass tickets, and access to profiles and…
  • I wanted it to be easy to use – and the JCL to fit on one screen!

All of this was easy except my JCL file got bigger with every report I wanted, and I spent a lot of time scrolling up and down, and changing the wrong file!

The solution was to use JCL subroutines – or INCLUDE JCL.

Examples

JCL to process the SMF data sets

You do not need to know what the JCL does – but you need to know it was in COLIN.JCL(RACFSMF)

//* DUMP THE SMF DATASETS 
// SET SMFPDS=SYS1.S0W1.MAN1
// SET SMFSDS=SYS1.S0W1.MAN3
//*
//SMFDUMP EXEC PGM=IFASMFDP,REGION=0M
//DUMPINA DD DSN=&SMFPDS,DISP=SHR,AMP=('BUFSP=65536')
//DUMPINB DD DSN=&SMFSDS,DISP=SHR,AMP=('BUFSP=65536')
//DUMPOUT DD DISP=(NEW,PASS),DSN=&RMF,SPACE=(CYL,(1,1))
//OUTDD DD DISP=(NEW,PASS),DSN=&OUTDD,
// SPACE=(CYL,(1,1)),DCB=(RECFM=VB,LRECL=12288)
//ADUPRINT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
//SYSIN DD *
INDD(DUMPINA,OPTIONS(DUMP))
INDD(DUMPINB,OPTIONS(DUMP))
OUTDD(DUMPOUT, TYPE(30,80,81,83))
START(1040)
END(2359)
DATE(2025229,2025360)
ABEND(NORETRY)
USER2(IRRADU00)
USER3(IRRADU86)

/*

Use it

//IBMJOBI  JOB 1,MSGCLASS=H RESTART=PRINT 
// JCLLIB ORDER=COLIN.JCL
// INCLUDE MEMBER=RACFSMF

//S1 EXEC PGM=ICETOOL,REGION=1024K
//DFSMSG DD SYSOUT=*
//TOOLMSG DD SYSOUT=*
//IN DD DISP=(SHR,PASS,DELETE),DSN=*.SMFDUMP.OUTDD
//JOBI DD DSN=&&TEMPJOBI,DISP=(NEW,PASS),SPACE=(CYL,(1,1))
//PJOBI DD SYSOUT=*
//TOOLIN DD *
COPY FROM(IN) TO(JOBI) USING(JOBI)
DISPLAY FROM(JOBI) LIST(PJOBI) -
...
//JOBICNTL DD *
INCLUDE COND=(5,8,CH,EQ,C'JOBINIT ')
//

The clever bits are the JCLLIB which gives the JCL library, and the INCLUDE MEMBER=RACFSMF which copies in the JCL.

To use the JOBI content, I needed to specify JOBI, PJOBI and JOBICTL, and similarly for each data component. 10 components meant 30 data sets – all with similar content and names, this lead to a mess of JCL.

Going further, I could use a template with the same data set names, (TEMP, PRINT etc) and just change the content.

I coverted the above JCL to create a member ICETOOL

//S1      EXEC  PGM=ICETOOL,REGION=1024K 
//DFSMSG DD SYSOUT=*
//TOOLMSG DD SYSOUT=*
//IN DD DISP=(SHR,PASS,DELETE),DSN=*.SMFDUMP.OUTDD
//TEMP DD DSN=&&TEMP3,DISP=(NEW,PASS),SPACE=(CYL,(1,1))
//PRINT DD SYSOUT=*

and use it with

//IBMJOBI  JOB 1,MSGCLASS=H RESTART=PRINT 
// JCLLIB ORDER=COLIN.JCL
// INCLUDE MEMBER=RACFSMF
//* INCLUDE MEMBER=PRINT
// INCLUDE MEMBER=ICETOOL

//TOOLIN DD *
COPY FROM(IN) TO(TEMP) USING(TEMP)
DISPLAY FROM(TEMP) LIST(PRINT) -

...
//TEMPCNTL DD *
INCLUDE COND=(5,8,CH,EQ,C'JOBINIT ')
//

Where I just had to change the data in italics – and not the boiler plate.

For each RACF record type, I had a different JCL member, based on the above file.
To select SMF records with a date and time range, I just edited member RACFSMF, and submitted the jobs, and they all used it.

This was easy to do and it let me focus on the problem – rather than on the JCL.

Why did my certificate mapping go wrong?

I had a working mapping for a Linux generated certificate to a z/OS userid. And then it wasn’t working. It took me 2 days before I had enlightenment. Although I had undone all of the changes I had made – well all but one.

I had defined

//IBMRACF  JOB 1,MSGCLASS=H 
//S1 EXEC PGM=IKJEFT01,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT DELMAP(LABEL('colinpaice'))ID(IBMUSER)
RACDCERT MAP ID(IBMUSER) -
WITHLABEL('colinpaice') -
SDNFILTER('CN=colinpaice.O=cpwebuser.C=GB')
SETROPTS RACLIST(DIGTNMAP, DIGTCRIT) REFRESH
racdcert listMAP id(IBMUSER)
/*

Which says it the certificate with Subject: C = GB, O = cpwebuser, CN = colinpaice come in, then it maps to IBMUSER. Yes, the terms are in a different order, and there are “.” instead of “.” but it worked.

I started working with JSON Web Tokens (JWT), and it stopped working. The userid was coming out as IZUSVR – which is the userid of z/OSMF. I struggled with traces, and wrote my own little program to map the certificate to a userid – but still it was IZUSVR.

The enlightenment.

With JWT they are signed by a private key, and the public key is used to check the signature (that is check the checksum of the data is valid). To do this, the keyring needs the certificate in the keyring.
I was lazy and used the same certificate to sign the JWT, as I used to do certificate logon to z/OSMF.

To put the certificate in the keyring you need to import the certificate. I copied the certificate from Linux, using cut and paste and imported it

I used

//IBMRACF2 JOB 1,MSGCLASS=H 
//S1 EXEC PGM=IKJEFT01,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT CHECKCERT('COLIN.COLIN.PAICE.PEM')
RACDCERT DELETE (LABEL('COLINPAICE')) ID(IZUSVR)
RACDCERT ADD('COLIN.COLIN.PAICE.PEM') -
ID(IZUSVR) WITHLABEL('COLINPAICE') TRUST


RACDCERT ID(IZUSVR) CONNECT(RING(CCPKeyring.IZUDFLT) -
USAGE(CERTAUTH) -
LABEL('COLINPAICE') -
id(IZUSVR))

SETROPTS RACLIST(DIGTCERT,DIGTRING ) refresh
/*

This imports the certificate and associates it with the specified userid, ID(IZUSVR).
Now, when the certificate arrives as part of the certificate logon to z/OSMF, it checks to see if it is in the RACF data base – yes it is – under userid IZUSVR. It does not use the RACDCERT MAP option.

I reran this job with userid ADCDB – and the JWT had ADCDB in the definition.

To make it more complex, the Liberty Web Server within z/OSMF caches some information, and this complicated the diagnosis. In the evening it worked – next morning after IPL – it didn’t!

Lesson learned

Use one certificate for certificate logon, and another certificate for JWT.

MQWEB and passtickets

The RACF PassTicket is a (one-time-only/short duration) password that is generated by a requesting product or function. It is an alternative to the RACF password.
You create a passticket specifying the userid and the application, and a one off password is generated. You can specify a validity period.

By default the passticket has replay protection – in that once used, the passticket cannot be used again, and so prevent replay. You can allow a passticket to be used more than once either by specifying APPLDATA(‘NO REPLAY PROTECTION’) for basic pass tickets, or REPLAY(YES) for enhanced pass tickets.

The server can use the function __login__applid() (or similar function) to run a thread as the specified userid. You pass the userid, password (pass ticket) and the application to use.

The MQWeb server is code running on top of Liberty Web server.

For my MQWeb server, running as started task CSQ9WEB, it was configured so my mqweb/mqwebuser.xml configuration file had <safCredentials profilePrefix=”MQWEB“…./>

I created a passticket for my userid COLIN, and application MQWEB, and I was able to logon to the the MQWEB server using userid COLIN and with the pass ticket as my password.

Creating and using pass tickets on z/OS.

As part of looking into secure way of logging on to z/OS, I looked into pass tickets (because Zowe can generate a pass ticket to connect to other sub-systems). I set up the simplest (and oldest) pass ticket configuration. The best practice is to use enhanced pass tickets, and store values encrypted. With enhanced pass tickets you can specify the validity period of the pass ticket – it defaults to 10 minutes. I wanted the easiest way, so I used the older technique.

With thanks to Philippe Richard for his many comments, I’ve incorporated them in the post.

I had the usual struggles with getting the C program to work, but overall it was quite easy.

I successfully used the RACF callable function IRRSPK00 R_ticketserv (IRRSPK00).

You pass a userid and an application name and the service returns a temporary, time limited, password for that userid and application.

The application name depends on what system you are logging on to. I submitted a job from TSO on system with SYSID S0W1, and the application name was TSOS0W1. You cannot use a pass ticket for TSO on a CICS system, because the application names will not match.

When you are under TSO and enter submit the application is still TSO, so use TSOS0W1.

If, for instance, you try to submit a job through the internal reader, then it will use application MVSS0W1.

For example:

//INTRDRS1 EXEC PGM=IEBGENER 
//SYSUT1 DD DSN=PASSTIKT.ENHC.JCL(REFRESH),
// DISP=SHR
//SYSUT2 DD SYSOUT=(,INTRDR)
//*
//SYSPRINT DD SYSOUT=*
//SYSIN DD DUMMY

and in member REFRESH, you have a job with userid=racf admin user, password= where you substitute the passticket, like:

//SYSADMX JOB 30000000,’MVS JOB CARD ‘,MSGLEVEL=(1,1), 
// CLASS=A,MSGCLASS=Q,NOTIFY=&SYSUID,TIME=1440,REGION=0M,
// USER=SYSADM,PASSWORD=PSEG7TXM,
// JOBRC=MAXRC
//IEFPROC EXEC PGM=IKJEFT01,REGION=4M,DYNAMNBR=10
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
SETROPTS RACLIST(PTKTDATA) REFRESH

It will use MVSS0W1 as the application ID.

Andrew Mattingly has written a very well detailed blog on passtickets which is well worth a read.

It described with ample details the algorithm and the various techniques to generate pass-tickets.

Security definitions

The security definitions are in two parts

  • The profile for using a pass ticket, for example, who can use a ticket for logging on to TSO,
  • The profile for which userids can create a pass ticket.

Who can use a pass ticket with which application

You can limit who can use the application, for example

  • a profile just TSOS0W1,
  • or members of group SYS1 profile TSOS0W1.SYS1,
  • or a userid COLIN in group SYS1, profile TSOS0W1.SYS1.COLIN

Example definitions for TSOS0W1 profile

RDEFINE PTKTDATA TSOS0W1  SSIGNON(KEYMASKED(7E4304D681920260)) - 
APPLDATA('NO REPLAY PROTECTION')

SETROPTS RACLIST(FACILITY,PTKTDATA) REFRESH

The server, TSO in this case, can use the function __login__applid() to run the thread as the specified userid. You pass the userid, password (pass ticket) and the application to use (TSOS0W1).

Who can define which pass tickets?

You have to define a RACF profile for the application name, and a profile for userids than can generate a pass ticket for that application.

RDEFINE PTKTDATA   IRRPTAUTH.TSOS0W1.*  UACC(NONE)
PERMIT IRRPTAUTH.TSOS0W1.* CLASS(PTKTDATA) ID(COLIN) ACCESS(UPDATE)
PERMIT IRRPTAUTH.TSOS0W1.* CLASS(PTKTDATA) ID(IBMUSER)ACCESS(UPDATE)
SETROPTS RACLIST(PTKTDATA) REFRESH

The above statements define a profile for defining pass ticket with the TSOS0W1 application.

Userids COLIN and IBMUSER can define pass tickets for this application.

What can you use to generate a pass ticket?

My application code

See C calling a function setting the high order bit on, and passing parameters for a discussion about calling the callable service, and passing the parameters.

 //   Code to generate a pass ticket 
#pragma linkage(IRRSPK00 ,OS)
#pragma runopts(POSIX(ON))
/*Include standard libraries */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <iconv.h>

int main( int argc, char *argv??(??))
{
if (argc != 3)
{
printf("Syntax is %s userid applid\n",argv[0]);
return 12 ;
}
if (strlen(argv[1]) > 8)
{
printf("length of userid must be <= 8\n");
return 12;
}
if (strlen(argv[2]) > 8)
{
printf("length of applid must be <= 8\n");
return 12;
}

char work_area[1024];
int Option_word = 0;
int rc;
long SAF_RC,RACF_RC,RACF_RS;
SAF_RC=0 ;
long ALET = 0;
short Function_code= 3;
struct {
short length;
char value[8];
} appl;
struct {
short length;
char value[8];
} userid;
struct {
short length;
char value[20];
} ticket;
ticket.length=20;
char * u= argv[1] ;
strncpy(&userid.value[0],u,8);
userid.length =strlen(u);
char * pAppl = argv[2];
strncpy(&appl.value[0],pAppl,8);
appl.length =strlen(pAppl);


int Ticket_options = 1;
int * pTO = & Ticket_options;

rc=IRRSPK00(
&work_area,
&ALET , &SAF_RC,
&ALET , &RACF_RC,
&ALET , &RACF_RC,
&ALET , &RACF_RS,
&ALET ,&Function_code,
&Option_word,
&ticket, // length followed by area
&pTO,
&userid,
&appl
);

printf("return code SAF %d RACF %d RS %d \n",
SAF_RC,RACF_RC,RACF_RS );
if (SAF_RC == 0)
{
int l = ticket.length;
printf("Pass ticket:%*.*s\n",l,l,ticket.value);
}
return SAF_RC;

}

The compile JCL was

//IBMPASST   JOB 1,MSGCLASS=H,COND=(4,LE) 
//S1 JCLLIB ORDER=CBC.SCCNPRC
// SET LOADLIB=COLIN.LOAD
//DOCLG EXEC PROC=EDCCB,INFILE='COLIN.C.SOURCE(TICKET)',
// CPARM='OPTF(DD:COPTS)'
//COMPILE.ASMLIB DD DISP=SHR,DSN=SYS1.MACLIB
//COMPILE.COPTS DD *
LIST,SOURCE
aggregate(offsethex) xref
SEARCH(//'ADCD.C.H',//'SYS1.SIEAHDR.H')
TEST
ASM
RENT ILP32 LO
OE
NOMARGINS EXPMAC SHOWINC XREF
LANGLVL(EXTENDED) sscom dll
DEFINE(_ALL_SOURCE)
DEBUG
/*
//BIND.SYSLMOD DD DISP=SHR,DSN=&LOADLIB.
//*IND.SYSLIB DD DISP=SHR,DSN=&LIBPRFX..SCEELKED
//*IND.OBJLIB DD DISP=SHR,DSN=COLIN.OBJLIB
//BIND.CSS DD DISP=SHR,DSN=SYS1.CSSLIB
//BIND.SYSIN DD *
INCLUDE CSS(IRRSPK00)
NAME TICKET(R)
/*
//START1 EXEC PGM=TICKET,REGION=0M,PARM='ADCDB TSOS0W1'
//STEPLIB DD DISP=SHR,DSN=&LOADLIB
//SYSERR DD SYSOUT=*,DCB=(LRECL=200)
//SYSERROR DD SYSOUT=*,DCB=(LRECL=200)
//SYSOUT DD SYSOUT=*,DCB=(LRECL=200)
//SYSPRINT DD SYSOUT=*,DCB=(LRECL=200)
//CEEDUMP DD SYSOUT=*,DCB=(LRECL=200)
/&

Problems

I could not get R_GenSec (IRRSGS00 or IRRSGS64): Generic security API interface RACF callable services to work because of the 31 bit program, and the service expecting 64 bit addresses.

This blog post has code which uses R_GenSec in 64 bit C.

Mapping a certificate to a userid and so avoid needing a password is good – but…

You can use the RACDCERT MAP command to map a certificate to a userid, and so avoid the need for specifying a password. Under the covers code uses the pthread_security_np and pass a certificate, or a userid and password, and if validated, the thread becomes that userid, just the same as if the userid was logged on.

Is this secure?

If you store a userid and password on your laptop, even though the data may be “protected” someone who has access to your machine may be able to copy the file and so impersonate you.

With a public certificate and private key, if someone can access your machine, they may be able to copy these files and so impersonate you.

You can get dongles which you plug into your laptop on which you can store protected data. In order to use the data, you need the physical device.

You need to protect the RACF command

Because the RACFCERT command has the power to be dangerous, you need to protect it.

You do not want someone to specify their certificate maps to a powerful userid, such as SYS1. The documentation says

To issue the RACDCERT MAP command, you must have the SPECIAL attribute or sufficient authority to the IRR.DIGTCERT.MAP resource in the FACILITY class for your intended purpose.

For a general user to create a mapping associated with their own user ID they need READ access to IRR.DIGTCERT.MAP.

For a general user to create a mapping associated with another user ID or MULTIID, they need need UPDATE access to IRR.DIGTCERT.MAP.

What’s the best way to set this up?

I think that as part of your process for setting up userids, the process should create the mapping for the certificate to a userid. This way you do not have people creating the mapping. If a mapping already exists, you cannot create another mapping.

You may want an automated process which checks the approval, and issues the commands, and so you do not have humans with the authority to issue the commands.

Of course you’ll have a break-glass all powerful userid in case of emergencies.

But….


Even though the password had expired, I could logon using the certificate. If I revoked the userid the logon failed.

I used certificate logon from z/OSMF and issued console commands. The starts a TSO address space, and z/OSMF passes the commands and responses to the tso address space.

Once a TSO address space has been started, there are no more checks to see if the userid is still valid.

If you want to inactivate the userid, you’ll need to revoke it, and then cancel all the TSO address spaces running on behalf of the userid. Walking someone off site is not good enough. There may be scripts which are automated, and will logon with no human intervention.
TSO address spaces may be configured to be cancelled if there is no activity. If the TSO address space is kept busy, (for example by sending it requests) it may never be forced off.

Giving a started task userid a password should be a sackable offence.

Last week I was going through some product documentation, and I got to the part where it said “Now change the started task userid, and give it a password”. I made a note to raise a defect on this, because this is a no-no.
This week someone asked on IBM-MAIN, the impact of giving a started task userid a password. There were many well informed comments covering things I didn’t know about, so I thought I’d make a blog post of of the comments.

Best practices dictate that passwords only be provided when a userid will be used by a specific person and that person needs a password for that userid. All other userids should never have a password.

A userid can be revoked because of too many invalid password attempts, or revoked because of inactivity. If a userid is revoked it cannot logon. If the userid of a started task is revoked, the started task may start, but it may be restricted as to what it can do – because it cannot logon.

You can alter a userid to have NOPASSWORD (and have no PHRASE). This means started tasks can start, but the userid cannot be used to logon to the system. This is known as making the userid PROTECTED.

Often started task userids have special capabilities, such as running as different userids, being able to set security options, or modify system storage. This means you do not want your Help Desk staff from adding or resetting passwords for protected userid. Changes to these protected userids should be done from a secure, limited access userid.

If you think through the impact of using started tasks. It may be better for all routine production jobs to be run as started tasks. This has the advantage that there are no passwords involved, and you can use automation to issue the start command based on a timer.

You might have a CICS started task userid for all CICS regions or just for a subset of regions. You might have one started task userid for all started tasks, or a started task userid for each logical instance, eg CICS, MQ, DB2, Zowe, TCP/IP etc.

System jobs

System jobs should be run as started tasks, and the started tasks should be protected

Personal jobs

You can submit jobs from your userid, (and not specify a password) and the job will run under your userid.

You can put USER=name,PASSWORD=… on a job card, and if these validate the job will run with the specified userid. This is not a good idea, as the password may be visible in the dataset.

Departmental userids

You can put USER=name and omit the password, and use surrogate checking. The documentation says

You can allow the use of surrogate users. A surrogate user is a RACF-defined user who has been authorized to submit jobs on behalf of another user (the execution user) without specifying the execution user’s password. Jobs submitted by a surrogate user run with the identity of the execution user. For example, if user JOE submits a job with the following JOB statement, JOE is the surrogate user and TOM is the execution user:

JOE can submit userid containing

//jobname JOB 'accounting-information',USER=TOM

You set up a security profile (see the documentation) to control which userid can specify a userid on the JOB USER= statement.

All access checks are done with TOM’s user ID.

The TOM userid can be a protected userid – without a password, if surrogates are used.

To set up surrogates, defined the profile, and give a group access to the profile, rather than give userids access. You are likely to have a group already defined. Administration, such as when someone leaves the department, is much easier, as you just need to remove the persons userid from the group.

Thanks to

Robert S. Hansel, Seymour J Metz, Steve Beaver, Jack Zukt, Mike Schwab, Jon Perryman for their comments.

Jump up and down: Do not give userids access to resources!

I am doing some work connecting subsystems together. The documentation for all the products involved, describe giving a userid permission to access to a resource.

This is not best practice. Like many things it will be obvious once you understand it. You need to look at the bigger pictures.
The documentation says for userids who want to use “this”, give them access to the profile “…”. What’s wrong with that?

  • If you have a department of 1000 people, giving them all access to the resource will be tedious
  • There are likely to be several resources people need access to, so connecting these 1000 userids to multiple resources will be even more tedious.
  • Someone joins your department – so you need to connect their userid to the long list of groups.
  • Someone leaves your department – you cannot trivially ask what resources can this userid access – you have to look at the access list for each resource and remove the userid from the group.

You most probably have groups set up already. Rather than give the userid access, give the group access to the resource.

If someone joins your group – you connect their userid to the groups – and they have access. If someone leaves your department, remove them from the groups – and they no longer have access to the resources.

You my want to support a new product… That’s easy – give the group(s) access to the resources – and the people will auto-magically get access.

As I said it is obvious once you see it.

Do not give userids access to resources – give groups access, and connect the userid to the group. I’ll go and raise documentation comments on the products’ documentation.

Zowe: Planning – certificates

A private key is used for encryption and should be kept private. If someone has access to the private key they can impersonate you.

A public key is the opposite of the private key. It is used for decrypting data encrypted with the private key. The public key can (and should) be generally available.

Certificates are used in TLS for authentication and encryption, and can be used for identification. They include a public key.

A Certificate Authority(CA) certificate is used to validate other certificates. It involves doing a checksum of a certificate, and encrypting it with the CA private key – a process known as signing the certificate

To validate a certificate, the recipient needs a copy of the CA which was used to sign the original certificate. It can the decrypt the encrypted checksum, and compare it with the certificates checksum.

If you create a Certificate Authority certificate you will need to distribute this to all machines that might communicate with your server(possibly thousands of machines), and installed into the keystore on those machines. This can be a big task. Most system have one site wide CA certificate which is distributed to all machines. You might have a second CA to limit access to a system. This CA is used to sign the server certificates.

Creating certificates

As part of creating a certificate you create the private/public keys. There are different algorithms for these keys. Some are stronger than others (it takes more time and CPU to break them) Keys with elliptic curve algorithms are generally stronger than using RSA techniques, and there other techniques resistant to Quantum Computing.

You have to use a server certificate with RSA and keysize 2048.

I found that authentication with JWT (Java Web Tokens) only worked with RSA keys and not Elliptic Curves. This is because of encryption with JWT.

Key stores

You should use keyrings rather than .pem files, as they are more secure. .pem files can be copied, and anyone with authority can copy them. You have to give explicit permission to be allowed to access a keyring. Certificate and the private keys within a keyring can be stored in cryptographic hardware, and the private keys are never exposed in clear text.

Many systems uses a keystore for storing the private key used by the server, and a trust store for the Certificate Authority keys needed to validate any client certificate sent to the server. This can, but is not recommended, be the same as the keystore.

The keystore will need the server key. You can specify which key should be used.

  • the Certificate Authority keys needed to validate the server key.
  • the server userid needs access to the keyring. If the private key belongs to the server’s userid, then the server’s userid needs read access to the keyring. If the private key belongs to a different userid, the server’s userid needs update access to the keyring. See here for more information.

The trust store needs all the CA certificates which may be needed to verify a client certificate.

The client machine will have one or more certificates. A copy of the CA used to create these needs to be installed in the trust store on the server.

I understand that if you add a certificate to the trust store, you need to restart Zowe for it to be picked up, so try to get a list of all the CAs you will need before you start.

Validating certificates

The trust store needs the certificates to validate any client certificates sent to the server. This will usually just be Certificate Authority certificates.

Zowe works with z/OSMF. They communicate with certificates. The z/OSMF trust store keyring needs the CA of the Zowe server certificate, and the Zowe trust store keyring needs the CA of the z/OSMF server key. If they are not set up properly you can get messages “LoadBalancer does not contain an instance for the service zaas”.

If you have a site wide CA, which is the same for both of the server keys, then you do not need to do any more work, as both trust store keyrings will already have the CA. If Zowe and z/OSMF have different CA certificates, the CA certificates need to be connected to the other keyrings.

Subject Alternative Name

The Subject Alternative Name within a certificate provides the IP addresses, or IP Names of the server. A client can check this address with the IP address of the session, and terminate the session if they do not match. This is considered best practice. This check can be disabled in Zowe by using verifyCertificates NOSTRICT in the zowe.yaml file.

RACF allows one name or address when the certificate is defined, for example ALTNAME(IP(127.0.0.1))

On my z/OS system the command TSO NETSTAT HOME gives three addresses 127.0.0.1 and 10.1.1.2 and 10.1.2.6.

You can configure your sysplex, so all systems in the sysplex have the same IP address, and traffic gets routed internally to the correct system. Without this, if you start a server on a different LPAR it will have a different IP address, and so the validation will fail.

On my system, the zOSMF certificate did not have an ALTNAME specified, and so failed the Zowe checks. I had to set the Zowe option verifyCertificates NOSTRICT for it to work, until I fixed the certificate.

If the z/OSMF certificate has an ALTNAME(IP…) specified, use the IP address value when you configure zOSMF for example

zOSMF: 
host: 127.0.0.1
port: 10443
applId: IZUDFLT

Mapping certificates

If you are using client certificate to authenticate rather than a userid and password, then you’ll need to map certificates to userids, for example with the RACDCERT MAP command. You can specify which CA the certificates were signed with, and fields from the subject Distinguished Name. The question of which is better: using userid and password, or client certificate to logon has no easy answer

  • It is easy for a hacker to get a password (lost handbag, yellow sticky stuck to the screen, a couple of pints down the pub) It is easy to change a password once is is compromised.
  • It is harder for a hacker to get a certificate – but it is harder to change and re-issue a certificate. You have to get the updated certificate down to the client’s machine. It could be stolen if a hacker has access to the machine.
  • Using Biometric data to logon is the ultimate limit in this area. Hackers could steal it – but there is no way of changing it if it is compromised!

Decisions

You need to decide

  • Are you going to have a CA just for Zowe? or reuse the site CA.
    • If you are going to have a Zowe specific CA – how are you going to distribute it to all the client machines.
    • You’ll need to ensure Zowe and z/OSMF have the other’s CA certificates in their trust store.
  • Are you going to use Subject Alternative Name ALTNAME(IP(10.1.1.2))
    • What value of verifyCertificates STRICT|NOSTRICT are you going to specify in the zowe.yaml file.
  • Are you going to authenticate using certificates. You will need to set up mapping from certificate to userid. RACDCERT MAP
  • If you will be using JWT, and so need an RSA key in the server certificate

Why can’t java use my key ring?

I had a problem with z/OSMF. I configured it to use an exiting keyring, but it consistently refused to use it. I had messages like

[WARNING ] CWPKI0809W: There is a failure loading the defaultKeyStore keystore. If an SSL configuration references the defaultKeyStore keystore, then the SSL configuration will fail to initialize.

This blog post covers how I debugged this situation.

What seemed strange was this only occurred when an Elliptic Curve certificate was being used – and not an RSA certificate.

Even more curiouser was the documentation mentioned access to the <ringOwner>.<ringName>.LST resource in the RDATALIB class. See here. I didn’t have this defined and yet RSA certificates would work! So curiouser and curiouser (or for the people who like correct grammar, curiouser and more curiouser).

All applications needing access to certificates and private keys use the R_datalib callable service.

The bottom line

  • z/OSMF has userid IZUSVR
  • I had a keyring and used two certificates
    • An RSA certificate, CCPKeyring.IZUDFLT, belonging to userid IZUSVR – based on the sample JCL provided by z/OSMF
    • An existing Elliptic Curve certificate NISTEC224 belonging to userid COLIN. This works else where.
  • Without <ringOwner>.<ringName>.LST defined the class(RDATALIB) the RSA certificate worked
  • Without <ringOwner>.<ringName>.LST defined the class(RDATALIB) the Elliptic Curve certificate failed
  • Once I found the problem I defined <ringOwner>.<ringName>.LST in class(RDATALIB), and gave the userid IZUSVR Update access to it – and the Elliptic curve worked
  • The reasons (being wise after the event)
    • R_datalib checks access on one profile in the RDATALIB class first – <ringowner>.<ringname>.LST. If there is none, it will fall back to check on two profiles in the FACILITY class – IRR.DIGTCERT.LISTRING and IRR.DIGTCERT.GENCERT. If the certificate is not owned by the accessing ID (except CERTAUTH or SITE), RDATALIB class has to be used for private key access.
    • This is true for the RSA certificate, used the IRRDIGTCERT.LISTRING class(FACILITY) and had access. So this worked.
    • For the Elliptic Curve, the caller’s userid (IZUSVR) is not the associated with the certificate (COLIN) so this fails, and the logic drops through to the RDATALIB checking.
    • The caller’s user ID has READ or UPDATE authority to the ..LST resource in the RDATALIB class. READ access enables retrieving one’s own private key, UPDATE access enables retrieving other’s. The ring did not exist, and so this access was not given.

How did I debug this? – Using Java trace

Adding configuration to z/OSMF

I copied /global/zosmf/configuration/local_override.cfg to /global/zosmf/configuration/local_override.colin

I edited/global/zosmf/configuration/local_override.cfg and changes the JVM options line to

JVM_OPTIONS=”-Xoptionsfile=’/global/zosmf/configuration/local_override.colin'”

I edited the local_override.colin, deleted all but the JVM options line, then split the line at \n so it looks like

-Dcom.ibm.ws.classloading.tcclLockWaitTimeMillis=300000
-Xscmx150M
-Xquickstart

Add debug information to the configuraton file

I added

-Djava.security.auth.debug=pkcs11keystore
-Dlog.level=Error

The output

[err] Jan 17, 2025 8:18:52 AM com.ibm.crypto.ibmjcehybrid.provider.HybridRACFKeyStore engineLoad 
TRACE: Loading keyring CCPKeyring.IZUDFLT as a JCECCARACFKS type keystore.
...
[err] Jan 17, 2025 8:19:02 AM com.ibm.crypto.hdwrCCA.provider.RACFInputStream getEntry
FINER: The private key of NISTEC224 is not available or no authority to access the private key
[err] Jan 17, 2025 8:19:02 AM com.ibm.crypto.ibmjcehybrid.provider.HybridRACFKeyStore engineLoad
TRACE: Error loading and storing certificates and key material from underlying JCECCARACFKS keyring CCPKeyring.IZUDFLT
java.io.IOException: The private key of NISTEC224 is not available or no authority to access the private key . This can be expected if the IBMJCECCA is not setup correctly or
ICSF is down. Will now attempt to load the keyring as a JCERACFKS keyring.

Which is not a very helpful message.

How did I debug this? – Using RACF trace

R_datalib is the callable service to ALL the exploiters which need access to a RACF keyring (certificates and private keys). It is r_datalib or its alias irrsdl00 with callable type number 41.

Enable the RACF trace

#SET TRACE(CALLABLE(TYPE(41))JOBNAME(IZU*))

Start GTF

S GTF.GTF,M=GTFRACF

This reported

IEF403I GTF - STARTED - TIME=08.17.03                                  
IEF188I PROBLEM PROGRAM ATTRIBUTES ASSIGNED
AHL121I TRACE OPTION INPUT INDICATED FROM MEMBER GTFRACF OF PDS
USER.Z24C.PROCLIB
TRACE=USRP
USR=(F44)
END
AHL103I TRACE OPTIONS SELECTED --USR=(F44)
AHL906I THE OUTPUT BLOCK SIZE OF 27998 WILL BE USED FOR OUTPUT 702
DATA SETS:
SYS1.TRACE

I started z/OSMF until it failed.

Stop GTF

p GTF 
AHL006I GTF ACKNOWLEDGES STOP COMMAND
AHL904I THE FOLLOWING TRACE DATASETS CONTAIN TRACE DATA :
SYS1.TRACE

Use IPCS to look at the dump, using command GTF USR(ALL). Go to the bottom of the output, use the command report view. This gives an ISPF edit session.

  • x all
  • f ‘RACF Reason code:’ all
    • You are interested in the non zero codes. “Label” each line of interest using the line prefix command .a, .b etc.
  • reset
  • loc .a
    • This will position you by the labelled line. Look up the RACF return and reason codes here. I had Reason Code 2c, which is decimal 44. Look for the keyring, or other information. I do not know which data tells you which sub operation r_datalib was doing, but for me it had the keyring name “CCPKeyring.IZUDFLT “. The description in the reason code documentation does not cover the situation of not having update access to the keyring, so I’ve raised a doc comment on it.

Using and debugging RACF CLASS(APPL) and pthread_security_np

I’ve blogged I want to be someone else – or using pthread_security_np, how a server can be passed a userid or password to logon to a server to do some work. For example I was logging on from a web server to the RMF GPMSERVE for displaying RMF reports in a a web browser.

I had followed the documentation, but all my userids had access, when none of them should have done. This blog post covers the steps I used to dig into this.

The RACF calls used, have a flag set “Suppress any RACF Messages”, which makes it harder to diagnose problems.

With most RACF calls you can define a profile

ADDSD 'COLIN.PROT.DATASET' UACC(NONE) WARNING

where the WARNING option says “If the userid does not have access – give the userid access, but write a message on the console”. This allows you to find when you need to grant access. Once the profile is established, you can use the ALTSD … NOWARNING. Userids requesting access, but which are not permitted will now fail.

Defining a profile with CLASS(APPL), the warning has no effect. I had to use NOTIFY(COLIN) to get notified of problems.

Tracing the requests

To trace all of the RACF calls made by my job COLINNT I issued

#set trace(callable(all),racroute(all),jobname(COLINNT))      

This gave me many hundreds of calls.

Once I had been through the whole process, I found the trace for the pthread_security_np etc calls was just

set trace(callable(type(38)),jobname(COLINNT))

Collect a trace

See Collecting and understanding a RACF GTF trace output.

Looking at the trace output

There is a trace record “before” (PRE) matching “after” (POST). Many parameters are the same (as you might expect).

The output of these traces is verbose, with unnecessary information and with extra blanks lines. For example for one parameter

   Area length:                  00000008 

Area value:
D6C6C6E2 C5E30000 | OFFSET.. |

Area length: 00000004

Area value:
00000000 | .... |

Area length: 00000008

In the description below, I’ve squashed this down to a single line

area length  8  OFFSET0000 length 4 00000000 

It looks like the field OFFSET0000 has the offset in hex from somewhere. I didn’t find this information useful.

A compressed trace record is given below. For the interpretation of the fields see the following trace record.

RTRACE
OMVSPRE
Service number 00000026
Parameters
area length x6c
area length 8 OFFSET0000 length 4 00000000
area length 8 OFFSET0004 length 4 00000000
area length 8 OFFSET0008 length 4 00000000
area length 8 OFFSET000C length 4 00000000
area length 8 OFFSET0010 length 4 40404040
area length 8 OFFSET0014 length 4 00000000
area length 8 OFFSET0018 length 4 40404040
area length 8 OFFSET001C length 1 01
area length 8 OFFSET0020 length 4 C4800000
area length 8 OFFSET0024 length 6 05c1c2c3c4C2 = .ABCDB
area length 8 OFFSET0028 length 4 00000000
area length 8 OFFSET002C length 9 08C7D7D4 E2C5D9E5 C5 = .GPMSERVE
Internal information
area length 8 OFFSET0034 length x37 = "Server Userid=IBMUSER created a full ACEE"
area length 8 OFFSET0048 length 4 00000000
area length 8 OFFSET0050 length 9 08000000 00000000 00
area length 8 OFFSET0054 length 1 00
area length 8 OFFSET0018 length 4 40404040
Internal data
area length xc0 ACEE
area length x50 userid information
area length x90 ACEX
area length x50 USP

hex dump of record

The RACF commands documentation says for a callable service, 0x00000026 is function number for IRRSIA00. This page gives the name of the function name IRRSIA00 and the description z/OS kernel on behalf of servers that use pthread_security_np servers or __login, or MVS servers that do not use z/OS UNIX services.

The parameters are for the initACEE (IRRSIA00) call.

The matching “after” record, OMVSPOST was (with the parameters of the the IRRSIA00 call format, parameters) are

RTRACE
OMVSPOST
Service number: 00000026
RACF Return code: 00000008
RACF Reason code: 00000020

Parameters
area length 8 OFFSET0000 length 4 00000000 >work_area<
area length 8 OFFSET0004 length 4 00000000 >ALET<
area length 8 OFFSET0008 length 4 00000008 >SAF_return_code<
area length 8 OFFSET000C length 4 00000000 >ALET<
area length 8 OFFSET0010 length 4 00000008 >RACF_return_code<
area length 8 OFFSET0014 length 4 00000000 >ALET<
area length 8 OFFSET0018 length 4 00000020 >RACF_reason_code<
area length 8 OFFSET001C length 1 01 >Function_code 1 = Create an ACEE<
area length 8 OFFSET0020 length 4 C4800000 >Attributes - see below<
area length 8 OFFSET0024 length 6 05c1c2c3c4C2 >Userid .ABCDB<
area length 8 OFFSET0028 length 4 00000000 >ACEE_pointer<
area length 8 OFFSET002C length 9 08C7D7D4 E2C5D9E5 C5 >APPLID = .GPMSERVE<
Internal information
area length 8 OFFSET0034 length x37 = "Server Userid=IBMUSER created a full ACEE"
area length 8 OFFSET0048 length 4 00000000
area length 8 OFFSET0050 length 9 08000000 00000000 00
area length 8 OFFSET0054 length 1 00
area length 8 OFFSET0018 length 4 40404040
Internal data
area length xc0 ACEE
area length x50 userid information
area length x90 ACEX
area length x50 USP

There the attributes C480000 mean

  • X80000000 – Create the ACEE
  • X40000000 – Createthe USP for the userid
  • X04000000 – Suppress any RACF Messages
  • X00800000 – Return an OUSP in the output area

Note the flag:X04000000 – Suppress any RACF Messages.

The return code

area length  8  OFFSET0008 length 4 00000008 >SAF_return_code<
...
area length 8 OFFSET0010 length 4 00000008 >RACF_return_code<
area length 8 OFFSET0014 length 4 00000000 >ALET<
area length 8 OFFSET0018 length 4 00000020 >RACF_reason_code<

8,8,32 means The user does not have appropriate RACF access to either the SECLABEL, SERVAUTH profile, or APPL specified in the parmlist.

For other RACF services the trace entries follow a similar format.