RACF: Processing audit records

RACF can write to SMF data information about which userid logged on, what resources it accessed etc.. This can be used to check there are no unexpected accesses, and any violations are actioned.

The data tends to be “this userid had access to that resource”. It does not contain numeric values, such as response time.

Overview of SMF data

SMF data is a standard across z/OS. Each product has an SMF record type, and record subtypes are used to provide granularity within a product’s records. It is common for an SMF record to have sections within it. There may be 0 or more sections, and the sections can be of varying length. A SMF formatting program needs to build and report useful information from these sections.

There are many tools or products to process SMF records. Individual products may produce tools for formatting records, and there are external tools available to process the records.

Layout of RACF SMF records

The layout of the RACF SMF records are described in the publications. Record type 80: RACF processing record. It describes the field names, at which offsets, and how to interpret the data (what each bit means), this information is sufficient for someone to write a formatting program.

RACF also provides a formatter. The formatter runs as a SORT exit, and expands the data. For example in the SMF data is a bit saying a userid has the SPECIAL attribute. The formatter expands this and creates a column “SPECIAL” with the value YES or NO. This makes it easy to filter and display records, because you do not need to map bits to their meaning – the exit has done it for you. The layout of the expanded records is described here.

What tools format the records?

A common(free) tool for processing the records that RACF produces is an extension to DFSORT called ICETOOL. (The IBM sort modules all begin with ICE… so calling it ICETOOL was natural).

With ICETOOL you can say include rows where…., display and format these fields, count the occurrences of this field, and add page titles. You can quickly generate tabular reports.

The output file of the RACF exit has different format records mixed up. You need to filter by record type and display the subset of records you need.

JCL to extract the RACF SMF record and convert to the expanded format

//* DUMP THE SMF DATASETS 
// SET SMFPDS=SYS1.S0W1.MAN1 
// SET SMFSDS=SYS1.S0W1.MAN2
//* 
//SMFDUMP  EXEC PGM=IFASMFDP,REGION=0M 
//DUMPINA  DD   DSN=&SMFPDS,DISP=SHR,AMP=('BUFSP=65536') 
//DUMPINB  DD   DSN=&SMFSDS,DISP=SHR,AMP=('BUFSP=65536') 
//DUMPOUT  DD   DISP=(NEW,PASS),DSN=&RMF,SPACE=(CYL,(1,1)) 
//OUTDD    DD   DISP=(NEW,PASS),DSN=&OUTDD, 
//             SPACE=(CYL,(1,1)),DCB=(RECFM=VB,LRECL=12288)
//ADUPRINT DD   SYSOUT=* 
//*XMLFORM  DD   DSN=COLIN.XMLFORM,DISP=(MOD,CATLG), 
//*             SPACE=(CYL,(1,1)),DCB=(RECFM=VB,LRECL=12288) 
//SYSPRINT DD   SYSOUT=* 

//SYSIN  DD * 
  INDD(DUMPINA,OPTIONS(DUMP)) 
  INDD(DUMPINB,OPTIONS(DUMP)) 
  OUTDD(DUMPOUT, TYPE(30,80,81,83)) 
  START(0000) 
  END(2359) 
  DATE(2025230,2025360) 
  ABEND(NORETRY) 
  USER2(IRRADU00) 
  USER3(IRRADU86) 
/*

The RACF exits produce several files.

//OUTDD the expanded records are written to this dataset
//ADUPRINT contains information on how many of each record type the exit processed
//XMLFORM you can have it write data in XML format – for post processing

JCL to process the expanded records

The JCL below invokes the ICETOOL processing.

//S1      EXEC  PGM=ICETOOL,REGION=0M 
//DFSMSG    DD  SYSOUT=* 
//TOOLMSG   DD  SYSOUT=* 
//IN     DD DISP=(SHR,PASS,DELETE),DSN=*.SMFDUMP.OUTDD 
//TEMP      DD  DSN=&&TEMP3,DISP=(NEW,PASS),SPACE=(CYL,(1,1)) 
//PRINT     DD  SYSOUT=*

Where

//IN refers to the //OUTDD statement in the earlier step
//TEMP is an intermediate dataset. The sort program writes filtered records to this data set.
//PRINT is where the formatted output goes

ICETOOL Processing

The whole job is

//IBMJOBI  JOB 1,MSGCLASS=H RESTART=PRINT 
//         JCLLIB ORDER=COLIN.RACF.ICETOOL 
// INCLUDE MEMBER=RACFSMF 
// INCLUDE MEMBER=PRINT 
// INCLUDE MEMBER=ICETOOL 
//TOOLIN    DD  * 
 COPY    FROM(IN)  TO(TEMP) USING(TEMP) 
 DISPLAY FROM(TEMP) LIST(PRINT) - 
         BLANK - 
         ON(5,8,CH)   HEADER('EVENT') - 
         ON(63,8,CH)  HEADER('USER ID') - 
         ON(14,8,CH)  HEADER('RESULT') - 
         ON(23,8,CH)  HEADER('TIME') - 
         ON(175,8,CH) HEADER('TERMINAL') - 
         ON(184,8,CH) HEADER('JOBNAME') - 
         ON(286,8,CH) HEADER('APPL   ') 
//TEMPCNTL DD * 
   INCLUDE COND=(5,8,CH,EQ,C'JOBINIT ') 
   OPTION  VLSHRT 
//

You have to be careful about the offsets. The record has a 4 byte length field on the front of each record. So the field in the layout of the expanded records described here is column 1 for length 8, in the JCL you specify column 5 of length 8. In the documentation the userid is columns 59 of length 8, in the JCL it is ON(63,8,CH).

The processing is ….

Copy the data from the dataset in //IN and copy it to the dataset in //TEMP. Using the sort instructions in TEMPCNTL. You take name name in USING(TEMP) and put CNTL on the end to locate the DDname.
The sort instructions say include only those records where columns 5 of length 8 of the record are the string ‘JOBINIT ‘ ( so columns 1 for length 8 in the mapping description).
The DISPLAY step copies record from the //TEMP dataset to the //PRINT DDNAME.
The ON() selects the data from the record, giving start column, length and formatting. For each field, it uses the specified column heading.

The output

In the //PRINT is

EVENT      USER ID    RESULT     TIME       TERMINAL   JOBNAME    APPL    
--------   --------   --------   --------   --------   --------   --------
JOBINIT    START1     SUCCESS    09:54:11              SMFCLEAR           
JOBINIT    START1     TERM       09:54:20              SMFCLEAR           
JOBINIT    START1     TERM       09:55:13              CSQ9WEB            
JOBINIT    IBMUSER    SUCCESS    10:12:14   LCL702     IBMUSER            
JOBINIT    IBMUSER    SUCCESS    10:14:18              IBMJOBI            
JOBINIT    IBMUSER    TERM       10:14:18              IBMJOBI            
JOBINIT    IBMUSER    SUCCESS    10:21:39              IBMACCES           
JOBINIT    IBMUSER    TERM       10:21:40              IBMACCES           
JOBINIT    IBMUSER    SUCCESS    10:22:10              IBMACCES           
JOBINIT    IBMUSER    TERM       10:22:11              IBMACCES           
JOBINIT    IBMUSER    SUCCESS    10:23:01              IBMPASST           
JOBINIT    IBMUSER    TERM       10:23:05              IBMPASST

Extending this

Knowing the format of the RACF extend record, you can add more fields to the reports.

You can filter which records you want. For example all records for userid START1. You can link filters with AND and OR statements.

Of course – JCL subroutines is the answer

I was processing RACF SMF records to report how clients were logging into MQ. This as a multi step job, and with each report I added, the JCL got more and more messy.
The requirements were simple

JCL to copy the dump the SMF data sets to a temporary data set
Run a tool against this data set to product the reports
- There were reports for logon and logoff, and pass tickets, and access to profiles and…
I wanted it to be easy to use – and the JCL to fit on one screen!

All of this was easy except my JCL file got bigger with every report I wanted, and I spent a lot of time scrolling up and down, and changing the wrong file!

The solution was to use JCL subroutines – or INCLUDE JCL.

Examples

JCL to process the SMF data sets

You do not need to know what the JCL does – but you need to know it was in COLIN.JCL(RACFSMF)

//* DUMP THE SMF DATASETS 
// SET SMFPDS=SYS1.S0W1.MAN1 
// SET SMFSDS=SYS1.S0W1.MAN3 
//* 
//SMFDUMP  EXEC PGM=IFASMFDP,REGION=0M 
//DUMPINA  DD   DSN=&SMFPDS,DISP=SHR,AMP=('BUFSP=65536') 
//DUMPINB  DD   DSN=&SMFSDS,DISP=SHR,AMP=('BUFSP=65536') 
//DUMPOUT  DD   DISP=(NEW,PASS),DSN=&RMF,SPACE=(CYL,(1,1)) 
//OUTDD    DD   DISP=(NEW,PASS),DSN=&OUTDD, 
//             SPACE=(CYL,(1,1)),DCB=(RECFM=VB,LRECL=12288) 
//ADUPRINT DD   SYSOUT=* 
//SYSPRINT DD   SYSOUT=* 
//SYSIN  DD * 
 //SYSIN  DD * 
   INDD(DUMPINA,OPTIONS(DUMP)) 
   INDD(DUMPINB,OPTIONS(DUMP)) 
   OUTDD(DUMPOUT, TYPE(30,80,81,83)) 
   START(1040) 
   END(2359) 
   DATE(2025229,2025360)
   ABEND(NORETRY) 
   USER2(IRRADU00) 
   USER3(IRRADU86)  

/*

Use it

//IBMJOBI  JOB 1,MSGCLASS=H RESTART=PRINT 
//         JCLLIB ORDER=COLIN.JCL 
// INCLUDE MEMBER=RACFSMF 
//S1      EXEC  PGM=ICETOOL,REGION=1024K 
//DFSMSG    DD  SYSOUT=* 
//TOOLMSG   DD  SYSOUT=* 
//IN     DD DISP=(SHR,PASS,DELETE),DSN=*.SMFDUMP.OUTDD 
//JOBI      DD  DSN=&&TEMPJOBI,DISP=(NEW,PASS),SPACE=(CYL,(1,1)) 
//PJOBI     DD  SYSOUT=* 
//TOOLIN    DD  * 
 COPY    FROM(IN)      TO(JOBI) USING(JOBI) 
 DISPLAY FROM(JOBI) LIST(PJOBI) - 
 ...
//JOBICNTL DD * 
   INCLUDE COND=(5,8,CH,EQ,C'JOBINIT ') 
//

The clever bits are the JCLLIB which gives the JCL library, and the INCLUDE MEMBER=RACFSMF which copies in the JCL.

To use the JOBI content, I needed to specify JOBI, PJOBI and JOBICTL, and similarly for each data component. 10 components meant 30 data sets – all with similar content and names, this lead to a mess of JCL.

Going further, I could use a template with the same data set names, (TEMP, PRINT etc) and just change the content.

I coverted the above JCL to create a member ICETOOL

//S1      EXEC  PGM=ICETOOL,REGION=1024K 
//DFSMSG    DD  SYSOUT=* 
//TOOLMSG   DD  SYSOUT=* 
//IN     DD DISP=(SHR,PASS,DELETE),DSN=*.SMFDUMP.OUTDD 
//TEMP      DD  DSN=&&TEMP3,DISP=(NEW,PASS),SPACE=(CYL,(1,1)) 
//PRINT     DD  SYSOUT=*

and use it with

//IBMJOBI  JOB 1,MSGCLASS=H RESTART=PRINT 
//         JCLLIB ORDER=COLIN.JCL 
// INCLUDE MEMBER=RACFSMF 
//* INCLUDE MEMBER=PRINT
// INCLUDE MEMBER=ICETOOL 
//TOOLIN    DD  * 
 COPY    FROM(IN)  TO(TEMP) USING(TEMP) 
 DISPLAY FROM(TEMP) LIST(PRINT) - 
  ...
//TEMPCNTL DD * 
   INCLUDE COND=(5,8,CH,EQ,C'JOBINIT ') 
//

Where I just had to change the data in italics – and not the boiler plate.

For each RACF record type, I had a different JCL member, based on the above file.
To select SMF records with a date and time range, I just edited member RACFSMF, and submitted the jobs, and they all used it.

This was easy to do and it let me focus on the problem – rather than on the JCL.

Why did my certificate mapping go wrong?

I had a working mapping for a Linux generated certificate to a z/OS userid. And then it wasn’t working. It took me 2 days before I had enlightenment. Although I had undone all of the changes I had made – well all but one.

I had defined

//IBMRACF  JOB 1,MSGCLASS=H 
//S1  EXEC PGM=IKJEFT01,REGION=0M 
//SYSPRINT DD SYSOUT=* 
//SYSTSPRT DD SYSOUT=* 
//SYSTSIN DD * 
RACDCERT DELMAP(LABEL('colinpaice'))ID(IBMUSER) 
RACDCERT MAP ID(IBMUSER)  - 
   WITHLABEL('colinpaice') - 
   SDNFILTER('CN=colinpaice.O=cpwebuser.C=GB') 
SETROPTS RACLIST(DIGTNMAP, DIGTCRIT) REFRESH 
racdcert listMAP id(IBMUSER) 
/*

Which says it the certificate with Subject: C = GB, O = cpwebuser, CN = colinpaice come in, then it maps to IBMUSER. Yes, the terms are in a different order, and there are “.” instead of “.” but it worked.

I started working with JSON Web Tokens (JWT), and it stopped working. The userid was coming out as IZUSVR – which is the userid of z/OSMF. I struggled with traces, and wrote my own little program to map the certificate to a userid – but still it was IZUSVR.

The enlightenment.

With JWT they are signed by a private key, and the public key is used to check the signature (that is check the checksum of the data is valid). To do this, the keyring needs the certificate in the keyring.
I was lazy and used the same certificate to sign the JWT, as I used to do certificate logon to z/OSMF.

To put the certificate in the keyring you need to import the certificate. I copied the certificate from Linux, using cut and paste and imported it

I used

//IBMRACF2 JOB 1,MSGCLASS=H 
//S1  EXEC PGM=IKJEFT01,REGION=0M 
//SYSPRINT DD SYSOUT=* 
//SYSTSPRT DD SYSOUT=* 
//SYSTSIN DD * 
RACDCERT CHECKCERT('COLIN.COLIN.PAICE.PEM') 
RACDCERT DELETE  (LABEL('COLINPAICE')) ID(IZUSVR) 
RACDCERT ADD('COLIN.COLIN.PAICE.PEM')  - 
   ID(IZUSVR)  WITHLABEL('COLINPAICE') TRUST 
                                                             
                                                             
RACDCERT ID(IZUSVR) CONNECT(RING(CCPKeyring.IZUDFLT)  - 
                            USAGE(CERTAUTH)  - 
                            LABEL('COLINPAICE') - 
                              id(IZUSVR)) 
                                                             
SETROPTS RACLIST(DIGTCERT,DIGTRING ) refresh 
/*

This imports the certificate and associates it with the specified userid, ID(IZUSVR).
Now, when the certificate arrives as part of the certificate logon to z/OSMF, it checks to see if it is in the RACF data base – yes it is – under userid IZUSVR. It does not use the RACDCERT MAP option.

I reran this job with userid ADCDB – and the JWT had ADCDB in the definition.

To make it more complex, the Liberty Web Server within z/OSMF caches some information, and this complicated the diagnosis. In the evening it worked – next morning after IPL – it didn’t!

Lesson learned

Use one certificate for certificate logon, and another certificate for JWT.

MQWEB and passtickets

The RACF PassTicket is a (one-time-only/short duration) password that is generated by a requesting product or function. It is an alternative to the RACF password.
You create a passticket specifying the userid and the application, and a one off password is generated. You can specify a validity period.

By default the passticket has replay protection – in that once used, the passticket cannot be used again, and so prevent replay. You can allow a passticket to be used more than once either by specifying APPLDATA(‘NO REPLAY PROTECTION’) for basic pass tickets, or REPLAY(YES) for enhanced pass tickets.

The server can use the function __login__applid() (or similar function) to run a thread as the specified userid. You pass the userid, password (pass ticket) and the application to use.

The MQWeb server is code running on top of Liberty Web server.

For my MQWeb server, running as started task CSQ9WEB, it was configured so my mqweb/mqwebuser.xml configuration file had <safCredentials profilePrefix=”MQWEB“…./>

I created a passticket for my userid COLIN, and application MQWEB, and I was able to logon to the the MQWEB server using userid COLIN and with the pass ticket as my password.

Creating and using pass tickets on z/OS.

As part of looking into secure way of logging on to z/OS, I looked into pass tickets (because Zowe can generate a pass ticket to connect to other sub-systems). I set up the simplest (and oldest) pass ticket configuration. The best practice is to use enhanced pass tickets, and store values encrypted. With enhanced pass tickets you can specify the validity period of the pass ticket – it defaults to 10 minutes. I wanted the easiest way, so I used the older technique.

With thanks to Philippe Richard for his many comments, I’ve incorporated them in the post.

I had the usual struggles with getting the C program to work, but overall it was quite easy.

I successfully used the RACF callable function IRRSPK00 R_ticketserv (IRRSPK00).

You pass a userid and an application name and the service returns a temporary, time limited, password for that userid and application.

The application name depends on what system you are logging on to. I submitted a job from TSO on system with SYSID S0W1, and the application name was TSOS0W1. You cannot use a pass ticket for TSO on a CICS system, because the application names will not match.

When you are under TSO and enter submit the application is still TSO, so use TSOS0W1.

If, for instance, you try to submit a job through the internal reader, then it will use application MVSS0W1.

For example:

//INTRDRS1 EXEC PGM=IEBGENER 
//SYSUT1 DD DSN=PASSTIKT.ENHC.JCL(REFRESH), 
// DISP=SHR 
//SYSUT2 DD SYSOUT=(,INTRDR) 
//* 
//SYSPRINT DD SYSOUT=* 
//SYSIN DD DUMMY

and in member REFRESH, you have a job with userid=racf admin user, password= where you substitute the passticket, like:

//SYSADMX JOB 30000000,’MVS JOB CARD ‘,MSGLEVEL=(1,1), 
// CLASS=A,MSGCLASS=Q,NOTIFY=&SYSUID,TIME=1440,REGION=0M, 
// USER=SYSADM,PASSWORD=PSEG7TXM, 
// JOBRC=MAXRC 
//IEFPROC EXEC PGM=IKJEFT01,REGION=4M,DYNAMNBR=10 
//SYSTSPRT DD SYSOUT=* 
//SYSTSIN DD *
  SETROPTS RACLIST(PTKTDATA) REFRESH

It will use MVSS0W1 as the application ID.

Andrew Mattingly has written a very well detailed blog on passtickets which is well worth a read.

It described with ample details the algorithm and the various techniques to generate pass-tickets.

Security definitions

The security definitions are in two parts

The profile for using a pass ticket, for example, who can use a ticket for logging on to TSO,
The profile for which userids can create a pass ticket.

Who can use a pass ticket with which application

You can limit who can use the application, for example

a profile just TSOS0W1,
or members of group SYS1 profile TSOS0W1.SYS1,
or a userid COLIN in group SYS1, profile TSOS0W1.SYS1.COLIN

Example definitions for TSOS0W1 profile

RDEFINE PTKTDATA TSOS0W1  SSIGNON(KEYMASKED(7E4304D681920260)) - 
    APPLDATA('NO REPLAY PROTECTION')

SETROPTS RACLIST(FACILITY,PTKTDATA) REFRESH

The server, TSO in this case, can use the function __login__applid() to run the thread as the specified userid. You pass the userid, password (pass ticket) and the application to use (TSOS0W1).

Who can define which pass tickets?

You have to define a RACF profile for the application name, and a profile for userids than can generate a pass ticket for that application.

RDEFINE PTKTDATA   IRRPTAUTH.TSOS0W1.*  UACC(NONE)
PERMIT IRRPTAUTH.TSOS0W1.* CLASS(PTKTDATA) ID(COLIN) ACCESS(UPDATE) 
PERMIT IRRPTAUTH.TSOS0W1.* CLASS(PTKTDATA) ID(IBMUSER)ACCESS(UPDATE) 
SETROPTS RACLIST(PTKTDATA) REFRESH

The above statements define a profile for defining pass ticket with the TSOS0W1 application.

Userids COLIN and IBMUSER can define pass tickets for this application.

What can you use to generate a pass ticket?

RCVTPTGN service
R_GenSec ((IRRSGS00 or IRRSGS64): Generic security API interface
R_ticketserv (IRRSPK00): Parse or extract
IRRPassTicket Java class and search for IRRPassTicket

My application code

See C calling a function setting the high order bit on, and passing parameters for a discussion about calling the callable service, and passing the parameters.

 //   Code to generate a pass ticket 
 #pragma linkage(IRRSPK00 ,OS) 
 #pragma runopts(POSIX(ON)) 
 /*Include standard libraries */ 
  #include <stdio.h> 
  #include <stdlib.h> 
  #include <string.h> 
  #include <stdarg.h> 
  #include <iconv.h> 
                                                                 
int main( int argc, char *argv??(??)) 
  { 
     if (argc != 3) 
     { 
        printf("Syntax is %s userid applid\n",argv[0]); 
        return 12 ; 
     } 
     if (strlen(argv[1]) >  8) 
     { 
        printf("length of userid must be <= 8\n"); 
        return 12; 
     } 
     if (strlen(argv[2]) > 8) 
     { 
        printf("length of applid must be <= 8\n"); 
        return 12; 
     } 
                                                                 
     char work_area[1024]; 
     int Option_word = 0; 
     int rc; 
     long SAF_RC,RACF_RC,RACF_RS; 
     SAF_RC=0 ; 
     long ALET = 0; 
     short Function_code= 3; 
     struct { 
       short length; 
       char value[8]; 
     } appl; 
     struct { 
       short length; 
       char value[8]; 
     } userid; 
     struct { 
       short length; 
       char value[20]; 
     } ticket; 
     ticket.length=20; 
     char * u= argv[1] ; 
     strncpy(&userid.value[0],u,8); 
     userid.length =strlen(u); 
     char * pAppl = argv[2]; 
     strncpy(&appl.value[0],pAppl,8); 
     appl.length =strlen(pAppl); 

     int Ticket_options = 1; 
     int * pTO = & Ticket_options; 
                                                          
     rc=IRRSPK00( 
          &work_area, 
          &ALET , &SAF_RC, 
          &ALET , &RACF_RC, 
          &ALET , &RACF_RC, 
          &ALET , &RACF_RS, 
          &ALET ,&Function_code, 
          &Option_word, 
          &ticket, // length followed by area 
          &pTO, 
          &userid, 
          &appl 
          ); 
     printf("return code SAF %d RACF %d RS %d  \n", 
     SAF_RC,RACF_RC,RACF_RS  ); 
     if (SAF_RC == 0) 
     { 
      int l = ticket.length; 
      printf("Pass ticket:%*.*s\n",l,l,ticket.value);           
     } 
  return SAF_RC; 
                                                               
}

The compile JCL was

//IBMPASST   JOB 1,MSGCLASS=H,COND=(4,LE) 
//S1          JCLLIB ORDER=CBC.SCCNPRC 
// SET LOADLIB=COLIN.LOAD 
//DOCLG       EXEC   PROC=EDCCB,INFILE='COLIN.C.SOURCE(TICKET)', 
//            CPARM='OPTF(DD:COPTS)' 
//COMPILE.ASMLIB DD  DISP=SHR,DSN=SYS1.MACLIB 
//COMPILE.COPTS DD * 
LIST,SOURCE 
aggregate(offsethex) xref 
SEARCH(//'ADCD.C.H',//'SYS1.SIEAHDR.H') 
TEST 
ASM 
RENT ILP32        LO 
OE 
NOMARGINS EXPMAC   SHOWINC XREF 
LANGLVL(EXTENDED) sscom dll 
DEFINE(_ALL_SOURCE) 
DEBUG 
/* 
//BIND.SYSLMOD DD DISP=SHR,DSN=&LOADLIB. 
//*IND.SYSLIB  DD DISP=SHR,DSN=&LIBPRFX..SCEELKED 
//*IND.OBJLIB  DD DISP=SHR,DSN=COLIN.OBJLIB 
//BIND.CSS    DD DISP=SHR,DSN=SYS1.CSSLIB 
//BIND.SYSIN DD * 
   INCLUDE CSS(IRRSPK00) 
   NAME  TICKET(R) 
/* 
//START1   EXEC PGM=TICKET,REGION=0M,PARM='ADCDB TSOS0W1' 
//STEPLIB  DD DISP=SHR,DSN=&LOADLIB 
//SYSERR   DD SYSOUT=*,DCB=(LRECL=200) 
//SYSERROR DD SYSOUT=*,DCB=(LRECL=200)
//SYSOUT   DD SYSOUT=*,DCB=(LRECL=200) 
//SYSPRINT DD SYSOUT=*,DCB=(LRECL=200) 
//CEEDUMP  DD SYSOUT=*,DCB=(LRECL=200) 
/&

Problems

I could not get R_GenSec (IRRSGS00 or IRRSGS64): Generic security API interface RACF callable services to work because of the 31 bit program, and the service expecting 64 bit addresses.

This blog post has code which uses R_GenSec in 64 bit C.

Mapping a certificate to a userid and so avoid needing a password is good – but…

You can use the RACDCERT MAP command to map a certificate to a userid, and so avoid the need for specifying a password. Under the covers code uses the pthread_security_np and pass a certificate, or a userid and password, and if validated, the thread becomes that userid, just the same as if the userid was logged on.

Is this secure?

If you store a userid and password on your laptop, even though the data may be “protected” someone who has access to your machine may be able to copy the file and so impersonate you.

With a public certificate and private key, if someone can access your machine, they may be able to copy these files and so impersonate you.

You can get dongles which you plug into your laptop on which you can store protected data. In order to use the data, you need the physical device.

You need to protect the RACF command

Because the RACFCERT command has the power to be dangerous, you need to protect it.

You do not want someone to specify their certificate maps to a powerful userid, such as SYS1. The documentation says

To issue the RACDCERT MAP command, you must have the SPECIAL attribute or sufficient authority to the IRR.DIGTCERT.MAP resource in the FACILITY class for your intended purpose.

For a general user to create a mapping associated with their own user ID they need READ access to IRR.DIGTCERT.MAP.

For a general user to create a mapping associated with another user ID or MULTIID, they need need UPDATE access to IRR.DIGTCERT.MAP.

What’s the best way to set this up?

I think that as part of your process for setting up userids, the process should create the mapping for the certificate to a userid. This way you do not have people creating the mapping. If a mapping already exists, you cannot create another mapping.

You may want an automated process which checks the approval, and issues the commands, and so you do not have humans with the authority to issue the commands.

Of course you’ll have a break-glass all powerful userid in case of emergencies.

But….

Even though the password had expired, I could logon using the certificate. If I revoked the userid the logon failed.

I used certificate logon from z/OSMF and issued console commands. The starts a TSO address space, and z/OSMF passes the commands and responses to the tso address space.

Once a TSO address space has been started, there are no more checks to see if the userid is still valid.

If you want to inactivate the userid, you’ll need to revoke it, and then cancel all the TSO address spaces running on behalf of the userid. Walking someone off site is not good enough. There may be scripts which are automated, and will logon with no human intervention.
TSO address spaces may be configured to be cancelled if there is no activity. If the TSO address space is kept busy, (for example by sending it requests) it may never be forced off.

Giving a started task userid a password should be a sackable offence.

Last week I was going through some product documentation, and I got to the part where it said “Now change the started task userid, and give it a password”. I made a note to raise a defect on this, because this is a no-no.
This week someone asked on IBM-MAIN, the impact of giving a started task userid a password. There were many well informed comments covering things I didn’t know about, so I thought I’d make a blog post of of the comments.

Best practices dictate that passwords only be provided when a userid will be used by a specific person and that person needs a password for that userid. All other userids should never have a password.

A userid can be revoked because of too many invalid password attempts, or revoked because of inactivity. If a userid is revoked it cannot logon. If the userid of a started task is revoked, the started task may start, but it may be restricted as to what it can do – because it cannot logon.

You can alter a userid to have NOPASSWORD (and have no PHRASE). This means started tasks can start, but the userid cannot be used to logon to the system. This is known as making the userid PROTECTED.

Often started task userids have special capabilities, such as running as different userids, being able to set security options, or modify system storage. This means you do not want your Help Desk staff from adding or resetting passwords for protected userid. Changes to these protected userids should be done from a secure, limited access userid.

If you think through the impact of using started tasks. It may be better for all routine production jobs to be run as started tasks. This has the advantage that there are no passwords involved, and you can use automation to issue the start command based on a timer.

You might have a CICS started task userid for all CICS regions or just for a subset of regions. You might have one started task userid for all started tasks, or a started task userid for each logical instance, eg CICS, MQ, DB2, Zowe, TCP/IP etc.

System jobs

System jobs should be run as started tasks, and the started tasks should be protected

Personal jobs

You can submit jobs from your userid, (and not specify a password) and the job will run under your userid.

You can put USER=name,PASSWORD=… on a job card, and if these validate the job will run with the specified userid. This is not a good idea, as the password may be visible in the dataset.

Departmental userids

You can put USER=name and omit the password, and use surrogate checking. The documentation says

You can allow the use of surrogate users. A surrogate user is a RACF-defined user who has been authorized to submit jobs on behalf of another user (the execution user) without specifying the execution user’s password. Jobs submitted by a surrogate user run with the identity of the execution user. For example, if user JOE submits a job with the following JOB statement, JOE is the surrogate user and TOM is the execution user:

JOE can submit userid containing

//jobname JOB 'accounting-information',USER=TOM

You set up a security profile (see the documentation) to control which userid can specify a userid on the JOB USER= statement.

All access checks are done with TOM’s user ID.

The TOM userid can be a protected userid – without a password, if surrogates are used.

To set up surrogates, defined the profile, and give a group access to the profile, rather than give userids access. You are likely to have a group already defined. Administration, such as when someone leaves the department, is much easier, as you just need to remove the persons userid from the group.

Thanks to

Robert S. Hansel, Seymour J Metz, Steve Beaver, Jack Zukt, Mike Schwab, Jon Perryman for their comments.

Jump up and down: Do not give userids access to resources!

I am doing some work connecting subsystems together. The documentation for all the products involved, describe giving a userid permission to access to a resource.

This is not best practice. Like many things it will be obvious once you understand it. You need to look at the bigger pictures.
The documentation says for userids who want to use “this”, give them access to the profile “…”. What’s wrong with that?

If you have a department of 1000 people, giving them all access to the resource will be tedious
There are likely to be several resources people need access to, so connecting these 1000 userids to multiple resources will be even more tedious.
Someone joins your department – so you need to connect their userid to the long list of groups.
Someone leaves your department – you cannot trivially ask what resources can this userid access – you have to look at the access list for each resource and remove the userid from the group.

You most probably have groups set up already. Rather than give the userid access, give the group access to the resource.

If someone joins your group – you connect their userid to the groups – and they have access. If someone leaves your department, remove them from the groups – and they no longer have access to the resources.

You my want to support a new product… That’s easy – give the group(s) access to the resources – and the people will auto-magically get access.

As I said it is obvious once you see it.

Do not give userids access to resources – give groups access, and connect the userid to the group. I’ll go and raise documentation comments on the products’ documentation.

Zowe: Planning – certificates

A private key is used for encryption and should be kept private. If someone has access to the private key they can impersonate you.

A public key is the opposite of the private key. It is used for decrypting data encrypted with the private key. The public key can (and should) be generally available.

Certificates are used in TLS for authentication and encryption, and can be used for identification. They include a public key.

A Certificate Authority(CA) certificate is used to validate other certificates. It involves doing a checksum of a certificate, and encrypting it with the CA private key – a process known as signing the certificate

To validate a certificate, the recipient needs a copy of the CA which was used to sign the original certificate. It can the decrypt the encrypted checksum, and compare it with the certificates checksum.

If you create a Certificate Authority certificate you will need to distribute this to all machines that might communicate with your server(possibly thousands of machines), and installed into the keystore on those machines. This can be a big task. Most system have one site wide CA certificate which is distributed to all machines. You might have a second CA to limit access to a system. This CA is used to sign the server certificates.

Creating certificates

As part of creating a certificate you create the private/public keys. There are different algorithms for these keys. Some are stronger than others (it takes more time and CPU to break them) Keys with elliptic curve algorithms are generally stronger than using RSA techniques, and there other techniques resistant to Quantum Computing.

You have to use a server certificate with RSA and keysize 2048.

I found that authentication with JWT (Java Web Tokens) only worked with RSA keys and not Elliptic Curves. This is because of encryption with JWT.

Key stores

You should use keyrings rather than .pem files, as they are more secure. .pem files can be copied, and anyone with authority can copy them. You have to give explicit permission to be allowed to access a keyring. Certificate and the private keys within a keyring can be stored in cryptographic hardware, and the private keys are never exposed in clear text.

Many systems uses a keystore for storing the private key used by the server, and a trust store for the Certificate Authority keys needed to validate any client certificate sent to the server. This can, but is not recommended, be the same as the keystore.

The keystore will need the server key. You can specify which key should be used.

the Certificate Authority keys needed to validate the server key.
the server userid needs access to the keyring. If the private key belongs to the server’s userid, then the server’s userid needs read access to the keyring. If the private key belongs to a different userid, the server’s userid needs update access to the keyring. See here for more information.

The trust store needs all the CA certificates which may be needed to verify a client certificate.

The client machine will have one or more certificates. A copy of the CA used to create these needs to be installed in the trust store on the server.

I understand that if you add a certificate to the trust store, you need to restart Zowe for it to be picked up, so try to get a list of all the CAs you will need before you start.

Validating certificates

The trust store needs the certificates to validate any client certificates sent to the server. This will usually just be Certificate Authority certificates.

Zowe works with z/OSMF. They communicate with certificates. The z/OSMF trust store keyring needs the CA of the Zowe server certificate, and the Zowe trust store keyring needs the CA of the z/OSMF server key. If they are not set up properly you can get messages “LoadBalancer does not contain an instance for the service zaas”.

If you have a site wide CA, which is the same for both of the server keys, then you do not need to do any more work, as both trust store keyrings will already have the CA. If Zowe and z/OSMF have different CA certificates, the CA certificates need to be connected to the other keyrings.

Subject Alternative Name

The Subject Alternative Name within a certificate provides the IP addresses, or IP Names of the server. A client can check this address with the IP address of the session, and terminate the session if they do not match. This is considered best practice. This check can be disabled in Zowe by using verifyCertificates NOSTRICT in the zowe.yaml file.

RACF allows one name or address when the certificate is defined, for example ALTNAME(IP(127.0.0.1))

On my z/OS system the command TSO NETSTAT HOME gives three addresses 127.0.0.1 and 10.1.1.2 and 10.1.2.6.

You can configure your sysplex, so all systems in the sysplex have the same IP address, and traffic gets routed internally to the correct system. Without this, if you start a server on a different LPAR it will have a different IP address, and so the validation will fail.

On my system, the zOSMF certificate did not have an ALTNAME specified, and so failed the Zowe checks. I had to set the Zowe option verifyCertificates NOSTRICT for it to work, until I fixed the certificate.

If the z/OSMF certificate has an ALTNAME(IP…) specified, use the IP address value when you configure zOSMF for example

zOSMF: 
  host: 127.0.0.1 
  port: 10443 
  applId: IZUDFLT

Mapping certificates

If you are using client certificate to authenticate rather than a userid and password, then you’ll need to map certificates to userids, for example with the RACDCERT MAP command. You can specify which CA the certificates were signed with, and fields from the subject Distinguished Name. The question of which is better: using userid and password, or client certificate to logon has no easy answer

It is easy for a hacker to get a password (lost handbag, yellow sticky stuck to the screen, a couple of pints down the pub) It is easy to change a password once is is compromised.
It is harder for a hacker to get a certificate – but it is harder to change and re-issue a certificate. You have to get the updated certificate down to the client’s machine. It could be stolen if a hacker has access to the machine.
Using Biometric data to logon is the ultimate limit in this area. Hackers could steal it – but there is no way of changing it if it is compromised!

Decisions

You need to decide

Are you going to have a CA just for Zowe? or reuse the site CA.
- If you are going to have a Zowe specific CA – how are you going to distribute it to all the client machines.
- You’ll need to ensure Zowe and z/OSMF have the other’s CA certificates in their trust store.
Are you going to use Subject Alternative Name ALTNAME(IP(10.1.1.2))
- What value of verifyCertificates STRICT|NOSTRICT are you going to specify in the zowe.yaml file.
Are you going to authenticate using certificates. You will need to set up mapping from certificate to userid. RACDCERT MAP
If you will be using JWT, and so need an RSA key in the server certificate

Why can’t java use my key ring?

I had a problem with z/OSMF. I configured it to use an exiting keyring, but it consistently refused to use it. I had messages like

[WARNING ] CWPKI0809W: There is a failure loading the defaultKeyStore keystore. If an SSL configuration references the defaultKeyStore keystore, then the SSL configuration will fail to initialize.

This blog post covers how I debugged this situation.

What seemed strange was this only occurred when an Elliptic Curve certificate was being used – and not an RSA certificate.

Even more curiouser was the documentation mentioned access to the <ringOwner>.<ringName>.LST resource in the RDATALIB class. See here. I didn’t have this defined and yet RSA certificates would work! So curiouser and curiouser (or for the people who like correct grammar, curiouser and more curiouser).

All applications needing access to certificates and private keys use the R_datalib callable service.

The bottom line

z/OSMF has userid IZUSVR
I had a keyring and used two certificates
- An RSA certificate, CCPKeyring.IZUDFLT, belonging to userid IZUSVR – based on the sample JCL provided by z/OSMF
- An existing Elliptic Curve certificate NISTEC224 belonging to userid COLIN. This works else where.
Without <ringOwner>.<ringName>.LST defined the class(RDATALIB) the RSA certificate worked
Without <ringOwner>.<ringName>.LST defined the class(RDATALIB) the Elliptic Curve certificate failed
Once I found the problem I defined <ringOwner>.<ringName>.LST in class(RDATALIB), and gave the userid IZUSVR Update access to it – and the Elliptic curve worked
The reasons (being wise after the event)
- R_datalib checks access on one profile in the RDATALIB class first – <ringowner>.<ringname>.LST. If there is none, it will fall back to check on two profiles in the FACILITY class – IRR.DIGTCERT.LISTRING and IRR.DIGTCERT.GENCERT. If the certificate is not owned by the accessing ID (except CERTAUTH or SITE), RDATALIB class has to be used for private key access.
- This is true for the RSA certificate, used the IRRDIGTCERT.LISTRING class(FACILITY) and had access. So this worked.
- For the Elliptic Curve, the caller’s userid (IZUSVR) is not the associated with the certificate (COLIN) so this fails, and the logic drops through to the RDATALIB checking.
- The caller’s user ID has READ or UPDATE authority to the ..LST resource in the RDATALIB class. READ access enables retrieving one’s own private key, UPDATE access enables retrieving other’s. The ring did not exist, and so this access was not given.

How did I debug this? – Using Java trace

Adding configuration to z/OSMF

I copied /global/zosmf/configuration/local_override.cfg to /global/zosmf/configuration/local_override.colin

I edited/global/zosmf/configuration/local_override.cfg and changes the JVM options line to

JVM_OPTIONS=”-Xoptionsfile=’/global/zosmf/configuration/local_override.colin'”

I edited the local_override.colin, deleted all but the JVM options line, then split the line at \n so it looks like

-Dcom.ibm.ws.classloading.tcclLockWaitTimeMillis=300000
-Xscmx150M
-Xquickstart

Add debug information to the configuraton file

I added

-Djava.security.auth.debug=pkcs11keystore
-Dlog.level=Error

The output

[err] Jan 17, 2025 8:18:52 AM com.ibm.crypto.ibmjcehybrid.provider.HybridRACFKeyStore engineLoad 
TRACE: Loading keyring CCPKeyring.IZUDFLT as a JCECCARACFKS type keystore. 
...
[err] Jan 17, 2025 8:19:02 AM com.ibm.crypto.hdwrCCA.provider.RACFInputStream getEntry 
FINER: The private key of NISTEC224 is not available or no authority to access the private key 
[err] Jan 17, 2025 8:19:02 AM com.ibm.crypto.ibmjcehybrid.provider.HybridRACFKeyStore engineLoad 
TRACE: Error loading and storing certificates and key material from underlying JCECCARACFKS keyring CCPKeyring.IZUDFLT 
java.io.IOException: The private key of NISTEC224 is not available or no authority to access the private key . This can be expected if the IBMJCECCA is not setup correctly or 
ICSF is down. Will now attempt to load the keyring as a JCERACFKS keyring.

Which is not a very helpful message.

How did I debug this? – Using RACF trace

R_datalib is the callable service to ALL the exploiters which need access to a RACF keyring (certificates and private keys). It is r_datalib or its alias irrsdl00 with callable type number 41.

Enable the RACF trace

#SET TRACE(CALLABLE(TYPE(41))JOBNAME(IZU*))

Start GTF

S GTF.GTF,M=GTFRACF

This reported

IEF403I GTF - STARTED - TIME=08.17.03                                  
IEF188I PROBLEM PROGRAM ATTRIBUTES ASSIGNED                            
AHL121I  TRACE OPTION INPUT INDICATED FROM MEMBER GTFRACF  OF PDS      
USER.Z24C.PROCLIB                                                      
TRACE=USRP                                                             
USR=(F44)                                                              
END                                                                    
AHL103I  TRACE OPTIONS SELECTED --USR=(F44)                            
AHL906I THE OUTPUT BLOCK SIZE OF    27998 WILL BE USED FOR OUTPUT 702  
        DATA SETS:                                                     
          SYS1.TRACE

I started z/OSMF until it failed.

Stop GTF

p GTF 
AHL006I GTF ACKNOWLEDGES STOP COMMAND                    
AHL904I THE FOLLOWING TRACE DATASETS CONTAIN TRACE DATA :
          SYS1.TRACE

Use IPCS to look at the dump, using command GTF USR(ALL). Go to the bottom of the output, use the command report view. This gives an ISPF edit session.

x all
f ‘RACF Reason code:’ all
- You are interested in the non zero codes. “Label” each line of interest using the line prefix command .a, .b etc.
reset
loc .a
- This will position you by the labelled line. Look up the RACF return and reason codes here. I had Reason Code 2c, which is decimal 44. Look for the keyring, or other information. I do not know which data tells you which sub operation r_datalib was doing, but for me it had the keyring name “CCPKeyring.IZUDFLT “. The description in the reason code documentation does not cover the situation of not having update access to the keyring, so I’ve raised a doc comment on it.