Configuring and using the RMF GPM Server

RMF provides information on the usage of system resources, such as CPU, Channel usage, Disk response time etc. You can get reports from an attached 3270 screen, from a web server, and from a REST request.

For the web server and REST requests, you need the GPM server running. It took me a while to get this running, and to get useful data out of it.

GPMServer uses basic authority checking of userid and password. Alternatively it can use certificates from the client to authenticate on z/OS.

There are two versions of GPMSERVE. It looks like the newer one is written in Java. I only have access to the old version.

GPM Setup

I used

//GPMSERVE PROC MEMBER=00 
//STEP1    EXEC PGM=GPMDDSRV,REGION=128M,TIME=1440, 
//         PARM='TRAP(ON)/&MEMBER' 
//*        PARM='TRAP(ON),ENVAR(ICLUI_TRACETO=STDERR)/&MEMBER' 
//* 
//*STEPLIB DD   DISP=SHR,DSN=CEE.SCEERUN 
//*        DD   DISP=SHR,DSN=CBC.SCLBDLL 
//GPMINI   DD   DISP=SHR,DSN=SYS1.SERBPWSV(GPMINI) 
//GPMHTC   DD   DISP=SHR,DSN=SYS1.SERBPWSV(GPMHTC) 
//GPMPPJCL DD   DISP=SHR,DSN=SYS1.SERBPWSV(GPMPPJCL) 
//CEEDUMP  DD   SYSOUT=* 
//SYSPRINT DD   SYSOUT=* 
//SYSOUT   DD   SYSOUT=* 
//         PEND

CACHESLOTS(4)                   /* Number of timestamps in CACHE     */ 
DEBUG_LEVEL(3)                  /*   informational messages        */ 
SERVERHOST(10.1.1.2) 
HTTPS(ATTLS) /* AT-TLS setup required */ 
MAXSESSIONS_HTTP(20)            /* MaxNo of concurrent HTTP requests */ 
HTTP_PORT(8803)                 /* Port number for HTTP requests     */ 
HTTP_ALLOW(*)                   /* Mask for hosts that are allowed   */ 
HTTP_NOAUTH()                   /* No server can access without auth.*/ 
CLIENT_CERT(NONE) 
/* CLIENT_CERT(ACCEPT) */

The essence of my AT-TLS definitions is (from my Easy-ATTLS)

LocalPortRange : 8803
Direction : Both
ApplicationControlled : Off
TTLSEnabled : On
CtraceClearText : On
Trace : 2
HandshakeRole : Server
Keyring : start1/TN3270
TLSv1.1 : Off
TLSv1.2 : On
TLSv1.3 : Off
HandshakeTimeout : 3
ClientECurves : Any
ServerCertificateLabel : NISTECCTEST
V3CipherSuites : [
   1302  TLS_AES_256_GCM_SHA384,
   1301  TLS_AES_128_GCM_SHA256,
   003D  TLS_RSA_WITH_AES_256_CBC_SHA256,
   C02C  TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
 ]

I used CtraceClearText : On so I could trace the flows and see the encrypted traffic.

The Chrome browser used ECDHE* cipher specs. I had specified C02C TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, and I could this was being used.

The Chrome browser prompted for userid and password which was passed up to the server.

Issuing commands

You start the server with

S GPMSERVE

If it abends with

IEF450I GPMSERVE GPMSERVE - ABEND=S0C4 U0000 REASON=00000011

Check RMF is active. And check you have issued F RMF,START III to start the data collection.

You stop the server

p gpmserve

You can display information about the server

f gpmserve,display

The newer version of GPMSERVE uses commands like F GPMSERVE,APPL=DISPLAY

The output is like

+GPM062I DDS-REFR 01/02 084125 CYCLE=314. WAITING 10 SEC
+GPM062I HTTP-LIS 01/02 084119 MAX=20 ACTIVE=0 SUSPEND=1
+GPM062I RMF_DDS_ATTLS 01/02 074900 STARTING …
+GPM062I RMF_DDS_OPTS 01/02 074900 STARTING …
+GPM062I HTTP-CLI 01/02 083219 ::FFFF:10.1.0.2 TERMINATED. SUSPENDED.

Where 01/02 is Jan 2nd. 074900 is 07:49:00

Certificate and keyring set up

I reused an existing keyring. The AT-TLS definitions give the keyring is start1/TN3270 and the certificate to use is NISTECCTEST.

List the ring contents

tso RACDCERT listring(TN3270) id(START1)

The keyring included the CA for my NISTECCTEST certificate, and the CA for the client’s certificate (on Linux).

My certificate authentication to work, I needed the client certificate connected to the keyring.

On Linux I had

ca256.pem the Certificate Authority
colinpaice.pem

I FTPed these to z/OS as VB data sets, COLIN.CA256.PEM, and COLIN.PAICE.PEM.

Import the CA into z/OS

//IBMRACFI JOB 1,MSGCLASS=H 
//S1  EXEC PGM=IKJEFT01,REGION=0M 
//SYSPRINT DD SYSOUT=* 
//SYSTSPRT DD SYSOUT=* 
//SYSTSIN DD * 
RACDCERT CHECKCERT('COLIN.CA256.PEM') 
RACDCERT DELETE - 
  (LABEL('CA256')) CERTAUTH 
RACDCERT CERTAUTH     ADD('COLIN.CA256.PEM')       - 
  WITHLABEL('CA256')  TRUST 
RACDCERT CERTAUTH LISTCHAIN(LABEL('CA256')) 
                                                          
RACDCERT CONNECT(CERTAUTH   LABEL('CA256') - 
  RING(TN3270)               ) ID(START1) 
SETROPTS RACLIST(DIGTNMAP, DIGTCRIT) REFRESH 
/*

and import the users .pem file.

//IBMRACFI JOB 1,MSGCLASS=H 
//S1  EXEC PGM=IKJEFT01,REGION=0M 
//SYSPRINT DD SYSOUT=* 
//SYSTSPRT DD SYSOUT=* 
//SYSTSIN DD * 
RACDCERT CHECKCERT('COLIN.PAICE.PEM') 
RACDCERT DELETE - 
  (LABEL('RMFCERT')) ID(COLIN) 
RACDCERT ID(COLIN)    ADD('COLIN.PAICE.PEM')        - 
  WITHLABEL('RMFCERT')  TRUST 
RACDCERT ID(COLIN) LISTCHAIN(LABEL('RMFCERT')) 
RACDCERT  ID(START1)  CONNECT(ID(COLIN )    LABEL('RMFCERT') - 
  RING(TN3270)) 
SETROPTS RACLIST(DIGTNMAP, DIGTCRIT) REFRESH 
/*

When a user connects with a certificate, GPMSERVE looks in the keyring for the passed certificate, and finds the userid for it.

Setting up the security profiles

You need to set up a CLASS(APPL) profile for GPMSERVE. Give any authorised userids read access to the profile.

//IBMRACF  JOB 1,MSGCLASS=H 
//S1  EXEC PGM=IKJEFT01,REGION=0M 
//SYSPRINT DD SYSOUT=* 
//SYSTSPRT DD SYSOUT=* 
//SYSTSIN DD * 
* Delete and redefine the profile
* List it first
RLIST APPL        GPMSERVE    authuser 
RDELETE APPL GPMSERVE 
SETROPTS RACLIST(APPL) refresh 
RDEFINE APPL GPMSERVE UACC(NONE)  NOTIFY(COLIN) 
PERMIT GPMSERVE CLASS(APPL) ID(IBMUSER) ACCESS(READ) 
PERMIT GPMSERVE CLASS(APPL) ID(COLIN  ) ACCESS(READ) 
PERMIT GPMSERVE CLASS(APPL) ID(ADCDB  ) ACCESS(NONE) 
SETROPTS RACLIST(APPL) refresh 
RLIST APPL        GPMSERVE    authuser 
SETROPTS RACLIST(APPL) refresh 
/*

I specified RDEFINE APPL GPMSERVE UACC(NONE) NOTIFY(COLIN) so the userid COLIN gets notified if anyone tries to use the profile and fails. Using WARNING does not work.

Changing security

If you give a userid read permission to the CLASS(APPL) GPMSERVE profile, you need to stop and restart GPMSERVE to pick up the changes. It looks like GPMSERVE caches the access after first use, and there is no refresh security command.

When tracing a job it helps to trace the correct address space.

The title When tracing a job it helps to trace the correct address space is a clue – it looks obvious, but the problem was actually subtle.

The scenario

I was testing the new version of Zowe, and one of the components failed to start because it could not find a keyring. Other components could find it ok. I did a RACF trace and there were no records. The question is why were there no records?

The execution environment.

I start Zowe with S ZOWE33. This spawns some processes such as ZOWE335. This runs a Bash script which starts a Java program.

I start a GTF trace with

s gtf.gtf,m=gtfracf
#set trace(callable(type(41)),jobname(Zowe*))

Where callable type 41 is for r_datalib services to access a keyring.

No records were produced

What is the problem?
Have a few minute pause to think about it.

Solution

After 3 days I stumbled on the solution – having noticed, but ignored the evidence. I wondered if the Java code to process keyrings, did not use the R_datalib API, I wondered if Java 21 uses a different jar file for processing keyrings – yes – but this didn’t solve the problem.

The solution was I should have been tracing job ZWE33CS! Whoa – where did that come from?

The Java program was started with

_BPX_JOBNAME=ZWE33CS /usr/lpp/java/J21.0_64/bin/java

See here which says

When a new z/OS® UNIX process is started, it runs in a z/OS UNIX initiator (a BPXAS address space). By default, this address space has an assigned job name of userIDx, where userID is the user ID that started the process, and x is a decimal number. You can use the _BPX_JOBNAME environment variable to set the job name of the new process. Assigning a unique job name to each … process helps to identify the purpose of the process and makes it easier to group processes into a WLM service class.

If I use the command D A,L it lists all of the address spaces running on the system. I had seen the ZOWE33* ones, and also the ZWE* ones – but ignored the ZWE* ones. Once I knew the solution is was so obvious.

What is my Unix process doing?

I was familiar with the USS command ps -ef which displays output like

     UID        PID       PPID  C    STIME TTY       TIME CMD 
  WEBSRV   16842766          1  - 07:23:05 ?         0:00 -sh -c   /web/httpd1/bin/apachectl -k start -f /web/httpd1/conf/httpd.conf   -DNO_f

For Zowe threads I was getting

/u/tmp/zowep33//bin/utils/configmgr -script /u/tmp/zowep33//bin/commands/inter

which was annoyingly truncated.

The command ps -e -o args > aa gives the whole command line (up to 1024 bytes) such as

/u/tmp/zowep33//bin/utils/configmgr -script /u/tmp/zowep33//bin/commands/internal/start/component/cli.js

Another useful command when you know it.

How do I logon to ISPF and allocate my data sets?

Yes, I know you do not logon to ISPF, but the title is shorter than how do I logon to TSO, and start ISPF so my data sets are allocated as I want them.
I wrote this blog post because I was trying to use ISMF and save information into ISPF tables, but I could not use the information in the tables because my table data set was not in the ISPTLIB concatenation.

When I used TSO ISRDDN to display the data sets allocated to my TSO session I had

ISPTABL -> COLIN.S0W1.ISPF.ISPPROF
ISPTLIB -> ISP.SISPTENU        
        -> SYS1.DGTTLIB        
        -> SYS1.SBLSTBL0       
        ...

COLIN.S0W1.ISPF.ISPPROF was not in the list of data sets in the ISPTLIB concatenation.

This lead me to the question – how do I add COLIN.S0W1.ISPF.ISPPROF to the ISPTLIB concatenation?

How do I allocate my datasets to ISPF

When I logon to ISPF I get

------------------------------- TSO/E LOGON -----------------------------------
                                                                               
                                                                               
   Enter LOGON parameters below:                   RACF LOGON parameters:     
   Userid    ===> COLIN                                                        
   Password  ===>                                                             
   Procedure ===> ISPFPROC                         Group Ident  ===>           
   Acct Nmbr ===> ACCT#                                                        
   Size      ===> 2096128                                                      
   Perform   ===>                                                             
   Command   ===> ex 'colin.zlogon.clist'

You can influence what happens by specifying a different Procedure, or specifying a command in Command.

The PROCEDURE ===> ISPFPROC is JCL to start a TSO address space and allocate system wide datasets.

Once ISPF has started, you can issue the command TSO ISRDDN to display all of the datasets allocated to TSO.
The ISRDDN command member ISPFPROC will find and show you which of the allocated data sets contain the member.
it gave me

                           Current Data Set Allocation         Member was found
 Command ===>                                                  Scroll ===> PAGE
                                                                                
   Message             Act DDname   Data Set Name   Actions: B E V M F C I Q   
  Member: ISPFPROC    >_   SYSPROC  ADCD.Z31B.PROCLIB

You can enter the B command in the >_ field to browse the member directly

Aside:

The Actions: B E V M F C I Q are commands for

B Browse the first sixteen data sets or a single data set.
E Edit the first sixteen data sets or a single data set.
V View the first sixteen data sets or a single data set.
M Show an enhanced member list for the first sixteen data sets or a single data set.
F Free the entire DDNAME.
C Compress a PDS using the existing allocation.
I Provide additional data set information.
Q Display list of users or jobs using a data set.

Browse the member

This member has

//********************************************************************    
//*                                                                       
//*                 ISPF FULL-FUNCTION LOGON PROC                         
//*                                                                       
//*********************************************************************   
//ISPFPROC PROC ROOT='/usr/lpp/zosmf'  /* ZOSMF INSTALL ROOT     */       
//         EXPORT SYMLIST=(XX)                                            
//         SET QT=''''                                                    
//         SET XX=&QT.&ROOT.&QT.                                          
//ISPFPROC EXEC PGM=IKJEFT01,REGION=0M,DYNAMNBR=200,                      
//             PARM='%ISPFCL'                                             
//CEEOPTS DD *,SYMBOLS=JCLONLY                                            
 ENVAR("PATH=/bin:&XX./bin")                                              
//SYSUADS  DD  DISP=SHR,DSN=SYS1.UADS                                     
//SYSLBC   DD  DISP=SHR,DSN=SYS1.BRODCAST                                 
//SYSPROC  DD  DISP=SHR,DSN=USER.&SYSVER..CLIST                           
//         DD  DISP=SHR,DSN=FEU.&SYSVER..CLIST                            
//         DD  DISP=SHR,DSN=ADCD.&SYSVER..CLIST                           
//         DD  DISP=SHR,DSN=ISP.SISPCLIB 
...                                  
//ISPTLIB  DD  DISP=SHR,DSN=ISP.SISPTENU         
//         DD  DISP=SHR,DSN=SYS1.DGTTLIB         
... 
//SDSFMENU DD  DSN=ISF.SISFPLIB,DISP=SHR         
//ISPTABL  DD  DSN=SYS1.SMP.OTABLES,DISP=SHR

This JCL

creates the environment PATH=/bin/:/usr/lpp/zosmf/bin
Allocates lots of data sets, for example SYSPROC has USER…..CLIST depending on the value of the global symbol &SYSVER (Z31B at the moment). If I IPL a different level of z/OS it may have a different level, such as Z24C
Allocates fixed name data sets such as ISP.SISPCLIB
Allocates lots of ISPF tables for input
Allocates an SDSF menu data set
Allocates a table ISPTABL for ISPF
But does not allocate an ISPTABL for my personal tables.

In the JCL it has

//ISPFPROC EXEC PGM=IKJEFT01,REGION=0M,DYNAMNBR=200,          
//             PARM='%ISPFCL'

Which says invoke TSO (IKJEFT01) and execute the %ISPFCL Clist (or REXX).

Use PF3 to return from ISRDDN.

Where is ISPFCL?

The above JCL uses CLIST/REXX ISPFCL as a profile to do additional processing, such as allocating additional data sets.

You could allocate datasets in the ISPF JCL instead of through the CLIST – but the CLIST allows conditional processing, such as if the ISPFPROF data set does not exist, then allocate it.

You can use TSO ISRDDN again and specify member ISPFCL . The member was found, in four places (see the Member: below)

                           Current Data Set Allocations           Row 98 of 118
 Command ===>  _____________________                            Scroll ===> PAGE
                                                                                
   Message             Act DDname   Data Set Name   Actions: B E V M F C I Q   
  Member: ISPFCL      >_   SYSPROC  USER.Z31B.CLIST                             
                      >_            FEU.Z31B.CLIST                              
  Member: ISPFCL      >_            ADCD.Z31B.CLIST                             
                      >_            ISP.SISPCLIB
  Member: ISPFCL      >_            USER.Z31B.PROCLIB                           
                      >_            FEU.Z31B.PROCLIB                            
  Member: ISPFPROC    >_            ADCD.Z31B.PROCLIB                           
                      >_            ISM403.SFMNEXEC                             
                      >_            AUT430.SINGREXX                             
                      >_   SYSUADS  SYS1.UADS                                   
                      >_   SYSUDUMP ---------- JES2 Subsystem file -------------

The member is found in 4 places. You can browse a member by entering B in the >_

The first ISPFCL member has

PROC 0 VOL(B3SYS1)                                                       
 CONTROL NOMSG NOFLUSH ASIS                                              
PROFILE NOMODE MSGID PROMPT INTERCOM WTPMSG                              
WRITE *****************************************************************
...
FREE FILE(ISPPROF ISPTABL)                                   
SET &SDSFTAB= &STR(&SYSUID..SDSF.ISFTABL)                    
ALLOC DA('&SDSFTAB') SHR FILE(ISFTABL)                       
                                                             
SET &DSNAME = &STR(&SYSUID..&SYSNAME..ISPF.ISPPROF)          
ALLOC DA('&DSNAME') SHR FILE(ISPPROF)                        
ALLOC DA('&DSNAME') SHR FILE(ISPTABL)
IF &LASTCC ¬= 0 THEN DO      
/* Allocate the ISPF Prof dataset  */
...
END

The FREE FILE(ISPPROF ISPTABL) says drop (ignore) the existing definitions for ISPPROF and ISPTABL. The CLIST will reallocate them.
The ALLOC DA(‘&DSNAME’) SHR FILE(ISPTABL) allocates my dataset to the ISPTABL ddname.
The problem is that you cannot easily concatenate my data sets to the ISPTLIB concatenation. You can use the TSO ALLOCate command to allocate a list of data sets to a DDNAME, but not just to add one data set to an existing allocated DDNAME. See Adding a data set to an existing DDNAME in TSO.

Starting ISPF

When you logon to the TSO Logon panel it has

Command   ===> ex 'colin.zlogon.clist'

The command (if specified) will be processed after any command found in the PARM field of the EXEC JCL statement in your logon procedure.

You can specify ISPF, a clist, or other command.
If you want to invoke ISPF from your clist you will need to invoke the ISPF command for example

/* Rexx */                                                        
  trace r                                                         
  say "in colin.zlogon.clist"                                     
  address TSO                                                     
                                          
  "alloc fi(ISPTLIB) DA('COLIN.S0W1.ISPF.ISPPROF') SHR "          
  zl =userid.SDSF.isftabl  /* so we get colin.zlogon.clist */     
  if SYSDSN(zl) = OK then                                         
  do                                                              
   "alloc fi(isftabl) da('"zl"') shr reus"                        
  end                                                             
  req = "ALLOC FI(tmp) DA('COLIN.S0W1.ISPF.ISPPROF') SHR "        
  if bpxwdyn(req )  =0 then                                       
         call bpxwdyn "concat ddlist(ISPTLIB,tmp) "               
  "ispf"

With this, ISPF starts with my data sets allocated as I want them!

Adding a data set to an existing DDNAME in TSO.

I wanted to add a data set to the already allocated ISPTLIB concatenation. You can use the TSO ALLOCate command to allocate a list of data sets, but not to add a data set to an existing definition.

Lionel B. Dyck pointed me to the TSO function bpxwdyn.

When I logon to TSO I invoke a userid.ZLOGON.REXX data set

/* Rexx */                                                              
                                      
  address TSO                                                           
  userid = userid()                                                     
  dsn= userid".S0W1.ISPF.ISPPROF"                                                                  
  req = "ALLOC FI(tmp) DA('"dsn"') SHR "              
  if bpxwdyn(req )  =0 then                                             
         call bpxwdyn "concat ddlist(ISPTLIB,tmp) "                     
 "ispf"

The bpxwdyn(req ) allocates the dataset to the DDNAME TMP.
The call bpxwdyn “concat ddlist(ISPTLIB,tmp) copies the data set(s) in the tmp DDNAME to the end of the ISPTLIB DDNAME
ispf starts ISPF.

The TSO ISRDDN command gave me

                          Current Data Set Allocations           Row 68 of 122
Command ===>                                                  Scroll ===> CSR
                                                                              
 Volume   Disposition Act DDname   Data Set Name   Actions: B E V M F C I Q   
 B3RES1   SHR,KEEP   >    ISPTLIB  ISP.SISPTENU                               
...                          
 A4USR1   SHR,KEEP   >             COLIN.S0W1.ISPF.ISPPROF

Easy once you know how.

On the CBTAPE are KONCAT and CONCAT which do a similar function.

Using the Java Health centre for looking into Z/OSMF, MQWEB and other Liberty products.

The Java Health centre has an agent running in the JVM of interest, and there is Eclipse plug-in to display the data.

A Java server such as Liberty ( as used in z/OSMF, z/OSMF and MQWEB) can provide information on how the server is running. I was running MQWEB with Openj9, Java 21 (Semeru).

You need to configure the Liberty server and have something to process the data such as Health Center running on Eclipse.

You can display information in graphical time line format, such as

CPU used, system and application as used by the JVM
Which classes are being used
The environment – such as the parameters used to start the JVM
Garbage collection activity
I/O – number of files open, and open activity
Method profiling
Threads in use.

Configure the Eclipse

I installed Health Center from the Market place.

How to collect the data

You can configure the JVM in different modes:

headless – data is collected and written to the local file system
collect from the start – and view in Eclipse, this means you get all of the Java class loading activity
start collecting only after Eclipse has started, and connected to the JVM. I use this method. I start my server, and run a workload to “warm up the JVM” then use Eclipse to show the activity due to my testing.

Configure the JVM server

The options are listed here.

You can specify the JVM options on the command line or the jvm.options file.

You can specify them on the -Xhealthcenter:… statement, or as

-Dcom.ibm.diagnostics.healthcenter...=...

values. For example

-Xhealthcenter:level=off,readonly=off,jmx=on,port=1972

-Xhealthcenter:level=off
-Dcom.ibm.java.diagnostics.healthcenter.agent.port=1972 
-Dcom.ibm.diagnostics.healthcenter.jmx=on
-Dcom.ibm.diagnostics.healthcenter.readonly=on

To run headless

In the server

I added the following to my jvm.options

-Xhealthcenter:level=headless 
-Dcom.ibm.java.diagnostics.healthcenter.headless.delay.start=2 
-Dcom.ibm.diagnostics.healthcenter.headless=on 
-Dcom.ibm.java.diagnostics.healthcenter.data.collection.level=headless 
-Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/u/tmp/zowec/ 
-Dcom.ibm.diagnostics.healthcenter.readonly=on

Down load the files to your work station, and use File -> Load Data to process the files.

To run the Health centre in real time

In the server

-Xhealthcenter:level=off,readonly=off,jmx=on,port=1972 
-Dcom.ibm.diagnostics.healthcenter.logging.level=debug

Note the jmx=on and the port number. You need this for the Eclipse configuration. The level=off means do not start collecting data until the Health centre agent connects.

In Eclipse

File -> New Connection… -> Enable an application for monitoring -> Next.

On the Select connector panel I used

Once it worked, I enabled security.

Click Next

The Health Centre then starts searching at the specified port. I disable the Scan next 100 ports… When it manages to connect to the port, click Finish.

I initially had problems connecting to the server, see Why can’t I connect to a z/OS port?

It takes a few seconds to start the data collection, and start downloading the data.

Let the JVM warm up

The image below shows the CPU usage from the start of the server.

For the first 5 minutes, this is the JVM starting up with no workload. Afterwards the CPU used drops to a low value.

After 5 minutes, I started my workload. For the first 12 or so minutes the CPU is high, but after about 13 minutes it levels out. If you want to do any measurements of cost per transaction you should take them from this period. During the “warm up” period, the JVM is optimising the code etc.

The green line shows the system CPU usage. The red line (and grey area) shows the Application usage. We can see most of the CPU used is application usage.

The number of methods profiled is the JVM optimising the code. It takes the “hottest” classes and does those first… until all (most) of the classes are optimised.

Long term monitoring.

From this diagram you can see the JVM startup, the initial part of my test where the JVM was warming up, the remainder of the test, and the JVM overhead after the test.

You need to take all of these into consideration when running performance tests.

Running performance tests

I set up my Work Load Manager configuration to record the number of MQ transactions, and had a report class for the MQWEB server. From this I can calculate the cost per transaction.

Health centre agent logging

With

-Dcom.ibm.diagnostics.healthcenter.logging.level=finest

I had output in the STDERR output

[06:51:52] com.ibm.diagnostics.healthcenter.Agent FINE: System receiver, version 1.0 
[06:51:52] com.ibm.diagnostics.healthcenter.Agent FINE: /usr/lpp/java/J21.0_64//lib/libhcapiplugin.so, version 1.0                                                                                                                    
[06:51:52] com.ibm.diagnostics.healthcenter.java FINE: Health Center Agent 4.0.7 
06:51:53com.ibm.java.diagnostics.healthcenter.agent.mbean.HCLaunchMBean <init> 
INFO: Agent version "3.0.21.202109031203" 
06:51:56 com.ibm.java.diagnostics.healthcenter.agent.mbean.HCLaunchMBean startAgent 
INFO: Health Center agent running in off mode. 
06:51:56 com.ibm.java.diagnostics.healthcenter.agent.mbean.HCLaunchMBean startAgent 
INFO: Health Center agent started on port 1972.

and in STDOUT many

com.ibm.lang.management.OperatingSystemMXBean.getTotalPhysicalMemory()

I can’t automatically allocate a data set, and my SMS set up is not helping.

I’m running my little zD&T z/OS system on my laptop. I am the only person on this system, so I have to do every thing myself.

I started my MQ system last week, and now it is complaining that it cannot allocate archive logs. From my experience with MQ, I know this is serious. I know I have lots of space on my disks, so why can’t MQ use it.
I’ll go through the diagnostic path I took, which shows the SMS commands I used, and give the solution.

The blog post One minute SMS covers many of the concepts (and commands used).

The error messages

CSQJ072E %CSQ9 ARCHIVE LOG DATA SET 'CSQARC2.CSQ9.B0000002' HAS BEEN ALLOCATED TO NON-TAPE DEVICE AND CATALOGUED, OVERRIDING CATALOG PARAMETER                                    
IGD17272I VOLUME SELECTION HAS FAILED FOR INSUFFICIENT SPACE FOR DATA SET CSQARC2.CSQ9.A0000002 JOBNAME (CSQ9MSTR) STEPNAME (CSQ9MSTR) PROGNAME (CSQYASCP) 
REQUESTED SPACE QUANTITY = 120960 KB                                        
STORCLAS (SCMQS) MGMTCLAS (        ) DATACLAS (        )                    
STORGRPS (SGMQS SGBASE SGEXTEAV  )                                          
IKJ56893I DATA SET CSQARC2.CSQ9.A0000002 NOT ALLOCATED+                     
IGD17273I ALLOCATION HAS FAILED FOR ALL VOLUMES SELECTED FOR DATA SET      
CSQARC2.CSQ9.A0000002                                                       
IGD17277I THERE ARE (247) CANDIDATE VOLUMES OF WHICH (7) ARE ENABLED OR     
QUIESCED                                                                    
IGD17290I THERE WERE 3 CANDIDATE STORAGE GROUPS OF WHICH THE FIRST 3 814    
WERE ELIGIBLE FOR VOLUME SELECTION.                                         
THE CANDIDATE STORAGE GROUPS WERE:SGMQS SGBASE SGEXTEAV                     
IGD17279I 240 VOLUMES WERE REJECTED BECAUSE THEY WERE NOT ONLINE            
IGD17279I 240 VOLUMES WERE REJECTED BECAUSE THE UCB WAS NOT AVAILABLE       
IGD17279I 7 VOLUMES WERE REJECTED BECAUSE THEY DID NOT HAVE SUFFICIENT      
SPACE (041A041D)

Why is it using the storage class SCMQS?

From the ISMF panels,

option 7 Automatic Class Selection
option 5 Display – Display ACS Object Information

Gives a panel

   Panel  Utilities  Help                                                       
 ────────────────────────────────────────────────────────────────────────────── 
                               ACS OBJECT DISPLAY                              
 Command ===>                                                                  
                                                                                
 CDS Name  : ACTIVE                                                             
                                                                                
 ACS Rtn   Source Data Set ACS      Member    Last Trans  Last Date  Last Time
 Type      Routine Translated from  Name      Userid      Translated Translated
 --------  -----------------------  --------  ----------  ---------- ----------
 DATACLAS  SYS1.S0W1.DFSMS.CNTL     DATACLAS  IBMUSER     2019/12/17  15:21   
 MGMTCLAS  -----------------------  --------  --------    ----------  -----        
 STORCLAS  SYS1.S0W1.DFSMS.CNTL     STORCLAS  IBMUSER     2020/12/02  11:23    
 STORGRP   SYS1.S0W1.DFSMS.CNTL     STORGRP   IBMUSER     2019/12/17  15:23

So the ACS routine is in SYS1.S0W1.DFSMS.CNTL(STORCLAS)

This file has

PROC STORCLAS 
FILTLIST MQS_HLQ          INCLUDE(CSQ*.**, 
                                  CSQ.**, 
                                  MQS.**, 
                                  MQS*.**)
...
SELECT 
...
WHEN (&DSN = &MQS_HLQ) 
  DO 
    SET &STORCLAS = 'SCMQS' 
    EXIT CODE(0) 
  END 
...
END
END

This says for any data set name (&DSN) that match the list (&MQS_HLQ) whic has CSQ* or MQS*, then set the Storage class to ‘SCMQS’

What storage groups are connected with the MQ data set?

Member SYS1.S0W1.DFSMS.CNTL(STORGRP) has

...
WHEN (&STORCLAS= 'SCMQS')                             
  DO                                                  
    SET &STORGRP = 'SGMQS','SGBASE','SGEXTEAV'        
    EXIT CODE(0)                                      
  END                                                 
...

so these are the storage groups that MQ data sets will use.

What DASD volumes are in the storage group?

D SMS,SG(SGbase)                             
IGD002I 13:34:38 DISPLAY SMS 699             
                                                                
STORGRP  TYPE    SYSTEM= 1                                      
SGBASE   POOL            +                                      
  SPACE INFORMATION:                                            
  TOTAL SPACE = 29775MB USAGE% = 98 ALERT% = 0                  
  TRACK-MANAGED SPACE = 29775MB USAGE% = 98 ALERT% = 0

Hows there is 29775 M allocated -and it is 98% full.

D SMS,SG(SGMQS)                                                        
IGD002I 13:31:33 DISPLAY SMS 678           
                                                                                          
STORGRP  TYPE    SYSTEM= 1                                                                
SGMQS    POOL            +                                                                
  SPACE INFORMATION:                                                                      
    NOT AVAILABLE TO BE DISPLAYED                                                         
***************************** LEGEND *****************************                        
. THE STORAGE GROUP OR VOLUME IS NOT DEFINED TO THE SYSTEM                                
+ THE STORAGE GROUP OR VOLUME IS ENABLED                                                  
- THE STORAGE GROUP OR VOLUME IS DISABLED                                                 
* THE STORAGE GROUP OR VOLUME IS QUIESCED                                                 
D THE STORAGE GROUP OR VOLUME IS DISABLED FOR NEW ALLOCATIONS ONLY                        
Q THE STORAGE GROUP OR VOLUME IS QUIESCED FOR NEW ALLOCATIONS ONLY                        
> THE VOLSER IN UCB IS DIFFERENT FROM THE VOLSER IN CONFIGURATION                         
SYSTEM  1 = S0W1

There are no volumes allocated to this storage group.

What volumes are in the storage group?

D SMS,SG(SGBASE),LISTVOL                                             
IGD002I 13:39:07 DISPLAY SMS 705                                     
                                                                     
STORGRP  TYPE    SYSTEM= 1                                           
SGBASE   POOL            +                                           
  SPACE INFORMATION:                                                 
  TOTAL SPACE = 29775MB USAGE% = 98 ALERT% = 0                       
  TRACK-MANAGED SPACE = 29775MB USAGE% = 98 ALERT% = 0               
                                                                     
VOLUME UNIT MVS  SYSTEM= 1                               STORGRP NAME
B3USR1 0ADA ONRW         +                                 SGBASE    
USER0A                   +                                 SGBASE    
USER0B                   +                                 SGBASE    
USER0C                   +                                 SGBASE    
USER0D                   +                                 SGBASE    
USER0E                   +                                 SGBASE    
USER0F                   +                                 SGBASE    
USER00 0A9C ONRW         +                                 SGBASE    
USER01                   +                                 SGBASE    
USER02 0AB0 ONRW         +                                 SGBASE    
USER03 0ACE ONRW         +                                 SGBASE    
USER04 0AB2 ONRW         +                                 SGBASE    
USER05 0AB5 ONRW         +                                 SGBASE    
USER06 0A83 ONRW         +                                 SGBASE
...
+ THE STORAGE GROUP OR VOLUME IS ENABLED

How do I see how much space is available in my disks?

ISMF,

option 2 – Volume
option 1 – DASD

This gives a panel

                          VOLUME SELECTION ENTRY PANEL              Page 1 of 3
 Command ===>                                                                  
                                                                                
 Select Source to Generate Volume List  . . 2  (1 - Saved list, 2 - New list)  
   1  Generate from a Saved List         Query Name To                         
        List Name  . . COLIN             Save or Retrieve                       
   2  Generate a New List from Criteria Below                                  
        Specify Source of the New List  . . 1  (1 - Physical, 2 - SMS)         
        Optionally Specify One or More:                                        
        Enter "/" to select option      Generate Exclusive list                 
          Type of Volume List . . . 1          (1-Online,2-Not Online,3-Either)
          Volume Serial Number  . . USER*      (fully or partially specified)  
          Device Type . . . . . . .            (fully or partially specified)  
          Device Number . . . . . .            (fully specified)               
            To Device Number  . . .            (for range of devices)          
          Acquire Physical Data . . Y          (Y or N)                        
          Acquire Space Data  . . . Y          (Y or N)                        
          Storage Group Name  . . .            (fully or partially specified)  
          CDS Name . . . . . . .                                               
                                               (fully specified or 'Active')   
 Use ENTER to Perform Selection; Use DOWN Command to View next Selection Panel;
 Use HELP Command for Help; Use END Command to Exit.

        Enter "/" to select option      Generate Exclusive list                 
          Type of Volume List . . . 1          (1-Online,2-Not Online,3-Either)
          Volume Serial Number  . . *          (fully or partially specified)  
          Device Type . . . . . . .            (fully or partially specified)  
          Device Number . . . . . .            (fully specified)               
            To Device Number  . . .            (for range of devices)          
          Acquire Physical Data . . Y          (Y or N)                        
          Acquire Space Data  . . . Y          (Y or N)                        
          Storage Group Name  . . . SGBASE     (fully or partially specified)  
          CDS Name . . . . . . . 'ACTIVE'
                                               (fully specified or 'Active')

You can specify a Volume Serial prefix, a Storage Group Name, or a combination of both.

You need to select Acquire Physical Data, and Acquire Space Data.

You get output like

 LINE       VOLUME FREE       %     ALLOC      FRAG   LARGEST    FREE     
 OPERATOR   SERIAL SPACE      FREE  SPACE      INDEX  EXTENT     EXTENTS  ... ...
---(1)----  -(2)-- ---(3)---  (4)-  ---(5)---  -(6)-  ---(7)---  --(8)--  
            B3USR1   149186K     2   8165315K    375     34032K       36  
            USER00    67067K     1   8247434K    718      2490K      133  
            USER02    30601K     1   2740899K    412     11621K       31  
            USER03     3209K     0   2768291K    333      2213K        6  
            USER04   146198K     5   2625302K    280     42332K       19  
            USER05    64466K     2   2707034K      9     63802K        3  
            USER06   273304K    10   2498196K    177    105581K       14

Which shows I do not have much free space.

Add more space

As it looks like my storage group pools are low on disk space, I need to allocate more volumes.

See Adding more disk space to z/OS, creating volumes and adding them to SMS.

Once I added the volume to the SGBASE storage group, it usage went from

TOTAL SPACE = 29775MB USAGE% = 98 ALERT% = 0                      
TRACK-MANAGED SPACE = 29775MB USAGE% = 98 ALERT% = 0

TOTAL SPACE = 32482MB USAGE% = 89 ALERT% = 0                      
TRACK-MANAGED SPACE = 32482MB USAGE% = 89 ALERT% = 0

What CEA TSO operator commands are there?

Part of the CEA facility service on z/OS, provides the capability for an application to start TSO address spaces, send it TSO commands, and receive the responses. This is used by products lie z/OSMF. You can have a CEA TSO address spaces for a user, as well as a “normal” TSO userid, where you logon and use ISPF.

Change the CEA parameters F CEA,CEA=(x1,x2,…xN)
Display the CEA configuration parameters F CEA,D,P
Display a summary of CEA TSO regions F CEA,D,S
Display client summary F CEA,D,CLIENTSUMMARY and D CEA,CLIENT=*
Display the session information F CEA,DIAG,SESSTABLE

More information about the commands

Change the CEA parameters F CEA,CEA=(x1,x2,…xN)

Display the CEA configuration parameters F CEA,D,P

STATUS: ACTIVE-FULL      CLIENTS: 0  INTERNAL: 0            
CEA = (00)                                                  
SNAPSHOT           = N                                      
HLQLONG            = CEA         HLQ          =             
BRANCH             =             COUNTRYCODE  =             
CAPTURE RANGE FOR SLIP DUMPS:                               
LOGREC             = 01:00:00    LOGRECSUMMARY= 04:00:00    
OPERLOG            = 00:30:00                               
CAPTURE RANGE FOR ABEND DUMPS:                              
...                           
CAPTURE RANGE FOR CONSOLE DUMPS:                            
...                          
TSOASMGR:                                                   
RECONSESSIONS      = 0           RECONTIME    = 00:00:00    
MAXSESSIONS        =   50        MAXSESSPERUSER=   10

Display a summary of CEA TSO regions F CEA,D,S

STATUS: ACTIVE-FULL      CLIENTS: 0  INTERNAL: 0         
EVENTS BY TYPE:  #WTO: 0  #ENF: 0  #PGM: 0               
TSOASMGR:     ALLOWED: 50  IN USE: 1 HIGHCNT: 0

Display client summary F CEA,D,CLIENTSUMMARY and D CEA,CLIENT=*

STATUS: ACTIVE-FULL      CLIENTS: 0  INTERNAL: 0                   
EVENTS BY TYPE:  #WTO: 0  #ENF: 0  #PGM: 0                         
TSOASMGR:     ALLOWED: 50  IN USE: 1 HIGHCNT: 0                    
NO CLIENTS KNOWN TO CEAS AT THIS TIME                              
12I CN=L700     DEVNUM=0700 SYS=S0W1

Display the session information F CEA,DIAG,SESSTABLE

INDEX=0001 USERID=COLIN    APPID=IZUCONAP ASID=004E MSGQID=00060018                       
 COUNT=0001 ASCBADDR=FC3B80 STOKEN=0000013800000009 STTIME=15:34:43.966                   
 LRTIME=15:34:43.967 LOGONPROC=IZUFPROC GROUP=         REGION=50000                       
 CODEPG=1047 CHARSET=697 ROWS=204 COLS=160 RECONN=N RCTIME=00:00:00.000                   
 ACCT=ACCT#                                                                               
 HOST REMOTESYS=         REMOTEQID=00000000 CALLERSYS=

This shows information like the TSO LOGON procedure used, the screen size,the region size and the account number.

Mapping a certificate to a userid and so avoid needing a password is good – but…

You can use the RACDCERT MAP command to map a certificate to a userid, and so avoid the need for specifying a password. Under the covers code uses the pthread_security_np and pass a certificate, or a userid and password, and if validated, the thread becomes that userid, just the same as if the userid was logged on.

Is this secure?

If you store a userid and password on your laptop, even though the data may be “protected” someone who has access to your machine may be able to copy the file and so impersonate you.

With a public certificate and private key, if someone can access your machine, they may be able to copy these files and so impersonate you.

You can get dongles which you plug into your laptop on which you can store protected data. In order to use the data, you need the physical device.

You need to protect the RACF command

Because the RACFCERT command has the power to be dangerous, you need to protect it.

You do not want someone to specify their certificate maps to a powerful userid, such as SYS1. The documentation says

To issue the RACDCERT MAP command, you must have the SPECIAL attribute or sufficient authority to the IRR.DIGTCERT.MAP resource in the FACILITY class for your intended purpose.

For a general user to create a mapping associated with their own user ID they need READ access to IRR.DIGTCERT.MAP.

For a general user to create a mapping associated with another user ID or MULTIID, they need need UPDATE access to IRR.DIGTCERT.MAP.

What’s the best way to set this up?

I think that as part of your process for setting up userids, the process should create the mapping for the certificate to a userid. This way you do not have people creating the mapping. If a mapping already exists, you cannot create another mapping.

You may want an automated process which checks the approval, and issues the commands, and so you do not have humans with the authority to issue the commands.

Of course you’ll have a break-glass all powerful userid in case of emergencies.

But….

Even though the password had expired, I could logon using the certificate. If I revoked the userid the logon failed.

I used certificate logon from z/OSMF and issued console commands. The starts a TSO address space, and z/OSMF passes the commands and responses to the tso address space.

Once a TSO address space has been started, there are no more checks to see if the userid is still valid.

If you want to inactivate the userid, you’ll need to revoke it, and then cancel all the TSO address spaces running on behalf of the userid. Walking someone off site is not good enough. There may be scripts which are automated, and will logon with no human intervention.
TSO address spaces may be configured to be cancelled if there is no activity. If the TSO address space is kept busy, (for example by sending it requests) it may never be forced off.

Getting a CTRACE

Component TRACE (CTRACE) is the z/OS system trace capability for z/OS components. Most z/OS components use it.

From “capturing a trace” perspective, there are two aspects.

Capturing the trace data
- The trace can be an in-memory trace, which is available when a dump is taken. This is is often the default. For example by default a trace is enabled to capture errors, and the in-memory trace is used.
- You can have a trace writer started task which writes to a data set. When you start the trace you give the name of the started task. Data is passed to the trace writer job. You can then use the trace data set in IPCS.
Enabling the trace for the component. Usually there are options you can specify, for example all entries, or just error entries and how big the in-memory trace should be.

To trace a z/OS component, you need to know the CTRACE component name, and what you want to trace.

I tried to capture a CTRACE of a z/OS component, and struggled, because I didn’t know the name of the component.

What are the trace component names?

The z/OS command

TRACE STATUS

gave

IEE843I 16.10.22  TRACE DISPLAY 940                               
        SYSTEM STATUS INFORMATION                                 
 ST=(ON,0001M,00005M) AS=ON  BR=OFF EX=ON  MO=OFF MT=(ON,064K)    
 COMPONENT MODE  COMPONENT MODE  COMPONENT MODE  COMPONENT MODE   
 --------------------------------------------------------------   
 CSF       ON    NFSC      ON    SYSGRS    MIN   SYSANT00  MIN    
 SYSJES2   SUB   SYSRRS    MIN   SYSIEAVX  MIN   SYSSPI    OFF    
 SYSJES    SUB   SYSHZS    MIN   SYSSMS    OFF   SYSAXR    MIN    
 SYSDLF    MIN   SYSOPS    MIN   SYSXCF    MIN   SYSDUMP   ON     
 SYSLLA    MIN   SYSXES    ON    SYSUNI    OFF   SYSCATLG  MIN    
 SYSTTRC   OFF   SYSTCPDA  SUB   SYSRSM    SUB   SYSAOM    MIN    
 SYSVLF    MIN   SYSTCPIP  SUB   SYSLOGR   ON    SYSOMVS   MIN    
 SYSCEA    MIN   SYSWLM    MIN   SYSTCPIS  SUB   SYSTCPRE  SUB    
 SYSIOS    MIN   SYSANTMN  MIN   SYSDMO    MIN   SYSIEFAL  ON     
 SYSTCPOT  SUB

I was after a CEA trace, and from the above, the name is SYSCEA. It is MIN, so is already active.

What is the trace’s status?

d trace,comp=SYSCEA

gave me

COMPONENT     MODE BUFFER HEAD SUBS                           
------------------------------------------------------------- 
SYSCEA        MIN  0002M                                      
   ASIDS      *NONE*                                          
   JOBNAMES   *NONE*                                          
   OPTIONS    ERROR                                           
   WRITER     *NONE*

So it is active, capturing errors, and writing to the in-memory trace (because there is no WRITER). I recognised the options as the defaults in parmlib member CTICEA00.

I had my own trace writer started task

Member CTWTR in proclib

//CTWTR PROC                                                                  
//DELETE  EXEC PGM=IEFBR14                                                    
//TRCOUT01 DD DSNAME=IBMUSER.CTRACE1,                                         
// SPACE=(CYL,(10),,CONTIG),DISP=(MOD,DELETE)                                 
//*                                                                           
//IEFPROC EXEC PGM=ITTTRCWR,TIME=999                                          
//TRCOUT01 DD DSNAME=IBMUSER.CTRACE1,                                         
// SPACE=(CYL,(10),,CONTIG),DISP=(NEW,CATLG)                                  
//SYSPRINT DD SYSOUT=*

I started my CTRACE writer

TRACE CT,WTRSTART=CTWTR

I created my own member CTICEACP in parmlib

TRACEOPTS 
   ON 
   BUFSIZE(20m) 
   OPTIONS('ALL') 
   WTR(CTWTR)

The WTR ties up with my CTRACE writer started task name.

Stop the current trace

TRACE CT,OFF,COMP=SYSCEA

Start the CEA trace using my member

TRACE CT,ON,COMP=sysCEA,PARM=CTICEACP

Run the test

Stop the CEA trace

TRACE CT,OFF,COMP=SYSCEA

Stop the trace writer

TRACE CT,WTRSTOP=CTWTR

The output from the CTWTR task gave me

IEF196I IEF142I CTWTR CTWTR - STEP WAS EXECUTED - COND CODE 0000         
IEF196I IGD104I IBMUSER.CTRACE1                              RETAINED,   
IEF196I DDNAME=TRCOUT01

which gives me the name of the data set IBMUSER.CTRACE1.

Use IPCS to look at the trace

option =0 to specify the name of the data set
=6
dropd
The above command clears out any old information about the data set
CTRACE COMP(SYSCEA) full

I had some data in the trace – but not for the problem I had…. so I need to try something else.

The advanced class.

You do not need to have a member in parmlib. You can use

TRACE CT,ON,COMP=SYSCEA

and do not specify a PARM. This will then prompt for the parameters, asid, jobname, writer and options.