One minute mvs: data set file system (/dsfs) on z/OS

There is a good overview of dsfs here.

My understanding of how dsfs works is that you can access z/OS datasets, members and spool, through a Unix file system interface, and so use commands like “ls”. When you read a member, it is read into the dsfs file system, and the Unix commands work on this data. When you write to the Unix file, it is written to the dsfs file system, and when the file is closed, the data is written to the dataset.

Some information is cached in the dsfs file system, and some definitions, such as DCB information on a per userid basis is also stored.

DSFS is defined to OMVS as a Physical File System. Part of the definition of the PFS is the started task name. When the PFS is defined to OMVS, it starts the DSFS started task.

To stop it, you issue a command to OMVS to stop the PFS.

Permissions

The dsfs started task has threads which do work. When you issue a request, one of the threads becomes your userid and accesses the data sets, so there is no change to the standard RACF data set protection.

User defaults

You can define defaults for when data sets are created. These are user dependant. To issue these for another user, you have to become a super user to become that id, and then issue the dfsadm command.

Changing configuration

You can change the configuration file and restart DSFS (which will be disruptive).

You can make changes to the active system (which are not saved) using the omvs command

Diagnosis

Messages are produced, and are documented in the M&C manual. if you get a reason code, you can use bpxmtext code to display the meaning of the code.

One minute mvs: catalogs and datasets.

In days of old, when 64KB was a lot of real storage, to reference a data set you had to specify the data set name and the DASD volume the data set was on. DSN=MY.JCL,VOL=SER=USER00

After this, the idea of a catalog was developed. Just like the catalog in a library, it told you where things were stored. When you created a data set, and specified DISP=(NEW,CATLG), the data set name and volume were stored in the catalog. When you wanted to use a data set, and did not specify the volume, then the catalog was used to find the volume for the data set.

As systems grew and the number of data sets grew, the catalog grew and quickly became difficult to manage. For example if you deleted data sets, the entry was logically removed from the catalog, resulting in gaps in the catalog.

After this a multi level catalog was developed. You have one master catalog. You can have many user catalogs. You define an alias in the master catalog saying for data sets starting with a specific high level qualifier, use that user catalog.

When a userid was created, most system programmers would also create an alias pointing to a user catalog. They may define a user catalog for each user, or a user catalog could be shared by many aliases.

The catalogs are managed by the VSAM component of z/OS.

Entity naming

PDS and sequential files are referred to as datasets. VSAM provides simple database objects,

  • Relative Record ( where you say get me the n’th record)
  • Key sequence. You define a primary index, and you can an index on different columns using an ALTERNATIVE INDEX. 

VSAM uses the term cluster to what you use in you JCL or application. A cluster has a data component, and zero or more index components.

Moving systems.

I have been running on a self contained ADCD system at z/OS level 2.4. I have recently installed a self contained system at z/OS 2.5. How do I get my data sets into the new system?
You can import connect a user catalog into a new (master) catalog, and define an alias in the new master catalog pointing to the user catalog.

When I did this I could then see my COLIN.* data sets. To be able to use the data sets, I need the volumes to be attached to the z/OS system.

Useful IDCAMS commands

In batch you use the IDCAMS program (IDC = prefix for VSAM, AMS is for Access Management Services!)

If you do not specify a catalog, it defaults to the master catalog.

Create a user catalog

//IBMDF  JOB 1,MSGCLASS=H                                   
//S1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE USERCATALOG -
( NAME('A4USR1.ICFCAT') -
MEGABYTES(15 15) -
VOLUME(A4USR1) -
ICFCATALOG -
FREESPACE(10 10) -
STRNO(3 ) ) -
DATA( CONTROLINTERVALSIZE(4096) -
BUFND(4) ) -
INDEX(BUFNI(4) )
/*

Creating an alias to use the catalog

The JCL below creates two aliases in the master catalog. They both point to a catalog called A4USR1.ICFCAT (which is in the master catalog)

//IBMUSERT JOB 1,MSGCLASS=H                                     
//S1 EXEC PGM=IDCAMS,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE ALIAS (NAME(BACKUP) RELATE('A4USR1.ICFCAT') )
DEFINE ALIAS (NAME(COLIN ) RELATE('A4USR1.ICFCAT') )
/*

To import an existing user catalog into a master catalog

//IBMUSERT JOB 1,MSGCLASS=H                                   
//S1 EXEC PGM=IDCAMS,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
IMPORT CONNECT -
OBJECTS -
(('A4USR1.ICFCAT' VOLUME(A4USR1) DEVICETYPE(3390) -
))
/*

From the information provided, the master catalog knows the name, and location of the user catalog.

List an entry

You can list information about an entry, such as a data set, or a catalog using the LISTCAT command

LISTCAT ENT(COLIN.USERS) ALL

Listing aliases

You can use the IDCAMS command LISTCAT with alias

LISTCAT ALIAS

which gives a one line per entry list of all of the aliases

ALIAS --------- ADCDA       
ALIAS --------- ADCDB
ALIAS --------- ADCDC
ALIAS --------- ADCDD
...

LISTCAT ALIAS ALL

gives

ALIAS --------- ADCDA                                                       
...
ENCRYPTIONDATA
DATA SET ENCRYPTION-----(NO)
ASSOCIATIONS
USERCAT--USERCAT.Z24C.USER
ALIAS --------- ADCDB
...
ENCRYPTIONDATA
DATA SET ENCRYPTION-----(NO)
ASSOCIATIONS
USERCAT--USERCAT.Z24C.USER

So we can see that the the alias ADCDA maps to user catalog USERCAT.Z24C.USER

Listing a catalog

The command

LISTCAT CATALOG(USERCAT.Z24C.USER)

gives

CLUSTER ------- 00000000000000000000000000000000000000000000       
DATA ------- USERCAT.Z24C.USER
INDEX ------ USERCAT.Z24C.USER.CATINDEX
NONVSAM ------- ADCDA.S0W1.ISPF.ISPPROF
NONVSAM ------- ADCDA.S0W1.SPFLOG1.LIST
NONVSAM ------- ADCDB.S0W1.ISPF.ISPPROF
...
CLUSTER ------- SYS1.VVDS.VC4CFG1
DATA ------- SYS1.VVDS.VC4CFG1

Which shows there is a data component of the catalog called USERCAT.Z24C.USER, and there is index component called USERCAT.Z24C.USER.CATINDEX.

The catalog has a data component (USERCAT.Z24C.USER) and an index component USERCAT.Z24C.USER.CATINDEX.

Within the catalog are entries for data sets such as ADCDA.S0W1.ISPF.ISPPROF, and system (DFDSS) dataset SYS1.VVDS.VC4CFG1 - which contains information what is on the SMS DASD volume C4CFG1.

One minute MVS: What is IBM Multi Factor Authentication on z/OS?

Most people are familiar with Multi Factor Authentication (MFA). For example when accessing a banking site through the internet, you have a digital code sent to your phone which you enter in the web page.

There is a phrase associated with MFA. Something you know, something you have. When using internet banking, you use a userid and password(something you know) and the 6 digit code sent to your phone (something you have). At airports, the staff use a badge to get access to secure areas. They swipe the badge (something you have) and have to enter a 4 digit code (something you know).

In-band and out-of-band

With some applications you enter two factors to logon to the application. For example, I can logon to TSO with a “password” 983211:passw0rd - where 983211 is a one time code (which changes every minute) and passw0rd is my password. This is in-band (you enter the combined password >IN<to the application).

I can use a certificate to logon to a web page, get a one time password and enter that into the TSO logon screen. This is indirect, or out-of-band authentication, you set up the password >OUT<side of the application.

What is available on z/OS?

You can set up one-time-codes, or password (or pass phrase), or one-time-code and password (or pass phrase). A password can be up to 8 characters. A pass phrase must be between 14 to 100 characters in length (inclusive).

You can get a one-time-code from several sources:

  • A small hardware device, which you can hang on your keyring
  • Generated from software. I use the IBM Security Verify application on my Android phone. There are other applications, such as Google Authenticator and Duo Mobile, but the code generated by these was not accepted by z/OS. See below.

To set up the mobile phone application you logon on the IBM MFA web browser on your z/OS and get a QR code displayed. This code contains a secret, and other information such as algorithm=SHA256, period=60 seconds, digits=6. The app on your phone reads this and stores the information. When ever you use the app, it displays a code which you enter on z/OS. The value is time limited and expires after a short interval, typically 30 or 60 seconds.

I configured MFA to use just the TOTP (One Time Password). When I logged on to TSO with userid TOTP and the code I got

ICH70008I IBM MFA Message:
AZF1105I TOTP PASSCODE ACCEPTED
ICH70001I TOTP LAST ACCESS AT 07:37:37 ON SATURDAY, JANUARY 6, 2024

When I configured MFA to require TOTP and userid, I had to enter a password like 345112:PASSW0RD, where 345112 was the one time code from the application.

You configure the MFA on a per userid basis. I set up MFA for a new userid called TOTP, and this has to logon with two factors. Another userid only has to logon with the password.

The IBM Security verify application worked out of the box.

With applications Duo Mobile, and Google Authenticator I got message

AZF5042E Preflight saw invalid account metadata

because they provided an invalid code. The applications only worked with the following system wide options

  • Digest Algorithm . . . . . 1 (SHA-1)
  • Token Code Length . .. . 1 (6-digit)
  • Token Period. . . . . . . . . 2 (30 seconds)

See this page for more information.

Yubikey

A Yubikey is a small USB device from which you can get a one time password. I found the site and what you need to order confusing, and purchased the wrong device. On one of the Yubikey pages it compares the different devices. I needed a Yubikey 5 series; I had (wrongly) purchased a Yubikey Security key Series. When the new key arrives I’ll write up how to use it.

Using a certificate

To make your certificate known to the MFA instance, you logon to a web page using TLS. The web browser port has been secured using AT-TLS. When you logon to the web page, the web browser displays the list of valid certificates for you to choose. After you have selected one, the application running in the web server can extract information from the certificate, and update the userid information in the MFA profile for the userid.

You set up an “out-of-bounds policy” saying use which authentication method (use certificates) and how long the password is valid for (60 seconds).

You configure the userid to be able to use the policy.

To be able to logon, the userid logs on to a different web page ../mfa/mypolicy using the same certificate and enters the userid. A TLS handshake is done to the server (validating the certificate), and a password is returned. You enter the password in your application.

One minute MVS – Using individual data set encryption on z/OS.

Overview

You can have full disk encryption. This prevents the disk from being read if it is removed from the environment. The disk subsystem requests the keys from a key manager, not z/OS, as the disk subsystem is doing the encryption and decryption. The keys are requested at power on of the disk subsystem.

On z/OS you can have data set encryption. The data set contents are encrypted on disk. Each data set could have a unique encryption key. Users on the system need permission to read the data set, and need access to the encrypt key.

If a userid is permitted to the data set, and to the encryption key, the userid has access to the data and can read and write it, the same was as if the data set was not encrypted.

Once you have set up the definitions, they are used when the data set is created. To encrypt a data set, you can…

  • Create a new (encrypted) dataset
  • Copy the old to the new.
  • Delete the old, and rename the new to old.

If you delete the key, then the data is not accessible unless you have a backup of the key – or you have a copy of the key on another system.

This encryption does not apply to files in Unix System Services, because these are not RACF protected.

MQ 9.2 and later supports encryption, including for page sets and log data sets. See here. DB2 can use data set encryption for its page sets and logs, see here.

Topics

Implementation

You create an encryption key using the ICSF component on z/OS.

ISPF interface

If you are using the ICSF ISPF interface use options use : 5 Utility, 5 CKDS Keys, 7 Generate AES DATA keys. In the field Enter the CKDS record label for the new AES DATA key enter a memorable name. In the Red book, it uses a prefix of DATASET.name , I used COLINAES.

In AES key bit length: select 256 – other values give errors.

Batch interface

Use the operator command d icsf,kds to display the current datasets being used by ICSF. It gave me CSF.CSFCKDS.NEW .

The JCL below deletes the key, and creates a new key. It then refreshes the in memory data. (Once you delete the key, any data sets which used it cannot be read).

//IBMICSF  JOB 1,MSGCLASS=H 
//STEP10 EXEC PGM=CSFKGUP 
//  SET CKDS=CSF.CSFCKDS.NEW 
//CSFCKDS DD DISP=OLD,DSN=&CKDS 
//* LENGTH(32) GENERATES A 256 BIT KEY 
//CSFIN DD *,LRECL=80 
DELETE TYPE(DATA) LABEL(COLINBATCHAES ) 
ADD TYPE(DATA) ALGORITHM(AES), 
LABEL(COLINBATCHAES          ) LENGTH(32) 
/* 
//CSFDIAG DD SYSOUT=*,LRECL=133 
//CSFKEYS DD SYSOUT=*,LRECL=1044 
//CSFSTMNT DD SYSOUT=*,LRECL=80 
//* Refresh the in memory data
//REFRESH  EXEC PGM=CSFEUTIL,PARM='&CKDS,REFRESH' 

This gave

CSFG0321 STATEMENT SUCCESSFULLY PROCESSED.
CSFG0780 A REFRESH OF THE IN-STORAGE CKDS IS NECESSARY TO ACTIVATE CHANGES MADE BY KGUP.

and the refresh gave

CSFU002I CSFEUTIL COMPLETED, RETURN CODE = 0, REASON CODE = 0

Security profiles

The encryption information is used when the data set is created. This can be specified in JCL, VSAM DEFINE, or in the DFP extension of a dataset RACF profile.

Create and use the encryption key profiles

Use batch TSO. The statements below:

  • Uses SET to define the variable, as it is used in several places
  • Delete the old profile (there is no define replace)
  • Create the profile
  • Give userid IBMUSER read access to the profile
  • Refreshes the RACLIST information
  • Alters the data sets profile to set the DFP segment to use the key just defined
//IBMRACF2 JOB 1,MSGCLASS=H 
// EXPORT SYMLIST=*
// SET KEY=COLINAES
//S1 EXEC PGM=IKJEFT01,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *,SYMBOLS=JCLONLY
RDELETE CSFKEYS &KEY
RDEFINE CSFKEYS &KEY +
ICSF(SYMCPACFWRAP(YES) SYMCPACFRET(YES)) +
UACC(NONE)
PERMIT &KEY +
CLASS(CSFKEYS) ID(IBMUSER ) +
ACCESS(READ)
SETROPTS RACLIST(CSFKEYS) REFRESH

RLIST CSFKEYS &KEY AUTHUSER ICSF


ALTDSD 'COLIN.ENCR.*' UACC(NONE) +
DFP(DATAKEY(&KEY))

/*
//* LISTCAT ENTRIES('COLIN.ENCR.DSN') ALL

This encryption information is only used when a data is created.

If you use LISTCAT, it will show old information, until the data set is recreated.

More set up

When I tried creating a data set with the encryption label I got

IEF344I IBMRACF2 S3 DD2 - ALLOCATION FAILED DUE TO DATA FACILITY SYSTEM ERROR           
IGD17157I DSNTYPE BASIC IS NOT A SUPPORTED DATA SET TYPE FOR ENCRYPTION
BECAUSE STGADMIN.SMS.ALLOW.DATASET.SEQ.ENCRYPT IS NOT DEFINED
IGD17151I ALLOCATION FAILED FOR DATA SET
COLIN.ENCR.DSN BECAUSE A KEY LABEL IS
SPECIFIED FOR AN UNSUPPORTED DATA SET TYPE.

See Specifying a key label for a non-extended format data set.

You can either use an SMS DC with Extended Format specified, or define the RACF resource

STGADMIN.SMS.ALLOW.DATASET.SEQ.ENCRYPT.

TSO RDEFINE FACILITY STGADMIN.SMS.ALLOW.DATASET.SEQ.ENCRYPT
TSO SETR RACLIST(FACILITY) REFRESH

Use the definitions with a dataset

You can specify the encryption key reference in

  • JCL using DSKEYLBL
  • Via a RACF data set profile and the DFP extension
  • DEFINE IDCAMS, with KEYLABEL(MYLABEL)
  • SMS definitions

If there is no DFP segment to the RACF profile you can use

//SYSUT2 DD   DSN=COLIN.ENCR.DSN,SPACE=(CYL,(1,1)), 
//       DSKEYLBL=COLINBATCHAES, 
//       DISP=(MOD,CATLG), 
//       DCB=(RECFM=FB,LRECL=80,BLKSIZE=800) 

In the JCL output it has

IGD17150I DATA SET COLIN.ENCR.DSN IS ELIGIBLE FOR ACCESS METHOD ENCRYPTION. KEY LABEL IS (COLINBATCHAES)

LISTCAT output gave

LISTCAT ENTRIES('COLIN.ENCR.DSN') ALL                           
NONVSAM ------- COLIN.ENCR.DSN                                  
     IN-CAT --- A4USR1.ICFCAT                                   
     HISTORY                                                    
       ...  
     SMSDATA                                                    
      ... 
     ENCRYPTIONDATA                                             
       DATA SET ENCRYPTION----(YES)
       DATA SET KEY LABEL-----COLINBATCHAES                              

Doing interesting things with encrypted data sets

You can use DFDSS to copy the encrypted dataset, without decrypting it. Any encryption parameters are copied to the new data set.

You need access to the CSFKEYS profile.

The JCL below

  • Deletes the old data set
  • Copies from COLIN.ENCR.DSN creating the output renaming COLIN to ADCD
  • List the catalog for the output data set
//IBMDFDSS JOB 1,MSGCLASS=H                                       
//S1 EXEC PGM=IEFBR14,REGION=0M                                  
//SYSPRINT DD SYSOUT=*                                            
//DDOLD DD DSN=ADCD.ENCR.DSN,SPACE=(CYL,(1,1)),DISP=(MOD,DELETE) 
//* 
//S1  EXEC PGM=ADRDSSU,REGION=0M PARM='TYPRUN=NORUN'              
//SYSPRINT DD SYSOUT=*                                            
//SYSIN DD *                                                      
 COPY  -                                                          
    DATASET(INCLUDE(COLIN.ENCR.DSN))       -               
    REPLACE  -                                                    
    RENUNC(ADCD )                                                 
/*                                                                
//S1  EXEC PGM=IKJEFT01,REGION=0M                                 
//SYSPRINT DD SYSOUT=*                                            
//SYSTSPRT DD SYSOUT=*                                            
//SYSTSIN DD *                                                    
LISTCAT ENTRIES('ADCD.ENCR.DSN') ALL                              
/*                                                                

The userid(COLIN) that ran this job had permission to read the data set, and has access to the key.

The output data set has

ENCRYPTIONDATA                      
  DATA SET ENCRYPTION----(YES)      
  DATA SET KEY LABEL-----COLINBATCHAES   

The data set has been copied encrypted with the same key as the original data set.

You can print the encrypted data in the file using DFDSS (ADRDSSU) PRINT DATASET(..) command.

Other questions

Does the size of the data set change?

It looks like the encrypted dataset is the same size as the unencrypted data set.

What happens if I delete and recreate the encryption key?

DFDDSS COPY worked (RC 0) – but gave a message IEC143I 213-91 which means the label points to a different key.

If I try to read the copied data set, I get the same message. The data set can be copied, but cannot be decrypted.

The key has been thrown away, and the contents are unreadable unless you have a backup of the key, or have a copy of the key on another system.

One minute MVS: DLLs and load modules

This is one of a series of short blog posts on z/OS topics. This post is about using Dynamic Link Library. It follows on from One minute MVS: Binder and loader which explains some of the concepts used.

Self contained load modules are good but…

An application may have a main program and use external functions bound to the main program. This is fine for the simple case. In more complex environments where a program loads external modules this may be less effective.

If MOD1 and MOD2 are modules and they both have common subroutines MYSUBS, then when MOD1 and MOD2 are loaded, they each have a copy of MYSUBS. This can lead to inefficient use of storage (and not as fast as one copy which every thing uses).

You may not be responsible for MYSUBS, you just use the code. If they are bound to your load module, then in order to use a later version you will have to rebind your load modules. This is not very usable.

Stubs are better

There are several techniques to get round these problems.

In your program you can use a stub module. This sits between your application and the code which does all of the work.

  • For example for many of the z/OS provided facilities, the stub modules chain through z/OS control blocks to find the address of the routine to use.
  • You can have the subroutine code in a self-contained load module. Your stub can load the module then branch to it. This way you can use the latest available version of code. For example MQCONN loads a module, other MQ verbs use the module, MQDISC releases the load module.

If the stub loads a module, then other threads using the stub just increment the in-use count and do not have to reload the module (from disk). This means you do not have multiple copies of the load module in memory.

The Dynamic Link Library support does this load of the module for you, is it a build option rather than change your code.

Creating a DLL Object

When you compile the C source you need to have options DLL and EXPORTALL. (The documentation says it is better to use #PRAGMA EXPORT(name) for each entry, rather than use the EXPORTALL option – but this means making a small change to your program).

When you bind the module you need parameter DYNAM=DLL, and specify //BIND.SYSDEFSD DD… pointing to a member, for example COLIN.SIDE(CERT)

The member will have information (known as a side deck) with information about the functions your program has exposed. For example

 IMPORT DATA,'CERT','ascii_tab' 
 IMPORT DATA,'CERT','other_tab'  
 IMPORT CODE,'CERT','isbase64'                       
 IMPORT CODE,'CERT','printHex'                    
 IMPORT CODE,'CERT','UnBase64'                    f

Where

  • IMPORT …. ‘CERT’ says these are for load module CERT
  • IMPORT DATA – these labels are data constants
  • IMPORT CODE – these are functions within CERT

If this was a 64 bit program, it would have IMPORT CODE64, and IMPORT DATA64.

If you compile in Unix Services you get a side-deck file like /u/tmp/console/cert.x containing

IMPORT DATA64,'cert.so','ascii_tab'                                         IMPORT CODE64,'cert.so','isbase64'
...

and the load module /u/tmp/console/cert.so.

The .x file is only used by the binder. The .so file is loaded and used by programs.

When your program tries to use one of these functions, for example isbase64, module CERT(cert.so) is loaded and used. This means you need the library containing this module to be available to the job.

Binding with the DLL

Instead of binding your program with the CERT object. You bind it with the side deck and use the equivalent to INCLUDE COLIN.SIDE(CERT).
The binder associates the external reference isbase64 with the code in load module CERT.

Conceptually the isbase64 reference, points to some stub code which does

  • Load module CERT
  • Find entry point isbase64 within this module
  • Replace the address of the isbase64 external reference, so it now points to the real isbase64 entry point in module CERT.
  • Branch to the real isbase64 code.

The next time isbase64 is used – it goes directly to the isbase64 function in the CERT module.

Using DLLs

You can use DLLs in Unix, or normal address spaces.

For a program running in Unix services you can use the C run time functions

  • dlopen(modules_name)
  • dlclose(dlHandle) release the module
  • dlsym(dlHandle,function_name|variable_name) to get information about a function name or variable
  • dlerror() to get diagnostic information about the last dynamic link error

One minute MVS: Binder and loader

This topic is in the series of “One minute MVS” giving the essentials of a topic.

Your program

The use of functions or subroutines are very common in programming. For a simple call

x = mysub()

which calls an external function mysub has generated code like

MYPROG  CSECT     
     L 15,mysub the function 
     LA 1,PARMLIB
     BASR  14,15 or BALR in older programs 
...
mysub  DC  V(MYSUB)

where

  • MYPROG is an entry point to the program
  • the mysub variable defines some storage for the external reference(ER) to MYSUB.

The output of the assembler or compiler is a file or dataset member, known as an an “object deck” or “object file”. It cannot be executed, as it does not have the external functions or subroutines.

The binder (or linkage editor)

The binder program takes object decks, includes any subroutines and external code and creates a load module or program object.

In early days load modules were stored in PDS datasets. In the directory of a member was information about the size of the load module, and the entry point. As the binder got more sophisticated, the directory did not have enough space for all of the data that was created. As a result PDSE (Extended PDSs) were created, which have an extendable directory entry. For files in Unix Services Load modules are stored in the the Unix file system.

The term Program Object is used to cover load modules and files in the Unix file system. I still think of them both as Load Modules.

The binder takes the parts needed to create the program object, for example functions you created and are stored in a PDS or Unix, and includes code, for example for the prinf() function. These are merged into one file.

Pictorially the merged files look like

  • Offset 0 some C code.
  • Offset 200 MYPROG Object
    • Offset 10 within MYPROG, MYPROG entry point (so offset 210 from the start of the merged files)
    • Offset 200 within MYPROG, mysub:V(MYSUB)
    • Offset 310 within MYPROG end of MYPROG
  • Offset 512 FUNCTION1 object
  • Offset 800 MYSUB1 Object
    • Offset 28 within MYSUB1, MYSUB entry point
    • Offset 320 within MYSUB1, end of MYSUB

The binder can now resolve references. It knows that MYSUB entry point is at offset 28 within MYSUB1 object, and MYSUB1 Object is 800 from the start of the combined files. It can now replace the mysub:V(MYSUB) in MYPROG with the offset value 828.

The entire files is stored as a load module(program object) as one object, with a name that you give it, for example COLIN.

The loader

When load module COLIN is loaded. The loader loads the load module from disk into memory. For example at address 200,000. As part of the loading, it looks at the external references and calculates the address in memory from the offset value. So 200,000 + offset 828 is address 200828. This value is stored in the mysub variable.

When the function is about to be called via L 15,mysub, register 15 has the address of the code in memory and the program can branch to execute the code.

It gets more complex than this

Consider two source programs

int value = 0;
int total = 0;
void main()
{
  value =1;
  total = total + value; 
  printTotal();
  
}
int total;
int done;
void printotal()
{
  printf("Total = %d\n",total);
  done = 1; 
}

There are some global static variables. The variable “total” is used in each one – it is the same variable.

These programs are defined as being re-entrant, and could be loaded into read only storage.

The variables “value” and “total”, cannot go into read only storage as they change during the execution of the program.

There are three global variables: “value”, “total” and “done”; total is common to both programs.

These variables go into a storage area called Writeable Static Area (WSA).

If there are multiple threads running the program, each gets its own copy of the WSA, but they can all shared instructions.

A program can also have 31 bit resident code, and 64 bit resident code. The binder takes all of these parts and creates “classes” of data

  • The WSA class. This contains the merged list of static variables.
  • 64-bit re-entrant code – class. It takes the 64-bit resident code from all of the programs, and included subroutines and creates a “64-bit re-entrant” blob.
  • 31- bit re-entrant code -class. It takes the 31-bit resident code from all of the programs, and included subroutines and creates a “31-bit re-entrant” blob.
  • 64-bit data – class, from all objects
  • 31-bit data – class, from all objects

When the loader loads the modules

  • It creates a new copy of the WSA for each thread
  • It loads the 64 bit re-entrant code (or reuses any existing copy of the code) into 64 bit storage
  • It loads the 31 bit re-entrant code (or reuses any existing copy of the code) into 31 bit storage.

How can I see what is in the load module?

If you look at the output from the binder you get output which includes content like

CLASS  B_TEXT            LENGTH =      4F4  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  C_DATA64          LENGTH =        0  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  C_CODE64          LENGTH =     1A38  ATTRIBUTES = CAT,   LOAD, RMODE= 64 
CLASS  C_@@QPPA2         LENGTH =        8  ATTRIBUTES = MRG,   LOAD, RMODE= 64 
CLASS  C_CDA             LENGTH =     3B50  ATTRIBUTES = MRG,   LOAD, RMODE= 64 
CLASS  B_LIT             LENGTH =      140  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  B_IMPEXP          LENGTH =      A6B  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  C_WSA64           LENGTH =      6B8  ATTRIBUTES = MRG, DEFER , RMODE= 64 
CLASS  C_COPTIONS        LENGTH =      304  ATTRIBUTES = CAT, NOLOAD 
CLASS  B_PRV             LENGTH =        0  ATTRIBUTES = MRG, NOLOAD 

Where

  • B_TEXT is from HLASM (assembler program). Any sections are conCATenated together (Attributes =CAT)
  • C_WSA64 is the 64 bit WSA. Any data in these sections have been MeRGed (see the “total” variable above) (Attributes = MRG)
  • C_OPTIONS contains the list of C options used at compile time. The loader ignores this section (NOLOAD), but it is available for advanced programs such as debuggers to extract this information from the load module.

To introduce even more complexity. You can have class segments. These are an advanced topic where you want groups of classes to be independently loaded. Most people use the default of 1 segment.

Layout of the load module

Class layout

You can see the layout of the classes in the segment.

  • Class B_TEXT starts at offset 0 and is length 4F4.
  • Class C_CODE64 is offset 4F8 (4F4 rounded up to the nearest doubleword) and of length 1A38.
CLASS  B_TEXT            LENGTH =      4F4  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
                         OFFSET =        0 IN SEGMENT 001     ALIGN = DBLWORD 
  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
CLASS  C_CODE64          LENGTH =     1A38  ATTRIBUTES = CAT,   LOAD, RMODE= 64 
                         OFFSET =      4F8 IN SEGMENT 001     ALIGN = DBLWORD ff

Within each class

CLASS  C_CODE64          LENGTH =     1A38  ATTRIBUTES = CAT,   LOAD, RMODE= 64 
                         OFFSET =      4F8 IN SEGMENT 001     ALIGN = DBLWORD 
--------------- 
                                                                                                  
 SECTION    CLASS                                      ------- SOURCE -------- 
  OFFSET   OFFSET  NAME                TYPE    LENGTH  DDNAME   SEQ  MEMBER 
                                                                                                  
                0  $PRIV000010        CSECT      1A38  /0000001  01 
       0        0     $PRIV000011        LABEL 
      B8       B8     PyInit_zconsole    LABEL 
     8B8      8B8     or_bit             LABEL 
     C30      C30     cthread            LABEL 
    1158     1158     cleanup            LABEL 
    11B8     11B8     printHex           LABEL 
  • The label $PRIV000010 CSECT is generated because I did not have a #pragma CSECT(CODE,”….”) statement in my source. If you use #pragma CSECT(STATIC,”….”) you get the name in the CLASS C_WSA64 section (see the following section)
  • The C function “or_bit” is at offset B8 in the class C_CODE64.

The static area

For the example below, the C module had #pragma CSECT(STATIC,”SZCONSOLE”)

CLASS  C_WSA64           LENGTH =      6B8  ATTRIBUTES = MRG, DEFER , RMODE= 64 
                         OFFSET =        0 IN SEGMENT 002     ALIGN = QDWORD 
--------------- 
                                                                                     
            CLASS 
           OFFSET  NAME                TYPE    LENGTH   SECTION 
                0  $PRIV000012      PART            10 
               10  SZCONSOLE        PART           5A0  ZCONSOLE 
              5B0  ascii_tab        PART           100  ascii_tab 
              6B0  gstate           PART             4  gstate 

There are two global static variables, common to all routines, ascii_tab, and gstate. They each have an entry defined in the class.

All of the static variables internal to routines are in the SZCONSOLE section. They do not have a explicit name because they are internal.

One minute SMS

For many people SMS is something on z/OS that gets in the way of doing real work, and you find you are trying to work around it. A bit like Clippy on Windows.

SMS is System Managed Storage. Before SMS, the systems programmer would be responsible for which data sets go on which volume; how often to backup these data sets, and get annoyed when the z/OS users allocated data sets inefficiently, wasting space and using the wrong data set attributes.

SMS has made the systems programmers job much easier. Instead of having a list of disk volumes which can be used by developers, and a list of volumes reserved for systems people, then telling people which disk to use; SMS allows you to say group these volumes as SYSPROGS, and these volumes as OTHER, and when you allocate a data set the system says “COLIN.*” maps to OTHER; SYS1.* map to SYSPROGS.

You can disks have which are not SMS managed, but these days, most of the disks are SMS managed.

You can also specify rules such as all SYS1.*.PARMLIB get backed up daily – and keep the last 10 copies; and do not backup these temporary files.

The system can also do house keeping and say if these user data sets have not been used for a long period, backup them and move them to slower volumes or tape.

The operator commands for SMS are here.

SMS topics

SMS has different “topics”.

A data class is used when creating a data set. It influences the space and data set attributes. You could set up a data class of USERFB80 for user data sets which are Fixed block 80 records. This data class can specify optimum block size to be used, and override any value the user has specified.

A management class provides information about how often it is to be backed up, how long it is to be kept for before automatic deletion, and if it can be compressed or moved out to tape if it has not been used for a period.

A Storage class provides information about which disk volumes to use. You may have some high performing disks, which you want databases to use, and some old spinning disks, for the end users to use.

A storage group is where data sets get created. For example storage group Pool SGCICS has volumes CICS00 through to CICS1F, storage group SGMQ has volumes SGMQS – MQ0000 through MQ004, and OLD001.

You can define a storage group which uses Virtual IO (paging), instead of disk. This is Storage Group VIO.

The Automatic Class Selection (ACS) is the magic which says “my datasets get fast disks”; “your data sets get slow disks, and the data sets are migrated out to tape if they haven’t been used for a week”. At a conceptual level it is like

if DSN in “SYS1.**” then STORCLAS = “SGSYS1”
if SIZE > 4052MB then DATACLAS= BIGFILE
Select
When(STORCLAS=”IMS”) then STORGRP = “SGIMS”
When(STORCLAS=”MQ”) then STORGRP=”SGMQ”
end

When are these visible?

I compiled a C program job. In the JCL output it had

IGD101I SMS ALLOCATED TO DDNAME (SYSLIN )
DSN (SYS22013.T150656.RA000.COLINC3.LOADSET.H01 )
STORCLAS (SCBASE) MGMTCLAS ( ) DATACLAS ( )
VOL SER NOS= VIO

So we can see the storage class used (SGBASE), and the data class, and management class (both not specified).

I’ve run out of space – but there is plenty of space.

I got the following message, and struggled to resolve the problem.

IGD17272I VOLUME SELECTION HAS FAILED FOR INSUFFICIENT SPACE FOR
DATA SET SYS21355.T163955.RA000.COLINMQ.TEMP.H01
JOBNAME (COLINMQ ) STEPNAME (S1 )
PROGNAME (AMATERSE) DDNAME (SYSUT2 )
REQUESTED SPACE QUANTITY = 415019 KB
STORCLAS (SCBASE) MGMTCLAS ( ) DATACLAS ( )
STORGRPS (SGBASE SGEXTEAV SGVIO )

SMS is clearly involved – but how to display more information? You can use ISPF ISMF panels to display information, but sometimes it is easier to use the DISPLAY SMS operator command. For example

What is the status of a volume

d sms,vol(C4USR1)
IGD002I DISPLAY SMS
VOLUME UNIT MVS   SYSTEM= 1   STORGRP NAME
C4USR1 0A9B ONRW          +   SGBASE
* LEGEND *
. THE STORAGE GROUP OR VOLUME IS NOT DEFINED TO THE SYSTEM
+ THE STORAGE GROUP OR VOLUME IS ENABLED
* - THE STORAGE GROUP OR VOLUME IS DISABLED
D THE STORAGE GROUP OR VOLUME IS QUIESCED
Q THE STORAGE GROUP OR VOLUME IS DISABLED FOR NEW ALLOCATIONS ONLY

Where the + in the line with the volume is described in the following lines. So the C4USR1 means the volume is enabled for SMS.

ONRW is ONline ReadWrite

What storage groups are defined?

D SMS,STORGRP(ALL)
D SMS,SG(SGBASE)

Gave

SGBASE POOL +
SPACE INFORMATION:
TOTAL SPACE = 16240MB USAGE% = 91 ALERT% = 0
TRACK-MANAGED SPACE = 16240MB USAGE% = 91 ALERT% = 0

if you want more information

D SMS,SG(SGBASE),LISTVOL

Gave

STORGRP TYPE SYSTEM= 1
SGBASE POOL +
SPACE INFORMATION:
TOTAL SPACE = 16240MB USAGE% = 91 ALERT% = 0
TRACK-MANAGED SPACE = 16240MB USAGE% = 91 ALERT% = 0 

VOLUME UNIT MVS SYSTEM= 1 STORGRP NAME
C4USR1 0A9B ONRW        + SGBASE
USER00 0A9C ONRW        + SGBASE
USER01                  + SGBASE
USER0A                  + SGBASE
...

There were lots of volumes defined … but only two were online and available.

Using ISMF with Storage groups

If you use

  • ISMF option 6 (Storage group)
  • Storage Group Name *
  • Option 1 (List)

This gave me

 LINE       STORGRP  SG                VIO      VIO  
 OPERATOR   NAME     TYPE              MAXSIZE  UNIT 
---(1)----  --(2)--- -------(3)------  --(4)--  (5)- 
            SGDB2    POOL              -------  ---- 
            SGEXTEAV POOL              -------  ---- 
            SGIMS    POOL              -------  ---- 
            SGMQS    POOL              -------  ---- 
            SGVIO    VIO               2000000  3390 
----------  -------- --------  ----------  ----------

We can see that the Storage Group SGVIO is of type VIO. The maximum dataset size that can be allocated in this is Storage Group. 2000000 is 2,000,000 KB.

SMS Enable/disable volumes

You can disable a volume from SMS using

V SMS,VOL(C4USR1),DISABLE

The command
D SMS,VOL(C4USR1)

now gives

VOLUME UNIT MVS    SYSTEM= 1 STORGRP NAME
C4USR1 0A9B ONRW           - SGBASE
+ THE STORAGE GROUP OR VOLUME IS ENABLED
- THE STORAGE GROUP OR VOLUME IS DISABLED

The JCL below will allocate a (temporary) data set on volume C4CFG1, and then delete it.

//IBMUSER1 JOB 1,MSGCLASS=H
//S1 EXEC PGM=IEFBR14
//DD1 DD DSN=&TEMP,DISP=(NEW,PASS),
// SPACE=(CYL,(1,1)),DCB=(LRECL=80,RECFM=FB),
// VOL=SER=(C4CFG1),
// STORCLAS=SCNOSMS
//S2 EXEC PGM=IEFBR14
//DD1 DD DSN=*.S1.DD1,DISP=(OLD,DELETE)

If I removed the STORCLAS=SCNOSMS, the data set was allocated on a different volume C4USR1 with STORCLAS (SCBASE).

Where are the definitions stored?
On my zD&T system, the ACS routines etc are defined in SYS1.S0W1.DFSMS.CNTL. I do not think SMS uses this directly. When creating definitions you may point the ISMF panels to members in this data set.

Documentation

One Minute MVS – tuning stack and heap pools

These days many applications use a stack and heap to manage storage used by an application. For C and Cobol programs on z/OS these use the C run time facilities. As Java uses the C run time facilities, it also uses the stack and heap.

If the stack and heap are not configured appropriately it can lead to an increase in CPU. With the introduction of 64 bit storage, tuning the heap pools and stack is no longer critical. You used to have to carefully manage the stack and heap pool sizes so you didn’t run out of storage.

The 5 second information on what to check, is the number of segments freed for the stack and heap should be zero. If the value is large then a lot of CPU is being used to manage the storage.

The topics are

Kinder garden background to stack.

When a C (main) program starts, it needs storage for the variables uses in the program. For example

int i;
for (ii=0;ii<3:ii++)
{}

char * p = malloc(1024);

The variables ii and p are variables within the function, and will be on the functions stack. p is a pointer.

The block of storage from the malloc(1024) will be obtained from the heap, and its address stored in p.

When the main program calls a function the function needs storage for the variables it uses. This can be done in several ways

  1. Each function uses a z/OS GETMAIN request on entry, to allocate storage, and a z/OS FREEMAIN request on exit. These storage requests are expensive.
  2. The main program has a block of storage which functions can use. For the main program uses bytes 0 to 1500 of this block, and the first function needs 500 bytes, so uses bytes 1501 to 2000. If this function calls another function, the lower level function uses storage from 2001 on wards. This is what usually happens, it is very efficient, and is known as a “stack”.

Intermediate level for stack

It starts to get interesting when initial block of storage allocated in the main program is not big enough.

There are several approaches to take when this occurs

  1. Each function does a storage GETMAIN on entry, and FREEMAIN on exit. This is expensive.
  2. Allocate another big block of storage, so successive functions now use this block, just like in the kinder garden case. When functions return to the one that caused a new block to be allocated,
    1. this new block is freed. This is not as expensive as the previous case.
    2. this block is retained, and stored for future requests. This is the cheapest case. However a large block has been allocated, and may never be used again.

How big a block should it allocate?

When using a stack, the size of the block to allocate is the larger of the user specified size, and the size required for the function. If the specified secondary size was 16KB, and a function needs 20KB of storage, then it will allocate at least 20KB.

How do I get the statistics?

For your C programs you can specify options in the #PRAGMA statement or, the easier way, is to specify it through JCL. You specify C run time options through //CEEOPTS … For example

//CEEOPTS DD *
STACK(2K,12K,ANYWHERE,FREE,2K,2K)
RPTSTG(ON)

Where

  • STACK(…) is the size of the stack
  • RPTSTG(ON) says collect and display statistics.

There is a small overhead in collecting the data.

The output is like:

STACK statistics:                                                
  Initial size:                                2048     
  Increment size:                             12288     
  Maximum used by all concurrent threads:  16218808     
  Largest used by any thread:              16218808     
  Number of segments allocated:                2004     
  Number of segments freed:                    2002     

Interpreting the stack statistics

From the above data

  • This shows the initial stack size was 2KB and an increment of 12KB.
  • The stack was extended 2004 times.
  • Because the statement had STACK(2K,12K,ANYWHERE,FREE,2K,2K), when the secondary extension became free it was FREEMAINed back to z/OS.

When KEEP was used instead of FREE, the storage was not returned back to z/OS.

The statistics looked like

STACK statistics:                                                
  Initial size:                               2048     
  Increment size:                            12288     
  Maximum used by all concurrent thread:  16218808     
  Largest used by any thread:             16218808     
  Number of segments allocated:               1003     
  Number of segments freed:                      0     

What to check for and what to set

For most systems, the key setting is KEEP, so that freed blocks are not released. You can see this a) from the definition b) Number of segments freed is 0.

If a request to allocate a new segment fails, then the C run time can try releasing segments that are not in use. If this happens the “”segments freed” will be incremented.

Check that the “segments freed” is zero, and if not, investigate why not.

When a program is running for a long time, a small number of “segments allocated” is not a problem.

Make the initial size larger, closer to the “Largest used of any thread” may improve the storage utilisation. With smaller segments there is likely to be unused space, which was too small for a functions request, causing the next segment to be used. So a better definition would be

STACK(16M,12K,ANYWHERE,KEEP,2K,2K)

Which gave

STACK statistics:                                                          
  Initial size:                                     16777216               
  Increment size:                                      12288               
  Maximum used by all concurrent threads:           16193752               
  Largest used by any thread:                       16193752               
  Number of segments allocated:                            1               
  Number of segments freed:                                0               

Which shows that just one segment was allocated.


Kinder garden background to heap

When there is a malloc() request in C, or a new … in Java, the storage may exist outside of the function. The storage is obtained from the heap.

The heap has blocks of storage which can be reused. The blocks may all be of the same size, or or different sizes. It uses CPU time to scan free blocks looking for the best one to reuse. With more blocks it can use increasing amounts of CPU.

There are heap pools which avoids the costs of searching for the “right” block. It uses a pools of blocks. For example:

  1. there is a heap pool with 1KB fixed size blocks
  2. there is another heap pool with 16KB blocks
  3. there is another heap pool with 256 KB blocks.

If there is a malloc request for 600 bytes, a block will be taken from the 1KB heap pool.

If there is a malloc request for 32KB, a block would be used from the 256KB pool.

If there is a malloc request for 512KB, it will issue a GETMAIN request.

Intermediate level for heap

If there is a request for a block of heap storage, and there is no free storage, a large segment of storage can be obtained, and divided up into blocks for the stack. If the heap has 1KB blocks, and a request for another block fails, it may issue a GETMAIN request for 100 * 1KB and then add 100 blocks of 1KB to the heap. As storage is freed, the blocks are added to the free list in the heap pool.

There is the same logic as for the stack, about returning storage.

  1. If KEEP is specified, then any storage that is released, stays in the thread pool. This is the cheapest solution.
  2. If FREE is specified, then when all the blocks in an additional segment have been freed, then free the segment back to the z/OS. This is more expensive than KEEP, as you may get frequent GETMAIN and FREEMAIN requests.

How many heap pools do I need and of what size blocks?

There is usually a range of block sizes used in a heap. The C run time supports up to 12 cell sizes. Using a Liberty Web server, there was a range of storage requests, from under 8 bytes to 64KB.

With most requests there will frequently be space wasted. If you want a block which is 16 bytes long, but the pool with the smallest block size is 1KB – most of the storage is wasted.
The C run time gives you suggestions on the configuration of the heap pools, the initial size of the pool and the size of the blocks in the pool.

Defining a heap pool

How to define a heap pool is defined here.

You specify the size of overall size of storage in the heap using the HEAP statement. For example for a 16MB total heap size.

HEAP(16M,32768,ANYWHERE,FREE,8192,4096)

You then specify the pool sizes


HEAPPOOL(ON,32,1,64,2,128,4,256,1,1024,7,4096,1,0)

The figures in bold are the size of the blocks in the pool.

  • 32,1 says maximum size of blocks in the pool is 32 bytes, allocate 1% of the heap size to this pool
  • 64,2 says maximum size of blocks in the pool is 64 bytes, allocate 2% of the heap size to this pool
  • 128,4 says maximum size of blocks in the pool is 128 bytes, allocate 4% of the heap size to this pool
  • 256,1 says maximum size of blocks in the pool is 256 bytes, allocate 1% of the heap size to this pool
  • 1024,7 says maximum size of blocks in the pool is 1024 bytes, allocate 7% of the heap size to this pool
  • 4096,1 says maximum size of blocks in the pool is 4096 bytes, allocate 1% of the heap size to this pool
  • 0 says end of definition.

Note, the percentages do not have to add up to 100%.

For example, with the CEEOPTS

HEAP(16M,32768,ANYWHERE,FREE,8192,4096)
HEAPPOOLS(ON,32,50,64,1,128,1,256,1,1024,7,4096,1,0)

After running my application, the data in //SYSOUT is


HEAPPOOLS Summary:                                                         
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In 
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use      
  ------------------------------------------------------------------------ 
       32        40    50      209715           0           0           0 
       64        72      1        2330           1        1002           2 
      128       136      1        1233           0           0           0 
      256       264      1         635           0           0           0 
     1024      1032      7        1137           1           2           0 
     4096      4104      1          40           1           1           1 
  ------------------------------------------------------------------------ 

For the cell size of 32, 50% of the pool was allocated to it,

Each block has a header, and the total size of the 32 byte block is 40 bytes. The number of 40 bytes units in 50% of 16 MB is 8MB/40 = 209715, so these figures match up.

(Note with 64 bit heap pools, you just specify the absolute number you want – not a percentage of anything).

Within the program there was a loop doing malloc(50). This uses cell pool with size 64 bytes. 1002 blocks(cells) were used.

The output also has

Suggested Percentages for current Cell Sizes:
HEAPP(ON,32,1,64,1,128,1,256,1,1024,1,4096,1,0)


Suggested Cell Sizes:
HEAPP(ON,56,,280,,848,,2080,,4096,,0)

I found this confusing and not well documented. It is another of the topics that once you understand it it make sense.

Suggested Percentages for current Cell Sizes

The first “suggested… ” values are the suggestions for the size of the pools if you do not change the size of the cells.

I had specified 50% for the 32 byte cell pool. As this cell pool was not used ( 0 allocated cells) then it suggests making this as 1%, so the suggestion is HEAPP(ON,32,1

You could cut and paste this into you //CEEOPTS statement.

Suggested Cell Sizes

The C run times has a profile of all the sizes of blocks used, and has suggested some better cell sizes. For example as I had no requests for storage less than 32 bytes, making it bigger makes sense. For optimum storage usage, it suggests of using sizes of 56, 280,848,2080,4096 bytes.

Note it does not give suggested number of blocks. I think this is poor design. Because it knows the profile it could have an attempt at specifying the numbers.

If you want to try this definition, you need to add some values such as

HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

Then rerun your program, and see what percentage figures it recommends, update the figures, and test again. Not the easiest way of working.

What to check for and what to set

There can be two heap pools. One for 64 bit storage ( HEAPPOOL64) the other for 31 bit storage (HEAPPOOL).

The default configuration should be “KEEP”, so any storage obtained is kept and not freed. This saves the cost of expensive GETMAINS and FREEMAINs.

If the address space is constrained for storage, the C run time can go round each heap pool and free up segments which are in use.

The value “Number of segments freed” for each heap should be 0. If not, find out why (has the pool been specified incorrectly, or was there a storage shortage).

You can specify how big each pool is

  • for HEAPPOOL the HEAP size, and the percentage to be allocated to each pool – so two numbers to change
  • for HEAPPOOL64 you specify the size of each pool directly.

The sizes you specify are not that sensitive, as the pools will grow to meet the demand. Allocating one large block is cheaper that allocating 50 smaller blocks – but for a server, this different can be ignored.

With a 4MB heap specified

HEAP(4M,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

the heap report was

 HEAPPOOLS Summary: 
   Specified Element   Extent   Cells Per  Extents    Maximum      Cells In 
   Cell Size Size      Percent  Extent     Allocated  Cells Used   Use 
   ------------------------------------------------------------------------ 
        56        64      1         655           2        1002           2 
       280       288      1         145           1           1           0 
       848       856      1          48           1           1           0 
      2080      2088      1          20           1           1           1 
      4096      4104      1          10           0           0           0 
   ------------------------------------------------------------------------ 
   Suggested Percentages for current Cell Sizes: 
     HEAPP(ON,56,2,280,1,848,1,2080,1,4096,1,0) 

With a small(16KB) heap specified

HEAP(16K,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

The output was

HEAPPOOLS Summary:                                                            
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In    
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use         
  ------------------------------------------------------------------------    
       56        64      1           4         251        1002           2    
      280       288      1           4           1           1           0    
      848       856      1           4           1           1           0    
     2080      2088      1           4           1           1           1    
     4096      4104      1           4           0           0           0    
  ------------------------------------------------------------------------    
  Suggested Percentages for current Cell Sizes:                               
    HEAPP(ON,56,90,280,2,848,6,2080,13,4096,1,0)                             

and we can see it had to allocate 251 extents for all the request.

Once the system has “warmed up” there should not be a major difference in performance. I would allocate the heap to be big enough to start with, and avoid extensions.

With the C run time there are heaps as well as heap pools. My C run time report gave

64bit User HEAP statistics:
31bit User HEAP statistics:
24bit User HEAP statistics:
64bit Library HEAP statistics:
31bit Library HEAP statistics:
24bit Library HEAP statistics:
64bit I/O HEAP statistics:
31bit I/O HEAP statistics:
24bit I/O HEAP statistics:

You should check all of these and make the initial size the same as the suggested recommended size. This way the storage will be allocated at startup, and you avoid problems of a request to expand the heap failing due to lack of storage during a buys period.

Advanced level for heap

While the above discussion is suitable for many workloads, especially if they are single threaded. It can get more complex when there are multiple thread using the heappools.

If you have a “hot” or highly active pool you can get contention when obtaining and releasing blocks from the heap pool. You can define multiple pools for an element size. For example

HEAPP(ON,(56,4),1,280,1,848,1,2080,1,4096,1,0)

The (56,4) says make 4 pools with block size of 56 bytes.

The output has

HEAPPOOLS Summary:                                                          
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In  
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use       
  ------------------------------------------------------------------------  
       56       64     1           4         251        1002           2  
       56       64     1           4           0           0           0  
       56       64     1           4           0           0           0  
       56       64     1           4           0           0           0  
      280       288      1           4           1           1           0  
      848       856      1           4           1           1           0  
     2080      2088      1           4           1           1           1  
     4096      4104      1           4           0           0           0  
  ------------------------------------------------------------------------  

We can see there are now 4 pools with cell size of 56 bytes. The documentation says Multiple pools are allocated with the same cell size and a portion of the threads are assigned to allocate cells out of each of the pools.

If you have 16 threads you might expect 4 threads to be allocated to each pool.

How do you know if you have a “hot” pool.

You cannot tell from the summary, as you just get the maximum cells used.

In the report is the count of requests for different storage ranges.

Pool  2     size:   160 Get Requests:           777707 
  Successful Get Heap requests:    81-   88                 77934 
  Successful Get Heap requests:    89-   96                 59912 
  Successful Get Heap requests:    97-  104                 47233 
  Successful Get Heap requests:   105-  112                 60263 
  Successful Get Heap requests:   113-  120                 80064 
  Successful Get Heap requests:   121-  128                302815 
  Successful Get Heap requests:   129-  136                 59762 
  Successful Get Heap requests:   137-  144                 43744 
  Successful Get Heap requests:   145-  152                 17307 
  Successful Get Heap requests:   153-  160                 28673
Pool  3     size:   288 Get Requests:            65642  

I used ISPF edit, to process the report.

By extracting the records with size: you get the count of requests per pool.

Pool  1     size:    80 Get Requests:           462187 
Pool  2     size:   160 Get Requests:           777707 
Pool  3     size:   288 Get Requests:            65642 
Pool  4     size:   792 Get Requests:            18293 
Pool  5     size:  1520 Get Requests:            23861 
Pool  6     size:  2728 Get Requests:            11677 
Pool  7     size:  4400 Get Requests:            48943 
Pool  8     size:  8360 Get Requests:            18646 
Pool  9     size: 14376 Get Requests:             1916 
Pool 10     size: 24120 Get Requests:             1961 
Pool 11     size: 37880 Get Requests:             4833 
Pool 12     size: 65536 Get Requests:              716 
Requests greater than the largest cell size:               1652 

It might be worth splitting Pool 2 and seeing if makes a difference in CPU usage at peak time. If it has a benefit, try Pool 1.

You can also sort the “Successful Heap requests” count, and see what range has the most requests. I don’t know what you would use this information for, unless you were investigating why so much storage was being used.

Ph D level for heap

For high use application on boxes with many CPUs you can get contention for storage at the hardware cache level.

Before a CPU can use storage, it has to get the 256 byte cache line into the processor cache. If two CPU’s are fighting for storage in the same 256 bytes the throughput goes down.

By specifying

HEAPP(ALIGN….

It ensures each block is isolated in its own cache line. This can lead to an increase in virtual storage, but you should get improved throughput at the high end. It may make very little difference when there is little load, or on an LPAR with few engines.

One minute MVS – synchronous I/O.

This blog is one of a series of blog posts about performance topics on z/OS. This post is about synchronous I/O technology which reduces I/O time from milliseconds down to microseconds.

There is a 2018 redbook with some good content. This 2020 presentation is a good introduction. This gets a bit technical and looks at it from the Storage Controller view point.

History

Some protocols are very chatty. For example when you download a television program to your television, you see it as one download. Under the covers the software sees this television program as many parts, and there will be a conversation like “Here is part 1, the first 10 MBs”… “ok got that – please send the next”. Within this conversation is TCP/IP chatter, “here is packet 4013” … “ok got that” etc.

40 years ago the I/O to disks was a bit like the following conversation between the mainframe and the disk controller. Imaging this as a series of phone calls. The “Hello” is where they initiate a new call, and “Bye” is where they hang up

  • Mainframe: Hello storage controller, I want to read from disk with volume label A1USR1
  • Storage controller: Sorry, it is in use, please try later. Bye.
  • Mainframe : Hello? the telephone line is busy, I’ll retry.
  • Mainframe: Hello, is disk A1USR1 available now?
  • Storage controller: yes. I’ve reserved it for you
  • Mainframe: Please move the disk read heads so they are under cylinder 61
  • Storage controller: OK will do. Bye.
  • Storage controller: Hello. OK the heads are under cylinder 61
  • Mainframe: Read from head 4 (track 4) third record
  • Storage controller: OK will do. Bye.
  • Storage controller: Hello. OK, done that – here is the data
  • Mainframe: Thank you, I’ve finished with the volume A1USR1.
  • Storage controller: You’re welcome – that took 25 milliseconds. Bye.

Today’s conversation is much faster, as mostly data is in the storage controller’s cache. There are still a few phone calls.

Today’s synchronous I/O

With the new synchronous I/O technology (zHyperLink) the conversation is more like the following. (Using the new premium support phone number)

  • Mainframe: Hello Storage controller, I want to read from disk with volume label A1USR1 Cylinder 61 track 4.
  • Storage controller: Just a second – we have it in cache, here is the data. Bye.

Or perhaps the conversation goes like.

  • Mainframe: Hello Storage controller, I want to read from disk with volume label A9SYS1 Cylinder 995 track 2.
  • Storage controller: Just a second – I’m sorry we do not have it in cache. Please use the old way of doing it, using the old customer support number. Bye.

Technical background

With the old way of doing I/O there were multi asynchronous requests coming back from the Storage Controller. It needed CPU to start the I/O ( Start subchannel) then suspend the program until the I/O completed and then resume the program. The program could have been re-dispatched on a different CPU, so its data was not in the processor cache.

Sometimes the amount of CPU used was more than the duration of I/O request! With the coupling facility(CF), came technology to issue an I/O request synchronously. With this the “Issue CF request” was one mainframe instruction which started the I/O, waited for the response and then resumed. There was no z/OS interrupt, and there was no dispatcher involved. Generally this used less CPU than the asynchronous request, and was faster.

The duration of the synchronous request depends on the work being done in the CF. The more work it does, the longer the instruction takes. For example the CF is doing some work on some data for a different processor, and locks a resource. This will delay other users of the data, who get locked out. Also as the physical distance between the CF and the mainframe increased, the overall duration increased (due to the speed to light).

At some point it is more efficient to use the asynchronous request; send the request; suspend until it has completed; rather than the send the request and wait.

The CF code has logic to determine if the request should be synchronous or asynchronous. Even though the recent requests have all been asynchronous, it will periodically try a synchronous request to see if it is worth switching to synchronous mode.

Additional hardware

For the mainframe to support this synchronous I/O it needs additional hardware.

  • It continues to need the FIbreCONnection (FICON) today for those requests that do not support synchronous I/O.
  • It needs new zHyperLink cables connected from the mainframe to the Storage Controller. You need a direct connection, not via a switch.
  • The CF and mainframe need to be closer than 150 meters.
  • You can have duplexed CFs which both have to be within 150 meters.

One Minute MVS performance – Work Load Manager – looking at WLM reports.

I have a set of blog posts relating to getting started with z/OS performance. This blog post follows on the overview of WLM, and describes the contents of the reports, and how you can tell if work is being delayed, and why it is being delayed.

Real goals from my system

For TSO on my z/OS there are goals

  1. For the first 800 service units (a systems independent measure of CPU usage)
    1. 80% requests to complete within 00:00:00.30
    2. Work has importance 2
  2. After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

  1. Execution velocity of 30
  2. Importance 3

For started tasks with Low Priority the goals are

  1. Discretionary – there no goals – just do your best

How do I tell what is going on and if the goals have been met?

RMF can display data in near real time (every minute or so).

RMF captures data and produces SMF records which can be processed by RMF and other products.

You can report on

  1. How well the service class did against its goals
  2. How well transactions or work did, from a reporting class.

You could have all CICS transactions in a service class, so they get the same CPU profile etc, but have different reporting classes. You can monitor CE* transaction, and PAY* transactions differently.

You could have a reporting class for work coming in from other systems, depending on the userid.

I set up a reporting class for z/OSMF. In the RMF batch report SYSRPTS(WLMGL(RCPER(ZOSMF)).

One part of the report was contained


         z/OS V2R4               SYSPLEX ADCDPL             DATE 06/14/2021           INTERVAL 05.00.003   
                                 RPT VERSION V2R4 RMF       TIME 09.25.00
POLICY=ETPBASE                        REPORT CLASS=ZOSMF                                   PERIOD=1 
 -TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF  TRANS-APPL%-----CP-IIPCP/AAPCP-IIP/AAP  ---ENCLAVES--- 
 AVG        1.00  ACTUAL                    0  TOTAL        66.25       64.20  173.99  AVG ENC   0.00 
 MPL        1.00  EXECUTION                 0  MOBILE        0.00        0.00    0.00  REM ENC   0.00 
 ENDED         0  QUEUED                    0  CATEGORYA     0.00        0.00    0.00  MS ENC    0.00 
 END/S      0.00  R/S AFFIN                 0  CATEGORYB     0.00        0.00    0.00 
                                                                                                                
 ----SERVICE----   SERVICE TIME  ---APPL %---  --PROMOTED--  --DASD I/O---  ----STORAGE----  -PAGE-IN RATES- 
 IOC        2366K  CPU  720.505  CP     66.25  BLK    0.000  SSCHRT    0.2  AVG    81420.24  SINGLE      0.0 
 CPU      617333   SRB    0.223  IIPCP  64.20  ENQ    0.000  RESP      0.0  TOTAL  81421.05  BLOCK       0.0 
 MSO      154219   RCT    0.000  IIP   173.99  CRM    0.000  CONN      0.0  SHARED     0.00  SHARED      0.0 
 SRB         191   IIT    0.013  AAPCP   0.00  LCK    0.889  DISC      0.0                   HSP         0.0 
 TOT        3138K  HST    0.000  AAP      N/A  SUP    0.000  Q+PEND    0.0 
 GOAL: EXECUTION VELOCITY 70.0%     VELOCITY MIGRATION:   I/O MGMT  28.3%     INIT MGMT 28.3% 
                                                                                                                
          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  -------------- EXEC DELAYS % -----------  
 SYSTEM                    VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU                                
 S0W1        --N/A--       28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19                               

Key fields:

INTERVAL 05.00.003

This tells the duration of the requests.

POLICY=ETPBASE REPORT CLASS=ZOSMF PERIOD=1

This tells you this is a report class (rather than a service class) the name is zOSMF, and is for period 1 . When you have service classes which have more than one criteria , such as high priority for the first 0.5 seconds of CPU – then low priority, these will have multiple periods.

-TRANSACTIONS–
AVG 1.00
MPL 1.00
ENDED 0
END/S 0.00

This says on average there was one instance running. You can have multiple transactions or jobs in a class. Add up the total duration of all jobs/transactions and divide by the interval to get the average(AVG).

MPL (multi programming level) is an advanced topic and describes how many instances were concurrently active.

No jobs/transactions ended in this interval, with a ending rate of 0 in 5 minutes.

—APPL %—
CP 66.25
IIPCP 64.20
IIP 173.99
AAPCP 0.00
AAP N/A

This shows the percentage of CPU used over the interval

  • 66.25 percent on GP engines
  • 64.20 percent IIPCP is 64.20 % of GP engine was doing work that could have run on a ZIIP – if there had been spare ZIIP capacity. 66.25 – 64.20 = 2.05 of work on a GP that was not ZIIP eligible.
  • 173.99 percent of ZIIP work running on a ZIIP engine – so nearly 2 ZIIP engines were being used
  • 0 AAPCP – there was no ZAAP eligible work offloaded onto a GP
  • 0 AAP there was no work running on an ZAAP

The total ZIIP used was 173.99 in ZIIP engines, +64.20 of a GP = 238 or almost 2.5 ZIIP engines worth.

It is good to run on ZIIPs where possible, because ZIIPs are cheaper ($$) than GPs, and GPs may be configured to be slower than a ZIIP.

GOAL: EXECUTION VELOCITY 70.0%

The performance goal for this work was defined as Execution Velocity of 70 %.

 
         EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM  VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU      
 S0W1    28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19       
  • The achieved execution velocity was 28.3% against a target of 70%
  • The performance index was 2.5. The performance goal is goal/actual. A value of 1 or smaller is good. The value here shows the goal was not met. You need to consider
    • Changing the goal for this work so the target goal is what you can achieve on a normal day
    • Changing the importance of the work for when the system is constrained.
    • If you change the goal for one set of work – it may impact other work, so you need to look at the system as a whole and decide which is your important work.
    • Add more CPUs or ZIIPs – these may not help if the delays are not CPU… see below
  • Average number of address spaces in this class 1.
  • EXEC USING%. The figures above were for true CPU used. WLM samples activities 4 times a second. Of the samples where jobs were running or waiting for waiting for a resource.
    • 8% of an CPU engine was used – this includes ZIIP work running on GP.
    • 20% of a ZIIP engine
    • The ratio 8:20 is similar to CPU on GP and ZIIP actually used in this period of 66.25: 173.99.
  • EXEC DELAYS
    • The total delay was 72% = ( 100 – (8+20) “using samples” above)
    • for 53% of all the samples it was was waiting for a ZIIP engine
    • for 19% of all the the samples it was waiting for a GP engine.
    • You can have other delays listed here, for example paging, or your program is capped to limit how much CPU it is allowed.

Once z/OSMF had started, and settled down, there were still delays for IIP (28%). To me this looks like a lumpy workload, that perhaps there is a timer which pops and runs multiple threads. There are more threads than IIPs – so some have to wait.

Reports for transactional work

I defined a transaction so I could measure the response times (and CPU used) for a service in z/OSMF. A TSO address space is started, and z/OSMF sends a client/server request to the TSO address space. The response time is sub-second so a good candidate to demonstrate WLM for a transaction.

I configured z/OSMF to have

<zosWorkloadManager collectionName=”MOPZCET”/>
<wlmClassification>
<httpClassification transactionClass=”ZCI3″ resource=”/zosmf/webispf/*/“/>
</wlmClassification>

The collection name is passed to WLM to determine the service class and report class of the work. The default is the server name.

All ISPF (with a URL of /zosmf/webispf/*) requests were classified as ZCI3.

I then used WLM to configure

  • a service class ZCI3 with Average response time of 00:00:00.010
  • a classification rule for type CB, a rule for CN=MOPZCET, and sub-rule TC = ZCI4. This gave the service class and report class.

The data in the report had

-TRANSACTIONS–
AVG 0.01
MPL 0.01
ENDED 21
END/S 0.07

21 transactions in 5 minutes is 0.07 a second.

MPL (MultiProgramming Limit is the target which represents the number of address spaces that must be in the swapped-in state for the service class period to meet its goals. I’ve never used it!

TRANS-TIME HHH.MM.SS.FFFFFF
ACTUAL               140526
EXECUTION            139950
QUEUED                  575

The average time was 0.140 seconds.

GOAL: RESPONSE TIME 000.00.00.010 AVG

That was the specification in WLM (note the specified value of 0.010 is very different to the 0.140 achieved)


          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP 
 S0W1     000.00.00.140526 66.7 14.1   0.0  0.0 N/A  18 0.0  9.1 9.1  

This shows the average response time was 0.140 seconds, used 18% on a ZIIP, and waited 9% of the time for a ZIIP

To the right of the data in the report was

--- DELAY % --- 
UNK IDL CRY CNT                 
 64 0.0 0.0 0.0 

Which says there was 64% of the delay was unknown. This could be

  • waiting for end user input
  • waiting for TCP/IP data
  • the program sent off a request and is waiting for a response.

For example the ISPF transaction in z/OSMF had sent a request to an address space running TSO. This address space processed the request and sent the response back. I am guessing that the 64% delay was waiting for TSO to process the request and send back the response.

You also get a response time profile based on the service class

                              ----------RESPONSE TIME DISTRIBUTION---------- 
   -----TIME------  # TRANS   0    10   20   30   40   50   60   70   80   90   100 
   HH.MM.SS.FFFFFF  IN BUCKET |....|....|....|....|....|....|....|....|....|....| 
<= 00.00.00.014000          0  > 
<= 00.00.00.015000          0  > 
<= 00.00.00.020000          2  >>>>>> 
<= 00.00.00.040000          5  >>>>>>>>>>>>> 
>  00.00.00.040000         14  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 

This shows that out of the 21 requests, 7 were below 0.040 seconds, and 14 were over 0.040 seconds.

From the service class, it was specified as GOAL: RESPONSE TIME 000.00.00.010 AVG so this goal is very badly specified. It would be better set to average of 0.140 seconds.

I changed the service class to a goal of 0.140 seconds and activated it. After I had run some tests the output was

          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS %
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT            
 S0W1     000.00.00.097733  100  0.7   0.0  0.0 N/A  50 0.0  0.0            

Which showed no delays

and a response time profile

                                ---RESPONSE TIME DISTRIBUTION--- 
    -----TIME------  --# TRANS  0    10   20   30   40   50   60
    HH.MM.SS.FFFFFF  IN BUCKET  |....|....|....|....|....|....|.
 <= 00.00.00.070000          0  > 
 <= 00.00.00.084000          5  >>>>>>>>>>>>>>> 
 <= 00.00.00.098000          9  >>>>>>>>>>>>>>>>>>>>>>>>>> 
 <= 00.00.00.112000          1  >>>> 
 <= 00.00.00.126000          0  > 
 <= 00.00.00.140000          1  >>>> 
 <= 00.00.00.154000          1  >>>> 
 <= 00.00.00.168000          0  > 
 <= 00.00.00.182000          0  > 
 <= 00.00.00.196000          0  > 
 <= 00.00.00.210000          1  >>>> 
 <= 00.00.00.280000          0  > 

An average of 0.10 seconds, with some taking up to 0.210 seconds.

Real time information

You can get the information in near real time from RMF (or other monitors)

For example for processor delays

            Service  CPU  DLY USG EAppl  ----------- Holding Job(s) ---------
Jobname  CX Class    Type  %   %    %     %  Name      %  Name      %  Name 
IZUSVR1  SO STCHIM   CP     2  35 56.53   91 IZUSVR1    4 JES2MON    2 TCPIP 
                     IIP   94  95 183.1   89 IZUSVR1                         

This shows that job IZUSVR1

  • Was delayed for 2% of the time on a GP
  • Used 35% of the GP engines
  • Was delayed 94% of the time on a ZIIP
  • and used 95% of the available ZIIP resource
  • The jobs using CPU were IZUSVR1 (using 91%) JES2MON and TCPIP
  • The jobs using ZIIP were IZUSVR1

What to do now?

You need to identify the goals of your work, and set sensible goals. This may take several iterations. You also need to understand the priorities of the work, and userid.

Once you have configured your system to report on response times of your business critical work, you can adjust the service classes so your work achieves it goals.

Define reporting classes so you can monitor different groups of work and that they are meeting their goals.