How do I enter a password on the z/OS console for my program?

I wanted to run a job/started task which prompts the operator for a password. Of course being a password, you do not want it written to the job log for every one to see.

In assembler you can write a message on the console, and have z/OS post an ECB when the message is replied to.

         WTOR  'ROUTECD9 ',reply,40,ecb,ROUTCDE=(9) 
         wait 1,ECB=ECB 
         ...
ECB      DC   F'0' 
REPLY    DS  CL40

The documentation for ROUTCDE says

9 System Security. The message gives information about security checking, such as a request for a password.

When this ran, the output on the console was as follows The … is where I typed R 6,abcdefg

@06 ROUTECD9 
...                                
R 6 SUPPRESSED                              
IEE600I REPLY TO 06 IS;SUPPRESSED

With ROUTCDE=(1) the output was

@07 ROUTECD1                      
R 7,ABCDEFG                      
IEE600I REPLY TO 07 IS;ABCDEFG

With no ROUTCDE keyword specified the output was

@08 NOROUTECD                          
R 8 SUPPRESSED                       
IEE600I REPLY TO 08 IS;SUPPRESSED

The lesson is that you have to specify ROUTCDE=(1) if you want the reply to be displayed. If you omit the ROUTCDE keyword, or specify a value of 9 – the output is supressed.

Can I do this from a C program?

The C run time _console2() function allows you to issue console messages. If you pass and address for modstr, the _console2() function waits until there is an operator stop of modify command issued for the job. If a NULL address is passed in the modstr, then the message is displayed, and control returns immediately. The text of the modify command is visible on the console.

To get suppressed text you would need to issue the WTOR Macro using __ASM(…) in your C program.

Can I share a VSAM file (ZFS) between systems?

I had the situation where I am using ZD&T – which is a z/OS emulator running on Linux, where there 3390 disks are emulated on Linux files. I have an old image, and a new image, and I want to use a ZFS from the new image on the old image to test out a fix.

The high level answer to the original question is “it depends”.

Run in a sysplex

This is how you run in a production environment. You have a SYSPLEX, and have a (master) catalog shared by all systems. I cannot create the environment in zD&T. Setting up a sysplex is a lot of work for a simple requirement.

Copy the Linux file

Because the 3390 volumes are emulated as Linux files, you can copy the Linux file and use that file in the old zPTD image, and avoid the risk of damaging the new copy. The Linux file name is different, but the VOLID is the same. I was told you can use import catalog to get this to work. I haven’t tried it.

The cluster is in a shared user catalog.

If the VSAM cluster is defined in a user catalog, and the user catalog can be used on both systems, then the cluster can be used on both systems (but not at the same time). When the cluster is used, information about the active system is stored in the cluster. When the file system is unmounted, or OMVS is shutdown, this system information is removed. If you do not unmount, or shutdown OMVS cleanly, then when the file system is mounted on the other system, the mount will detect the file system was last used on another system, and wait for a minute or so to make sure the other system is inactive. If the mount command is issued during OMVS startup OMVS will wait for this time. If you have 10 file systems shared, OMVS will wait for each in turn – which can significantly delay OMVS start up.

When the cluster is in the master catalog

Someone suggested

You could mount the volume to your new system and import connect the master catalog of the old system to the new one and define the old alias for the ZFS in the new master pointing to the old master which is now a user catalog to the new system. If it’s not currently different, you could rename it on the old system to a new HLQ that is different from the existing one and then do the import connect of the master as a usercat and define the new alias pointing to the old ZFS.

This feels too dangerous to me!

Pax the files in the directory

You can use Pax to unload the contents of the directory to a dataset, then load the data from the dataset on the other system.

cd /usr/lpp....
pax -W “seqparms=’space=(cyl,(10,10))'” -wzvf “//’COLIN.PAX.PYMQI2′” -x os390 .

On the other system

mkdir mydir
cd mydir 
pax -rf “//’COLIN.PAX.PYMQI2A'” .

Note when using cut and paste make sure you have all of the single quotes and double quotes. I found they sometimes got lost in the pasting.

Using DFDSS

See Migrating an ADCD z/OS release: VSAM files

I can’t even spell Ansible on z/OS

The phrase “I can’t even spell….” is a British phrase which means “I know so little about this that I cannot even pronounce or write the word.”

I wanted to see if I could use Ansible to extract some information from z/OS. There is a lot of documentation available, but it felt like the documentation started at chapter 2 of the instruction book, and missing the first set of instructions.

Below are the instructions to get the most basic ping request working.

On z/OS

Ansible is a python package which you need to install.

pip install ansible-core

This may install several packages

It is better to do this in an SSH terminal session rather than from ISPF -> OMVS. For example it may display a progress bar.

On Linux

Setup

sudo apt install ansible

I made a directory to store my Ansible files in

mkdir ansible
cd ansible

There is some good documentation here.

Edit the inventory.ini

[myhosts]
10.1.1.2

[myhosts:vars]
ansible_python_interpreter=/usr/lpp/IBM/cyp/v3r12/pyz/bin/python

Where

[myhosts]… is the IP address of the remote system.
[myhosts:vars] ansible_python_interpreter=… is needed for Ansible to work. It it the location of Python on z/OS.

Check the connection

Ansible uses an SSH session to get to the back end. Check this works before you use Ansible.

ssh colin@10.1.1.2

I have set this up for password less logon.

Try the ping

ansible myhosts -u colin -m ping -i inventory.ini

Where

-i inventory.ini specifies the configuration file
myhosts which sections in the configuration file
-u colin logon with this userid
-m ping issue this command

When this worked I got

10.1.1.2 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

The command took about 10 seconds to run.

You may not need to specify the -u information.

What can go wrong?

I experienced

Invalid userid

ansible myhosts -u colinaa -m ping -i inventory.ini

10.1.1.2 | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: colinaa@10.1.1.2: Permission denied (publickey,password).",
    "unreachable": true
}

This means you got to the system, but you specified an invalid user, or the userid was unable to connect over SSH.

Python configuration missing

ansible myhosts -u colin -m ping -i inventory.ini

This originally gave me

[WARNING]: No python interpreters found for host 10.1.1.2 (tried ['python3.12', 'python3.11',
'python3.10', 'python3.9', 'python3.8', 'python3.7', 'python3.6', '/usr/bin/python3',
'/usr/libexec/platform-python', 'python2.7', '/usr/bin/python', 'python'])
10.1.1.2 | FAILED! => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": false,
    "module_stderr": "Shared connection to 10.1.1.2 closed.\r\n",
    "module_stdout": "/usr/bin/python: FSUM7351 not found\r\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 127
}

Edit the inventoy.ini and add the ansible_python_interpreter information.

[myhosts]
10.1.1.2

[myhosts:vars]
ansible_python_interpreter=/usr/lpp/IBM/cyp/v3r12/pyz/bin/python

Logging on to Git (on z/OS)

I’ve gradually been moving away from being 100% ISPF, and moving to OMVS. I use SSH terminals to access the Command Line Interface (CLI) just like I use on Linux, and I do most of my editing with VScode on Linux accessing the files on z/OS over sshfs so they look as if they are in a local Linux directory.

I wanted to use Git on z/OS. It was easy to install and start using, but I had problems logging on to Git.

As I understand it there are several ways of logging on to Git. I’ve used two, HTTPS and SSH.

HTTPS

You can logon to Git with a userid and a Personal Access Token. A PAT is like a sophisticated password. To get a PAT, go to your Git home page, click on your photo, and click settings. On the public profile page which is displayed, at the bottom of the left hand column is<> Developer settings. Click on this link. Click on Personal access tokens.

Click on Tokens (classic) -> Generate new token (classic). You have to verify, so I clicked send code via email. Copy the PAT.

When you create a new PAT you can specify what the token can do, for example

full control of the private repository, or just access the public repository, or access the commit state.
can control the public keys
delete repositories

Click on generate token. A token is displayed such as ghp_7OSehXd6lP1234Gy0KRvqpmABALX8L618ycad. Copy this and save it somewhere securely. If you lose it, it is easy to delete and create another.

If you use Git using https, for example https://github.com/colinpaiceABC/ColinsRepo it will prompt for userid (colinpaiceABC) and password. Password means use a PAT.

You can store the userid and PAT for scripts etc to use to logon.

When you create the PAT you specify the validity period, for example two weeks, so you will need to have a process in place to renew the token.

SSH

You can logon to Git using SSH. Because keys are stored on your local machine, and on the Git server, you do not need to enter userid and password/PAT each time.

Git has excellent documentation on using ssh.

You need an SSH key. Check in directory ~/ssl, for files like id_….pub I have id_ed25519.pub and id_rsa.pub . If you do not have a key, follow the git documentation to create one.

Once I had my key I used the documentation on how to add it to Git.

Check you are using the ….pub file. It looks like

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA...XX/Xk colin@ColinNew

Add it using picture -> settings -> SSH and GPG keys ….

To use this you access Git via

git clone git@github.com/colinpaicemq/MQTools.git

If it doesn’t work as expected.

I got into a mess because I used

git clone https://github.com/colinpaicemq/MQTools.git

to clone the repository. When I tried to update the repository it asked me for userid and password!

You can change whether you use HTTPS or SSH to logon. For example to set SSH

git remote set-url origin git@github.com:colinpaicetest/testrepro.git

See the documentation.

Accessing SMF Real Time data.

The traditional way of processing SMF data (product statistics, and audit information), is to post-process SMF datasets. This might be done hourly, or daily, (or more frequently). This means there is a delay between the records being created, and being available for processing.

With SMF Real Time, you can connect an application program to SMF, and get records from the SMF buffers, as the records are created.

Configuring SMF

SMF needs to be in logstream mode. See Many is so last year – logstreams is the way to go.

You need to configure SMF to generate the records. See the SMFPRMxx parameter in parmlib.

I created an entry dynamically using

setsmf inmem(IFASMF.INMEM,RESSIZMAX(128M),TYPE(30,42))

Note: 128M is the smallest buffer size.

The IBM documentation Defining in-memory resources covers various topics.

Displaying information

The command

D SMF

gave me

IFA714I 11.46.50 SMF STATUS 101                
      LOGSTREAM NAME               BUFFERS        STATUS      
      A-IFASMF.DEFAULT                      0       CONNECTED   
      A-IFASMF.COLIN                        0       CONNECTED   
      A-IFASMF.INMEM                  4826066       IN-MEMORY

The command

 D SMF,M

Gave showed my Real time, in Memory resource in use

d smf,m                                                   
IFA714I 11.48.15 SMF STATUS 109                   
IN MEMORY CONNECTIONS                                                
  Resource: IFASMF.INMEM                                             
  Con#: 0001 Connect Time: 2026.019 10:07:20                         
  ASID: 004B                                                         
  Con#: 0002 Connect Time: 2026.019 11:48:10                         
  ASID: 0049

The Application Programming Interface.

The API is pretty easy to use. I based my C application on the IBM example.

I called my program from Python, so that was an extra challenge.

Query

You can issue the query API request. This returns the name of the INMEM definitions available to you, and the SMF record types in the definition.

Capture the data

You need to issue

connect, passing the name of the INMEM definition. It returns a 16 byte token. Once the connect has completed successfully, SMF will capture the data in a buffer.
get, passing the token. You can specify a flag saying blocking – so the thread waits until data is available. You do not get records from before the connect.
If there is too much data for your application to process – or your application is slow to process the data, SMF will wrap the data, and so lose records. The application will get return code IFAINMMissedData (Meaning: Records were skipped due to buffer re-use—that is, wrapping of the data in the in-memory resource. In this case, the output buffer might not contain a valid record.) You should reissue the get.
disconnect, passing the token. The disconnect can be done on a different thread. If so, it notifies any thread in a blocking get request, which gets a return code IFAINMGetForcedOut.

Problems

The problems I originally had were that my SMF was not running in log stream mode.
Once I set this up, I could get data back.

I set up INMEM record for SMF 30 records, and although I submitted some batch jobs, I did not get any SMF 30 records in my program.
If I logged off TSO, I got a record. If I issued tso omvs from ISPF I got records.

I added

SUBSYS(JES2)

to my SMFPRMLS member, and I got SMF 30 records for batch jobs.

I later changed this to be

SUBSYS(JES2,EXITS(IEFU29,IEFU83,IEFU84,IEFUJP,IEFUSO))

to be consistent wit the SUBSYS(STC… parameter)
I got SMF 30 records when logging on using SSH, from using TSO OMVS, or spawning a thread in OMVS to run in the background, for example ls &

It is curious that I do not have SUBSYS(TSO) defined – but I get entries for TSO usage.

It is OK, but…

The code works and generates records. One problem I have is how to stop my program running.

You could use a non blocking call, loop around getting records until you get no records available, then return, do an external wait, and then reloop. This puts the control in your application, but does use CPU as it loops periodically (every second perhaps) looking for records.

You could use a blocking call where the request waits until a record is available, or another thread issues the disconnect call. This means an extra programming challenge creating a thread for the blocking request to run off, and another thread to handle the disconnect request.

The first case, non blocking case, feels easier to code – but at the cost of higher CPU.

Getting SSH to work to z/OS

I have two versions of z/OS, old and new(!). I had problems getting ssh to work because of key problems.

The problem

I tried to update my laptop key to the server

ssh-copy-id colin@10.1.1.2

This gave

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
ERROR: Someone could be eavesdropping on you right now (man-in-the-middle attack)!
ERROR: It is also possible that a host key has just been changed.
ERROR: The fingerprint for the ED25519 key sent by the remote host is
ERROR: SHA256:2mUOVfdSedJVQIzZiGsRkOe9Vkc1bkyuDNp5H+VrZ98.
ERROR: Please contact your system administrator.
ERROR: Add correct host key in /home/colin/.ssh/known_hosts to get rid of this message.
ERROR: Offending ED25519 key in /home/colin/.ssh/known_hosts:1
ERROR:   remove with:
ERROR:   ssh-keygen -f '/home/colin/.ssh/known_hosts' -R '10.1.1.2'
ERROR: Host key for 10.1.1.2 has changed and you have requested strict checking.
ERROR: Host key verification failed.

Searching the internet I got suggestions saying “delete the old line from the file”. I didn’t want to do this because it meant I would not be able to go back to the old system and work as before.

Solutions

I edited /home/colin/.ssh/known_hosts and commented out line 1, with a # at the front (the :1 above is the first line). I repeated the command and it report the same message for line :2. I commented that out as well.

I got further

colin@ColinNew:~$ ssh-copy-id colin@10.1.1.2
The authenticity of host '10.1.1.2 (10.1.1.2)' can't be established.
ED25519 key fingerprint is SHA256:2mUOVfdSedJVQIzZiGsRkOe9Vkc1bkyuDNp5H+VrZ98.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 2 key(s) remain to be installed -- if you are prompted now it is to install the new keys
colin@10.1.1.2: Permission denied (publickey,hostbased).

I had to start the SYSLOGD on z/OS to capture the output from SSHD.

In the /var/logSSHD (your’s may be different) it said

FOTS2307 User COLIN from 10.1.0.2 not allowed because not listed in AllowUsers

In my SSHD config file /etc/ssh/sshd_config I had

# Allow specific user IDs 
AllowUsers IBMUSER

I added COLIN to the list and restarted SSHD. (I do not know how to refresh SSHD)

This time the error log had

trying public key file /u/tmp/zowet/colin/.ssh/authorized_keys 
Could not open authorized keys '/u/tmp/zowet/colin/.ssh/authorized_keys': ...

I fixed this, tried to logon, and this time it worked.

On Linux, I edited /home/colin/.ssh/known_hosts and un-commented the lines I had commented out before.
I tried the ssh command again, and it still worked!

Many is so last year – logstreams is the way to go.

I’ve been looking into the SMF Real Time, where an application program can get records directly from SMF, and not have to post-process SMF datasets or log streams. To use the real time support, SMF needs to use log streams.

What is SMF?

SMF is System Management Facility. z/OS and the subsystems can write data to SMF for post processing. Typical records are audit and accounting records from z/OS, RACF or CICS, changes to SMS, and changes to resources. Each product has one or more SMF record-type numbers allocated to it. Within each SMF record type you can have sub-types, for example the z/OS SMF 30 record has a sub-type for job start, another sub-type for job step end, and another sub-type for job end.

Display SMF options

The command

d smf

gave

   NAME                VOLSER SIZE(BLKS) %FULL  STATUS    
 P-SYS1.S0W1.MAN1      B3SYS1      7200     0  ALTERNATE  
 S-SYS1.S0W1.MAN3      USER04     72000     1  ACTIVE

showing the dataset are being used, and giving information about the datasets

The command

d smf,o

displays all of the SMF options, and where they came from – for example a parmlib member, or from the SETSMF command.

IEE967I 08.44.41 SMF PARAMETERS 489                
        MEMBER = SMFPRM00   
        ...   
        SYNCVAL(00) -- DEFAULT                                      
        DUMPABND(RETRY) -- DEFAULT                                  
        INMEM(IFASMF.COLIN,TYPE(30,42),RESSIZMAX(0128M)) -- PARMLIB 
        SUBSYS(STC,NOTYPE(14:19,62:69,99)) -- SYS   
        ...
        STATUS(010000) -- PARMLIB            
        INTVAL(01) -- PARMLIB                
        MAXDORM(0001) -- PARMLIB             
        REC(PERM) -- PARMLIB                 
        NOPROMPT -- PARMLIB                  
        DSNAME(SYS1.S0W1.MAN3) -- PARMLIB    
        DSNAME(SYS1.S0W1.MAN1) -- PARMLIB    
        ACTIVE -- PARMLIB

The old way of recording SMF data

SMF had set of datasets it would use in turn. Typically these were named like SYS1.MANX, SYS1.MANY, or SYS1.PROD.MAN2 etc.. When the active dataset filled up, SMF would switch to the next empty dataset. You (or automation) then runs a job to either copy the records to another dataset, or post process the records; and then clear the dataset for reuse.

As computers got bigger, more work was done, more records were written and writing records to disk could not keep up.

Logstreams is the way forward.

A log stream is a stream of data which can be written to a Coupling Facility(CF) structure, or to a dataset on disk. Typically writing to a CF is faster than writing to disk.

With MANx datasets, all records were written to one dataset. With logstreams, you can configure SMF have multiple logstreams and you configure which record type(s) go to which log stream. This means you can have CICS records going to the “CICS log stream”, and RACF records going to the “RACF logstream”, and the remainder going to a default log stream.

Having multiple logstreams means data can be written to many log streams concurrently, and so avoids the bottleneck of writing to a MANx dataset.

Setting up security profiles

It took me several attempts to configure the security profiles.

Be able to define and delete logstreams

//IBMUSER1 JOB   1,MSGCLASS=H 
//KEYCERTS EXEC PGM=IKJEFT01 
//SYSPRINT DD SYSOUT=* 
//SYSTSPRT DD SYSOUT=* 
//SYSTSIN  DD * 
 RDEFINE FACILITY RESOURCE(MVSADMIN.LOGR) UACC(NONE) 
 permit  MVSADMIN.LOGR class(FACILITY)   - 
                    access(control) ID(SYS1) 
 setr raclist(facility) refresh

Define individual logstreams

RDEFINE LOGSTRM IFASMF.** UACC(NONE) 
PERMIT IFASMF.** class(LOGSTRM ) - 
                    access(ALTER  ) ID(SYS1) 
setr raclist(logstrm ) refresh

Giving SMF access to the logstreams

RDEFINE FACILITY IFA.IFASMF.* UACC(READ)
setr raclist(facility) refresh

Setting up logstreams

You need to set up at least one log stream. It is easy to define more and change the SMF configuation.

I used the define logstream command

//IBMLOG JOB 1,MSGCLASS=H 
//LOGDEF EXEC PGM=IXCMIAPU,REGION=4M 
//SYSPRINT DD SYSOUT=* 
//SYSIN DD * 
DATA TYPE(LOGR) REPORT(YES) 
                                                                       
DELETE LOGSTREAM NAME(IFASMF.DEFAULT) 
DEFINE LOGSTREAM NAME(IFASMF.DEFAULT) 
                   DESCRIPTION(SMF_LOGSTREAM) 
                   MODEL(NO) 
                   DASDONLY(YES)         
                   STG_SIZE(65532) 
                   LS_SIZE(15000) 
                   HLQ(IXGLOGR) 
                   HIGHOFFLOAD(80) 
                   LOWOFFLOAD(0) 
                   AUTODELETE(YES)           /* DELETE OPTION */ 
                   OFFLOADRECALL(NO) 
                   MAXBUFSIZE(65532) 
                   DIAG(NO) 
                   RETPD(1)                  /* DELETE 1 DAYS */ 
//

I also define a log stream IFASMF.COLIN

With the HLQ(IXGLOGR) definition, behind the logstreams were data sets like

Dataset                              Volume  
IXGLOGR.IFASMF.COLIN.ADCDPL          *VSAM*
IXGLOGR.IFASMF.COLIN.ADCDPL.DATA     USER05
IXGLOGR.IFASMF.COLIN.A0000000        *VSAM*
IXGLOGR.IFASMF.COLIN.A0000000.DATA   USER04

Configure SMF

I created a member SMFPRMLS in a user.parmlib

ACTIVE                          /* ACTIVE SMF RECORDING             */ 
DSNAME(SYS1.&SYSNAME..MAN1, 
       SYS1.&SYSNAME..MAN3) 
RECORDING(LOGSTREAM) 
NOPROMPT                        /* DO NOT PROMPT OPERATOR           */ 
REC(PERM)                       /* TYPE 17 PERM RECORDS ONLY        */ 
MAXDORM(0001)                   /* WRITE IDLE BUFFER AFTER 1 SEC    */ 
INTVAL(01)                      /* EVEY MINUTE                      */ 
STATUS(010000)                  /* WRITE SMF STATS AFTER 1 HOUR     */ 
JWT(0400)                       /* 522 AFTER 30 MINUTES             */ 
SID(&SYSNAME(1:4)) 
LISTDSN                         /* LIST DATA SET STATUS AT IPL      */ 
DEFAULTLSNAME(IFASMF.DEFAULT) 
LSNAME(IFASMF.COLIN,TYPE(30,42)) 
AUTHSETSMF 
SYS(NOTYPE(14:19,62:69,99),EXITS(IEFU83,IEFU84,IEFACTRT, 
              IEFUSI,IEFUJI,IEFU29),NOINTERVAL,NODETAIL) 
SUBSYS(STC,EXITS(IEFU29,IEFU83,IEFU84,IEFUJP,IEFUSO)) 
INMEM(IFASMF.COLI2,RESSIZMAX(128M),TYPE(30,42))

I activated it using the command

t smf=ls

When this failed, because my log stream definitions were not correct, the SMF collection defaulted to using the specified SYS1.MANx datasets.
The important bits of the SMFPRMxx file are

RECORDING(LOGSTREAM) – use logstreams rather than datasets
LSNAME(IFASMF.COLIN,TYPE(30,42)) for record types 30 and 42 write them to this log stream
DEFAULTLSNAME(IFASMF.DEFAULT) If there is no LSNAME for a record type – then write them to this log stream

You can issue setsmf commands to override the existing definition.

Processing SMF records

For SMF datasets

For the Use JCL like

// SET SMFPDS=SYS1.S0W1.MAN1                
// SET SMFSDS=SYS1.S0W1.MAN3                
//SMFDUMP  EXEC PGM=IFASMFDP 
//DUMPINA  DD   DSN=&SMFPDS,DISP=SHR,AMP=('BUFSP=65536') 
//DUMPINB  DD   DSN=&SMFSDS,DISP=SHR,AMP=('BUFSP=65536') 
//DUMPOUT  DD   DISP=(NEW,CATLG),DSN=&RMF,SPACE=(CYL,(10,10)) 
//*             DCB=(LRECL=32760,RECFM=VBS) 
//*            DCB=(BLKSIZE=0,LRECL=32760,RECFM=VBS) 
//*UMPOUT  DD   DISP=SHR,DSN=IBMUSER.RMF,SPACE=(CYL,(1,1)) 
//SYSPRINT DD   SYSOUT=* 
//SYSIN  DD * 
  INDD(DUMPINA,OPTIONS(DUMP)) 
  INDD(DUMPINB,OPTIONS(DUMP)) 
  OUTDD(DUMPOUT,TYPE(42,80,30)) 
  RELATIVEDATE(BYDAY,0,1)
  START(0000) 
  END(2300) 
/*

This processes records within the specified time range in the datasets.

For log streams

Use JCL like the following – using PGM=IFASMFDL

//IBMSMFL  JOB 1,MSGCLASS=H 
//* DUMP THE SMF DATASETS 
// SET SMF=IBMUSER.SMF 
//* 
//S1 EXEC PGM=IEFBR14 
//DUMPOUT  DD   DISP=(MOD,DELETE),DSN=&SMF,SPACE=(CYL,(1,1)) 
//* 
//SMFDUMP   EXEC PGM=IFASMFDL,REGION=0M 
//DUMPOUT  DD   DISP=(NEW,CATLG),DSN=&SMF,SPACE=(CYL,(10,10)) 
//SYSPRINT DD   SYSOUT=* 
//SYSIN  DD * 
  LSNAME(IFASMF.COLIN,OPTIONS(DUMP)) 
  OUTDD(DUMPOUT,TYPE(30)) 
  RELATIVEDATE(BYDAY,0,1)
  START(0000) 
  END(2300) 
/* 
//

When you specify a date range, it will read not only the active log stream datasets, but any archive ones it created, and which are available.

Display SMF

With logstream the D SMF command gave

   LOGSTREAM NAME               BUFFERS        STATUS            
 A-IFASMF.DEFAULT                    774       CONNECTED         
 A-IFASMF.COLIN                      584       CONNECTED         
 A-IFASMF.INMEM                        0       IN-MEMORY

Dumping SMF data – last n day’s worth

For many years, I’ve been processing SMF data, and using the date option like DATE(2026012,2027000). Every day, I had to change it to match today’s date, and submit the job.

I’ve just discovered you can give relative dates. For example RELATIVEDATE(BYDAY,0,1), which says go back 0 days and includes 1 day – so just do today.

The output listing has, for today’s date day 19 of 2026:

IFA834I RELATIVEDATE PARAMETER RESULTS IN START DATE 2026.019, END         
                DATE 2026.019                                                
IFA836I RELATIVEDATE RANGE EXTENDS INTO FUTURE, END DATE AND TIME USED       
                IS 2026.019 11:29

You can specify BYDAY, BYWEEK, and BYMONTH.

This function has been around for years! I wonder how much time I’ve wasted on doing it the old way.

Pandas 102, notes on using pandas

Pandas is a great tool for displaying data from Python. You give it arrays of data, and it can display, summarise, group, print and plot. It is used for the simplest data, up to data analysts processing megabytes of data.

There is a lot of good information about getting started with Pandas, and how you can do advanced things with Pandas. I did the Pandas 101 level of reading, I struggled with the next step, so my notes for the 102 level of reading are below. Knowing that something can be done means you can go and look for it. If you look but cannot find, it may be that you are using the wrong search arguments, or there is no information on it.

Working with data
Feeding data into Pandas
What can you do with it?
Which columns are displayed ?
Select which rows you want displayed
You can select rows and columns
- What does .loc do?
You can process the data, such as sort
Saving data
- Export the data
- Import and do the analysis
Processing multiple data sources as one
Processing data within fields
Basic operations on columns
Doing aggregation, count, sum, maximum, minimum etc.
- Simple aggregation .sum()
- More complex aggregation .agg()
Updating some values
Display all fields for one row over multiple lines

Working with data

I’ve been working with “flat files” on z/OS. For example the output of DCOLLECT which is information about dataset etc from SMS.

One lesson I learned was you should isolate the extraction from the processing (except for trivial amounts of data). Extracting data from flat files can be expensive, and take a long time, for example it may include conversion from EBCDIC to ASCII. It is better to capture the data from the flat file in python variables, then write the data to disk using JSON, or pickle (Python object serialisation). As a separate step read the data into memory from your saved file, then do your data processing work, with pandas, or other tools.

Feeding data into Pandas

The work I’ve done has been two dimensional, rows and columns; you can have multi dimensional.

You can use a list of dictionaries(dicts), or dict of list:

# a dict of lists
datad = {"dsn":["ABC","DEF"],
         "volser":["SYSRES","USER02"]}
pdd =  pd.DataFrame(datal)  

# a list of dicts
datal = [{"dsn":"ABC","volser":"SYSRES"},
         {"dsn":"DEF","volser":"USER02S"},
        ]   

pdl = pd.DataFrame.from_records(datal)

Processing data like pdd = pd.DataFrame(datal) creates a pandas data frame. You take actions on this data frame. You can create other data frames from an original data fram, for example with a subset of the rows and columns.

I was processing a large dataset of data, and found it easiest to create a dict for each row of data, and then accumulate each row as a list. Before I used Pandas, I had just printed out each row. I do not know which performs better. Someone else used a dict of lists, and appended each row’s data to the “dsn” or “volser” list.

What can you do with it?

The first thing is to print it. Once the data is in Pandas you can use either of pdd or pdl above.

print(pdd)

gave

   dsn   volser
0  ABC   SYSRES
1  DEF  USER02S

Where the 0, 1 are the row numbers of the data.

With my real data I got

                                             DSN  ... AllocSpace
0   SYS1.VVDS.VA4RES1                             ...       1660
1   SYS1.VTOCIX.A4RES1                            ...        830
2   CBC.SCCNCMP                                   ...     241043
3   CBC.SCLBDLL                                   ...        885
4   CBC.SCLBDLL2                                  ...        996
..                                           ...  ...        ...
93  SYS1.SERBLPA                                  ...        498
94  SYS1.SERBMENU                                 ...        277
95  SYS1.SERBPENU                                 ...      17652
96  SYS1.SERBT                                    ...        885
97  SYS1.SERBTENU                                 ...        332

[98 rows x 7 columns]

The data was formatted to match my window size. With a larger window I got more columns.

You can change this by using

			
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

Which columns are displayed ?

Rather than all of the columns being displayed you can select which columns are displayed.

You can tell from the data you passed to pandas, or use the command

print(list(opd.columns.values))

This displays the values of the column names, as a list.

To display the columns you specify use

print(opd[["DSN","VOLSER","ExpDate","CrDate","LastRef","63bit alloc space KB", "AllocSpace"]])

You can say display all but the specified columns

print(opd.loc[:, ~opd.columns.isin(["ExpDate","CrDate","LastRef"])])

Select which rows you want displayed

print(opd.loc[opd["VOLSER"].str.startswith("A4"),["DSN","VOLSER",]])

print(opd.loc[opd["DSN"].str.startswith("SYS1."),["DSN","VOLSER",]])

gave

                                             DSN  VOLSER  
0   SYS1.VVDS.VA4RES1                             A4RES1  
1   SYS1.VTOCIX.A4RES1                            A4RES1  
12  SYS1.ADFMAC1                                  A4RES1  
13  SYS1.CBRDBRM                                  A4RES1  
14  SYS1.CMDLIB                                   A4RES1  
..                                           ...     ...  
93  SYS1.SERBLPA                                  A4RES1  
94  SYS1.SERBMENU                                 A4RES1  
95  SYS1.SERBPENU                                 A4RES1  
96  SYS1.SERBT                                    A4RES1  
97  SYS1.SERBTENU                                 A4RES1  

[88 rows x 2 columns]

From this we can see 88 (out of 97) rows were displayed. Row 0, 1 , 12, 13, but not rows 2, 3, …

What does .loc do?

My interpretation of this (which I haven’t seen documented)

If there is one parameter, this is a list of the columns you want.

If there are two parameters, the second is the list of the columns you want displayed. The first column is conceptually a list of True or False, with one value per row, saying if the row should be selected or not. So for

print(opd.loc[opd["VOLSER"].str.startswith("A4"),["DSN","VOLSER",]])

opd[“VOLSER”].str.startswith(“A4”)

says take the column called VOLSER, convert it to a string. If it starts with the string “A4” then return True, else return False. This returns a list of one entry per row.
The opd.loc[opd[“VOLSER”].str.startswith(“A4”),…) then selects the rows.

You can select rows and columns

print(opd.loc[opd["VOLSER"].str.startswith("A4"),["DSN","VOLSER","63bit alloc space KB",]])

You can process the data, such as sort

The following statements extracts columns from the original data, sorts the data, and creates a new data frame. The new data frame is printed.

sdata= opd[["DSN","VOLSER","63bit alloc space KB",]].sort_values(by=["63bit alloc space KB","DSN"], ascending=False)
print(sdata)

This gave

                                             DSN  VOLSER  63bit alloc space KB
2   CBC.SCCNCMP                                   A4RES1                241043
35  SYS1.MACLIB                                   A4RES1                210664
36  SYS1.LINKLIB                                  A4RES1                166008
90  SYS1.SEEQINST                                 A4RES1                103534
42  SYS1.SAMPLIB                                  A4RES1                 82617
..                                           ...     ...                   ...
62  SYS1.SBPXTENU                                 A4RES1                    55
51  SYS1.SBDTMSG                                  A4RES1                    55
45  SYS1.SBDTCMD                                  A4RES1                    55
12  SYS1.ADFMAC1                                  A4RES1                    55
6   FFST.SEPWMOD3                                 A4RES1                    55

[98 rows x 3 columns]

Showing that all the rows, and all the (three) columns which had been copied to the sdata data frame.

You can use

>>> df.sort_values(
...     by=["FirstSortCol", "SecondSortCol],
...     ascending=[True, False]
...     )[["datacol", "FirstSortCol", "moreData8","SecondSortCol"]]

Renaming columns

pf2 = pdSubset.rename({'DSN': 'Dataset Name', '31AllocInKB': 'SpaceInKB'}, axis='columns')

This renames the column DSN to Dataset Name, and 31AllocInKB to SpaceIn KB, and leaves the remaining columns unchanged.

Saving data

Reading an external file and processing the data into Python arrarys took an order of magnitude longer than processing it Pandas.

You should consider a two step approach to looking at data

Extract the data and exported it in an access format, such as Pickle or JSON. While getting this part working, use only a few rows of data. Once it works, you can process all of the data.
Do the analysis using the exported data.

Export the data

You should consider externalising the data in JSON or pickles format for example

# write out the data to a file
fPickle = open('pickledata', 'wb')    
    # source, destination
pickle.dump(opd, fPickle)                    
fPickle.close()

Import and do the analysis


# and read it in
fPickle = open('pickledata', 'rb')    
opd = pickle.load(fPickle)
fPickle.close()
print(odp)

Processing multiple data sources as one

If you have multiple sets of data, for example for Monday, Tuesday, Wednesday, etc you can use

week =  pd.concat(monday,tuesday,wednesday,thursday,friday)

Processing data within fields

Within my data, I have a field with information like

                   DSN  VOLSER           FormatType
0    SYS1.VVDS.VC4RES1  C4RES1                   []
1   SYS1.VTOCIX.C4RES1  C4RES1              [Fixed]
2          CBC.SCCNCMP  C4RES1    [Fixed, Variable]
3          CBC.SCLBDLL  C4RES1    [Fixed, Variable]
4         CBC.SCLBDLL2  C4RES1    [Fixed, Variable]

Where the data under FormatType is a list. You can reference elements in a list.

For example

x =  data.FormatType.apply(lambda x: 'Variable' in x)
print(x)

gives

0     False
1     False
2      True
3      True
4      True

The command

print(data.loc[ data.FormatType.apply(lambda x: 'Blocked' in x)])

gives

              DSN  VOLSER           FormatType
2     CBC.SCCNCMP  C4RES1    [Fixed, Variable]
3     CBC.SCLBDLL  C4RES1    [Fixed, Variable]
4    CBC.SCLBDLL2  C4RES1    [Fixed, Variable]

Basic operations on columns

You can do basic operations on columns such as

print(dataset[["CountIO","CacheHits"]].sum())

The sum() (and count() etc) functions add up the specified columns.

This gave

[361 rows x 10 columns]
CountIO      74667.0
CacheHits     1731.0
dtype: float64

An operation like

print(dataset.sum())

Would have totalled all the columns, including some which are meaningless, for example, maximum value found.

Doing aggregation, count, sum, maximum, minimum etc.

Simple aggregation

You can aggregate data

# Extract just the fields of interest
d = dataset[["DSN","CountIO","CacheHits"]]
print(d.groupby("DSN").sum())

Gave

                                        CountIO  CacheHits
DSN                                                       
ADCD.Z31B.PARMLIB                          68.0       60.0
ADCD.Z31B.PROCLIB                          66.0       66.0
ADCD.Z31B.VTAMLST                         141.0      141.0
COLIN.TCPPARMS                              4.0        4.0
FEU.Z31B.PARMLIB                            4.0        0.0
IXGLOGR.ATR.S0W1.RM.DATA.A0000000.DATA      4.0        0.0
SYS1.DAE                                    0.0        0.0
SYS1.DBBLIB                               974.0      932.0

More complex aggregation

The .agg() gives you much more control as to what, and how you want to process data.

print(d.groupby("DSN").agg({'DSN' : ['count'], 'CountIO' : ['sum','max'],"CacheHits": ["sum"]}))

gave

                                         DSN  CountIO          CacheHits
                                       count      sum      max       sum
DSN                                                                     
ADCD.Z31B.PARMLIB                         19     68.0      7.0      60.0
ADCD.Z31B.PROCLIB                         30     66.0      3.0      66.0
ADCD.Z31B.VTAMLST                          6    141.0     41.0     141.0
COLIN.TCPPARMS                             2      4.0      3.0       4.0
FEU.Z31B.PARMLIB                           1      4.0      4.0       0.0
IXGLOGR.ATR.S0W1.RM.DATA.A0000000.DATA     4      4.0      1.0       0.0
SYS1.DAE                                   1      0.0      NaN       0.0
SYS1.DBBLIB                                2    974.0    932.0     932.0

Notes:

The columns are not in the order I specified. It is hard to see which field Max applies to
There is a Not a Number (Nan) in one of the value. You need to allow for this.
In the simple case using .sum() by default it tries to sum all of the columns. Using .agg you can specify which columns you want to process

Updating some values

I want to update values in one column depending on other columns.

For example for small files, make the space 9999

for i, row in pf2.iterrows():
    if row["SpaceInKB"] < 1000:
        pf2.at[i,"SpaceInKB"] = 9999

Display all fields for one row over multiple lines

print(pfz.iloc[1])

gave me

Dataset Name          CATALOG.VS01.CICS63                         
VolSer                                                      CICS63
SpaceInKB                                                     9960
EncryptionKeyLabel                                            NULL
EncrytionType                                               0xffff
Name: 1, dtype: object

Processing a dataset is easy in Python.

I’ve been doing more with Python on z/OS, and have spent time using datasets. With the pyzfile package this is very easy! (Before this you had to copy a data set to a file in Unix Services).

You can do all the things you normally expect to do: open, close, read, write, locate, info etc.

from pyzfile import * 
def readfile(): 
   try: 
        with ZFile("//'COLIN.DCOLLECT.OUT'", "rb,type=record,noseek") as file: 
            for kw,value in file.info().items():            
               print(kw,":",value)
            for rec in file: 
               yield  rec 
    except ZFileError as e: 
        print(e,file=sys.stderr) 
                                                                                           
def reader(nrecords): 
    nth = 0                                                                            
    for line in readfile(): 
        nth += 1 
        if  nth > nrecords: 
            break 
        if nth % 100 == 99:
            print("Progress:",nth+1,file=sys.stderr)
        ## Do something..

It gave

access_method : QSAM
blksize : 27998
device : DISK
dsname : COLIN.DCOLLECT.OUT
dsorgConcat : False
dsorgHFS : False
dsorgHiper : False
dsorgMem : False
dsorgPDE : False
dsorgPDSdir : False
dsorgPDSmem : False
dsorgPO : False
dsorgPS : True
dsorgTemp : False
dsorgVSAM : False
maxreclen : 1020
mode : {'append': False, 'read': True, 'update': False, 'write': False}
noseek_to_seek : NOSWITCH
openmode : RECORD
recfmASA : False
recfmB : True
recfmBlk : True
recfmF : False
recfmM : False
recfmS : False
recfmU : False
recfmV : True
vsamRKP : 0
vsamRLS : NORLS
vsamkeylen : 0
vsamtype : NOTVSAM
Progress: 100
...

Where the fields are taken from fldata().

Great stuff!

Of course once you’ve got a record, doing something with it may be harder, because , for example Python is ASCII, and you’ll need to convert any character strings from EBCDIC to ASCII.