What’s going on in my program in Unix Services?

On Linux, starting a Python program is subsecond. On z/OS, running on zD&T (so emulated hardware) it takes about 2 seconds. I wondered if what was causing this – is my ZFS file system slow?

I used the Unix command

bpxtrace -o /tmp/trace -f format -c  python ac.py

to capture a trace.

This produced output like

       PID ASID TCB    Local time      System call           Additional trace
- - - - - - - - - - - - - - - - - - - - - - - - - -
65589 0049 8FB2F8 08:27:13.722080 Call open parms: 0000004D
65589 0049 8FB2F8 08:27:13.722714 Exit open rv=00000004
65589 0049 8FB2F8 08:27:13.722741 Call fstat parms: 00000004
65589 0049 8FB2F8 08:27:13.722785 Exit fstat rv=00000000
65589 0049 8FB2F8 08:27:13.722817 Call lseek parms: 00000004
65589 0049 8FB2F8 08:27:13.722824 Exit lseek rv=00000000
65589 0049 8FB2F8 08:27:13.722836 Call lseek parms: 00000004
65589 0049 8FB2F8 08:27:13.722883 Exit lseek rv=00000000
65589 0049 8FB2F8 08:27:13.722896 Call lseek parms: 00000004
65589 0049 8FB2F8 08:27:13.722901 Exit lseek rv=00000000

This is ok, but I want to know how long the calls took.

I wrote an ISPF Rexx script

/* REXX */ 
/*
exec to extract the alias records from a catalog listing
*/
ADDRESS ISPEXEC
'ISREDIT MACRO (lines) '
"ISREDIT (f) = LINENUM .ZLAST"
sum = 0
do j = 3 by 2 to f
k = j + 1
"ISREDIT ( bef) = LINE " j
"ISREDIT ( aft) = LINE " k
parse var bef . 27 mm 29 . 30 ss 39 . 40 what
parse var aft . 27 am 29 . 30 as 39
before = 60 * mm + ss
after = 60 * am + as
sum = sum + (after - before)
delta = format(after - before,1,6)
string = "== "delta what
"ISREDIT LINE (k) = (string)"

end
say sum
exit

running the Rexx script produced output in the file like

     65589 0049 8FB2F8 08:27:13.722080 Call open      
== 0.000630 Call open
65589 0049 8FB2F8 08:27:13.722741 Call fstat
== 0.000050 Call fstat
65589 0049 8FB2F8 08:27:13.722817 Call lseek
== 0.000000 Call lseek

Using the ISPF commands

  • X all
  • f “==” all
  • del all x
  • sort 1 11

This gave

== 0.000000 Call lseek 
== 0.000000 Call lseek
...
0.000820 Call close parms: 00000005 00000000 00000000 05FC0119
0.001450 Call cond_timed_wait parms: 00000000 000F4240 00000001 00000000 20861040
0.002450 Call loadhfs parms: 00000048 /u/tmp/zowet/colin/envz/lib/python3...
0.004200 Call loadhfs parms: 00000008 CELQDCPP 00000000 0000010C 61E9F3F1
0.004310 Call loadhfs parms: 00000007 CXXRT64 00000000 0000010C 61E9F3F1
0.034780 Call mvsprocclp parms: 00000100 00000000 00000000 00000000
0.042970 Call mvsprocclp parms: 00000100 2081D1D8 2086CC75 00000000

There were nearly 500 lines in the output file. 400 entries were 100 microseconds or less. There were 6 entries taking longer than 1 millisecond.
My trace file was of duration 1.1 seconds. Adding up the individual times took 0.13 seconds, so it looks like the delays are not caused by the file system, and I need to look else where.

From data to reports missing the potholes

I’ve been doing work with datasets on z/OS to produce reports. These range from SMF data to DCOLLECT data on datasets and SMS data.

It took a while to “get it right”, because I made some poor decisions as to how to process the data, and some of my processing was much more complex than it needed to be. It was easiest to start again!

I’ve been working with Python and Python tools, and other tools available on the platforms. See Pandas 102, notes on using pandas.

My current environment is to use some Python code to read a record, parse the record into a dictionary(dict), then add the dict to a list of records. Then either pass the list of dicts to Pandas to display, or to externalise the data, and have a second Python program to read the externalised data, and do the Pandas processing.

Reading the data

The data is usually in data sets rather than files in Unix Services. You can copy a dataset to a file, but it is easier to use the python package pyzfile to read datasets directly.

from pyzfile import * 

try:
with ZFile("//'COLIN.DCOLLECT.OUT'", "rb,type=record,noseek") as file:
for rec in file:
#l = len(rec)
yield rec
except ZFileError as e:
print(e,file=sys.stderr)

Often a data source will contain a mixture of record types, for example a dump of SMF datasets, may contain many different record types and subtypes.

You need to consider if you want to process all record types in one pass, or process one record type in one run, and a different record type in a different run.

Processing the data

You will normally have a mapping of the layout of the data in a record. Often there are a mix of records types, you need to decide which record types you process and which you ignore.

Field names

Some of the field names in a record are cryptic, they were created when field names could only be 8 characters or less. For example DCDDSNAM. This stands for DCollect records record type D, field name DS NAMe. You need to decide what you name the field. Do you name it DCDDSNAM, and tell the reader to go and look in the documentation to understand the field names in the report, or do you try to add value and just call it DSN, or DataSetName. You cannot guess some fields, such as DCDVSAMI. This is VSAM Inconsistency.

You also need to consider the printed report. If you have a one character field in the record, and a field name which is 20 characters long – by the default the printed field will be 20 characters long, and so waste space in the report. If the field is rarely used you could call it BF1 for Boring Field 1.

Character strings

Python works in ASCII, and strings need to be in ASCII to be printable. You will need to convert character data from EBCIDC to ASCII.

You can use substring to extract data from a record for example. So to extract a string, and convert it…

DSN =  record[20:63].decode('cp500').strip())

Integers

Integers – you will need to covert this to internal format. I found using the Python Struct very good to use. You give a string of conversion characters (integer, integer, …) and it returns an array of the data. If you are processing the data on a different platform, you may need to worry about big end and little end conversion of numbers.

Strange integers

Some records have units like hundredths of a second. You may want to convert these to float

float_value = float(input_value)/100

Packed numbers

Packed numbers are a representation of a date in “decimal” format. For example a yyyyddd for year 2025, day 5 is 0X2025005F” where the F is a sign digit. You cannot just print it (it comes out as 539295839).

Bit masks

Bit masks are very common, for example there is a 1 byte field DCVFLAG1 with values:

  • DCVUSPVT 0x20 Private
  • DCVUSPUB 0x10 Public
  • DCVUSSTG 0x08 Storage
  • DCVSHRDS 0X04 Device is sharable

If the value of the field is 0x14 – what do you return? I would create a field Flag1 with value of a list[“Public”,”Shareable”]. If all the bits were off, this would return an empty list []. It would be easy to create [“DCVUSPUB”,”DCVSHRDS”] or just display the hex value 14 (or 0x14) – but this makes it hard to interpret the data for the people reading the reports.

Triplets

SMF records contain triplets. These are defined by [offset to start, length of data, count of data] within the record.

For example in the SMF30 record there are many triplet sections. There is one for “usage data” involved in usage based pricing. There can be zero or more sections like

  • Product owner
  • Product name
  • TCB time used in hundredths of a second

How are you going to process this?
The SMF record has 3 fields for usage

  • SMF30UDO Offset to Usage Data section in SMF 30 record
  • SMF30UDL Length of each Usage Data section in SMF 30 record
  • SMF30UDN Number of Usage Data section in SMF 30 record

I would create a variable UsageData = [{“ProdOwner”: …,”ProdName”: …, “TCBTime”: …},{“ProdOwner”: …,”ProdName”: …, “TCBTime”: …},]

and convert TCBTime from an integer representing a hundreds of a second, to a floating point number.

Having these triplets make a challenge when printing the record. You could decide to

  • omit this data
  • summarise the data – and provide only a sum of the TCBTime value
  • give the data as a list of dicts, then have a Pandas step to copy only the fields you need for your reports.

For this usage data, I may want a report showing which jobs used which product, and how my much CPU the job used in that product. Although I may capture the data as a list of products, i could extract the data, and create another data record with

  • jobname1, product1, … CPU used1
  • jobname1, product2, … CPU used2
  • jobname2, product1, … CPU used1
  • jobname2, product3, … CPU used3

and remove the product data from the original data record.

Do you want all of the fields?

You may want to ignore fields, such as reserved values, length of record values, record_ type, and any fields you are not interested in. Record length tends to be the first field, and this is usually not interesting when generating default reports.

How to handle a different length record?

The format of many records change with new releases, typically adding new fields.

You need to be able to handle records from the previous release, where the record is shorter. For example do not add these fields to your dict, or give add them with a “None” value.

Now I’ve got a record – now what?

Once you have got your record, and created a dict from the contents {fieldname1=value, fieldname2=value2…} , you could just add it to the list to be passed to Pandas. It is not always that simple.

I found that some records need post processing before saving.

Calculations

For a DCOLLECT record, there is a field which says

DCVFRESP: Free Space on Volume (in KB when DCVCYLMG is set to 0 or in MB when DCVCLYMG is set to 1)

You need to check bit DCVCYLMG and have logic like

if DCVCYLMG  == 1:
data["FreeSpVolKB"] = data["FreeSpVolKB"] * 1024

Adding or deleting fields

For some fields I did some calculations to simplify the processing. For example I wanted average time when I had total_time, and count.

I created average_time = total_time / count, added this field, and deleted total_time and count fields.

Error handling

I found some records have an error flag, for example “Error calculating volume capacity”. You need to decide what to do.

  • Do you include them, and have the risk, that the calculations/display of volume capacity might be wrong?
  • Do you report record during the collection stage, and not include them in the overall data?

How you accumulate the data, dicts or lists?

When using Pandas you can build each record as a dict of values {“kw1″:”v1″,”kw2″:”v2”}, then build a list of dicts [{}, {}…]

or have “column” have a list of values {“Jobname”: [“job1″,”job2″…],”CPUUsed”:[99,101…] … }. As you process each field you append it to the appropriate “column” field.

# a dict of lists
datad = {"dsn":["ABC","DEF"],
"volser":["SYSRES","USER02"]}
datad["dsn"].append("GHI")
datad["volser"].append("OLDRES")

pdd = pd.DataFrame(datal)


# a list of dicts
dictl = [
{"dsn":"ABC","volser":"SYSRES"},
{"dsn":"DEF","volser":"USER02S"}]
newdict = {"dsn":"GHI","volser":"OLDRES"}

dictl.append(newdict)

pdl = pd.DataFrame.from_records(datal)

I think it is better to capture your data in a dict, then add the dict to the list of records.

For example with

DCVFRESP: Free Space on Volume (in KB when DCVCYLMG is set to 0 or in MB when DCVCLYMG is set to 1)

If you use a dict to collect the data, you can then easily massage the values, before adding the dict to the list.

if DCVCYLMG  == 1:
  data["FreeSpVolKB"] = data["FreeSpVolKB"] * 1024

grand_data.append[data]

If you try to do this using “column” values it gets really messy trying to do a similar calculation.

Using the data

It took a long time to process the dataset and create the Python data. I found it quicker overall to process the dataset once, and externalise the data using Pickle, or JSON. Then have different Python programs which read the data in and processed it. For example

  • Creating a new data structure using just the columns I was interested in.
  • Filtering which rows I wanted.
  • Save it

Pandas 102, notes on using pandas

Pandas is a great tool for displaying data from Python. You give it arrays of data, and it can display, summarise, group, print and plot. It is used for the simplest data, up to data analysts processing megabytes of data.

There is a lot of good information about getting started with Pandas, and how you can do advanced things with Pandas. I did the Pandas 101 level of reading, I struggled with the next step, so my notes for the 102 level of reading are below. Knowing that something can be done means you can go and look for it. If you look but cannot find, it may be that you are using the wrong search arguments, or there is no information on it.

Working with data

I’ve been working with “flat files” on z/OS. For example the output of DCOLLECT which is information about dataset etc from SMS.

One lesson I learned was you should isolate the extraction from the processing (except for trivial amounts of data). Extracting data from flat files can be expensive, and take a long time, for example it may include conversion from EBCDIC to ASCII. It is better to capture the data from the flat file in python variables, then write the data to disk using JSON, or pickle (Python object serialisation). As a separate step read the data into memory from your saved file, then do your data processing work, with pandas, or other tools.

Feeding data into Pandas

The work I’ve done has been two dimensional, rows and columns; you can have multi dimensional.

You can use a list of dictionaries(dicts), or dict of list:

# a dict of lists
datad = {"dsn":["ABC","DEF"],
"volser":["SYSRES","USER02"]}
pdd = pd.DataFrame(datal)

# a list of dicts
datal = [{"dsn":"ABC","volser":"SYSRES"},
{"dsn":"DEF","volser":"USER02S"},
]

pdl = pd.DataFrame.from_records(datal)

Processing data like pdd = pd.DataFrame(datal) creates a pandas data frame. You take actions on this data frame. You can create other data frames from an original data fram, for example with a subset of the rows and columns.

I was processing a large dataset of data, and found it easiest to create a dict for each row of data, and then accumulate each row as a list. Before I used Pandas, I had just printed out each row. I do not know which performs better. Someone else used a dict of lists, and appended each row’s data to the “dsn” or “volser” list.

What can you do with it?

The first thing is to print it. Once the data is in Pandas you can use either of pdd or pdl above.

print(pdd)

gave

   dsn   volser
0 ABC SYSRES
1 DEF USER02S

Where the 0, 1 are the row numbers of the data.

With my real data I got

                                             DSN  ... AllocSpace
0 SYS1.VVDS.VA4RES1 ... 1660
1 SYS1.VTOCIX.A4RES1 ... 830
2 CBC.SCCNCMP ... 241043
3 CBC.SCLBDLL ... 885
4 CBC.SCLBDLL2 ... 996
.. ... ... ...
93 SYS1.SERBLPA ... 498
94 SYS1.SERBMENU ... 277
95 SYS1.SERBPENU ... 17652
96 SYS1.SERBT ... 885
97 SYS1.SERBTENU ... 332

[98 rows x 7 columns]

The data was formatted to match my window size. With a larger window I got more columns.

You can change this by using

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

Which columns are displayed ?

Rather than all of the columns being displayed you can select which columns are displayed.

You can tell from the data you passed to pandas, or use the command

print(list(opd.columns.values))

This displays the values of the column names, as a list.

To display the columns you specify use

print(opd[["DSN","VOLSER","ExpDate","CrDate","LastRef","63bit alloc space KB", "AllocSpace"]])

You can say display all but the specified columns

print(opd.loc[:, ~opd.columns.isin(["ExpDate","CrDate","LastRef"])])

Select which rows you want displayed

print(opd.loc[opd["VOLSER"].str.startswith("A4"),["DSN","VOLSER",]])

or

print(opd.loc[opd["DSN"].str.startswith("SYS1."),["DSN","VOLSER",]])

gave

                                             DSN  VOLSER  
0 SYS1.VVDS.VA4RES1 A4RES1
1 SYS1.VTOCIX.A4RES1 A4RES1
12 SYS1.ADFMAC1 A4RES1
13 SYS1.CBRDBRM A4RES1
14 SYS1.CMDLIB A4RES1
.. ... ...
93 SYS1.SERBLPA A4RES1
94 SYS1.SERBMENU A4RES1
95 SYS1.SERBPENU A4RES1
96 SYS1.SERBT A4RES1
97 SYS1.SERBTENU A4RES1

[88 rows x 2 columns]

From this we can see 88 (out of 97) rows were displayed. Row 0, 1 , 12, 13, but not rows 2, 3, …

What does .loc do?

My interpretation of this (which I haven’t seen documented)

If there is one parameter, this is a list of the columns you want.

If there are two parameters, the second is the list of the columns you want displayed. The first column is conceptually a list of True or False, with one value per row, saying if the row should be selected or not. So for

print(opd.loc[opd["VOLSER"].str.startswith("A4"),["DSN","VOLSER",]])

opd[“VOLSER”].str.startswith(“A4”)

says take the column called VOLSER, convert it to a string. If it starts with the string “A4” then return True, else return False. This returns a list of one entry per row.
The opd.loc[opd[“VOLSER”].str.startswith(“A4”),…) then selects the rows.

You can select rows and columns

print(opd.loc[opd["VOLSER"].str.startswith("A4"),["DSN","VOLSER","63bit alloc space KB",]])

You can process the data, such as sort

The following statements extracts columns from the original data, sorts the data, and creates a new data frame. The new data frame is printed.

sdata= opd[["DSN","VOLSER","63bit alloc space KB",]].sort_values(by=["63bit alloc space KB","DSN"], ascending=False)
print(sdata)

This gave

                                             DSN  VOLSER  63bit alloc space KB
2 CBC.SCCNCMP A4RES1 241043
35 SYS1.MACLIB A4RES1 210664
36 SYS1.LINKLIB A4RES1 166008
90 SYS1.SEEQINST A4RES1 103534
42 SYS1.SAMPLIB A4RES1 82617
.. ... ... ...
62 SYS1.SBPXTENU A4RES1 55
51 SYS1.SBDTMSG A4RES1 55
45 SYS1.SBDTCMD A4RES1 55
12 SYS1.ADFMAC1 A4RES1 55
6 FFST.SEPWMOD3 A4RES1 55

[98 rows x 3 columns]

Showing that all the rows, and all the (three) columns which had been copied to the sdata data frame.

Saving data

Reading an external file and processing the data into Python arrarys took an order of magnitude longer than processing it Pandas.

You should consider a two step approach to looking at data

  • Extract the data and exported it in an access format, such as Pickle or JSON. While getting this part working, use only a few rows of data. Once it works, you can process all of the data.
  • Do the analysis using the exported data.

Export the data

You should consider externalising the data in JSON or pickles format for example

# write out the data to a file
fPickle = open('pickledata', 'wb')
# source, destination
pickle.dump(opd, fPickle)
fPickle.close()

Import and do the analysis


# and read it in
fPickle = open('pickledata', 'rb')
opd = pickle.load(fPickle)
fPickle.close()
print(odp)

Processing multiple data sources as one

If you have multiple sets of data, for example for Monday, Tuesday, Wednesday, etc you can use

week =  pd.concat(monday,tuesday,wednesday,thursday,friday)

Processing data within fields

Within my data, I have a field with information like

                   DSN  VOLSER           FormatType
0 SYS1.VVDS.VC4RES1 C4RES1 []
1 SYS1.VTOCIX.C4RES1 C4RES1 [Fixed]
2 CBC.SCCNCMP C4RES1 [Fixed, Variable]
3 CBC.SCLBDLL C4RES1 [Fixed, Variable]
4 CBC.SCLBDLL2 C4RES1 [Fixed, Variable]

Where the data under FormatType is a list. You can reference elements in a list.

For example

x =  data.FormatType.apply(lambda x: 'Variable' in x)
print(x)

gives

0     False
1 False
2 True
3 True
4 True

The command

print(data.loc[ data.FormatType.apply(lambda x: 'Blocked' in x)])

gives

              DSN  VOLSER           FormatType
2 CBC.SCCNCMP C4RES1 [Fixed, Variable]
3 CBC.SCLBDLL C4RES1 [Fixed, Variable]
4 CBC.SCLBDLL2 C4RES1 [Fixed, Variable]

Basic operations on columns

You can do basic operations on columns such as

print(dataset[["CountIO","CacheHits"]].sum())

The sum() (and count() etc) functions add up the specified columns.

This gave

[361 rows x 10 columns]
CountIO 74667.0
CacheHits 1731.0
dtype: float64

An operation like

print(dataset.sum())

Would have totalled all the columns, including some which are meaningless, for example, maximum value found.

Doing aggregation, count, sum, maximum, minimum etc.

Simple aggregation

You can aggregate data

# Extract just the fields of interest
d = dataset[["DSN","CountIO","CacheHits"]]
print(d.groupby("DSN").sum())

Gave

                                        CountIO  CacheHits
DSN
ADCD.Z31B.PARMLIB 68.0 60.0
ADCD.Z31B.PROCLIB 66.0 66.0
ADCD.Z31B.VTAMLST 141.0 141.0
COLIN.TCPPARMS 4.0 4.0
FEU.Z31B.PARMLIB 4.0 0.0
IXGLOGR.ATR.S0W1.RM.DATA.A0000000.DATA 4.0 0.0
SYS1.DAE 0.0 0.0
SYS1.DBBLIB 974.0 932.0

More complex aggregation

The .agg() gives you much more control as to what, and how you want to process data.

print(d.groupby("DSN").agg({'DSN' : ['count'], 'CountIO' : ['sum','max'],"CacheHits": ["sum"]}))

gave

                                         DSN  CountIO          CacheHits
count sum max sum
DSN
ADCD.Z31B.PARMLIB 19 68.0 7.0 60.0
ADCD.Z31B.PROCLIB 30 66.0 3.0 66.0
ADCD.Z31B.VTAMLST 6 141.0 41.0 141.0
COLIN.TCPPARMS 2 4.0 3.0 4.0
FEU.Z31B.PARMLIB 1 4.0 4.0 0.0
IXGLOGR.ATR.S0W1.RM.DATA.A0000000.DATA 4 4.0 1.0 0.0
SYS1.DAE 1 0.0 NaN 0.0
SYS1.DBBLIB 2 974.0 932.0 932.0

Notes:

  • The columns are not in the order I specified. It is hard to see which field Max applies to
  • There is a Not a Number (Nan) in one of the value. You need to allow for this.
  • In the simple case using .sum() by default it tries to sum all of the columns. Using .agg you can specify which columns you want to process

Processing a dataset is easy in Python.

I’ve been doing more with Python on z/OS, and have spent time using datasets. With the pyzfile package this is very easy! (Before this you had to copy a data set to a file in Unix Services).

You can do all the things you normally expect to do: open, close, read, write, locate, info etc.

from pyzfile import * 
def readfile():
try:
with ZFile("//'COLIN.DCOLLECT.OUT'", "rb,type=record,noseek") as file:
for kw,value in file.info().items():
print(kw,":",value)
for rec in file:
yield rec
except ZFileError as e:
print(e,file=sys.stderr)

def reader(nrecords):
nth = 0
for line in readfile():
nth += 1
if nth > nrecords:
break
if nth % 100 == 99:
print("Progress:",nth+1,file=sys.stderr)
## Do something..

It gave

access_method : QSAM
blksize : 27998
device : DISK
dsname : COLIN.DCOLLECT.OUT
dsorgConcat : False
dsorgHFS : False
dsorgHiper : False
dsorgMem : False
dsorgPDE : False
dsorgPDSdir : False
dsorgPDSmem : False
dsorgPO : False
dsorgPS : True
dsorgTemp : False
dsorgVSAM : False
maxreclen : 1020
mode : {'append': False, 'read': True, 'update': False, 'write': False}
noseek_to_seek : NOSWITCH
openmode : RECORD
recfmASA : False
recfmB : True
recfmBlk : True
recfmF : False
recfmM : False
recfmS : False
recfmU : False
recfmV : True
vsamRKP : 0
vsamRLS : NORLS
vsamkeylen : 0
vsamtype : NOTVSAM
Progress: 100
...

Where the fields are taken from fldata().

Great stuff!

Of course once you’ve got a record, doing something with it may be harder, because , for example Python is ASCII, and you’ll need to convert any character strings from EBCDIC to ASCII.

What is using all the space on this file system?

Help ! My ZFS has filled up shows how to use the du command to list entries under specified directory, and display the files using the most space. This may not be what you want, because you could have a second file system mounted over a sub-directory, and you do not want to include this in the output.

To display the size of files on the device, and not all sub-directories, use the command

find /u -xdev -size +2048 -exec ls -o {} \;|awk '{print  $4, $8 }' |sort -n -r |head -n 20

This command does

  • find
  • /u this directory
  • -xdev only files on this same file system – do not (x) go to a different device
  • -size +2048 where the size is greater(+) than 2048 of 512 byte blocks (1MiB)
  • -exec ls -o {} \; execute the command -ls -o on the name of the file
  • awk ‘{print $4, $8 }’ extract the 4th and 8th fields from the data – the size and file name
  • sort -n -r sort the data, treating the data as numbers(-n) in reverse order (-r). It defaults to the first column
  • head -n 20 and display only the first 20 lines of output.

I used this, and found I had many large files I didn’t want. I removed them, and got a lot of disk space back!

0/10 for JCL 101 homework

When I worked with customers, you could often tell people who were not experienced, and setting up subsystems and applications

“My first JCL”

//STEPLIB   DD DSN=ABC120.EG1.SABCAUTH,DISP=SHR   /* USER LIB  */ 
// DD DSN=ABC120.SABCANLE,DISP=SHR
// DD DSN=ABC120.SABCAUTH,DISP=SHR
// DD DSN=CEE.SCEERUN,DISP=SHR
...
//FILE1 DD DSN=ABC120.EG1.FILE01,DISP=SHR
//FILE2 DD DSN=ABC120.EG1.FILE02,DISP=SHR

Where the FILE datasets contain user data.

All the ABC120.* datasets were shipped on a volume ABCV12. When the system was updated to a newer service level, the volume ABCV12 was refreshed and put on all systems.

What could go wrong?

The first problem – whoops

With the volume ABCV12 being replaced, all the user data is replaced – Whoops.

Solution: You need to keep your libraries and user data separate. Product libraries on ABCV12, and user data on USRxxx. You might want to make the volume for the product libraries (ABCV12) read only.

The second problem – what is this?

Once you have fixed the problem and separated the data onto different volumes you upgrade to the next version ABCV13.

Now your JCL is

//STEPLIB   DD DSN=ABC130.EG1.SABCAUTH,DISP=SHR   /* USER LIB  */ 
// DD DSN=ABC130.SABCANLE,DISP=SHR
// DD DSN=ABC130.SABCAUTH,DISP=SHR
// DD DSN=CEE.SCEERUN,DISP=SHR
...
//FILE1 DD DSN=ABC120.EG1.FILE01,DISP=SHR
//FILE2 DD DSN=ABC120.EG1.FILE02,DISP=SHR

People looking at this will be confused and will ask what release are we running, it looks like 1.3, but the data sets say 1.2

Solution: Use a name like ABCUSER.EG1.FILE01 instead of ABC120.EG1.FILE01. These names never change when you migrate to a newer release.

You can enforce which HLQ’s can use which volumes using HSM rules.

You want to do a test upgrade to the next release: – how much work is it!!

Over this weekend, you want to test out the next release and go from release 1.3 to 1.4. You look at all your JCL, use SRCHFOR ABC130. and find all the places where you have ABC130 (wow, lots of places). You will have to change the JCL to run the subsystem at the start of the test, run your test, and change it all back ready for production next week. With all the changes you need to be careful not to make a mistake. (And of course all the change request paper work needs to be approved)

A better way is to use dataset aliases.

DEFINE ALIAS(NAME(ABC.SABCAUTH) RELATE(ABC120.SABCAUTH))
DEFINE ALIAS(NAME(ABC.SABCANLE) RELATE(ABC120.SABCANLE))
etc

So when you use ABC.SABCAUTH under the covers it uses ABC920.SABCAUTH

Your JCL now looks like

//STEPLIB   DD DSN=ABCUSR.EG1.SABCAUTH,DISP=SHR   /* USER LIB  */ 
// DD DSN=ABC.SABCANLE,DISP=SHR
// DD DSN=ABC.SABCAUTH,DISP=SHR
// DD DSN=CEE.SCEERUN,DISP=SHR
...
//FILE1 DD DSN=ABCUSER.EG1.FILE01,DISP=SHR
//FILE2 DD DSN=ABCUSER.EG1.FILE02,DISP=SHR

You do not need to worry about APF authorising the CSQ.SCSQAUTH. The dataset which is checked is the dataset the alias points to.

To test the next release this weekend, you delete the aliases and define the new ones. You do not need to change your JCL libraries. You run your tests and at the end delete the new aliases and redefine the old one. The JCL will fit on one screen is much easier than changing all your JCL libraries, and less error prone. (And someone else can review the JCL before you make the changes)

An alternative way

You could use system symbolics EGHLQ = ABC120, and refer to it as in //STEPLIB DSN=&EGHLQ..SABCAUTH

Hands up…

If you are guilty of the problems raised in the blog post, you can get round them.

You can implement the alias for the product libraries, and gradually change all references to use the aliases.

Where you have

//FILE1     DD DSN=ABC120.EG1.FILE01,DISP=SHR 
//FILE2 DD DSN=ABC120.EG1.FILE02,DISP=SHR

You can define an alias for these and use DSN=ABCUSER.EG1.FILE01. Once you’ve made the change your friends will appreciate the clarity, and the only people who know about the mess you made are the storage administrators.

In the long term you may be able to fix it by copying the datasets to new ones with the proper name, and deleting the old ones.

Defining dataset aliases is good, but take care

I can define a dataset alias CSQ.SCSQAUTH which points to dataset CSQ920.SCSQAUTH, and use CSQ,SCSQAUTH in my JCL. When I want to change to the next level of code, I just change CSQ.SCSQAUTH to point to CSQ940.SCSQAUTH, and my JCL just works unchanged! Every one should do this.

Background

As part of z/OS catalogs, you can define aliases to keep user information out of the master catalog. For example point a high level qualifier to a catalog

DEFINE ALIAS (NAME(CSQ920) RELATE('USERCAT.Z31B.PRODS')
DEFINE ALIAS (NAME(CSQ) RELATE('USERCAT.Z31B.PRODS')

If I now define a dataset CSQ.COLIN, the information about this data set will be store in the catalog USERCAT.Z31B.PRODS. When the dataset is used, the name is looked up in the master catalog, which says go and use catalog USERCAT.Z31B.PRODS.

Dataset level aliases

I can also define an alias at the data set level. For example CSQ.SCSQAUTH is an alias of CSQ920.SCSQAUTH. I can then use CSQ.SCSQAUTH in my JCL instead of CSQ920.SCSQAUTH

When the next version of code is available, I can change the alias of CSQ.SCSQAUTH to point to CSQ940.SCSQAUTH and my JCL will work as before. I do not need to go through my JCL libraries and replacing the old with new. This is great – every one should use it.

Create the alias using

DEFINE ALIAS(NAME(CSQ.SCSQAUTH) RELATE(CSQ920.CSQ9.SCSQAUTH))

For this to work the data set alias CSQ.SCSQAUTH must be in the same catalog as the data set it references, so both name and target need to be in USERCAT.Z31B.PRODS.

If I use ISPF 3.4 with CSQ.SCSQAUTH it gives volume of *ALIAS. If I browse the dataset it shows data set CSQ920.CSQ9.SCSQAUTH.

You do not need to worry about APF authorisation because controls are on the target data set CSQ920.CSQ9.SCSQAUTH.

What problems did I have?

I had a frustrating hour or so until I got it to work.

I had a different user catalog for the CSQ HLQ.

DEFINE ALIAS (NAME(CSQ920) RELATE('USERCAT.Z31B.PRODS')
DEFINE ALIAS (NAME(CSQ) RELATE('USERCAT.COLINS')
DEFINE ALIAS(NAME(CSQ.SCSQAUTH) RELATE(CSQ920.CSQ9.SCSQAUTH))

The above statements worked successfully, but ISPF 3.4 did not show CSQ.SCSQAUTH.

The commands

LISTCAT CATALOG('CATALOG.Z31B.MASTER') ALIAS
LISTCAT CATALOG('USERCAT.COLINS') ALIAS

did not show any entries for CSQ.SCSQAUTH.
If I tried to redefine the data set alias, it said DUPLICATE entry.

I had to use

LISTCAT CATALOG('USERCAT.Z31B.PRODS') ALIAS

and there was my CSQ.SCSQAUTH.

The documentation says

If the entryname in the RELATE parameter is non-VSAM, choose an aliasname in the NAME parameter that resolves to the same catalog as the entryname.

which I missed the first time round.

I had to delete the dataset alias from the catalog for the target dataset

DELETE CSQ.SCSQAUTH  CATALOG('USERCAT.Z31B.PRODS') 

I then deleted the alias for CSQ, redefined it to point to the correct user catalog, redefined the data set alias and it worked.

DELETE    CSQ          ALIAS 
DEFINE ALIAS (NAME(CSQ) RELATE('USERCAT.Z31B.PRODS')
DEFINE ALIAS(NAME(CSQ.SCSQAUTH) RELATE(CSQ920.CSQ9.SCSQAUTH))

Getting sshfs to work to z/OS

You can “mount” a remote file system as a local directory over sshfs. (ssh file system).

Getting this working was a challenge. I do not know if it is an FTP problem, or a z/OS problem

The command, from Linux, is

sshfs colin@10.1.1.2: ~/mountpoint

where mountpoint is a local directory, and my z/OS system is on 10.1.1.2

This flows into the SSH daemon (SSHD) on z/OS which handles the handshake and encryption.

For the IBM provided SSHD, the /etc/ssh/sshd_config config file has

Subsystem sftp /usr/lib/ssh/sftp-server 

Where /usr/lib/ssh/sftp-server is the executable to do the work. The IBM supplied object is a load module. You could replace this with a script or other module.

Once the session has been established you can access the files, as if they were on the local system.

What is running on z/OS?

If you use the ps -ef command it displays

     UID        PID       PPID  CMD                                               
OMVSKERN 50397264 67174474 /usr/sbin/sshd -f /etc/ssh/sshd_config -R
COLIN 67174482 50397264 /usr/sbin/sshd -f /etc/ssh/sshd_config -R
COLIN 50397267 67174482 sh -c /usr/lib/ssh/sftp-server
COLIN 83951719 50397267 /usr/lib/ssh/sftp-server

This shows the calling chain – the first (SSHD) is at the top, and the last, /usr/lib/ssh/sftp-server, is doing the work to process the files

The shell used depends on the OMVS(PROGRAM()) defined for the userid.

When did sshfs work?

If I had OMVS(PROGRAM(‘/bin/sh’)) then the sshfs worked ok, I could used the files as expected.

If the program was for bash or for zhs, then the data as seen from Linux was in EBCDIC and so was not usable.

So how do I use zsh or bash?

I got round this problem…

I specified the userid as having OMVS(PROGRAM(‘/bin/sh’)), and changed to use the bash shell in the logon script

If I logon with ssh colin@10.1.1.2 then there are environment variables in /etc/profile and ~/.profile.

SSH_CLIENT="10.1.0.2 44898 22"
SSH_CONNECTION="10.1.0.2 44898 10.1.1.2 22"
SSH_TTY="/dev/ttyp0000"

In my ~/.profile I’ve put (thanks to Kirk Wolf for suggesting the better if interactive shell sta

tement)

# for all interactive sessions use the following if
#if [[ $- = *i* ]];
# for sessions only with ssh use the following ig statement
if [[ ! -z "$ SSH_CONNECTION" ]]
# ssh: switch to bash....
#set -x
# bash="/usr/lpp/Rocket/rsusr/ported/bin/bash"
bash="/u/zopen/usr/local/bin/bash"
echo "shell was $SHELL bash $bash"
if [[ $SHELL != $bash ]]
then
echo "using the bash shell" $bashs
export SHELL="$bash"
exec "$bash" # replace the current script with bash
# any code after the exec is not executed
fi
fi

which says. If the $SSH_CONNECTION variable is not the empty string, (the session came in over an ssh connection) then invoke $bash, and it replaces the current environment with the /u/zopen/usr/local/bin/bash.

With this I could use both sshfs for remote file access, and ssh for terminal access.

If there are better ways of doing this, please let me know

OMVS is the way ahead!

If you have any suggestions in this area – please let me know!

I recently found this article which covers the same ground with more/better explanations.

With lots of development of open source tools for OMVS on z/OS, I thought I would try it out. I’ve been amazed at how good it is. This blog post is “one liners” to help people get started and over the initial hump to moving towards this. I’ll add more blog posts as I go further down the path.

My original task was to develop some Python scripts to extract profiles from RACF. I use Ubuntu Linux on my laptop.

I used to use OMVS from ISPF, because I thought the interface through SSH was poor in comparison. I now think that the OMVS interace is limited compare to the SSH interface, because of all of the enhancements and packages available to it.

One example is I use the “less” command on Linux very frequently. This does not work with ISPF OMVS, but it is available through SSH.

See Setting Up the z/OS UNIX Shell (Correctly & Completely) for an excellent article on moving to OMVS though SSH.

Editing is easy

  • Create a mountpoint on your laptop.
  • use sshfs colin@10.1.1.2: ~/mountpoint
  • use vscode to edit the files. This is a very popular editor/IDE.
  • you have to be careful of tagging the file. Create a file using touch, then use “chtag -t -c ISO8859-1 filename “, and then edit it. It is editable in vscode and ISPF (but not at the same time of course). Yesterday the files needed the tag ISO8859-1, today they only work without the tag! ( chtag -r newtry.py) – I do not know what has changed!

I used other tools such as diff, from my laptop to files in my z/OS Home directory.

You can install packages like zowe on z/OS and use vscode to edit files and datasets from lists, to issue z/OS commands, and look at spool. This is a heavy weight package, but is very popular. The editing via sshfs is very easy.

Install zopen

zopen is a set of packages ported from open source. It was easy to install.

I used zopen install … to install packages. I used

  • zopen install which, this tells you the full path of a command
  • zopen install less, less is a fast display of a file in a terminal, with search capability. It is often faster than an editor/browser. Less is a more advanced version of the more command. The more command allows you to page through a file.

Use the bash or zsh shell

In my OMVS userid profile I set PROGRAM(/u/zopen/usr/local/bin/bash) This version of bash has proper key support. For example delete really does delete characters. For the Rocket port of bash the delete key is a dead key.

If your delete key does not work on the command line

Use the zopen bash shell, PROGRAM(/u/zopen/usr/local/bin/bash) or the zsh shell PROGRAM(/bin/zsh).

Logon

Use a command like ssh colin@10.1.1.2 to get to z/OS. You can configure SSH and transfer a key file to z/OS, so you can logon without a password.

Using the right shell

The default borne shell is so back level. You should use bash or zsh.

bash

You should use bash from zopen. Use PROGRAM(/u/zopen/usr/local/bin/bash) in your RACF profile. Use bash from zopen because this has more function than Rocket’s bash – and the delete key works as expected.

zsh

Many people recommend zsh. Use program(/bin/zsh) in your RACF profile to use it. See Oh My Zsh or: How I Learned to Stop Worrying and Love the Shell to a good introduction to zsh extensions. For example there are sudo, and a git interface.

Issuing TSO commands

You can use the tsocmd to issue a command and get a response back

tsocmd "LU COLIN" |less

you can then page through the output.

Command complete

Bash and zsh have command completion.
if you type zopen [tab] [tab] it will display the options available for the zopen command

You can use ls [tab] [tab] to display all the files in the current directory

RACF

I’ve been using the Python interface (pysear) to RACF to display information, and manage profiles. It’s great and very flexible.

SDSF

There is a python interface to sdsf, available in z/OS 3.1, but it is not available in the 3.1 images I have.

My ~/.profile

export _BPXK_AUTOCVT=ON
export _CEE_RUNOPTS="FILETAG(AUTOCVT,AUTOTAG) POSIX(ON)"
export _TAG_REDIR_ERR=txt
export _TAG_REDIR_IN=txt
export _TAG_REDIR_OUT=txt
export CXX="/bin/xlclang++"
export CC="/bin/xlC"
export CC="/bin/xlclang++"
export PATH=${PATH}:/bin
export PATH=${PATH}:/u/colin/.local/bin
export PATH=${PATH}:/u/tmp/zowep/bin/
export PATH=${PATH}:/usr/lpp/IBM/cyp/v3r12/pyz/bin
export LIBPATH=${LIBPATH}:/usr/lpp/IBM/cyp/v3r12/pyz/lib
. /u/zopen//etc/zopen-config --override-zos-tools
# if Ive come in from SSH....
if [[ -z "$SSH_CLIENT" ]]
then
# dummy
xxx="$SSH_CLIENT"
else
set -x
zopenkw="alt audit build clean compare-versions compute-builddeps \
config-helper create-cicd-job create-repo diagnostics \
generate help2man info init install md2man migrate-buildenv \
migrate-groovy promote publish query remove split-patch \
Cupdate-cacert usage version whichproject "
echo $kw
complete -W "$zopenkw " zopen
fi

That’s as far as I’ve got.

Keeping people out of the master catalog.

I had written Here’s another nice mess I’ve gotten into! My master catalog is full of junk which describes what I did once I found my master catalog was full of stuff which should not be there.

I’ve now got round to finding out how to stop people from putting rubbish there in the first place!

See One minute mvs: catalogs and datasets for an introduction to master and user catalogs.

The master catalog should have some system datasets, aliases, and not much else.

An alias says for this high level qualifier (COLIN) go to the usercatalog(‘USER.COLIN.CATALOG).

A catalog is a dataset, and you can use a RACF profile to protect it, so only authorised people can update it. Typically, when you define a userid or a high level qualifier, you should also define an alias for that userid (or HLQ), pointing to a user catalog.

To keep user data out of the master catalog you need

  • one or more user catalogs – for example do you give each user their own catalog, have one per deparment, or one for all users. These catalogs are typically defined by storage administrators (or automation set up by storage administrators).
  • an alias for each userid and the name of the catalog that userid should use. These aliases are set up by people (or automation) which defines userids.
  • an alias for each dataset High Level Qualifier (HLQ) and the name of the catalog that the HLQ should use. These aliases are set up by people (or automation) which defines the high level qualifiers. An example HLQ is CEE, or DB2.

If you migrate to a system with a new master catalog (for example with zPDT or zD&T), you will need to import the usercatalogs into the master catalog, and redefine the aliases.

Import a user catalog

When I tried to import a user catalog into the master catalog, I got

 ICH408I USER(COLIN   ) GROUP(TEST    ) NAME(CCPAICE             ) 
CATALOG.Z31B.MASTER CL(DATASET ) VOL(B3SYS1)
INSUFFICIENT ACCESS AUTHORITY
FROM CATALOG.Z31B.* (G)
ACCESS INTENT(UPDATE ) ACCESS ALLOWED(READ )

so any userid importing or exporting a catalog needs update access to the catalog.

Defining and deleting an alias

Having set up RACF profiles, and given my userid COLIN only READ access to the master catalog, I found my userid could still define and delete aliases. It took a couple of days to find out why.

  • If a userid has ALTER access to CLASS(FACILITY) STGADMIN.IGG.DEFDEL.UALIAS the userid can define and delete ALIAS profiles. This overrides dataset access checks.
  • If a userid does not have ALTER access to the profile, then normal dataset checks are made.

What I learned…

  • My userid had “special”. As the documentation says The RACF SPECIAL attribute allows you to update any profile in the RACF database. This meant I could display and update any profile.
  • There is a profile class(facility) STGADMIN.IGG.DEFDEL.UALIAS which allows you to define and delete user aliases in the (master) catalog
  • If my userid had SPECIAL, or the userid was in group SYS1 I could issue the command

rlist facility STGADMIN.IGG.DEFDEL.UALIAS

and it gave

CLASS      NAME
----- ----
FACILITY STGADMIN.IGG.* (G)
LEVEL OWNER UNIVERSAL ACCESS YOUR ACCESS WARNING
----- -------- ---------------- ----------- -------
00 IBMUSER NONE ALTER NO

USER ACCESS
---- ------
SYS1 ALTER
IBMUSER ALTER

If my userid did not have special and was not in SYS1, I got

ICH13002I NOT AUTHORIZED TO LIST STGADMIN.IGG.*

When my userid was connected to the group SYS1, it got the ALTER access to the profile, and overrode the RACF profiles for the catalog data set.

Which is my master catalog?

At IPL, it reports

IEA370I MASTER CATALOG SELECTED IS CATALOG.Z31B.MASTER 

You can use the operator command D IPLINFO

SYSTEM IPLED AT 07.26.58 ON 01/02/2026                                              
RELEASE z/OS 03.01.00 LICENSE = z/OS
USED LOADCP IN SYS1.IPLPARM ON 00ADF

My load parm member, SYS1.IPLPARM(LOADCP) has

IODF     99 SYS1 
INITSQA 0000M 0008M
SYSCAT B3SYS1113CCATALOG.Z31B.MASTER
SYSPARM CP
IEASYM (00,CP)

The catalog is called CATALOG.Z31B.MASTER and is on volume B3SYS1

Does a RACF profile exist?

See What RACF profile is used for a data set?

tso listdsd dataset(‘CATALOG.Z31B.MASTER’)
tso listdsd dataset(‘CATALOG.Z31B.MASTER’) generic

Showed there was no profile defined.

Create the profile

* DELDSD  'CATALOG.Z31B.*'                                   
ADDSD 'CATALOG.Z31B.*' UACC(READ)
PERMIT 'CATALOG.Z31B.*' ID(IBMUSER ) ACCESS(CONTROL)
PERMIT 'CATALOG.Z31B.*' ID(COLIN ) ACCESS(READ )

When I tried to use the master catalog from a general userid the request failed.

DELETE TEST  ALIAS                                                                                                 
IDC3018I SECURITY VERIFICATION FAILED+
IDC3009I ** VSAM CATALOG RETURN CODE IS 56 - REASON CODE IS IGG0CLFT-6
IDC0551I ** ENTRY COLIN.TEST NOT DELETED
IDC0014I LASTCC=8

Hmm that’s strange

With userid COLIN, I could still issue commands, such as DELETE TEST ALIAS, even though I had given it only read access.
If I displayed the profile from userid COLIN it had

 INFORMATION FOR DATASET CATALOG.Z31B.* (G)

LEVEL OWNER UNIVERSAL ACCESS WARNING ERASE
----- -------- ---------------- ------- -----
00 COLIN READ NO NO
YOUR ACCESS CREATION GROUP DATASET TYPE
----------- -------------- ------------
READ SYS1 NON-VSAM

This had me confused for several hours. That’s when I found out about the presence of the STGADMIN.IGG.DEFDEL.UALIAS profile.

Summary

You want users (non system) datasets to be in a user catalog, rather than the master catalog. This makes migrating to a new master catalog much easier, You just have to import the catalogs, and redefine the aliases.

You need to set up

  • one (or more) user catalogs
  • aliases to connect the userid (and High Level Qualifiers) to a catalog
  • give authorised used alter access to class(facility) STGADMIN.IGG.DEFDEL.UALIAS to allow them to maintain aliases.
  • define a RACF profile for the master catalog and make the UACC(READ).
  • for those people who need to need to define, import or export catalogs, they need update access to the master catalog dataset.