Here’s another nice mess I’ve gotten into! My master catalog.

Or “How to clean up the master catalog when you have filled it up with junk”. Looking at my z/OS system,  I was reminded of my grandfathers garage/workshop where the tools were carefully hung up on walls, the chisels were carefully stored a a cupboard to keep them sharp etc.   He had boxes of screws, different sizes and types in different boxes.   My father had a shed with a big box of tools.  In the box were his chisels, hammers, saws etc..  He had a big jar of “Screws – miscellaneous – to be sorted”.    The z/OS master catalog should be like my grandfather’s garage, but I had made it like my father’s shed.

Well, what a mess I found!   This blog post describes some of the things I had to do to clean it up and follow best practices.

In days of old, well before PCs were invented, all data sets were cataloged in the master catalog.  Once you got 10s of thousands of data sets on z/OS, the time to search the catalog for a dataset name increased, and the catalogs got larger and harder to manage.  They solved this about 40 years ago by developing the User Catalog  – a catalog for User entries.
Instead of 10,000 entries for my COLIN.* data sets, there should be an Alias COLIN in the Master Catalog which points to another catalog which could be just for me, or can be shared by other users. This means that even if I have 1 million datasets in the user catalog, the access time for system datasets is not affected.  What I expected to see in the master catalog is the system datasets, and aliases for the userids.  I had over 400 entries for COLIN.* datasets, 500 BACKUP.COLIN.* datasets, 2000, MQ.ARCHIVE.* datasets etc.  What a mess!

Steps to solve this.

Prevention is better than cure.

I’ve more recently created Keeping people out of the master catalog, which should help you prevent a mess in your catalog.

You can use the RACF Option PROTECTALL.  This says a userid needs a RACF profile before it can create a dataset.  This means each userid (and group) needs a profile  like ‘COLIN.*’, and give the userid access to this profile.  Once you have done this for all userids, you can use the RACF command SETROPTS PROTECTALL(WARNING) to enable this support.   This will allow users to create datasets, when there is no profile, but produces a warning message on the operator console – so you can fix it.  An authorised person can use SETROPTS NOPROTECTALL to turn this off.  Once you have this running with no warnings you can use the command SETROPTS PROTECTALL to make it live – without warnings, and you will live happily ever after, or at least till the next problem.

Action:

  1. Whenever you create a userid (or a high level qualifier) you need to create the RACF dataset profile for the userid (high level qualifier)
  2. You also need to set up an ALIAS for the new userid(high level qualifier) to point to a User Catalog.

How bad is the problem?

You can use IDCAMS to print the contents of a catalog

//S1 EXEC PGM=IDCAMS 
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
LISTCAT CATALOG(CATALOG.Z24A.MASTER) NAME
/*

This has output like

NONVSAM ------- BACKUP.USER.Z24A.VTAMLST.D201210 
NONVSAM ------- BACKUP.USER.Z24A.VTAMLST.D201222
NONVSAM ------- BACKUP.USER.Z24A.VTAMLST.D201224
ALIAS --------- BAQ300

This says there are datasets BACKUP… which should not be in the catalog.
There is an Alias BAQ300 which points to a user catalog.   This is what I expect.

The IDCAMS command

LISTCAT ALIAS CATALOG(CATALOG.Z24A.MASTER) ALL

list all of the aliases in the catalog, for example

ALIAS --------- BAQ300 
... 
ASSOCIATIONS
USERCAT--USERCAT.Z24A.PRODS

This shows for high level qualifier BAQ3000, go and look in the user catalog  USERCAT.Z24A.PRODS.

Moving the entries out of the Master Catalog

The steps to move the COLIN.* entries out of the Master Catalog are

  1. Create a User Catalog
  2. Create an ALIAS COLIN2 which points to this User Catalog. 
  3. Rename COLIN…. to COLIN2….
  4. Create an ALIAS COLIN for all new data sets.
  5. Rename COLIN2… to COLIN…
  6. Delete the ALIAS COLIN2.

Create a user catalog

Use IDCAMS to create a user catalog

 DEFINE USERCATALOG - 
( NAME('A4USR1.ICFCAT') -
MEGABYTES(15 15) -
VOLUME(A4USR1) -
ICFCATALOG -
FREESPACE(10 10) -
STRNO(3 ) ) -
DATA( CONTROLINTERVALSIZE(4096) -
BUFND(4) ) -
INDEX(BUFNI(4) )

To list what is in a user catalog

Use a similar IDCAMS command to list the master catalog 

LISTCAT ALIAS CATALOG(A4USR1.ICFCAT) ALL

Create an alias for COLIN2

 DEFINE ALIAS (NAME(COLIN2) RELATE('A4USR1.ICFCAT') ) 

Get the COLIN.* entries from the Master Catalog into the User Catalog

This was a bit of a challenge as I could not see how to do a global  rename.

You can rename non VSAM dataset either using ISPF 3.4 or use the TSO rename command in batch.

The problem occurs with the VSAM data sets.   When I tried to use the IDCAMS rename, I got an error code IGG0CLE6-122 which says I tried to do a rename, which would cause a change of catalog.

The only way I found of doing it was to copy the datasets to a new High Level Qualifier, and delete the originals.   Fortunately DFDSS has a utility which can do this for you.

//S1 EXEC PGM=ADRDSSU,REGION=0M PARM='TYPRUN=NORUN' 
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
COPY -
DATASET(INCLUDE(COLIN.** )) -
DELETE -
RENUNC(COLIN2)
/*

Most of the data sets were “renamed” to COLIN2… but I had a ZFS which was in use, and some dataset aliases.  I used

  •  the TSO command unmount filesystem(‘COLIN.ZCONNECT.BETA.ZFS’)
  • the  IDCAMS command DELETE COLIN.SCEERUN ALIAS for each of the aliases.

and reran the copy job.   This time it renamed the ZFS.  The renaming steps are

  • Check there are no datasets with the HLQ COLIN.
  • Define an alias for COLIN in the master catalog to point to a user catalog.
  • Rerun the copy job to copy from COLIN2 back to COLIN.
  • Mount the file system.
  • Redefine the alias to data sets (eg COLIN.SCEERUN).
  • Delete the alias for COLIN2.

To be super efficient, and like my grandfather, I could have upgraded the SMS ACS routines to force data sets to have the “correct” storage class, data class, or management class.  The job output showed  “Data set COLIN2.STOQ.CPY has been allocated with newname  COLIN.STOQ.CPY using STORCLAS SCBASE,  no DATACLAS, and no MGMTCLAS“.  These classes were OK for me, but may not be for a multi-user z/OS system.

One last thing, don’t forget to add the new user catalog to your list of datasets to backup.

What should I monitor for MQ on z/OS – logging statistics

For the monitoring of MQ on z/OS, there are a couple of key metrics you need to keep an eye on for the logging component, as you sit watching the monitoring screen.

I’ll explain how MQ logging works, and then give some examples of what I think would be key metrics.

Quick overview of MQ logging

  1. MQ logging has a big(sequential) buffer for logging data, it wraps.
  2. Application does an MQPUT of a persistent message.
  3. The queue manager updates lots of values (eg queue depth, last put time) as well as move data into the queue manager address space.  This data is written to log buffers. A 4KB page can hold data from many applications.
  4. An application does an MQCOMMIT.  MQ takes the log buffers up and including the current buffer and writes it to the current active log data set.  Meanwhile other applications can write to other log buffers.
  5. The I/O finishes and the log buffers just written can be reused.
  6. MQ can write up to 128 pages in an I/O. If there are more than 128 buffers to write there will be more than 1 I/O.
  7. If application 1 commits, the IO starts,  and then application 2 commits. The I/O for the commit in application 2 has to wait for the first set of disk writes to finish, before the next write can occur.
  8. Eventually the active log data set fills up.  MQ copies this active log to an archive data set.  This archive can be on disk or tape.   This archive data set may never be used again in normal operation.  It may be needed for recovery of transactions or after a failure.   The Active log which has just been copied can now be reused.

What is interesting?

Displaying how much data is logged per second.

Today       XXXXXXXXXXXXXXXXXXXX
Last week XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Yesterday XXXXXXXXXX          
      0                     100MB/Sec    200 MB/Sec

This shows that the logging rate today is lower than last week.   This could be caused by

  1. Today is just quieter than last week
  2. There is a problem and there are fewer requests coming into MQ.   This could be caused by problems in another component, or a problem in MQ.    When using persistent messages the longest part of a transaction is the commit and waiting for the log disk I/O.  If this I/O is slower it can affect the overall system throughput.
  3. You can get the MQ log IO response times from the MQ log data.

Displaying MQ log I/O response time

You can break down where time is spent in doing I/O into the following area

  1. Scheduling the I/O – getting the request into the I/O processor on the CPU
  2. Sending the request down to the Disk controller(eg 3990)
  3. Transferring data
  4. The I/O completes, and send an interrupt to z/OS, z/OS has to catch this interrupt and wake up the requester.

 Plotting the I/O time does not give an entirely accurate picture, as the time to transfer the data depends on the amount of data to transfer.  On a well run system there should be enough capacity so the other times are constant.    (I was involved in a critical customer situation where the MQ logging performance “died” every Sunday morning.   They did backups, which overloaded the I/O system).

In the MQ log statistics you can calculate the average I/O time.  There are two sets of data for each log

  1. The number of requests, and sum of the times of the requests to write 1 page.  This should be pretty constant, as the data is for when only one 4KB was transferred
  2. The number of requests, and sum of the times of the requests to more than 1 page.  The average I/O time will depend on the amount of data transferred.
  • When the system is lightly loaded, there will be many requests to write just one page. 
  • When big messages are being processed (over 4 KB) you will see multiple pages per I/O.
  • If an application processes many messages before it commits you will get many pages per I/O.   This is typical of a channel with a high achieved batch size.
  • When the system is busy you may find that most of the I/O write more than one page, because many requests to write a small amount of data fills up more than one page.

I think displaying the average I/O times would be useful.   I haven’t done tried this in a customer environment (as I dont have customer environment to use).    So if the data looks like

Today         XXXXXXXXXXXXXXXXXXXXXXXX
Last week     XXXXXXXXXXXXXXXXXXXXXXXXXXXXX  
One hour ago XXXXXXXXXXXXXXXXXXX
time in ms 0 1 2 3

it gives you a picture of the I/O response time.

  • The dark green is for I/O with just one page, the size of the bar should be constant.
  • The light green is for I/O with more than one page, the size of the bar will change slightly with load.  If it changes significantly then this indicates a problem somewhere.

Of course you could just display the total I/O response time = (total duration of I/Os) / (total number of I/Os), but you lose the extra information about the writing of 1 page.

Reading from logs

If an application using persistent messages decides to roll back:

  • MQ reads the log buffers for the transaction’s data and undoes any changes.
  • It may be the data is old and not in the log buffers, so the data is read from the active log data sets.
  • It may be that the request is really old (for example half an hour or more), MQ reads from the archive logs (perhaps on tape).

Looking at the application doing a roll back, and having to read from the log.

  • Reading from buffers is OK.   A large number indicates application problem or a DB/2 deadlock type problem.  You should investigate why there is so much rollback activity
  • Reading from Active logs … . this should be infrequent.  It usually indicates an application coding issue where the transaction took too long before commit.  Perhaps due to a database deadlock, or bad application design (where there is a major delay before the commit)
  • Reading from Archive logs… really bad news…..  This should never happen.

Displaying reads from LOGS

Today         XXXXXXXXXXXXXXXXXXXXXXXX
Last week     X
One hour ago  XXXXX
rate          0        10    20     40

Where green is “read from buffer”, orange is “read from active log”, red is “read from Archive log. Today looks a bad day”.

Do we still need the wine maker’s nose, the mechanics ear and the performance analyst’s glasses?

When I left University one of my university friends went into the wine industry.  We met up a few years later and said that his nose was more useful than his PhD in Chemistry.  Although they had moved towards gas chromatography (which gave you a profile of all of the chemicals in the wine), this was good at telling you if there were bad chemicals in the brew, but not if it would be a good vintage, for that they needed the human nose.

My father would tune his motorbike by listening to it.  He said the bike would tell you when you had tuned it just right, and got it “in the sweet spot”.  These days you plug the computer in and the computer tells you what to do.  A friend of mine had an expensive part replaced, because the computer said so.  A week later he took the car back to the garage because the computer “knew” there was a problem with the same expensive part, and said it should be replaced.    This time the more experienced mechanic cleaned a sensor and solved the problem.  Computers do not always know best.

When I first started in the performance role, the RMF performance reports were bewildering.   These reports were lots of numbers in a small font (so you needed your glasses).  Worse than that, they had several reports on the same page, and to a novice there was a blur of numbers.  Someone then helped me with comments like, you can ignore all the data, except for this number  3 inches in and 4 inches down.  That should be less than 95%.  On this other page – check this column is zeros, and so on.  As you gain more experience in performance, you get to know the “smell” of the data.  It just needs a quick sniff test to check things are OK.  If not, then it takes more time to dig into the data.

There are many tools for processing the SMF data and printing out reports full of numbers, but they add little value.    “The disconnect time is 140 microseconds” – is this good, or is it bad, it better than a disconnect time of 100?.    If the tools were smart enough to say “The disconnect time is 140 microseconds. This value should typically be zero” then this give you useful information instead of just data.

If you think that they could control the Starship Enterprise from one operations desk, they clearly did not have all of the raw data displayed.  It must have been smart enough to report “The impulse engines are running hot: colour red, suggest you reduce power”, because that is what Scottie the engineer kept saying.

If there were smart reports of the problems rather than just displaying data, it would reduce the skill needed to interpret reports, and the need for the performance analyst’s glasses.  Producing these smart reports is difficult and needs experience to know what is useful, and what is just confusing.

Sometimes it feels like the statistics produced have not been thought through.  One example I recently experienced; there is a counter of  the number of reads+writes to disk rather than cache.  For reads, there should be no reads from disk.  For writes, it may be good to write directly to disk, and not flood the cache.  Instead of one number for reads, and one number for write, there is one number for both.  So If I had 10 disk reads, 10 disk writes and 10 disk accesses – is this good or bad ? I don’t know.   This is not a head banging problem, as you usually have only reads or only writes – but not both.  I just had to use my nose,  10 million would be a problem, just 10 – not a problem, and I’ll still need my glasses.

 

What data set is my C program using?

I wanted to know what data set my C program was using.  There is a facility BPXWDYN: a text interface to dynamic allocation and dynamic output designed for REXX users, but callable from C.

This is not very well documented, so here is my little C program based on the sample IBM provided.

The documentation says use  RTARG dsname = {45,”rtdsn”};  but this is for alloc.  With “info” it gives the error message IKJ56231I TEXT UNIT X’0056′ CONTAINS INVALID KEY .  Which basically means rtdsn is not value.     I had to use RTARG dsname = {45,”INRTDSN”};

#include <stdio.h> 
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main(int argc, char * argv[]) {
typedef int EXTF();
#pragma linkage(EXTF,OS)
EXTF *bpxwdyn=(EXTF *)fetch("BPXWDY2 ");
int i,j,rc;
typedef struct s_rtarg {
short len;
char str[260];
} RTARG;
char *info ="info DD(APF1) ";

RTARG dsname = {45,"INRTDSN"}; // not rtdsn as the doc says
RTARG ddname = {9,"INRTDDN"}; // not rtddn as the doc says
RTARG volser = {7,"INRTVOL"};
RTARG msg = {3,"MSG "};
RTARG m[4] = {258,"msg.1",258,"msg.2",258,"msg.3",258,"msg.4"};

rc=bpxwdyn(info,&dsname,&ddname,&volser,
&msg,&m[0],&m[1],&m[2],&m[3]);
if (rc!=0) printf("bpxwdyn rc=%X %i\n",rc,rc);

if (*ddname.str) printf("ddname=%s\n",ddname.str);
if (*dsname.str) printf("dsname=%s\n",dsname.str);
if (*volser.str) printf("volser=%s\n",volser.str);
for (i=0,j=atoi(msg.str);i<j && i<4;i++)
printf("%s\n",m[i].str);

return;
}

 

How hard is it to delete lots of data sets? – Easy!

I was configuring a product and had some problems, so I needed to clean up.  I had hundreds of VSAM clusters.  I started using ISPF 3.4 and using the delete line command, but you have extra typing to do for VSAM files, so I gave up.

I had a faint memory of using a MASK to delete things, and a quick search gave me

//S1 EXEC PGM=IDCAMS 
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DELETE COLIN.O.RTE.RK* MASK
/*

Which deleted all my data sets.

Wasn’t this easy!

How do I do things with a subset of PDS members matching a pattern?

There are some clever things you can do on a subset of members of a PDS.

If you use ISPF  (Browse) or ISPF 2 (Edit) you can specify a data set name of

  • ‘COLIN.AAA.PROCLIB(%%%%%%00)’ and it displays only the members ending in 00.
  • ‘COLIN.AAA.PROCLIB(*AH*)’ to display all member with an AH in the name.
  • ‘COLIN.AAA.PROCLIB’  for all of the members.

If you use ISPF 3;4 I havent found a way of doing the same.

Acting on a subset.

If you have a list of members, for example ISPF 1,2,3;4  you can issue a primary command

sel *99 e 

which says select all those members ending in  99, and use the command “e” in front.  Similary  sel %%%%%%00 b.

Sorting the list

You can sort the list by many fields, name, size last changed.  For example “Sort Name”.

I have “Tab to point-and-shoot fields” enabled.   I can tab to column headers, and press enter.   The rows are sorted by this column.

I often use “sort changed” to find the ones I changed recently, and “sort id” to see who else has been changing the members.

Srchfor

I use “srchfor ” or “srchfor value” to look for the members containing a string (or two).

When this command has completed tab to “prompt” and press enter, or enter “sort prompt” to sort the members with hit to the top of the list.

Refresh

If the member list has changed, you can use “refresh” to refresh it.

 

 

How do I compare the directories of two PDS(E)s?

I wanted to compare  two directories to find the differences.   I could see that the number of members was different, but it was hard to see what was missing.

I browsed the web, and found that this was a commonly asked question, and often the solution was to write some Rexx and use the ISPF LM* functions.  I felt this was the wrong way.  

I had used Superc to compare members of different files – could it tell me same information about the member list – yes!

SuperC has different compare types

  1. File –  Compares source data sets for differences, but does not show what
    the differences are.
  2. Line – Compares source data sets for line differences.  It is record-oriented and points out inserted or deleted lines.
  3. Word – Compares source data sets for word differences.  If two data sets contain the same words in the same order, SuperC considers them to be identical, even if those words are not
    on the same lines.
  4. Byte – Compares source data sets for byte differences.  This compare type is most useful for comparing  machine readable data.

Example output of the File comparison type.

NEW: COLIN.ZZZ.PROCLIB  OLD: HLQ.Y.ABCNPARU                                                                                     
MEMBER SUMMARY LISTING (FILE COMPARE)                                                                                     
DIFF SAME MEMBERS   N-BYTES O-BYTES N-LINES O-LINES  HASH1 HASH2 
                                                                                     
 **       ABC11111   171120  173200    2139    2165  78D5C 1113D
      **  ABC9999       640     640       8       8  AB58A AB58A 
     

We can see

  • ABC1111 is different because the “**” in the DIFF column, and the hash code at the right is different
  • ABC9999 is the same in each because the “**” is in the SAME column, and the hash value is the same

You also get a summary of differences

   10   TOTAL MEMBER(S) PROCESSED AS A PDS 
    1   TOTAL MEMBER(S) PROCESSED HAD CHANGES 
    9   TOTAL MEMBER(S) PROCESSED HAD NO CHANGES 
    9   TOTAL NEW FILE MEMBER(S) NOT PAIRED 
  179   TOTAL OLD FILE MEMBER(S) NOT PAIRED 

List of members not in both

MEMBER SUMMARY LISTING (FILE COMPARE)                                 
NON-PAIRED NEW FILE MEMBERS | NON-PAIRED OLD FILE MEMBERS               
     ABC$$$$$               |       ZAA$$$$ 
     ABCSCLRR               |       ZYZAPST5 
                            |       ZYZAPST6
  • Member ABC$$$$ and one other are in the “new” PDS, but not in the “old” PDS.
  • Member ZAA$$$$ and 2 others are in “old” PDS, but not in the “new” PDS.

Like most things – easy – once you know how do to it!

Using Line mode

When I used line mode I got output like

                                                  N-LN# O-LN# 
I - SYSNAME                      &SYSNAME.        00004 00003
D - SYSNAME                      S0W1 

For one member, the  “new-file” at line 4 was similar to the line in the “old-file” at line 3.

To get from the old file to the new file, delete the line with S0W1 in it and insert the line with &SYSNAME.

Avoiding I/O by caching your PDSEs (It might not be worth it)

When you use most PDS datasets, the data has to be read from disk each time. (The exception is data sets in the Linklist LookAside(LLA) which do get cached. This blog post explains the set up to get your PDSEs cached in z/OS. There is a Red book Partitioned Data Set Extended Usage Guide SG24-6106-01 which covers this topic.
One of the benefits of using a PDSE is that you can get the data sets cached in Hiperspace in z/OS memory.

A C program I am working on takes about 8 seconds to compile in batch, and spends less than half a second doing I/O, so caching your PDSEs may not give you much benefit. You should try it youself as mileage may vary.

SMSPDSEs

The caching of information for PDSEs is doing in the SMSPDSE component of SMS.

You can have two addresses spaces for caching PDSE data sets

  1. SMSPDSE caches the directory of PDSE data sets.  It also caches PDSEs that are contained in the LNKLIST.  SMSPDSE is configured using the parmlib concatenation member IGDSMSxx.  If you want to change the configuration you have to re ipl.
  2. SMPPDSE1. This is used to cache other eligible PDSEs.   SMSPDSE1 is configured using the parmlib concatenation member IGDSMSxx. You can issue a command to restart this address space, and pick up any parameter changes – this is why is is known as the restartable address space. 

It is easy to create the SMPDSE1 address space.  It is described here.

Making PDSE data sets eligible for caching.

It is more complex than just setting a switch on a data set.

The Storage Class controls whether a PDSE is eligible for caching.  It is more complex than just setting a simple switch.  The eligibility of caching is controlled by the Direct MilliSecond Response time.  (Which means the Response time in MilliSeconds of Direct (non sequential) requests).  If you use ISMF to display the Storage Classes, one of the fields is the Direct MSR.  The documentation says If the MSR is < 9 then the value is “must cache”, 10 -998 “may cache”, 999 “never cache”.  I only got caching if MSR was <= 9.

If you change the Storage Class remember to use the command setsms scds(SYS1.S0W1.DFSMS.SCDS) to refresh SMS.
Change your data set to use the appropriate Storage Class with the valid Direct MSR.

By default the SMSPDSE1 address space caches the PDSE until the data set is closed.  This means that PDSEs are not cached between jobs.  You can change this using the commands

setsms PDSE1_BUFFER_BEYOND_CLOSE(YES)
VARY SMS,PDSE1,RESTART

Or just update the parameter in the parmlib IGDSMSxx member.
If you now use your PDSE it should be cached in Hiperspace.

You can use the command d sms,pdse1,hspstats to see what is cached.

This gave me

D SMS,PDSE1,HSPSTATS                                                   
IGW048I PDSE HSPSTATS Start of Report(SMSPDSE1) 531                    
HiperSpace Size: 256 MB                                                
LRUTime : 50 Seconds   LRUCycles: 200 Cycles                           
BMF Time interval 300 Seconds                                          
---------data set name-----------------------Cache--Always-DoNot       
                                             Elig---Cache--Cache
CSQ911.SCSQAUTH                                N      N      N         
CSQ911.SCSQMSGE                                N      N      N         
CSQ911.SCSQPNLE                                N      N      N         
CSQ911.SCSQTBLE                                N      N      N         
CBC.SCCNCMP                                    N      N      N         
CEE.SCEERUN2                                   N      N      N
COLIN.JCL                                      Y      Y      N         
COLIN.SCEEH.SYS.H                              Y      Y      N         
COLIN.SCEEH.H                                  Y      Y      N         
PDSE HSPSTATS  End of Report(SMSPDSE1)

The CSQ9* data sets are PDSEs in Link List.  The COLIN.* data sets are my PDSEs in storage class SCAPPL.   They have Always Cache specified.  If you restart the SMSPDSE1 address space, the cache will be cleared.

You can use the commands

  • d sms,pdse1,hspstats,DSN(COLIN.*)  to display a subset of data sets
  • d sms,pdse1,hspstats,STORCLAS(SCAPPL) to display the data sets in a storage class

SMF data on datasets

There were SMF 42.6 records for the SMSPDSE1 address space showing I/O to the PDSEs.
My jobs doing I/O to the PDSEs did not have a record for the PDSE in the SMF 42.6.

SMF data on SMSPDSE* buffer usage

Below is the printout from the SMF 42 subtype 1 records.

  • BMF:==TOTAL==
    • Data pages read: 20304 read by BMF: 567 <not read by BMF: 19737 ( 97 %) >
    • Directory pages read: 649 read by BMF: 642 <not read by BMF: 7 ( 1 %) >
  • SC:SCBASE
    • Data pages read: 183 read by BMF: 0 <not read by BMF: 183 (100 %)>
    • Directory pages read: 64 read by BMF: <60 not read by BMF: 4 ( 6 %) >
  • SC:SCAPPL
    • Data pages read: 567 read by BMF: 567 <not read by BMF: 0 ( 0 %) >
    • Directory pages read: 472 read by BMF: 472 <not read by BMF: 0 ( 0 %) >
  • SC:**NONE**
    • Data pages read: 19554 read by BMF: 0 <not read by BMF: 19554 (100 %)>
    • Directory pages read: 113 read by BMF: 110 <not read by BMF: 3 ( 2 %)>

We can see that for Storage Class SCAPPL all pages requested were in the cache.

Will this speed up my thousands of C compiles ?

Not necessarily.  See the problems I had.

  • The C header files are in a PDS – not a PDSE, so you would have to convert the PDSs to PDSEs
  • The C compiler uses the SEARCH(“CEE.SCEE.H.*”) option which says read from this library.   This may override your JCL if you decide to create new PDSEs for the C header files.
  • When I compiled in USS my defaults had SEARCH(/usr/include/).  This directory was on ZFS.Z24A.VERSION a ZFS file system.   The files on the ZFS may be cached.

When I ran my compile,there were 31 SMF 42.6 records for CEE.SCEE.H, giving a total of 111 I/Os, there were 2 records for CEE.SCEE.SYS.H with a total I/O count of 14.  If each I/O takes 1 millisecond this is 125 milliseconds doing disk I/O to the PDS, so I expect it is not worth converting compiles to use PDSEs and caching them.

 

Why does my C compile fail if I remove data sets I do not use?

I was playing with caching of header file PDSEs when I compiled a C program. I could see from the SMF 42.6 records that CEE.SCEEH.H PDS was being used.  It took nearly two hours before my job did not use this PDS!
I created a PDSE called COLIN.SCEEH.H and copied CEE.SCEEH.H into it.  I updated my JCL to use the new libraries,  reran my job and the SMF records show I was till using CEE.SCEEH.H.  Hmm this was very strange.

I renamed CEE.SCEEH.H to COLIN.CEE.SCEEH.H.  Did it work ?   No – I got compile errors, so I renamed it back again.  Removing the data set clearly does not work.

I then spotted in the compiler listing that I had the default SEARCH(//’CEE.SCEEH.+’).   I added SE(//’COLIN.SCEEH.+’)  and thought Fixed it!  No … still not fixed,  it still used CEE.SCEEH…

I had to use C options NOSEARCH, SE(//’COLIN.SCEEH.+’) .  The first option turns off the SEARCH(//’CEE.SCEEH.+’) , and the second one creates a new one.   After a cup of tea and a biscuit I remembered I had hit this about 20 years ago!

Undestanding SMF 42.6 data set statistics

DFSMS provides SMF records to report activity, for example

  1. Data set statistics.
  2. Storage class statistics.
  3. SMS buffer usage.

This blog post describes the data set statistics in SMF 42 records, subtype 6.    IBM does not provide a formatter for these statistics, but there are formatters available from other companies.

My formatter written in C is available on Github.

Overview

There are statistics available in the following areas

  1. Job name (neither step name nor program name)
  2. Data set name, type (Linear, PDSE etc), volume (A4USR1) etc
  3. Number of I/Os, and number of read I/Os
  4. Number of normal cache requests, and number of successful cache requests
  5. Number of other cache requests, Sequential, 1 Track, cache bypass.  See Understanding Storage Controller caching and the caching statistics.
  6. Average response time – and where the time is spent. See Pending, disconnect and other gobbledegook.
  7. Number of application  requests (Access Manager) and what type (sequential, direct, or directory) .  Your application may issue 10 fread() requests, but only one disk I/O was done.

It feels like a bit like a “dogs’ dinner” (a dish with some very good stuff, and some other stuff, all mixed up).

When are the records produced?

  1. When the file is closed.
  2.  Immediately after the recording of the type 30 interval record.
  3. For PDSEs,  from the SMSPDSE and SMSPDSE1 address spaces; after the the BMFTIME interval.

Sometimes an SMF record has data for more than one data set, other times, there is one dataset per record, and so multiple SMF 4.26 records per job step.  I could not find the reason for which one is used.  It may be as follows

  • If the data set is closed write one data set record per job.
  • Else if the SMF 30 time interval expired – write as many data set records as will fit in the SMF record.

What data is collected ?

Data is collected on reads and writes.  The I/O for reading the VTOC to find the dataset, or opening and closing the file are not recorded in the SMF 42.6.  They are recorded in SMF 42.5 records.  I had a program which read from a PDS member and wrote it to a sequential file.  For the PDS there were

  1. 4 I/Os to the VTOC – including 2 to the VTOC index
  2. 1 I/O to the PDS directory.  The entries in the directory are not sorted, multiple directory entries may need to be read to locate the required entry.
  3. 6 I/Os to the PDS member – reported in the SMF 42.6 record
  4. 1 I/O to the PDS directory for the close
  5. 1 I/O to the VTOC on the close.

Data set information.

You get information on most data sets

  1. Physical sequential eg a listings file
  2. PDS’s such as SYS1.MACLIB
  3. Linear data sets, such as MQ or DB2 logs and page set sets
  4. Striped data sets, such as MQ logs, have a record for each volume used.  I had a log with two stripes, and had two records, one for each volume.
  5. Temporary data set may have no I/O to disk if the data can be kept in VIO (virtual memory).  It may still have Application Manager reads and write.

But only some data on PDSEs.    The read I/Os were from the SMSPDSE address space which caches PDSE directory information.   When the binder output a load module to a PDSE load libary there were SMF 42.6 records for the write, from the SMSPDSE address space. There were no records produced when I had a C program write to a member of a PDSE.

Storage Controller caching information.

For a disk read, if the data is in the cache of the Storage Controller, the data can be returned immediately;  this is a cache hit.  If not, then a disk will have to be read to get the data; this is a cache miss.

For a disk write, usually the data is written to the Non Volatile Cache, and the request returns. This is known as DASD Fast Write (DFW).  At a later time, the data is moved from the cache to the disk.  This is a cache hit.  If the request says do not use cache (Bypass DASD Fast Write) this counts as a cache miss.

  1. There is a count of total normal cache requests (read and write), and a count of cache hits (read and write).  This normal cache request usually process a couple of tracks.
  2. There is a count of total normal cache write requests, and a count of normal cache write hits.
  3. You can calculate the number of normal read caches, from (total cache – write cache).
  4. There is a count of sequential cache requests.   This request tells the storage controller to bring many records into the cache, or this is going to write many records to the cache.   There is one combined counter for read and write requests.
  5. For direct access, reading just one track there is direct access , (what I call 1Track cache).  There is one combined counter for read and write requests.
  6. There is bypass cache where the record is written directly to disk.  There is one combined counter for read and write requests.

I/O response times

The average response time of the request, and the break down into connect time, disconnect time etc is reported.  See Pending, disconnect and other gobbledegook.

There are two sets of response times, the first set are in units of 128 microseconds, from the old days when disk response times were measured in milliseconds, and a newer set, which are in microseconds.   The old ones should be ignored and the more granular set of values used.

Most, but not all, of the response time fields are available in the SMF 42.6 records.

For one data set, I had

  • overall response time of 96 microseconds,
  • time to get to the Storage controller (pending time) 22, 
  • time in the Storage Controller(connect time) 17,
  • time doing to disks(disconnect time) 0.  

The difference is “other time” (57 microseconds) is the time from the IO request was initiated till the Start SubCHannel (SSCH) was issued, and the time from the I/O interrupt indicating I/O complete, before the z/OS issuing the Test SubCHannel(TSCH) to clear the I/O request and get the status.

Access manager counts

Applications usually use facilities like C functions fread() and fwrite(), rather than do the I/O themselves.  This makes the application much simpler, as the Access Method hides most of the complexity of doing the actual I/O.

Consider the scenario where your application opens a file and reads 10 records.  Under the covers the Access Method reads the disk,  a track’s worth of data, and gives the application data when asked.  We have 10 reads to the access manager, and one read from disk.

There are statistics on the number of sequential access, the number of direct access (get this particular record) and the number of directory records accessed – though this always seems to be zero!

Access manager response times

There are response times for the AM requests, I dont think they are very useful.

 The documentation says

  • delay values are in units of 128 microseconds.
  • The I/O delay is calculated by the access method when the access method checks whether an I/O
    buffer is available to be reused. For example, when an access method has issued an I/O request to commit the buffer. The timer starts when the access method makes the check request; it does not start  when the I/O is requested. The timer ends when the I/O request completes. Thus, the delay value that is reported is the amount of time that the caller had to wait to reuse a buffer.

It is not clear to me what this means.  I think it means there is logic like

  1. start asynchronous read 1
  2. start asynchronous read 2
  3. do something
  4. start timer
  5. check read 1 and wait till it has finished
  6. stop timer, and calculate the duration
  7. start timer
  8. check read 2 and wait till it has finished
  9. stop timer and calculate the duration
  10. Add up all the durations and report the sum.

In the documentation for these field, some fields say “total I/O delay”, others say “I/O delay”.  It looks like the field is total I/O delay.

Reading a file, it had 527 I/Os, with an average I/O response time of 527 microseconds.  The AM statistics had number of sequential read blocks 527, sequential read time 314368 microseconds, or an average of 597 microseconds, which is consistent with the disk I/O response time.

I only seemed to get AM delays when the average disk I/O response time was over 500 microseconds.

Do the numbers make sense?

The answer is sometimes.  I have cases where the numbers do not add up.

  1. For the data set CEE.SCEERUN, it had 21 I/Os, 21 reads from cache, but 14 disk reads.  I did an IO trace, and there were 21 reads from normal cache.  I do not know where the missing 7 were reported.  There was no write activity.
  2. For an SMF data set, this had 586 I/Os, 586 using Sequential cache, and 586 read requests.  This makes sense.
  3. For CBC.SCCNCMP  this had 305 I/Os, 5 normal reads from cache, 300 from Sequential cache, and number of disk reads 305.  This makes sense.
  4. When using the binder my PDSE had 21 I/Os,2 Read cache 12 write cache, and 7 Sequential cache, number of disk reads 2.  For this to make sense the 7 sequential cache would be for the write.

How do you tell the number of read I/Os. 

This is a bit tricky.  First, ask yourself – do you care ? Or are you really interested in the total I/Os.

  1.  If the number of I/OS = (number of cache requests – number of write cache requests), this is the number of read I/Os.
  2. If the number of I/Os != number of cache requests, there should be counts for sequential, 1 Track, or No Cache.  You cannot tell if these are for reads or writes.

Other interesting “features”

  1. The userid is blank.   The documentation says User-defined identification field (taken from common exit parameter area, not from USER=parameter on job statement).  I do not think Ihave a common exit.
  2. There is no step name, just the job name.  I was able to manually connect the SMF 30 records to the SMF 42.6 records by using the time stamp interval of the data set.
  3. There are SMF type 14 records for input data sets,  this was available when z/OS was MVS/360.  It’s partner SMF 15 is for output data sets.  It records by DDName, not data set name, and so is not very useful.  it provides an overlap of information with the SMF 42.6.   You have to enable the collection of SMF 14,15.  As you get one SMF record per DDNAME enty, you get many SMF records.  Most customers do not collect these records. The DISPLAY SMF command on my system gives SYS(NOTYPE(14:19,62:69,99)) which says do not collect records  with SMF types: 14 through 19, 62 through 69, or 99.
  4. Block size can be 0 – which means use the maximum the device supports.
  5. There is a field “Average disconnect time for reads”.   If a read has disconnect time, then there was a cache miss – the data was not in the 3990 storage cache.  You can get read cache miss, if the data has not been used for a while, and the data has been removed from the cache.
  6. There is a field “Maximum data set I/O response time” and “Service time associated with maximum I/O response time” in units of 128 microseconds.  The difference in these values is the time interval between the I/O interrupt occurring, and the Test SubCHannel, which processed the interrupt. My values were 0.000128 and 0.000000 – not very helpful.  Another set of values was 0.001152 0.000640.   A good idea to have this – but it has the wrong granularity.