How do I do things with a subset of PDS members matching a pattern?

There are some clever things you can do on a subset of members of a PDS.

If you use ISPF (Browse) or ISPF 2 (Edit) you can specify a data set name of

‘COLIN.AAA.PROCLIB(%%%%%%00)’ and it displays only the members ending in 00.
‘COLIN.AAA.PROCLIB(*AH*)’ to display all member with an AH in the name.
‘COLIN.AAA.PROCLIB’ for all of the members.

If you use ISPF 3;4 I havent found a way of doing the same.

Acting on a subset.

If you have a list of members, for example ISPF 1,2,3;4 you can issue a primary command

sel *99 e

which says select all those members ending in 99, and use the command “e” in front. Similary sel %%%%%%00 b.

Sorting the list

You can sort the list by many fields, name, size last changed. For example “Sort Name”.

I have “Tab to point-and-shoot fields” enabled. I can tab to column headers, and press enter. The rows are sorted by this column.

I often use “sort changed” to find the ones I changed recently, and “sort id” to see who else has been changing the members.

Srchfor

I use “srchfor ” or “srchfor value” to look for the members containing a string (or two).

When this command has completed tab to “prompt” and press enter, or enter “sort prompt” to sort the members with hit to the top of the list.

Refresh

If the member list has changed, you can use “refresh” to refresh it.

Avoiding I/O by caching your PDSEs (It might not be worth it)

When you use most PDS datasets, the data has to be read from disk each time. (The exception is data sets in the Linklist LookAside(LLA) which do get cached. This blog post explains the set up to get your PDSEs cached in z/OS. There is a Red book Partitioned Data Set Extended Usage Guide SG24-6106-01 which covers this topic.
One of the benefits of using a PDSE is that you can get the data sets cached in Hiperspace in z/OS memory.

A C program I am working on takes about 8 seconds to compile in batch, and spends less than half a second doing I/O, so caching your PDSEs may not give you much benefit. You should try it youself as mileage may vary.

SMSPDSEs

The caching of information for PDSEs is doing in the SMSPDSE component of SMS.

You can have two addresses spaces for caching PDSE data sets

SMSPDSE caches the directory of PDSE data sets. It also caches PDSEs that are contained in the LNKLIST. SMSPDSE is configured using the parmlib concatenation member IGDSMSxx. If you want to change the configuration you have to re ipl.
SMPPDSE1. This is used to cache other eligible PDSEs. SMSPDSE1 is configured using the parmlib concatenation member IGDSMSxx. You can issue a command to restart this address space, and pick up any parameter changes – this is why is is known as the restartable address space.

It is easy to create the SMPDSE1 address space. It is described here.

Making PDSE data sets eligible for caching.

It is more complex than just setting a switch on a data set.

The Storage Class controls whether a PDSE is eligible for caching. It is more complex than just setting a simple switch. The eligibility of caching is controlled by the Direct MilliSecond Response time. (Which means the Response time in MilliSeconds of Direct (non sequential) requests). If you use ISMF to display the Storage Classes, one of the fields is the Direct MSR. The documentation says If the MSR is < 9 then the value is “must cache”, 10 -998 “may cache”, 999 “never cache”. I only got caching if MSR was <= 9.

If you change the Storage Class remember to use the command setsms scds(SYS1.S0W1.DFSMS.SCDS) to refresh SMS.
Change your data set to use the appropriate Storage Class with the valid Direct MSR.

By default the SMSPDSE1 address space caches the PDSE until the data set is closed. This means that PDSEs are not cached between jobs. You can change this using the commands

setsms PDSE1_BUFFER_BEYOND_CLOSE(YES)
VARY SMS,PDSE1,RESTART

Or just update the parameter in the parmlib IGDSMSxx member.
If you now use your PDSE it should be cached in Hiperspace.

You can use the command d sms,pdse1,hspstats to see what is cached.

This gave me

D SMS,PDSE1,HSPSTATS                                                   
IGW048I PDSE HSPSTATS Start of Report(SMSPDSE1) 531                    
HiperSpace Size: 256 MB                                                
LRUTime : 50 Seconds   LRUCycles: 200 Cycles                           
BMF Time interval 300 Seconds                                          
---------data set name-----------------------Cache--Always-DoNot       
                                             Elig---Cache--Cache
CSQ911.SCSQAUTH                                N      N      N         
CSQ911.SCSQMSGE                                N      N      N         
CSQ911.SCSQPNLE                                N      N      N         
CSQ911.SCSQTBLE                                N      N      N         
CBC.SCCNCMP                                    N      N      N         
CEE.SCEERUN2                                   N      N      N
COLIN.JCL                                      Y      Y      N         
COLIN.SCEEH.SYS.H                              Y      Y      N         
COLIN.SCEEH.H                                  Y      Y      N         
PDSE HSPSTATS  End of Report(SMSPDSE1)

The CSQ9* data sets are PDSEs in Link List. The COLIN.* data sets are my PDSEs in storage class SCAPPL. They have Always Cache specified. If you restart the SMSPDSE1 address space, the cache will be cleared.

You can use the commands

d sms,pdse1,hspstats,DSN(COLIN.*) to display a subset of data sets
d sms,pdse1,hspstats,STORCLAS(SCAPPL) to display the data sets in a storage class

SMF data on datasets

There were SMF 42.6 records for the SMSPDSE1 address space showing I/O to the PDSEs.
My jobs doing I/O to the PDSEs did not have a record for the PDSE in the SMF 42.6.

SMF data on SMSPDSE* buffer usage

Below is the printout from the SMF 42 subtype 1 records.

BMF:==TOTAL==
- Data pages read: 20304 read by BMF: 567 <not read by BMF: 19737 ( 97 %) >
- Directory pages read: 649 read by BMF: 642 <not read by BMF: 7 ( 1 %) >
SC:SCBASE
- Data pages read: 183 read by BMF: 0 <not read by BMF: 183 (100 %)>
- Directory pages read: 64 read by BMF: <60 not read by BMF: 4 ( 6 %) >
SC:SCAPPL
- Data pages read: 567 read by BMF: 567 <not read by BMF: 0 ( 0 %) >
- Directory pages read: 472 read by BMF: 472 <not read by BMF: 0 ( 0 %) >
SC:**NONE**
- Data pages read: 19554 read by BMF: 0 <not read by BMF: 19554 (100 %)>
- Directory pages read: 113 read by BMF: 110 <not read by BMF: 3 ( 2 %)>

We can see that for Storage Class SCAPPL all pages requested were in the cache.

Will this speed up my thousands of C compiles ?

Not necessarily. See the problems I had.

The C header files are in a PDS – not a PDSE, so you would have to convert the PDSs to PDSEs
The C compiler uses the SEARCH(“CEE.SCEE.H.*”) option which says read from this library. This may override your JCL if you decide to create new PDSEs for the C header files.
When I compiled in USS my defaults had SEARCH(/usr/include/). This directory was on ZFS.Z24A.VERSION a ZFS file system. The files on the ZFS may be cached.

When I ran my compile,there were 31 SMF 42.6 records for CEE.SCEE.H, giving a total of 111 I/Os, there were 2 records for CEE.SCEE.SYS.H with a total I/O count of 14. If each I/O takes 1 millisecond this is 125 milliseconds doing disk I/O to the PDS, so I expect it is not worth converting compiles to use PDSEs and caching them.

What’s the difference between a PDS and a PDSE?

I’ve been using PDSE’s for years. I thought that PDSE was a slight improvement to a PDS in that you did not have to compress PDSEs like you had to with PDSs, and binding programs require a PDSE.

I’ve found there is a big difference. IBM documents it here. For me the difference are

A PDSE can be larger than a PDS – it can have more extents.
When you delete a member from a PDS, the space is not reclaimed. When you add a member to a PDS it uses up free space “from the free end”. When the PDS is full you have to compress it, and reorganise the space. With a PDSE the data is managed in 4KB pages. When a member is deleted the space is available immediately
With a PDS you can get “directory full”, if you did not allocate enough directory blocks when you created the data set. With a PDSE, if it needs a new “directory block” it gets any free block.
The directory of a PDS is in create order. To find a member you have to search the directory. With a PDSE the directory is indexed.
With a PDS only one thread can update it at a time. With a PDSE, multiple tasks can update it – including in a sysplex.
Old fashioned link edits can go into a PDS or a PDSE. The binder (the enhanced likage editor) can only store modules in a PDSE. One reason is that there is more information in the directory entry.
PDSEs are faster. When you read a PDS there is IO to the disk, firstly to get the directory blocks, to search for the entry, then to read the member from disk. With a PDSE, the system address space SMSPDSE may have cached directory entries, or the pages themselves, and so eliminated the need for IOs. Even if it is not cached the directory search may be shorter.
Some system load libraries have to be PDS and not PDSE, as the PDSE code may not be loaded early in the IPL.

You can find out about PDSEs here