DFSMS provides SMF records to report activity, for example
- Data set statistics.
- Storage class statistics.
- SMS buffer usage.
This blog post describes the data set statistics in SMF 42 records, subtype 6. IBM does not provide a formatter for these statistics, but there are formatters available from other companies.
My formatter written in C is available on Github.
There are statistics available in the following areas
- Job name (neither step name nor program name)
- Data set name, type (Linear, PDSE etc), volume (A4USR1) etc
- Number of I/Os, and number of read I/Os
- Number of normal cache requests, and number of successful cache requests
- Number of other cache requests, Sequential, 1 Track, cache bypass. See Understanding Storage Controller caching and the caching statistics.
- Average response time – and where the time is spent. See Pending, disconnect and other gobbledegook.
- Number of application requests (Access Manager) and what type (sequential, direct, or directory) . Your application may issue 10 fread() requests, but only one disk I/O was done.
It feels like a bit like a “dogs’ dinner” (a dish with some very good stuff, and some other stuff, all mixed up).
When are the records produced?
- When the file is closed.
- Immediately after the recording of the type 30 interval record.
- For PDSEs, from the SMSPDSE and SMSPDSE1 address spaces; after the the BMFTIME interval.
Sometimes an SMF record has data for more than one data set, other times, there is one dataset per record, and so multiple SMF 4.26 records per job step. I could not find the reason for which one is used. It may be as follows
- If the data set is closed write one data set record per job.
- Else if the SMF 30 time interval expired – write as many data set records as will fit in the SMF record.
What data is collected ?
Data is collected on reads and writes. The I/O for reading the VTOC to find the dataset, or opening and closing the file are not recorded in the SMF 42.6. They are recorded in SMF 42.5 records. I had a program which read from a PDS member and wrote it to a sequential file. For the PDS there were
- 4 I/Os to the VTOC – including 2 to the VTOC index
- 1 I/O to the PDS directory. The entries in the directory are not sorted, multiple directory entries may need to be read to locate the required entry.
- 6 I/Os to the PDS member – reported in the SMF 42.6 record
- 1 I/O to the PDS directory for the close
- 1 I/O to the VTOC on the close.
Data set information.
You get information on most data sets
- Physical sequential eg a listings file
- PDS’s such as SYS1.MACLIB
- Linear data sets, such as MQ or DB2 logs and page set sets
- Striped data sets, such as MQ logs, have a record for each volume used. I had a log with two stripes, and had two records, one for each volume.
- Temporary data set may have no I/O to disk if the data can be kept in VIO (virtual memory). It may still have Application Manager reads and write.
But only some data on PDSEs. The read I/Os were from the SMSPDSE address space which caches PDSE directory information. When the binder output a load module to a PDSE load libary there were SMF 42.6 records for the write, from the SMSPDSE address space. There were no records produced when I had a C program write to a member of a PDSE.
Storage Controller caching information.
For a disk read, if the data is in the cache of the Storage Controller, the data can be returned immediately; this is a cache hit. If not, then a disk will have to be read to get the data; this is a cache miss.
For a disk write, usually the data is written to the Non Volatile Cache, and the request returns. This is known as DASD Fast Write (DFW). At a later time, the data is moved from the cache to the disk. This is a cache hit. If the request says do not use cache (Bypass DASD Fast Write) this counts as a cache miss.
- There is a count of total normal cache requests (read and write), and a count of cache hits (read and write). This normal cache request usually process a couple of tracks.
- There is a count of total normal cache write requests, and a count of normal cache write hits.
- You can calculate the number of normal read caches, from (total cache – write cache).
- There is a count of sequential cache requests. This request tells the storage controller to bring many records into the cache, or this is going to write many records to the cache. There is one combined counter for read and write requests.
- For direct access, reading just one track there is direct access , (what I call 1Track cache). There is one combined counter for read and write requests.
- There is bypass cache where the record is written directly to disk. There is one combined counter for read and write requests.
I/O response times
The average response time of the request, and the break down into connect time, disconnect time etc is reported. See Pending, disconnect and other gobbledegook.
There are two sets of response times, the first set are in units of 128 microseconds, from the old days when disk response times were measured in milliseconds, and a newer set, which are in microseconds. The old ones should be ignored and the more granular set of values used.
Most, but not all, of the response time fields are available in the SMF 42.6 records.
For one data set, I had
- overall response time of 96 microseconds,
- time to get to the Storage controller (pending time) 22,
- time in the Storage Controller(connect time) 17,
- time doing to disks(disconnect time) 0.
The difference is “other time” (57 microseconds) is the time from the IO request was initiated till the Start SubCHannel (SSCH) was issued, and the time from the I/O interrupt indicating I/O complete, before the z/OS issuing the Test SubCHannel(TSCH) to clear the I/O request and get the status.
Access manager counts
Applications usually use facilities like C functions fread() and fwrite(), rather than do the I/O themselves. This makes the application much simpler, as the Access Method hides most of the complexity of doing the actual I/O.
Consider the scenario where your application opens a file and reads 10 records. Under the covers the Access Method reads the disk, a track’s worth of data, and gives the application data when asked. We have 10 reads to the access manager, and one read from disk.
There are statistics on the number of sequential access, the number of direct access (get this particular record) and the number of directory records accessed – though this always seems to be zero!
Access manager response times
There are response times for the AM requests, I dont think they are very useful.
The documentation says
- delay values are in units of 128 microseconds.
- The I/O delay is calculated by the access method when the access method checks whether an I/O
buffer is available to be reused. For example, when an access method has issued an I/O request to commit the buffer. The timer starts when the access method makes the check request; it does not start when the I/O is requested. The timer ends when the I/O request completes. Thus, the delay value that is reported is the amount of time that the caller had to wait to reuse a buffer.
It is not clear to me what this means. I think it means there is logic like
- start asynchronous read 1
- start asynchronous read 2
- do something
- start timer
- check read 1 and wait till it has finished
- stop timer, and calculate the duration
- start timer
- check read 2 and wait till it has finished
- stop timer and calculate the duration
- Add up all the durations and report the sum.
In the documentation for these field, some fields say “total I/O delay”, others say “I/O delay”. It looks like the field is total I/O delay.
Reading a file, it had 527 I/Os, with an average I/O response time of 527 microseconds. The AM statistics had number of sequential read blocks 527, sequential read time 314368 microseconds, or an average of 597 microseconds, which is consistent with the disk I/O response time.
I only seemed to get AM delays when the average disk I/O response time was over 500 microseconds.
Do the numbers make sense?
The answer is sometimes. I have cases where the numbers do not add up.
- For the data set CEE.SCEERUN, it had 21 I/Os, 21 reads from cache, but 14 disk reads. I did an IO trace, and there were 21 reads from normal cache. I do not know where the missing 7 were reported. There was no write activity.
- For an SMF data set, this had 586 I/Os, 586 using Sequential cache, and 586 read requests. This makes sense.
- For CBC.SCCNCMP this had 305 I/Os, 5 normal reads from cache, 300 from Sequential cache, and number of disk reads 305. This makes sense.
- When using the binder my PDSE had 21 I/Os,2 Read cache 12 write cache, and 7 Sequential cache, number of disk reads 2. For this to make sense the 7 sequential cache would be for the write.
How do you tell the number of read I/Os.
This is a bit tricky. First, ask yourself – do you care ? Or are you really interested in the total I/Os.
- If the number of I/OS = (number of cache requests – number of write cache requests), this is the number of read I/Os.
- If the number of I/Os != number of cache requests, there should be counts for sequential, 1 Track, or No Cache. You cannot tell if these are for reads or writes.
Other interesting “features”
- The userid is blank. The documentation says User-defined identification field (taken from common exit parameter area, not from USER=parameter on job statement). I do not think Ihave a common exit.
- There is no step name, just the job name. I was able to manually connect the SMF 30 records to the SMF 42.6 records by using the time stamp interval of the data set.
- There are SMF type 14 records for input data sets, this was available when z/OS was MVS/360. It’s partner SMF 15 is for output data sets. It records by DDName, not data set name, and so is not very useful. it provides an overlap of information with the SMF 42.6. You have to enable the collection of SMF 14,15. As you get one SMF record per DDNAME enty, you get many SMF records. Most customers do not collect these records. The DISPLAY SMF command on my system gives SYS(NOTYPE(14:19,62:69,99)) which says do not collect records with SMF types: 14 through 19, 62 through 69, or 99.
- Block size can be 0 – which means use the maximum the device supports.
- There is a field “Average disconnect time for reads”. If a read has disconnect time, then there was a cache miss – the data was not in the 3990 storage cache. You can get read cache miss, if the data has not been used for a while, and the data has been removed from the cache.
- There is a field “Maximum data set I/O response time” and “Service time associated with maximum I/O response time” in units of 128 microseconds. The difference in these values is the time interval between the I/O interrupt occurring, and the Test SubCHannel, which processed the interrupt. My values were 0.000128 and 0.000000 – not very helpful. Another set of values was 0.001152 0.000640. A good idea to have this – but it has the wrong granularity.