What started off as an investigation in why Java seemed slow on z/OS; was it due to a ZFS tuning problem? It changed into what performance health checks can I do with zFS.
It may be that zFS is so good you do not need to check its status, but I could find no useful reports, on what to check, and found that basic reports are not available, and useful data is missing. I would rather check than assume things are working OK.
- zFS on z/OS concepts, from a performance perspective
- How to collect zFS statistics
- Example of zFS statistics
- zFS performance reports I would like to use on z/OS (but can’t)
Getting the data
Data is available from SMF 92 records. Records are produced on a timer, either the SMF Interval broadcast, or the zFS -smf_recording interval.
Data is available from the zFS commands, for example query -reset -usercache.
If you use the display command, you get the data accumulate since the system was started, or the last reset was issued.
You may want to have a process to issue the display and reset commands periodically to provide a profile throughout the day. Having data accumulated for a whole day does not allow you to see peaks and troughs.
Some data does not include the duration of the data (or reset time), so you cannot directly calculate rates. You might need to save the reset time in a file, and use this to calculate the interval.
query fsinfo includes the reset time; query metacache, usercache and dircache do not include the reset time.
There is an API BPX1PCT(“ZFS “,ZFSCALL_STATS, … This returns the data in a C structure, but z/OS does not seem to provide this as a header file! It provides sample c programs for printing the data for each sort of data.. I do not know if the data is cumulative, or since the last reset.
Consider the simple scenario,
- I have a web server (Liberty on z/OS) for example z/OSMF, z/OS Connect, WAS with people using it.
- There are people developing a Java application
- I have a production Java program which runs every hour, reads in data from a file, does some processing, and puts sends it over HTTP to a monitoring system. This could be reading SMF data, and coverting it to JSON.
What the basic reports did I expect?
The question below would apply to any work, for example a business transaction, using CICS, DB2, MQ and IMS, zFS is just another component within a transaction.
- When I start my Java application – it sometimes takes much longer to start than at other times – 20 seconds longer. What is causing this? Is it due to the delays in reading files or should I look else where?
- For each job, I would like to know the total time spent processing files, and identify the files, used by the job, were most time is spent.
- We had a slow down last week, can we demonstrate that zFS is not the problem?
- Do I need to take any actions on zFS
- Today – because it is slow
- Next week – because I can see an increase in disk I/O over the past few weeks.
- Can I tell which files or file systems are using most of the cache, and what can I do about it?
For each job, I would like to know the total time spent processing files, and identify the files, used by the job, were most time is spent.
This information is not available.
From the SMF 92-11 records you can get some information
- Job name
- File name. Some files are given as /u/adcd/j.sh, other files are given as write.c with no path, just the name used. This is not very helpful, as it means I am unable to identify the specific file used.
- Time file was opened
- Time file was closed (so you can calculate the open duration)
- The number of directory reads. For the file “.” this had 1 read,
- The number of reads, blocks read, and bytes read
- The number of writes, locks written, and bytes written. For example an application did 10,000 writes, with a buffer length of 4096. There were 10,003 blocks written and 40,960,000 bytes written.
This information does not tell you how long requests took. A fread() could require data to be read from the file, or it may be available in the cache.
You cannot get this information from the zfs commands. You can get other information, for example the I wrote to a file and issued the command fileinfo -path /u/adcd/temp.temp -both this gave
path: /u/adcd/temp.temp owner S0W1 file seq read yes file seq write yes file unscheduled 0 file pending 625 file segments 625 file dirty segments 0 file meta issued 0 file meta pending 0
The data is described here.
- unscheduled Number of 4K pages in user file cache that need to be written.
- pending Number of 4K pages being written.
- segments Number of 64K segments in user cache.
- dirty segment Number of segments with pages that need to be written.
Given a filename you can query how many segments it has, but I could not find a way of listing the files in the cache. You would have to search the whole tree, and query each file to find this. This operation would significantly impact the metadata cache.
We had a slow down last week. Can we demonstrate that zFS is not the problem?
You can get information on
- the number of pages in the various pools
- the number of reads from the file system, and the number of requests that were available from the cache – the cache hit ratio. A good cache hit is typically over 95%.
- Steal Invocations tells you if the cache was too small, so pages had to be reused.
- The I/O activity (number of reads and writes, and number of bytes) by file system.
- The average I/O wait time by volume.
- The number of free pages never goes down, you can use it to see the highest number of pages in use, since ZFS started. It it reached 95% full on Monday – it will stay at 95% until restart.
If you compare the problem period with a normal period you should be able to see if the data is significantly different.
You need to decide how granular you want the data, for example capture it every 10 minutes, or every minute.
Do I need to take any actions on ZFS?
Today – because it is slow
Display the key data for the cache, cache hits, compare the amount of I/O today with a comparable day.
I do not think there are any statistics to tell you how much to increase the size of the cache. Making the cache bigger may not always help performance, for example if a program is writing a 1GB file, then while the cache is below 1GB it will flood the cache with pages to be written, and read only pages will have been overwritten.
You can monitor the number of reads and writes per file, and the number of file system I/Os, but you cannot directly see the files causing the file system I/O.
If there is a lot of sustained I/O to a file system, you may want to move it to a less heavily used volume, or move subdirectories to a different file system, on a different volume.
There are several caches: User Cache, Meta data cache, VNode cache, Log cache. The size of these can all be reconfigured, but I cannot see how to tell how full they are, and if they need to be increased in size.
Can I tell which files or file systems are using most of the cache, and what can I do about it?
The SMF record 92-59 contains the number of pages the file system has in the user cache, and in the meta cache.
The field SMF92FSUS has the number of pages this file system has allocated in the user cache.
The field SMF92FSMT has the number of pages this file system has allocated in the meta data cache
For 40 file systems, the time the record was created was within 2ms, so you should be able to group records with a similar time stamp, for example save the data, and show % buffers per file system.
The command fsinfo -full -aggregate ZFS.USERS provides the same information. It gave me
Statistics Reset Time: May 30 11:09:51 2021 Status:RW,NS,GF,GD,SE,NE,NC Legend: RW=Read-write, GF=Grow failed, GD=AGGRGROW disabled NS=Mounted NORWSHARE, SE=Space errors reported, NE=Not encrypted NC=Not compressed *** local data from system S0W1 (owner: S0W1) *** Vnodes: 48 LFS Held Vnodes: 4 Open Objects: 0 Tokens: 0 User Cache 4K Pages: 5011 Metadata Cache 8K Pages: 39 Application Reads: 11239 Avg. Read Resp. Time: 0.046 Application Writes: 22730 Avg. Writes Resp. Time: 0.081 Read XCF Calls: 0 Avg. Rd XCF Resp. Time: 0.000 Write XCF Calls: 0 Avg. Wr XCF Resp. Time: 0.000 ENOSPC Errors: 1 Disk IO Errors: 0
This also showed:
- there was 1 no-space error
- Status had
- GF=Grow failed
- GD=AGGRGROW disabled
- There were 48 Vnodes (files) in the meta cache.
It looks like the Application Reads and Writes are true application requests. I had a program which wrote 10,000 4KB records, and the Application writes increased by 10002. The reads increased by 23 event. I think this due to the running of the program.
The command also gave
VOLSER PAV Reads KBytes Writes KBytes Waits Average ------ --- ---------- ---------- ---------- ---------- ---------- --------- A4USS2 1 55 532 1658 91216 83 0.990 ------ --- ---------- ---------- ---------- ---------- ---------- --------- TOTALS 55 532 1658 91216 83 0.990
The number of write ( to the file system) increased by 630, the KB written increased by 40,084KB which is the approximate size of the file (40,000KB)
You can use the command fileinfo -path /u/adcd/aa -both and it will display information about the file system the file is on.
Although you can see how much data was written to the file system, I could not find easily find which file it came from. The SMF 92-11 records can give an indication, but writing 10MB to a file, and deleting the file may mean no data is written to disk, so the SMF 92-11 records are not 100% reliable.