Can I share a VSAM file (ZFS) between systems?

I had the situation where I am using ZD&T – which is a z/OS emulator running on Linux, where there 3390 disks are emulated on Linux files. I have an old image, and a new image, and I want to use a ZFS from the new image on the old image to test out a fix.

The high level answer to the original question is “it depends”.

Run in a sysplex

This is how you run in a production environment. You have a SYSPLEX, and have a (master) catalog shared by all systems. I cannot create the environment in zD&T. Setting up a sysplex is a lot of work for a simple requirement.

Copy the Linux file

Because the 3390 volumes are emulated as Linux files, you can copy the Linux file and use that file in the old zPTD image, and avoid the risk of damaging the new copy. The Linux file name is different, but the VOLID is the same. I was told you can use import catalog to get this to work. I haven’t tried it.

The cluster is in a shared user catalog.

If the VSAM cluster is defined in a user catalog, and the user catalog can be used on both systems, then the cluster can be used on both systems (but not at the same time). When the cluster is used, information about the active system is stored in the cluster. When the file system is unmounted, or OMVS is shutdown, this system information is removed. If you do not unmount, or shutdown OMVS cleanly, then when the file system is mounted on the other system, the mount will detect the file system was last used on another system, and wait for a minute or so to make sure the other system is inactive. If the mount command is issued during OMVS startup OMVS will wait for this time. If you have 10 file systems shared, OMVS will wait for each in turn – which can significantly delay OMVS start up.

When the cluster is in the master catalog

Someone suggested

You could mount the volume to your new system and import connect the master catalog of the old system to the new one and define the old alias for the ZFS in the new master pointing to the old master which is now a user catalog to the new system. If it’s not currently different, you could rename it on the old system to a new HLQ that is different from the existing one and then do the import connect of the master as a usercat and define the new alias pointing to the old ZFS.

This feels too dangerous to me!

Pax the files in the directory

You can use Pax to unload the contents of the directory to a dataset, then load the data from the dataset on the other system.

cd /usr/lpp....
pax -W “seqparms=’space=(cyl,(10,10))'” -wzvf “//’COLIN.PAX.PYMQI2′” -x os390 .

On the other system

mkdir mydir
cd mydir 
pax -rf “//’COLIN.PAX.PYMQI2A'” .

Note when using cut and paste make sure you have all of the single quotes and double quotes. I found they sometimes got lost in the pasting.

Using DFDSS

See Migrating an ADCD z/OS release: VSAM files

Creating a ZFS – which way should I do it? IDCAMS LINEAR or IDCAMS ZFS?

When I looked into creating a ZFS (so I could run use it in the Unix environment) I found there were two ways of doing it both have the same end result.

The “old” way – a three step process

You use DEFINE CLUSTER …LINEAR to create the data set, then use PGM=IOEAGFMT to format it, then mount it.

//IBMUZFS  JOB ,' ',COND=(4,LE) RESTART=MOUNT 
//DEFINE   EXEC   PGM=IDCAMS 
//SYSPRINT DD     SYSOUT=* 
//SYSIN    DD     * 
  DELETE               COLIN.ZOPEN.ZFS   CLUSTER 
  SET MAXCC=0 
  DEFINE                - 
    CLUSTER             - 
    (NAME(COLIN.ZOPEN.ZFS)- 
    LINEAR              - 
    VOLUMES(USER10       )        - 
    STORCLAS(SGBASE ) - 
    MEGABYTES(6000 1000)  - 
    SHAREOPTIONS(3 3)) 
/*            - 
//FORMATFS EXEC   PGM=IOEAGFMT,REGION=0M,COND=(0,NE,DEFINE), 
// PARM=('-aggregate COLIN.ZOPEN.ZFS  ') 
//*  PARM=('-aggregate COLIN.ZOPEN.ZFS    -compat') 
//SYSPRINT DD     SYSOUT=* 
//STDOUT   DD     SYSOUT=* 
//STDERR   DD     SYSOUT=* 
//* 
//* 
//MOUNT    EXEC PGM=IKJEFT1A,COND=((0,NE,DEFINE),(0,NE,FORMATFS)) 
//SYSTSPRT DD   SYSOUT=* 
//SYSTSIN  DD   * 
    MOUNT FILESYSTEM('COLIN.ZOPEN.ZFS') TYPE(ZFS) + 
    MOUNTPOINT('/u/zopen') + 
    MODE(RDWR) PARM('AGGRGROW') AUTOMOVE 
/*

The define took less than a second, the format took about 16 seconds, and the mount took less than one second

The Two step (sounds like a dance for system administrators)

You create the dataset with type ZFS, you then mount it, and the mount formats it.

//IBMUZFS  JOB ,' ',COND=(4,LE) RESTART=MOUNT 
//DEFINE   EXEC   PGM=IDCAMS 
//SYSPRINT DD     SYSOUT=* 
//SYSIN    DD     * 
  DELETE               COLIN.ZOPEN.ZFS   CLUSTER 
  SET MAXCC=0 
  DEFINE                - 
    CLUSTER             - 
    (NAME(COLIN.ZOPEN.ZFS)- 
    ZFS                 - 
    VOLUMES(USER10       )        - 
    STORCLAS(SGBASE ) - 
    MEGABYTES(6000 1000)  - 
    SHAREOPTIONS(3 3)) 
/* 
//MOUNT    EXEC PGM=IKJEFT1A,COND=((0,NE,DEFINE)) 
//SYSTSPRT DD   SYSOUT=* 
//SYSTSIN  DD   * 
    MOUNT FILESYSTEM('COLIN.ZOPEN.ZFS') TYPE(ZFS) + 
    MOUNTPOINT('/u/zopen') + 
    MODE(RDWR) PARM('AGGRGROW') AUTOMOVE 
/*

The define took less than a second – the mount took 17 seconds, because it had to do the format.

What’s the difference?

Overall the time to execute the job was the same.

I think I prefer the first way of doing it, as I have more control, and can check the format was as I expected.

If you used the second way of doing it, defined the ZFS in parmlib, I don’t know if the formatting would hold up OMVS startup.

And don’t forget

And do not forget to update your parmlib member so the ZFS is mounted automatically at IPL.

Migrating an ADCD z/OS release: ZFS files.

Start here:Migrating an ADCD z/OS release to the next release.

For background see Should I use tar or pax to backup my Unix files?

System files

If you have done any configuration to products which use Unix services, you are likely to have changed files in the file system. For example /etc/syslog.conf.

Configuration files

Many configuration files are configured in the /etc directory.

If you want to find which files have been changed since you have been using the system you could use the ls -ltr command to display the files in each directory, displayed with the latest changed date at the bottom.

This gets very tedious when you have a lot of directories to example. However you can ask Unix to list all files which match a critera, such as changed in the last n days, or newer than an existing file.

Create a file of the comparison data

touch -t 202202211456 /tmp/foo

This creates a file /tmp/foo with the given date year 2022 month 02 date 21 time 1456

find . -type f -newer /tmp/foo |xargs ls -ltr > aa

This is two commands.

The find command
- looks in the current directory (.) and subdirectories
- for objects with type of files (rather than directories etc)
- which have been changed more recently than /tmp/foo.
The file name is passed to the ls -ltr command to display the date time information.

-rwx------   1 OMVSKERN OMVSGRP       32 Feb 13  2023 ./cssmtp.env 
-rw-r--r--   1 OMVSKERN OMVSGRP     3453 Feb 13  2023 ./mail/ezatmail.cf 
-rwxr-xr-x   1 OMVSKERN OMVSGRP      250 Feb 28  2023 ./hosts 
-rw-r--r--   1 OMVSKERN OMVSGRP   226441 Jun 16  2023 ./pkiserv/pkiserv.tmpl.old 
-rw-r--r--   1 OMVSKERN OMVSGRP    22406 Jun 16  2023 ./pkiserv/pkiserv.conf.old 
...

if you use

find . -type f -newer /tmp/foo |tar -cvf ~/etc.tar –

it will create a tar file containing the changed files. You can then transport the etc.tar file to the newer system and untar it.

You can use the pax command to package the files

find . -type f -newer /tmp/foo |pax -W “seqparms=’space=(cyl,(10,10))'” -o saveext -wzvf “//’COLIN.PAX.TEST'” -x os390

to save the files in pax format into the data set COLIN.PAX.TEST. If the high level qualifier has been defined as an alias is the master catalog on both systems, the data set will be visible on both systems.

Note: If you specify

find . -type d -newer /tmp/foo

It displays all directory entries which have been changed, and displays all the files within those directories.

For example

total 1344 
-rw-r--r--   1 OMVSKERN OMVSGRP     3252 May  7  2019 ssh_config 
-rw-r--r--   1 OMVSKERN OMVSGRP   553761 May  7  2019 moduli 
-rwx------   1 OMVSKERN OMVSGRP       65 Oct 29  2019 sshd.sh

For example the sshd directory was changed in September this year, so all the files below it are listed.

Application output files

These are usually in the /var sub-directory. You may not need to move these files across.

User data

You will have your own data in the Unix file systems. If you put the data under the /u file system it should be easy to find!

You may have configured userids so their home directory is on a “user” ZFS file system, or your home directory could be mixed in with the systems files. For example the file system for IBMUSER is on an ADCD file system (D5USS2).

IBMUSER:/S0W1/etc: >df -P ~                                                         
Filesystem          512-blocks        Used  Available  Capacity Mounted on          
ZFS.USERS              2880000     1277970    1602030       45% /u

The newer system also has a ZFS.USERS. You cannot have both old and new ZFS.USERS mounted at the same time, as the mount takes cataloged data set.

For my application data

IBMUSER:/S0W1/etc: >df -P /u/tmp
Filesystem 512-blocks Used Available Capacity Mounted on
COLIN.ZFS2 2817120 788338 2028782 28% /u/tmp

I can take this ZFS system and mount it on the newer system.

You can tar up the files under a directory and move the tar file to the new system, or you can use pax which I think is better, as it creates a dataset out of the ZFS files.

Using tar

Note: if you use an absolute path in the tar command, when you untar the data it will use the same directory- – which may overrwrite data you wanted to keep.

If you use are relative directory, the data is untarred relative to the current directory.

cd /u/colin
tar -cvf ~/relative.tar *

is better than

tar -cvf ~/absolute.tar /u/colin

On the new system if I use

mk dir oldcolin
cd oldcolin
tar -xtvf relative.tar

it will restore the files in oldcolin.

If I use

mk dir oldcolin
cd oldcolin
tar -xtvf absolute.tar *

it will restore the files to /u/colin – and overwrite any files which were there.

What directories were created?

You can use the command

find /u -type d -newer /tmp/foo |xargs ls -ltrd> aa

to display the directories created/modified since the time of the /tmp/foo file.

This gives output like

drwxrwxrwx  10 OMVSKERN SYS1        8192 Dec  8 13:04 ./adcd 
drwxr-xr-x   2 OMVSKERN 1000        8192 Mar 15  2023 ./tmp/oemput 
drwxrwxrwx   2 OMVSKERN WEBGRP      8192 Apr 25  2023 ./mqweb3/logs 
drwxr-xr-x   2 OMVSKERN SYS1        8192 Jun 16  2023 ./mqweb2/oldconf

You can use the pax -E option to display the contents and extended attributes.

pax -E -f "//'COLIN.PAX.HTTP2'"

Notes the double / and both sets of quotes.

To unpax the files on the new system, cd into the directory and use

pax -k -rvf  "//'COLIN.PAX.TEST'" .

The

-k means do not overwrite
-r read
-v display the details
-f from the following file
into . (the current directory)

To mount a ZFS on the system

You can have an entry in an BPXPRMxx parmlib member, so a ZFS is mounted at IPL time.

You can also use a TSO command, or a batch job to mount a ZFS

//IBMMFAMO JOB 1,MSGCLASS=H
//MOUNT EXEC PGM=IKJEFT1A
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
MOUNT FILESYSTEM('AZF220.ZFS') TYPE(ZFS) +
MOUNTPOINT('/u/mfa') +
MODE(RDWR) PARM('AGGRGROW') AUTOMOVE

Sharing File systems

You can display the mounted file systems using the D OMVS,F command

This gives output like

ZFS            16 ACTIVE                      RDWR  01/15/2024  L=30  
  NAME=ZFS.USERS                                    14.38.54    Q=0   
  PATH=/u                                                             
  OWNER=S0W1     AUTOMOVE=N CLIENT=N
ZFS            37 ACTIVE                      RDWR  01/15/2024  L=48   
  NAME=COLIN.ZFS2                                   14.38.56    Q=0    
  PATH=/u/tmp                                                          
  OWNER=S0W1     AUTOMOVE=Y CLIENT=N

This shows there are two ZFS file systems, both mounted read/write. The first has

data set name ZFS.USERS. On the z24C system – this is the z24C file system. You cannot mount both the z24C, and the z25D file systems at the same time, because the mount command uses the cataloged data set.
mounted as path /u

The second has

data set name COLIN.ZFS2
mounted as path /u/tmp

You can choose to mount your data under an existing path, or create a new tree such as “/my”.

Tracing the z/OS ZFS file system, and using ZFS commands.

I was looking into a little Java problem, and wanted to know which files were being used by my Java program. The “obvious” answer was a trace – but the IBM documentation was about 8 years out of date!

The key lesson from this post is to use commands like

f OMVS,PFS=ZFS,… instead of MODIFY ZFS,… if you have ZFS in your OMVS address space.

zFS running in the z/OS UNIX address space says

In releases before z/OS V2R2, the amount of 31-bit virtual storage that was needed by both z/OS UNIX and zFS combined would have exceeded the size of a 2 GB address space. Due to that size limitation, zFS and z/OS UNIX could not coexist in the same address space.

In z/OS V2R2, zFS caches are moved above the 2 GB bar into 64-bit storage. You can now choose to have zFS run in its own colony address space or in the address space that is used by z/OS UNIX, which is OMVS.

When running zFS in the OMVS address space, each file system vnode operation (such as creating a directory entry, removing a directory entry, or reading from a file) will have better overall performance. Each operation will take the same amount of time while inside zFS itself. The performance benefit occurs because z/OS UNIX can call zFS for each operation in a more efficient manner.
Some inherent differences exist when zFS is run in the OMVS address space.
MODIFY commands must be passed to zFS through z/OS UNIX. Use the form MODIFY OMVS,pfs=zfs,cmd. For more information, see the section on passing a MODIFY command string to a physical file system (PFS) through a logical file system (LFS) in z/OS MVS System Commands. This form of the MODIFY command can be used whether zFS is in its own address space or in the OMVS address space.

Issuing commands

So when the documentation says issue a command

Steps for tracing on zFS
If you are re-creating a problem and need to collect a zFS trace, use the following steps:
1. Allocate the trace output data set as a PDSE, RECFM=VB, LRECL=133 with a primary allocation of at least 50 cylinders and a secondary allocation of 30 cylinders.
2. Define the zFS trace output data set to zFS by either using the IOEFSPRM trace_dsn option, or dynamically by using the zfsadm config -trace_dsn command. If you use the IOEFSPRM option, zFS must be stopped and then restarted to pick up the change, unless you also dynamically activate the trace output data set with the zfsadm config -trace_dsn command.
3. When you are ready to re-create the problem, reset the zFS trace table using the MODIFY ZFS,TRACE,RESET command.
4. Re-create the problem.
5. Enter the MODIFY ZFS,TRACE,PRINT command. This formats and prints the trace table to the PDSE defined on the trace_dsn option.

You still use the Unix command to define the output destination of the trace

zfsadm config -trace_dsn ‘IBMUSER.ZFSTRACE’

but you use the following console command to cause the trace to be formatted to the file from the internal buffer.

f OMVS,PFS=ZFS,TRACE,PRINT

Easy when you know how…

Help ! My ZFS has filled up

The file system I was using in Unix Services filled up – but it didn’t tell me,I just had a truncated file. I piped the output of a shell script to a file. As the file system filled up – it could not write the “file system full” message.

SInce I wrote this, Ive written What is using all the space on this file system? which discusses how to find what is using the space on a file system, rather than in a directory. These are not the same as a file system can be mounted over a sub-directory.

To solve this file system full problem, I had to explore other areas of z/OS which I was not so familiar with – ADRDSSU to move data sets, zfsadmin commands, and how to stop SMS from being too helpful.

Having made the ZFS.USERS larger, I then found IEC070I 104-204 is data set is > 4GB. So some of this blog post is wrong!

Later I found the ZFS.USERS data set had 123 extents – and could not be expanded.

The Unix Services command

df -P /u/pymqi

tells you the file system – and how full it is. This gave me

Filesystem 512-blocks   Used Available Capacity Mounted on
ZFS.USERS      204480 203954       526     100% /u

So we can see the data set is ZFS.USERS and it is 100% full.

What is using all of the space?

du -a ./ | sort -n -r | head -n 30

The command

zfsadm fsinfo ZFS.USERS

give more (too much) information, and

zfsadm aggrinfo ZFS.USERS

doesnt quite give enough info. df -P … is best

I used the command

zfsadm grow ZFS.USERS -size 144000

to make it bigger, but I got the Unix Services message

IOEZ00326E Error 133 extending ZFS.USERS

and on the system log

IOEZ00445E Error extending ZFS.USERS. 591
DFSMS return code = 104, PDF code = 204.
IOEZ00308E Aggregate ZFS.USERS failed dynamic grow, (by user COLIN).
IOEZ00323I Attempting to extend ZFS.USERS to 36000 4096 byte control intervals.
IEF196I IEC070I 104-204,OMVS,OMVS,SYS00022,0A9E,C4USS2,ZFS.USERS,
IEF196I IEC070I ZFS.USERS.DATA,CATALOG.Z24C.MASTER
IEC070I 104-204,OMVS,OMVS,SYS00022,0A9E,C4USS2,ZFS.USERS, 594
IEC070I ZFS.USERS.DATA,CATALOG.Z24C.MASTER
IOEZ00445E Error extending ZFS.USERS. DFSMS return code = 104, PDF code = 204.

Use the return codes from the IEC196I message. Search for IEC196I 104

DFSMS return code = 104, PDF code = 204 means no space on the volume.

I used ISPF 3;4 to display the volume the ZFS.USERS data set was on; C4USS2.

I used ISPF 3;4 to display what data sets were on the C4USS2 volume. If you use PF11 to can see the space allocated to each data set.

I could try moving this dataset to another volume, but that would mean unmounting it, moving it, remounting it. I thought it easier to move other dataset off the volume.

On the volume, I found a ZFS which I was not using and unmounted it

unmount filesystem(‘ZFS.Z24C.ZCX’) normal

and trying to move it, looked easy using DFDSS COPY DATASET .

//IBMUSER1 JOB 1,MSGCLASS=H
//STEP1 EXEC PGM=ADRDSSU,REGION=0M
//SYSPRINT DD SYSOUT=A
//SYSIN DD *
COPY DATASET(INCLUDE(ZFS.Z24C.ZCX))-
ODY(C4USS1) DELETE CATALOG
/*

When I ran this job – it moved the dataset, but moved it to a USER00 volume – filling up most of the space on this volume. I had just moved the problem. SMS intercepted my request and “managed” the disk storage for me.

I added BYPASSACS and NULLSTORCLA

COPY DATASET(INCLUDE(ZFS.Z24C.ZCX))-
BYPASSACS(ZFS.Z24C.ZCX) –
NULLSTORCLAS –
ODY(C4USS2) DELETE CATALOG

and this worked.

BYPASSACS – do not use ACS routines to decide where to put the data set
NULLSTORCLAS ( or STORCLAS(xxxxx)) do not use a Storage class.
ODY OUTDYNAM specifies that the output DASD volume is to be dynamically allocated.

I then had enough space to be able to grow the zfs.

When I ran it on a different ZFS, I got a message

BPXF137E RETURN CODE 00000072, REASON CODE 058800AA

which means there is a file system mounted within it. I got on the console

IOEZ00048I Detaching aggregate COLIN.ZFS2

I unmounted this

unmount filesystem(‘COLIN.ZFS2’) Immediate

I could then unmount ZFS.USERS, and then move it.

Once I had moved it, I expanded it, and remounted the COLIN.ZFS2. (See the MOUNT command in parmlib for the ZFS’s

zFS performance reports I would like to use on z/OS (but can’t)

What started off as an investigation in why Java seemed slow on z/OS; was it due to a ZFS tuning problem? It changed into what performance health checks can I do with zFS.

It may be that zFS is so good you do not need to check its status, but I could find no useful reports, on what to check, and found that basic reports are not available, and useful data is missing. I would rather check than assume things are working OK.

zFS on z/OS concepts, from a performance perspective
How to collect zFS statistics
Example of zFS statistics
zFS performance reports I would like to use on z/OS (but can’t)

Getting the data

Data is available from SMF 92 records. Records are produced on a timer, either the SMF Interval broadcast, or the zFS -smf_recording interval.

Data is available from the zFS commands, for example query -reset -usercache.

If you use the display command, you get the data accumulate since the system was started, or the last reset was issued.

You may want to have a process to issue the display and reset commands periodically to provide a profile throughout the day. Having data accumulated for a whole day does not allow you to see peaks and troughs.

Some data does not include the duration of the data (or reset time), so you cannot directly calculate rates. You might need to save the reset time in a file, and use this to calculate the interval.

query fsinfo includes the reset time; query metacache, usercache and dircache do not include the reset time.

There is an API BPX1PCT(“ZFS “,ZFSCALL_STATS, … This returns the data in a C structure, but z/OS does not seem to provide this as a header file! It provides sample c programs for printing the data for each sort of data.. I do not know if the data is cumulative, or since the last reset.

Simple scenario

Consider the simple scenario,

I have a web server (Liberty on z/OS) for example z/OSMF, z/OS Connect, WAS with people using it.
There are people developing a Java application
I have a production Java program which runs every hour, reads in data from a file, does some processing, and puts sends it over HTTP to a monitoring system. This could be reading SMF data, and coverting it to JSON.

What the basic reports did I expect?

The question below would apply to any work, for example a business transaction, using CICS, DB2, MQ and IMS, zFS is just another component within a transaction.

When I start my Java application – it sometimes takes much longer to start than at other times – 20 seconds longer. What is causing this? Is it due to the delays in reading files or should I look else where?
- For each job, I would like to know the total time spent processing files, and identify the files, used by the job, were most time is spent.
We had a slow down last week, can we demonstrate that zFS is not the problem?
Do I need to take any actions on zFS
- Today – because it is slow
- Next week – because I can see an increase in disk I/O over the past few weeks.
Can I tell which files or file systems are using most of the cache, and what can I do about it?

For each job, I would like to know the total time spent processing files, and identify the files, used by the job, were most time is spent.

This information is not available.

From the SMF 92-11 records you can get some information

Job name
File name. Some files are given as /u/adcd/j.sh, other files are given as write.c with no path, just the name used. This is not very helpful, as it means I am unable to identify the specific file used.
Time file was opened
Time file was closed (so you can calculate the open duration)
The number of directory reads. For the file “.” this had 1 read,
The number of reads, blocks read, and bytes read
The number of writes, locks written, and bytes written. For example an application did 10,000 writes, with a buffer length of 4096. There were 10,003 blocks written and 40,960,000 bytes written.

This information does not tell you how long requests took. A fread() could require data to be read from the file, or it may be available in the cache.

You cannot get this information from the zfs commands. You can get other information, for example the I wrote to a file and issued the command fileinfo -path /u/adcd/temp.temp -both this gave


path: /u/adcd/temp.temp 
owner                S0W1       file seq read           yes 
file seq write       yes        file unscheduled        0 
file pending         625        file segments           625 
file dirty segments  0          file meta issued        0 
file meta pending    0

The data is described here.

unscheduled Number of 4K pages in user file cache that need to be written.
pending Number of 4K pages being written.
segments Number of 64K segments in user cache.
dirty segment Number of segments with pages that need to be written.

Given a filename you can query how many segments it has, but I could not find a way of listing the files in the cache. You would have to search the whole tree, and query each file to find this. This operation would significantly impact the metadata cache.

We had a slow down last week. Can we demonstrate that zFS is not the problem?

You can get information on

the number of pages in the various pools
the number of reads from the file system, and the number of requests that were available from the cache – the cache hit ratio. A good cache hit is typically over 95%.
Steal Invocations tells you if the cache was too small, so pages had to be reused.
The I/O activity (number of reads and writes, and number of bytes) by file system.
The average I/O wait time by volume.
The number of free pages never goes down, you can use it to see the highest number of pages in use, since ZFS started. It it reached 95% full on Monday – it will stay at 95% until restart.

If you compare the problem period with a normal period you should be able to see if the data is significantly different.

You need to decide how granular you want the data, for example capture it every 10 minutes, or every minute.

Do I need to take any actions on ZFS?

Today – because it is slow

Display the key data for the cache, cache hits, compare the amount of I/O today with a comparable day.

I do not think there are any statistics to tell you how much to increase the size of the cache. Making the cache bigger may not always help performance, for example if a program is writing a 1GB file, then while the cache is below 1GB it will flood the cache with pages to be written, and read only pages will have been overwritten.

Next week

You can monitor the number of reads and writes per file, and the number of file system I/Os, but you cannot directly see the files causing the file system I/O.

If there is a lot of sustained I/O to a file system, you may want to move it to a less heavily used volume, or move subdirectories to a different file system, on a different volume.

There are several caches: User Cache, Meta data cache, VNode cache, Log cache. The size of these can all be reconfigured, but I cannot see how to tell how full they are, and if they need to be increased in size.

Can I tell which files or file systems are using most of the cache, and what can I do about it?

The SMF record 92-59 contains the number of pages the file system has in the user cache, and in the meta cache.

The field SMF92FSUS has the number of pages this file system has allocated in the user cache.

The field SMF92FSMT has the number of pages this file system has allocated in the meta data cache

For 40 file systems, the time the record was created was within 2ms, so you should be able to group records with a similar time stamp, for example save the data, and show % buffers per file system.

The command fsinfo -full -aggregate ZFS.USERS provides the same information. It gave me

Statistics Reset Time:     May 30 11:09:51 2021 
Status:RW,NS,GF,GD,SE,NE,NC 
Legend: RW=Read-write, GF=Grow failed, GD=AGGRGROW disabled                                  
        NS=Mounted NORWSHARE, SE=Space errors reported, NE=Not encrypted                     
        NC=Not compressed                                                                    
   *** local data from system S0W1 (owner: S0W1) ***                                         
Vnodes:              48              LFS Held Vnodes:         4       
Open Objects:        0               Tokens:                  0       
User Cache 4K Pages: 5011           Metadata Cache 8K Pages: 39      
Application Reads:   11239           Avg. Read Resp. Time:    0.046   
Application Writes:  22730           Avg. Writes Resp. Time:  0.081   
Read XCF Calls:      0               Avg. Rd XCF Resp. Time:  0.000   
Write XCF Calls:     0               Avg. Wr XCF Resp. Time:  0.000   
ENOSPC Errors:       1               Disk IO Errors:          0

This also showed:

there was 1 no-space error
Status had
- GF=Grow failed
- GD=AGGRGROW disabled
There were 48 Vnodes (files) in the meta cache.

It looks like the Application Reads and Writes are true application requests. I had a program which wrote 10,000 4KB records, and the Application writes increased by 10002. The reads increased by 23 event. I think this due to the running of the program.

The command also gave


VOLSER PAV    Reads      KBytes     Writes     KBytes     Waits    Average           
------ --- ---------- ---------- ---------- ---------- ---------- ---------          
A4USS2   1         55        532       1658      91216         83 0.990              
------ --- ---------- ---------- ---------- ---------- ---------- ---------          
TOTALS             55        532       1658      91216         83 0.990

The number of write ( to the file system) increased by 630, the KB written increased by 40,084KB which is the approximate size of the file (40,000KB)

You can use the command fileinfo -path /u/adcd/aa -both and it will display information about the file system the file is on.

Although you can see how much data was written to the file system, I could not find easily find which file it came from. The SMF 92-11 records can give an indication, but writing 10MB to a file, and deleting the file may mean no data is written to disk, so the SMF 92-11 records are not 100% reliable.

Category: zfs