How do I see what is in my z/OS HCD?

I’m running ADCD z/OS on zD&T, and wanted to see what was in my HCD (Hardware Configuration Definition). There are some ISPF panels, which allow you to print things, but they had some “required” parameters which I didn’t have and didn’t need.

This post shows the JCL I used, and gives a quick overview of the output

The JCL to create a batch report. This uses program CBDMGHCP. This parameters are described here. Note you can have the output created in XML format.

//IBMHCD JOB MSGCLASS=H
//GCREP EXEC PGM=CBDMGHCP,
// PARM='REPORT,CSMEN,,,,,00'
//HCDIODFS DD DSN=SYS1.IODF99,DISP=SHR
//HCDRPT DD SYSOUT=H,
// DCB=(RECFM=FBA,LRECL=200,BLKSIZE=6400)
//HCDMLOG DD SYSOUT=H,
// DCB=(RECFM=FBA,LRECL=200,BLKSIZE=6400)

The output data sets require the DCB information. If this is not provided you get messages like

IEC141I 013-34,IGG0199G,IBMHCD,GCREP,HCDRPT.

The output of the report in HCDRPT

At the top is

TIME: 16:46 DATE: 2021-07-12
IODF NAME: SYS1.IODF99
IODF TYPE: Production
IODF VERSION: 5
IODF VOLUME: A4SYS1
DESCRIPTION: m3000 with SCSI 08-0B only

It lists the sections

PROCESSOR SUMMARY REPORT                         A
PARTITION REPORT                                 B
IOCDS REPORT                                     C
CHANNEL PATH SUMMARY REPORT                      D
CONTROL UNIT SUMMARY REPORT                      G
DEVICE SUMMARY REPORT                            I
SWITCH SUMMARY REPORT                            K 
SWITCH DETAIL  REPORT                            L
SWITCH CONFIGURATION SUMMARY REPORT              M
SWITCH CONFIGURATION DETAIL REPORT               N
OPERATING SYSTEM SUMMARY REPORT                  O
OS DEVICE REPORT                                 P
OS DEVICE DETAIL REPORT                          Q
EDT REPORT  (MVS ONLY)                           R
OS CONSOLE REPORT                                S

Component reports heading

Following the list of sections are the data. Each section has a header like

CONTROL UNIT SUMMARY REPORT TIME:... DATE:...  PAGE G- 1

Where the title (CONTROL UNIT SUMMARY REPORT) and the character following the PAGE match the list of sections, which has “CONTROL UNIT SUMMARY REPORT G”.

To go to a section use Find ‘PAGE G’ .

CONTROL UNIT SUMMARY REPORT

   CONTROL UNIT        
 NUMBER  TYPE-MODEL    
 _____________________ 
 -  -  -  -  -  -  -  -
  0700   3174 
 -  -  -  -  -  -  -  -
  0A80   3990 
  0A81   3990 
  0A82   3990 
  0A83   3990 

Where 0700 is a console, and 0a80 is disk on an emulated 3990.

DEVICE SUMMARY REPORT

--- DEVICE ---    DEVICE                                                
 NUMBER,RANGE    TYPE-MODEL    ATTACHING CONTROL UNITS         
______________  _____________  |____|___
   0700         3270-X          0700 
   0A80         3390            0A80

OPERATING SYSTEM SUMMARY REPORT

 OPERATING                                     
 SYSTEM ID   TYPE        GEN    DESCRIPTION    
 _________   ________    ___    _______________
 OS390       MVS                ADCD ZOS IODF 

MVS DEVICE REPORT

DEV#,RANGE  TYPE-MODEL ... 
__________ __________  ...

0700,64    3270-X....
0A80,112   3390  ....          

Followed by columns of data with headings ( given in the report)

KEY            KEY DESCRIPTION 
---            --------------- 
DEV#,RANGE  -  DEVICE NUMBER, COUNT OF DEVICES (DECIMAL) 
TYPE-MODEL  -  DEVICE TYPE AND MODEL 
SS          -  SUBCHANNEL SET ID 
BASE        -  BASE DEVICE NUMBER FOR MULTIPLE EXPOSURE DEVICES 
UCB-TYPE    -  UCB TYPE BYTES 
ERP-NAME    -  ERROR RECOVERY PROGRAM 
DDT-NAME    -  DEVICE DESCRIPTOR TABLE 
MLT-NAME    -  MODULE LIST TABLE 
OPT         -  OPTIONAL MLT INDICATOR 
UIM-NAME    -  UNIT INFORMATION MODULE SUPPORTING THE DEVICE 
ATI         -  ATTENTION TABLE INDEX (UCBATI) 
AL          -  ALTERNATE CONTROL UNIT (UCBALTCU) 
SH          -  SHARED UP OPTION (UCBSHRUP) 
SW          -  DEVICE CAN BE SWAPPED BY DDR (UCBSWAPF) 
MX          -  DEVICE HAS MULTIPLE EXPOSURES (UCBMTPXP) 
MI          -  MIH PROCESSING SHOULD BE BYPASSED (UCBMIHPB) 
O           -  MLT IS OPTIONAL 
Y           -  DEVICE SUPPORTS THIS FEATURE 
BLANK       -  DEVICE DOES NOT SUPPORT THIS FEATURE 

The UIM is a an object with parameters defined for the device type. They are defined here. The options are defined here. For example a 3270 can have a selector pen!. a 3800 printer can have a burster.

There is a summary of devices.

        TOTAL NUMBER OF DEVICES BY CLASS 
        -------------------------------- 
CLASS NAME             CLASS TYPE    DEVICE COUNT 
----------             ----------    ------------ 
TAPE                       80               64 
COMMUNICATION DEVICES      40                0 
C-T-C                      41               24 
DASD                       20             1001 
GRAPHICS                   10               95 
UNIT RECORD                08                3 
CHARACTER READERS          04                0 
TOTAL NUMBER OF I/O DEVICES DEFINED BY THIS I/O CONFIGURATION     1187 

MVS DEVICE DETAIL REPORT

NUMBER,RANGE TYPE   SS PARAMETER                        FEATURE
____________ ______ __ _______________________________ _______
   0700,64   3270-X  0 OFFLINE=NO                       SELPEN,... 
   0A80,112  3390    0 OFFLINE=NO,DYNAMIC=YES,LOCANY=NO SHARED 

E D T REPORT

The Eligible Device Table Report has

                                AFFINITY  ALLOCATION   
 NAME TYPE  VIO  TOKEN  PREF    INDEX     DEVICE TYPE  DEVICE NUMBER LIST 
 _________  ___  _____  ______  ________  ___________  ___________________
 GENERIC                   280    FFFF     3010200F    ... 0A80- 0AFF ...
 GENERIC                  3800    FFFF     12001009    0700- 073F   

N I P Console REPORT

The Nucleus IP l Console Report has

 DEVICE #        TYPE-MODEL 
 ________       _____________ 
 0700           3270-X 
 0701           3270-X 

Have a good REST and save a fortune in CPU with Python

Following on from Have a good REST and save a fortune in CPU. The post gives some guidance on reducing the costs of using Liberty based servers from a Python program.

Certificate set up

I used certificate authentication from Linux to z/OS. I used

  • A certificate defined on Linux using Openssl.
  • I sent the Linux CA certificate to z/OS and imported it to the TRUST keyring.
  • I created a certificate on z/OS and installed it into the KEY keyring.
  • I exported the z/OS CA, sent it to Linux, and created a file called tempca.pem.

Python set up

Define the names of the user certificate private key, and certificate

cf=”colinpaicesECp256r1.pem”
kf=”colinpaicesECp256r1.key.pem”
cpcert=(cf,kf)

Define the name of the certificate for validating the server’s certificate

v=’tempca.pem’

Set up a cookie jar to hold the cookies sent down from the server

jar = requests.cookies.RequestsCookieJar()

Define the URL and request

geturl =”https://10.1.1.2:9443/ibmmq/rest/v1/admin/qmgr/

Define the headers

import base64
useridPassword = base64.b64encode(b’colin:passworm’)
my_header = {
‘Content-Type’: ‘application/json’,
‘Authorization’: useridPassword,
‘ibm-mq-rest-csrf-token’ : ‘ ‘
}

An example flow of two requests, using two connections

For example using python

s = requests
response1 = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert)
response2 = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert)

creates two session, each has a TLS handshake, issue a request, get a response and end.

An example of two requests using one session

For example using python

s = requests.Session()
response1 = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert1)
response2 = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert2)

The initial request has one expensive TLS handshake, the second request reuses the session.

Reusing this session means there was only one expensive Client Hello,Server Hello exchange for the whole conversation.

Even though the second request specified a different set of certificates, the certificates from when the session was established, using cpcert1 were used. (No surprise here as the certificates are only used when the session is established).

For the authentication, in both cases the first requests received a cookie with the LtpaToken2 cookie in it.

When this was passed up on successive requests, the userid information from the first request was used.

What is the difference?

I ran a workload of a single thread doing 200 requests. The ratios are important, not the absolute values.

Shared sessionOne session per requests
TCP flows to server1 11
CPU cost1 5
Elapsed time16

What they don’t tell you about using a REST interface.

After I stumbled on a change to my Python program which gave 10 times the throughput to a Web Server, I realised that I knew only a little about using REST. It is the difference between the knowledge to get a Proof Of Concept working, and the knowledge to run properly in production; it is the difference between one request a minute to 100 requests a second.

This blog post compares REST and traditional client server and suggests ways of using REST in production. The same arguments also apply to long running classical client server applications.

A REST request is a stateless, self contained request which you send to the back-end server, and get one response back. It is also known as a one shot request. Traditional client server applications can send several requests to the back-end as part of a unit of work.

In the table below I compare an extreme REST transaction, and an extreme traditional Client Server

AttributeRESTClient Server
ConnectionCreate a new connection for every request.Connect once, stay connected all day, reuse the session, disconnect at end of day.
Workload BalancingThe request can select from any available server, and so on average, requests will be spread across all connections. If a new server is added, then it will get used.The application connects to a server and stays connected. If the session ends and restarts, it may select a different server.
If a new server is added, it may not be used.
AuthenticationEach request needs authentication. If the userid is invalidated, the request will fail. Note that servers cache userid information, so it may take minutes before the request is
re-authenticated.
Authentication is done as part of the connection. If the userid is invalidated during the day, the application will carry on working until it restarts.
IdentificationBoth userid+password, and client certificate can be used to give the userid.Both userid+password, and client certificate can be used to give the userid. If you want to change which identity is used, you should disconnect and reconnect.
CostIt is very expensive to create a new connection. It is even more expensive when using TLS, because of the generation of the secret key. As a result it is very very expensive to use REST requests.The expensive create connection is done once, at start of day. Successive request do not have this overhead, so are much cheaper
Renew TLS session keyBecause there is only one transfer per connection you do not need to renew the encryption key.Using the same session key for a whole day is weak, as it makes it easier to break it. Renewing the session key after an amount of data has been processed, or after a time period is good practice.
RequestSome requests are suitable for packaging in one request, for example where just one server is involved.This can support more complex requests, for example DB2 on system A, and MQ on system B.
Number of connectionsThe connection is active only when it is used.The connection is active even though it has not been used for a long time. This can waste resources, and prevent other connections from being made to the server.
StatisticsYou get an SMF record for every request. Creating an SMF record costs CPU.You get one SMF record for collection of work, so reducing the overall costs. The worst case is one SMF record for the whole day.

What are good practices for using REST (and Client Server) in production?

Do not have a new connection for every request. Create a session which can be reused for perhaps 50 requests or ten minutes, depending on workload. This has the advantages :

  • You reduce the costs of creating the new connection for every request, by reusing the session.
  • You get workload balancing. With the connection ending and being recreated periodically, you will get the connections spread across all available connections. You should randomise the time a connection is active for, so you do not get a lot of time-out activity occurring at the same time
  • You get the re-authentication regularly.
  • The TLS key is renewed periodically.
  • You avoid the long running connections doing nothing.
  • For a REST request you may get fewer SMF records, for a Client-Server you get more SMF requests, and so more granular data.

How can I do this?

With Java you can open a connection, and have the client control how long it is open for.

With Python and the requests package, you can use

s = requests.Session()
res = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert)

res = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert)
etc

With Curl you can reuse the session.

Do I need to worry if my throughput is low?

No, If you are likely to have only one request to a server, and so cannot benefit from having multiple requests per connection you might just as well stay with a “one shot” and not use any of the tuning suggestions.

My favourite TCP commands for z/OS

Console commands

Display VIPA stuff

OSPF

Either D TCPIP,,OMP,command or F OMPOUTE,command. If you use S OMPROUTE.P1, you can use F P1,command

Netstat

Netstat has two formats TSO and OMVS

  • TSO format is like NETSTAT CONN (can also be issued from operator console)
  • OMVS format is like netstat -c

There is a comparison table here

The omvs command is good for netstat -c > filename

TSO command

  1. netstat conn Displays the information about each active TCP connection and UDP socket
  2. netstat conn ( PORT 10443 who is using port 10443 Gives Foreign socket 10.1.0.2..48518 (see below)
  3. netstat allconn Provides information for all TCP connections and UDP sockets, including recently closed ones.
  4. netstat allconn ( ipport 10.1.0.2+48518 Show information about this socket coming in from 10.1.0.2 port 48518
  5. netstat all ( ipport 10.1.0.2+48518 shows all information about the remote port.
  6. netstat conn TCP TCPIP2 for TSO command for TCPIP stack TCPIP2
  7. netstat home give the IP addresses this TCPIP stack uses. For example 10.1.1.2 amd 127.0.0.1
  8. netstat Dev give stats about all interfaces
  9. netstat home report dsn ‘colin.output’ issue the home command, and write the output to ‘colin.output’
  10. netstat conn report hlq colin ( port 1414 the hlq says output the report with data set name colin.netstat.conn
  11. netstat allconn (client csq9web show all current connections (ports in use), and recent ones for the job csq9web. netstat conn just shows active ones.

OMVS

  1. netstat -c Displays the information about each active TCP connection and UDP socket
  2. netstat -c -P 10443 who is using port 10443 Gives Foreign socket 10.1.0.2..48518 (see below)
  3. netstat -a Provides information for all TCP connections and UDP sockets, including recently closed ones.
  4. netstat -a -B 10.1.0.2+48518 Show information about this socket coming in from 10.1.0.2 port 48518
  5. netstat -A -B 10.1.0.2+48518 shows all information about the remote port.
  6. netstat -c -p TCPIP2 display all connections for TCPIP jobname TCPIP2
  7. netstat -c -p TCPIP2 -P 10443 which jobs are using port 10443 – direct the request to TCPIP2 stack
  8. netstat -h give the IP addresses this TCPIP stack uses. For example 10.1.1.2 amd 127.0.0.1
  9. netstat -d give stats about all connections

Ping

TSO Ping address

USS ping address

Trace route

TSO TRACERTE address

USS traceroute address

DNS resolver

The DNS resolver maps WWW.xxx.yyy to an IP address, and an IP address to WWW.xxx.yyy.

  • F RESOLVER,DISPLAY to display the configuration
  • F RESOLVER,REFRESH,SETUP=ADCD.Z31B.TCPPARMS(GBLRESOL) to change the resolver to use the specified file. You can use SETUP=…. to specify a data set, member or Unix file.

Ubuntu commands

Installing python on z/OS.

I wanted to use Python on z/OS to run a REST workload. It took a bit of time, but it worked with no major problems.

As well as installing Python, I had to install some packages so I could use the request package to issue REST requests.

This page has a good article about using python on z/OS. I used it to install additional packages.

This post has topics

There is a Python from Rocket software, and one from IBM. They may be the same; I used the IBM version.

There are instructions here. I logged on to my usual IBM web site, where my usual browser remembered my userid and password, then used https://www.ibm.com/products/open-enterprise-python-zos/pricing .

I used the web site https://www.ibm.com/docs/en/python-zos/3.9?topic=configuration-installing-configuring-pax-edition.

When I used the web site directly, I had to enter my userid and password.

The web site has a .pax file, and instructions.

The HAMB390.runnable.pax.Z file used 492418 * 512 byte blocks on z/OS.

I did not have enough space in my ZFS on z/OS, so I had to allocate a new ZFS

Allocate a ZFS

//DEFINE EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE –
CLUSTER –
(NAME(COLIN.ZFS1 ) –
VOLUMES(USER00) –
LINEAR –
MEGABYTES(254 25) –
SHAREOPTIONS(3 3))
/*

Format it

//FORMATFS EXEC PGM=IOEAGFMT,REGION=0M,COND=(0,NE,DEFINE),
// PARM=(‘-aggregate COLIN.ZFS1 -compat’)
//SYSPRINT DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDERR DD SYSOUT=*
//*

Mount it

//MOUNT EXEC PGM=IKJEFT1A,COND=((0,NE,DEFINE),(0,NE,FORMATFS))
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
MOUNT FILESYSTEM(‘COLIN.ZFS1’) TYPE(ZFS) +
MOUNTPOINT(‘/colin2’) MODE(RDWR) PARM(‘AGGRGROW’) AUTOMOVE
/*

If you want to use this ZFS after an IPL you will have to mount it. If you only want to use the ZFS as a temporary ZFS you can unmount and delete it.

I created another ZFS for the unpacked Python with size MEGABYTES(550 50).

I followed the install instructions https://www.ibm.com/docs/en/python-zos/3.9?topic=configuration-installing-configuring-pax-edition using the directory /colin3.

After I had unpacked it, there was a directory usr/lpp/IBM/cyp/v3r9 with members IBM and pyz.

The path for python is /colin3/usr/lpp/IBM/cyp/v3r9/pyz/bin

You may want to consider setting up a symbolic link from /usr/lpp/pyz to the new libraries. From an authorised userid issue

ln -s /colin3/usr/lpp/IBM/cyp/v3r9/pyz /usr/lpp/pyz

Set up a profile

I set up a pythonprof file with

export LIBPATH=/usr/lpp/pyz/lib:$LIBPATH
export _BPXK_AUTOCVT=’ON’
export _CEE_RUNOPTS=’FILETAG(AUTOCVT,AUTOTAG) POSIX(ON)’
# if you are using a bash shell, you should also set these:
export _TAG_REDIR_ERR=txt
export _TAG_REDIR_IN=txt
export _TAG_REDIR_OUT=txt
export PATH=/usr/lpp/pyz/bin:$PATH

When I wanted to run a python profile, I use . ./pythonprof to do the set up.

I then checked python worked using

/usr/lpp/pyz/bin/python3 –version

Which gave me

Python 3.9.2

Install the request package and co requisites

I used the information here to install additional packages so I could use a rest interface from python. I have copied the relevant section and expanded it.

If you want to use private keys with a password you need to use package requests_pkcs12. Use Pip download requests_pkcs12 instead of download requests below.

Our z/OS service isn’t connected to the internet, so we did it like this:

On a laptop with Python and pip installed:

a. mkdir requests
b. cd requests
c. pip download requests

This pulled the “requests” package and four dependencies: The files end in .whl

1. certifi
2. chardet
3. idna
4. urllib

Binary FTP all five to a USS directory (/u/myid/pipdl)

On z/OS, in USS in the directory we uploaded to, install all of the packages using:

. /u/adcd/pythonprof
pip3 install requests --find-links ./

Note lots of error messages as pip tries to access the internet version, but then success at the end.

You can now erase the .whl files.

I ran my python programs – and they worked!

One minute MVS – synchronous I/O.

This blog is one of a series of blog posts about performance topics on z/OS. This post is about synchronous I/O technology which reduces I/O time from milliseconds down to microseconds.

There is a 2018 redbook with some good content. This 2020 presentation is a good introduction. This gets a bit technical and looks at it from the Storage Controller view point.

History

Some protocols are very chatty. For example when you download a television program to your television, you see it as one download. Under the covers the software sees this television program as many parts, and there will be a conversation like “Here is part 1, the first 10 MBs”… “ok got that – please send the next”. Within this conversation is TCP/IP chatter, “here is packet 4013” … “ok got that” etc.

40 years ago the I/O to disks was a bit like the following conversation between the mainframe and the disk controller. Imaging this as a series of phone calls. The “Hello” is where they initiate a new call, and “Bye” is where they hang up

  • Mainframe: Hello storage controller, I want to read from disk with volume label A1USR1
  • Storage controller: Sorry, it is in use, please try later. Bye.
  • Mainframe : Hello? the telephone line is busy, I’ll retry.
  • Mainframe: Hello, is disk A1USR1 available now?
  • Storage controller: yes. I’ve reserved it for you
  • Mainframe: Please move the disk read heads so they are under cylinder 61
  • Storage controller: OK will do. Bye.
  • Storage controller: Hello. OK the heads are under cylinder 61
  • Mainframe: Read from head 4 (track 4) third record
  • Storage controller: OK will do. Bye.
  • Storage controller: Hello. OK, done that – here is the data
  • Mainframe: Thank you, I’ve finished with the volume A1USR1.
  • Storage controller: You’re welcome – that took 25 milliseconds. Bye.

Today’s conversation is much faster, as mostly data is in the storage controller’s cache. There are still a few phone calls.

Today’s synchronous I/O

With the new synchronous I/O technology (zHyperLink) the conversation is more like the following. (Using the new premium support phone number)

  • Mainframe: Hello Storage controller, I want to read from disk with volume label A1USR1 Cylinder 61 track 4.
  • Storage controller: Just a second – we have it in cache, here is the data. Bye.

Or perhaps the conversation goes like.

  • Mainframe: Hello Storage controller, I want to read from disk with volume label A9SYS1 Cylinder 995 track 2.
  • Storage controller: Just a second – I’m sorry we do not have it in cache. Please use the old way of doing it, using the old customer support number. Bye.

Technical background

With the old way of doing I/O there were multi asynchronous requests coming back from the Storage Controller. It needed CPU to start the I/O ( Start subchannel) then suspend the program until the I/O completed and then resume the program. The program could have been re-dispatched on a different CPU, so its data was not in the processor cache.

Sometimes the amount of CPU used was more than the duration of I/O request! With the coupling facility(CF), came technology to issue an I/O request synchronously. With this the “Issue CF request” was one mainframe instruction which started the I/O, waited for the response and then resumed. There was no z/OS interrupt, and there was no dispatcher involved. Generally this used less CPU than the asynchronous request, and was faster.

The duration of the synchronous request depends on the work being done in the CF. The more work it does, the longer the instruction takes. For example the CF is doing some work on some data for a different processor, and locks a resource. This will delay other users of the data, who get locked out. Also as the physical distance between the CF and the mainframe increased, the overall duration increased (due to the speed to light).

At some point it is more efficient to use the asynchronous request; send the request; suspend until it has completed; rather than the send the request and wait.

The CF code has logic to determine if the request should be synchronous or asynchronous. Even though the recent requests have all been asynchronous, it will periodically try a synchronous request to see if it is worth switching to synchronous mode.

Additional hardware

For the mainframe to support this synchronous I/O it needs additional hardware.

  • It continues to need the FIbreCONnection (FICON) today for those requests that do not support synchronous I/O.
  • It needs new zHyperLink cables connected from the mainframe to the Storage Controller. You need a direct connection, not via a switch.
  • The CF and mainframe need to be closer than 150 meters.
  • You can have duplexed CFs which both have to be within 150 meters.

One Minute MVS performance – TCP/IP

Question: In your car how do you tell if your car has a problem? Answer: You look at the dashboard and see if there is a red light showing. You may not know how to fix it – but you know that you need to get help to fix it.

The aim of this series of blog posts is to show you what to look for in z/OS performance and if you have a problem.

I will cover

What is a TCP/IP performance problem?

People complain about a TCP/IP performance problem when “it” seems slow. This could be caused by a variety of problems

  • Data between two ends is being discarded. This can occur on an unreliable, or overloaded component, whose default action is to throw away data, knowing it will be resent.
  • The time taken to get from one end to the other and back (“a ping”) is slow. This can be caused by slow or overloaded components.
  • There is a lot of data to send, for example a movie, or a web page with lots of javascript or graphics.
  • Or all of the above.

There is a quote “Never under estimate the bandwith with of a lorry full of tapes”. It might take 10 hours, but a truck 6 ft wide by 20 ft long could hold 300,000 1TB tapes and deliver 8 TBytes/second (with a round trip time of 20 hours). Which is more than the internet can provide!

You need to know

  • Are packets being thrown away? You see this from the number of packets which were resent.
  • What is the round trip time? (You could use ping – but you may not be able to)
  • Is data being sent efficiently – in big blocks?

TCP/IP concepts

With TCP/IP there is a connection between a sender and a receiver. The sender sends numbered packets of data to the receiver. The receiver sends an acknowledgement that a packet has been received.

The following is a representation of the flow

  • The sender sends packet 1
  • The sender sends packet 2
  • The sender sends packet 3
  • The receiver receives packet 1 and sends an acknowledgement for packet 1
  • The sender sends packet 4
  • The receiver receives packet 2 and sends an acknowledgement for packet 2
  • The sender waits until the acknowledgement of packet 1 has been received
  • The sender sends packet 5 and waits till the acknowledgement of packet2 has been received
  • etc

This way it is self limiting. It means the sender cannot send more than the receiver can handle.

If a packet goes missing, eventually the sender gets a time out, and resends it.

There are two parts to “performance”.

  1. FTP like: How much data can be sent per second. This is of interest to FTP and MQ, where there is mainly a one way transmission of lots of data. The round trip time is not so critical if you can have a lot of data in transit.
  2. Transactional: Send some data and wait for the remote end to respond, for example a web browser. The amount of data may be measured in KB, but the round trip time is important.

The term “window” is often used in TCP/IP.

The term “send window” on the sender side represents the total number of packets yet to be acknowledged by the receiver. With a bigger window, there is more data in the pipe line, and the throughput goes up. With a window of 1, one packet is sent and the sender waits for the acknowledgement before sending the next. With this, if there is a high latency, the overall throughput will be low.

More details

One of the factors that affects performance is the receive buffer size. If this was set to 4KB, it means that an application can read up to 4 KB of data at a time. This receive buffer size is sent to the sender, and basically says “send chunks up to this size – as that is all the receiver can take” – this sets the send-buffer-size.

The term Dynamic Right Sizing(DRS) allows the TCP receive buffer size to expand if the network conditions are favourable.

The term Outbound Right Sizing(ORS) allows the TCP send buffer size to expand if the network conditions are favourable.

Another term used is congestion window. If too much data is sent, or the network is unreliable, packets will get lost or thrown away. The congestion window is a measure of how much data can be in-flight. If packets get lost, the congestion window is made smaller. If packets are not lost, then it will try to increase the congestion window. This is a very rough indication of the quality of the network.

FTP like performance

There are several factors which can improve the throughput down a connection

  • Make packets bigger. In the early days of TCP/IP a typical packet was 256 bytes. These days a typical default packet size can be 64KB or more.
    • One of the Smarts in the protocol is called dynamic right sizing, where TCP will send increasing larger packets until the receiver says “big enough”. The packet size can change with load.
  • How much data to send before waiting for the acknowledgement. For a reliable connection, where data is never lost, it is efficient to send a lot of data before waiting. This is called a large send window.
  • If the connection is unreliable, it may be more efficient to have only a small send window, before waiting for the acknowledgement.

Transactional work

  • Having big buffers may not improve throughput, for example with a web page, the data may all fit into 2KB. In this case having a buffer size of 16KB or 64 KB may make no difference to throughput or performance.
  • Typically if one packet contains all the data, then this will be acknowledge as soon as it arrives.
  • Some web pages with a lot of javascript or images, may require big buffers, and many packets.

How to see what is going on

You can use the well known “ping” command to send data to the remote end, and get the response. This gives a measure of the network time.

I found most of the data for looking at performance, is available from the netstat command. I found it useful to capture the output of the command in a file or data set.

What connections are connected to this server?

I use the netstat command in TSO , because my fingers are more used to it, and the command options are more memorable than the omvs command ( for example with omvs netstat, do I need the -a or -A option)

netstat conn (port 1414
netstat conn report hlq colin ( port 1414
netstat conn report dsn ‘colin.output’ ( port 1414

These all gave the same output. The report hlq colin creates a data set colin.netstat.conn. The data set name is from the hlq, ‘netstat’, and the subcommand. You can specify a data set name using the ‘dsn’ option.

For omvs you can use

netstat -c -p TCPIP -P 1414 > filename

That lists all of the connections for port 1414.

The command gave me

MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:18:34    
User Id  Conn     Local Socket           Foreign Socket         State    
-------  ----     ------------           --------------         -----    
CSQ9CHIN 00000023 10.1.1.2..1414         10.1.0.2..60538        Establsh 
CSQ9CHIN 00000022 0.0.0.0..1414          0.0.0.0..0             Listen   

There is one connection established from 10.1.0.2 port 60538 to the server with the port listening on 1414.

The commands below give a lot of information about the connection

netstat all report hlq colin (ipport 10.1.0.2+60538
netstat -A -p TCPIP -B 10.1.0.2+60538 > all.port1

Output from the netstat command

The fields are described at the bottom of this page.

Both commands gave me the same output.

There is a lot of data. I’ve broken it into sections with comments after the interesting fields.

  MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:23:29 
  Client Name: CSQ9CHIN                 Client Id: 00000023 
  Local Socket: 10.1.1.2..1414          Foreign Socket: 10.1.0.2..60538 
  BytesIn:            0000002988        BytesOut:           0000002912 
  SegmentsIn:         0000000019        SegmentsOut:        0000000011   
  
  • 09:23:29 is the time when request was made. If you repeat the command you can get the interval between commands, and so calculate rates.
  • You get the client (job) name CSQ9CHIN.
  • The listener socket for the job (local socket) 10.1.1.2 with port 1414.
  • The foreign socket – the remote end of the connection. IP address 10.1.0.2 port 60538.
  • You can get the data rate If you repeat the command, calculate the deltas BytesIn and BytesOut, and divide by the time between measurement.
  StartDate:          06/16/2021        StartTime:          10:00:21 
  Last Touched:       10:20:37          State:              Establsh 
  RcvNxt:             2019327903        SndNxt:             0864946572 
  ClientRcvNxt:       2019327903        ClientSndNxt:       0864946572 
  InitRcvSeqNum:      2019324914        InitSndSeqNum:      0864943659 
  CongestionWindow:  0000018720        SlowStartThreshold: 0000065535 
  

Look at the congestion window. Big is good. Small may indicate small amounts of data being sent or it may indicate network problems, either slow connections or packets are being dropped.

IncomingWindowNum:  2019458463        OutgoingWindowNum:  0865008524 
SndWl1: 2019327903 SndWl2: 0864946572
SndWnd: 0000061952 MaxSndWnd: 0000064256

Check the send window. A small (1KB) send window can indicate poor configuration at the remote client, or only small amounts of data are being sent.

SndUna:             0864946572        rtt_seq:            0864946064 
MaximumSegmentSize: 0000001440 DSField: 00
Round-trip information:
Smooth trip time: 6.000
SmoothTripVariance: 12.000

Monitor the smooth route trip time (in milliseconds) this the local end to the remote end, and back. The variance gives a measure of the spread of response times. These are not strictly averages.

If you had a million requests taking 1 millisecond, and then had a long request taking 1000 milliseconds. The “Average” response time would change by a very small amount (to 1.09 milliseconds). The smoothed (or weighted average) may be something like – (99 * previous average + current value) /100. In this case the “average” goes up to 10.9 milliseconds, which is noticeable different.

ReXmt:              0000000000        ReXmtCount:         0000000000 

The re transmits should be zero – or not changing. If this number increases it means the network has lost packets.

DupACKs:            0000000000        RcvWnd:             0000130560  

The receive window is usually set to 2 * receive buffer.

SockOpt:            88                TcpTimer:           00   

Check SockOpt. Check bit 0x08. If set this indicates “delayed acknowledgement disabled”. See Nagle algorithm. This value being set is good.

If this is not set, then sender can delay sending data for up to about 200 ms, and so combine data from different applications into the same packet for the same destination. This reduces network traffic as there are fewer packets, but it delays the data being sent.

TcpSig:             04                TcpSel:             40 
TcpDet: E4 TcpPol: 00
TcpPrf: 81 TcpPrf2: 20
TcpPrf3: 00

For FTP type applications check the TCP Performance Flag TcpPrf. This says if Dynamic Right sizing (using bigger buffers) is enabled. The flag bits are x80 – enabled, x40 Active, x20 Active but disabled. X80 |X40 is good.

The TCP performance flag2 TcpPrf2. This is for outbound right sizing (ORS). A non zero value is good.

DelayAck:           Yes 
QOSPolicy: No
TTLSPolicy: No
RoutingPolicy: No
ReceiveBufferSize: 0000065536 SendBufferSize: 0000065536

These buffer sizes should be large with 64KB or larger, if so the system can dynamically increase them.

They can be configured at the TCP/IP level, or by the application. If they are 64KB or higher then TCP Dynamic Right Sizing can be used (adjust the buffers to match the load).

ReceiveDataQueued:  0000000000 
SendDataQueued: 0000000000

These should always be zero.

  • Received data queued means the application is slow to retrieve the data
  • Send data queued – the application has issued a send – but TCP/IP cannot process it.
SendStalled:        No 
Ancillary Input Queue: N/A

Send stalled should always be no.

What do you need to check?

  • SendStalled, ReceiveDataQueued,SendDataQueued should all be 0. They usually are 0. They would be non zero if there was a problem right now. If the problem gets better, these values would be 0.
  • Check ReXmt = The total number of times a packet has been retransmitted for this connection. This count is historical for the life of the connection.
    • If this is zero then there have been no re transmits, and so no packets lost.
    • If this is non zero, then it could be a historical problem. Wait and reissue the netstat command. If the ReXmt value has changed, this indicates packets are being lost.
  • Check the round trip time (and variance). Is the value what you expected? If there is traffic flowing on the connection, display the value multiple times, and see if there is significant variation.
  • Check ReceiveBufferSize and SendBufferSize. Values of 64KB or larger are good. Small is not good.
  • Check congestion window.

It is good to have some data for a normal day, and a problem day. For example if the packets are often lost, then this may not be the problem. If the SendBufferSize is only 8KB today and was 64KB last week – this would a good place to start looking. So capture and save NETSTAT reports for typical sessions.

What about connections into z/OS

Windows has a netstat command.

On Linux Netstat has been superseded with ss for example

ss –info dst 10.1.1.2
ss –info dst 10.1.1.2:1414
ss –info src 101.0.2

This is ss dash dash info …

gives similar information for connections going to 10.1.1.2, or the address and port 10.1.1.2:1414

Example netstat output from a slow FTP in connection

Client Name: IBMUSER                  Client Id: 000006FE 
Local Socket: 10.1.1.2..1109          Foreign Socket: 10.1.0.2..35508 
  BytesIn:            0220191104        BytesOut:           0000000000
  SegmentsIn:         0000152946        SegmentsOut:        0000083051
  StartDate:          06/28/2021        StartTime:          13:47:56 
  Last Touched:       14:24:28          State:              Establsh 
  RcvNxt:             3569682809        SndNxt:             2105824963
  ClientRcvNxt:       3569577977        ClientSndNxt:       2105824963
  InitRcvSeqNum:      3349491704        InitSndSeqNum:      2105824962
  CongestionWindow:   0000005760        SlowStartThreshold: 0000065535
  IncomingWindowNum:  3569946679        OutgoingWindowNum:  2105889219
  SndWl1:             3569681369        SndWl2:             2105824963
  SndWnd:             0000064256        MaxSndWnd:          0000064256
  SndUna:             2105824963        rtt_seq:            2105824962
  MaximumSegmentSize: 0000001440        DSField:            00 
  Round-trip information: 
    Smooth trip time: 3.000             SmoothTripVariance: 2.000 
  ReXmt:              0000000000        ReXmtCount:         0000000000
  DupACKs:            0000000000        RcvWnd:             0000263870 
  SockOpt:            A0                TcpTimer:           00 
  TcpSig:             04                TcpSel:             40 
  TcpDet:             E0                TcpPol:             00 
  TcpPrf:             E0                TcpPrf2:            28 
  TcpPrf3:            00 
  DelayAck:           Yes 
  QOSPolicy:          No 
  TTLSPolicy:         No 
  RoutingPolicy:      No 
  ReceiveBufferSize:  0000184351        SendBufferSize:     0000184320 
  ReceiveDataQueued:  0000104832 
    OldQDate:         06/28/2021        OldQTime:           14:24:27 
  SendDataQueued:     0000000000 
  SendStalled:        No 
  Ancillary Input Queue: N/A 
  Application Data:   EZAFTP0S D IBMUSER   C      FSSH 

Comments

  • Congestion window low
  • Smooth trip time: 3.00 good
  • ReXmt: 0 good
  • Receive buffr 184351- good
  • Receive buffer queued 104832 – BAD

One Minute MVS performance – Work Load Manager – looking at WLM reports.

I have a set of blog posts relating to getting started with z/OS performance. This blog post follows on the overview of WLM, and describes the contents of the reports, and how you can tell if work is being delayed, and why it is being delayed.

Real goals from my system

For TSO on my z/OS there are goals

  1. For the first 800 service units (a systems independent measure of CPU usage)
    1. 80% requests to complete within 00:00:00.30
    2. Work has importance 2
  2. After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

  1. Execution velocity of 30
  2. Importance 3

For started tasks with Low Priority the goals are

  1. Discretionary – there no goals – just do your best

How do I tell what is going on and if the goals have been met?

RMF can display data in near real time (every minute or so).

RMF captures data and produces SMF records which can be processed by RMF and other products.

You can report on

  1. How well the service class did against its goals
  2. How well transactions or work did, from a reporting class.

You could have all CICS transactions in a service class, so they get the same CPU profile etc, but have different reporting classes. You can monitor CE* transaction, and PAY* transactions differently.

You could have a reporting class for work coming in from other systems, depending on the userid.

I set up a reporting class for z/OSMF. In the RMF batch report SYSRPTS(WLMGL(RCPER(ZOSMF)).

One part of the report was contained


         z/OS V2R4               SYSPLEX ADCDPL             DATE 06/14/2021           INTERVAL 05.00.003   
                                 RPT VERSION V2R4 RMF       TIME 09.25.00
POLICY=ETPBASE                        REPORT CLASS=ZOSMF                                   PERIOD=1 
 -TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF  TRANS-APPL%-----CP-IIPCP/AAPCP-IIP/AAP  ---ENCLAVES--- 
 AVG        1.00  ACTUAL                    0  TOTAL        66.25       64.20  173.99  AVG ENC   0.00 
 MPL        1.00  EXECUTION                 0  MOBILE        0.00        0.00    0.00  REM ENC   0.00 
 ENDED         0  QUEUED                    0  CATEGORYA     0.00        0.00    0.00  MS ENC    0.00 
 END/S      0.00  R/S AFFIN                 0  CATEGORYB     0.00        0.00    0.00 
                                                                                                                
 ----SERVICE----   SERVICE TIME  ---APPL %---  --PROMOTED--  --DASD I/O---  ----STORAGE----  -PAGE-IN RATES- 
 IOC        2366K  CPU  720.505  CP     66.25  BLK    0.000  SSCHRT    0.2  AVG    81420.24  SINGLE      0.0 
 CPU      617333   SRB    0.223  IIPCP  64.20  ENQ    0.000  RESP      0.0  TOTAL  81421.05  BLOCK       0.0 
 MSO      154219   RCT    0.000  IIP   173.99  CRM    0.000  CONN      0.0  SHARED     0.00  SHARED      0.0 
 SRB         191   IIT    0.013  AAPCP   0.00  LCK    0.889  DISC      0.0                   HSP         0.0 
 TOT        3138K  HST    0.000  AAP      N/A  SUP    0.000  Q+PEND    0.0 
 GOAL: EXECUTION VELOCITY 70.0%     VELOCITY MIGRATION:   I/O MGMT  28.3%     INIT MGMT 28.3% 
                                                                                                                
          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  -------------- EXEC DELAYS % -----------  
 SYSTEM                    VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU                                
 S0W1        --N/A--       28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19                               

Key fields:

INTERVAL 05.00.003

This tells the duration of the requests.

POLICY=ETPBASE REPORT CLASS=ZOSMF PERIOD=1

This tells you this is a report class (rather than a service class) the name is zOSMF, and is for period 1 . When you have service classes which have more than one criteria , such as high priority for the first 0.5 seconds of CPU – then low priority, these will have multiple periods.

-TRANSACTIONS–
AVG 1.00
MPL 1.00
ENDED 0
END/S 0.00

This says on average there was one instance running. You can have multiple transactions or jobs in a class. Add up the total duration of all jobs/transactions and divide by the interval to get the average(AVG).

MPL (multi programming level) is an advanced topic and describes how many instances were concurrently active.

No jobs/transactions ended in this interval, with a ending rate of 0 in 5 minutes.

—APPL %—
CP 66.25
IIPCP 64.20
IIP 173.99
AAPCP 0.00
AAP N/A

This shows the percentage of CPU used over the interval

  • 66.25 percent on GP engines
  • 64.20 percent IIPCP is 64.20 % of GP engine was doing work that could have run on a ZIIP – if there had been spare ZIIP capacity. 66.25 – 64.20 = 2.05 of work on a GP that was not ZIIP eligible.
  • 173.99 percent of ZIIP work running on a ZIIP engine – so nearly 2 ZIIP engines were being used
  • 0 AAPCP – there was no ZAAP eligible work offloaded onto a GP
  • 0 AAP there was no work running on an ZAAP

The total ZIIP used was 173.99 in ZIIP engines, +64.20 of a GP = 238 or almost 2.5 ZIIP engines worth.

It is good to run on ZIIPs where possible, because ZIIPs are cheaper ($$) than GPs, and GPs may be configured to be slower than a ZIIP.

GOAL: EXECUTION VELOCITY 70.0%

The performance goal for this work was defined as Execution Velocity of 70 %.

 
         EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM  VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU      
 S0W1    28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19       
  • The achieved execution velocity was 28.3% against a target of 70%
  • The performance index was 2.5. The performance goal is goal/actual. A value of 1 or smaller is good. The value here shows the goal was not met. You need to consider
    • Changing the goal for this work so the target goal is what you can achieve on a normal day
    • Changing the importance of the work for when the system is constrained.
    • If you change the goal for one set of work – it may impact other work, so you need to look at the system as a whole and decide which is your important work.
    • Add more CPUs or ZIIPs – these may not help if the delays are not CPU… see below
  • Average number of address spaces in this class 1.
  • EXEC USING%. The figures above were for true CPU used. WLM samples activities 4 times a second. Of the samples where jobs were running or waiting for waiting for a resource.
    • 8% of an CPU engine was used – this includes ZIIP work running on GP.
    • 20% of a ZIIP engine
    • The ratio 8:20 is similar to CPU on GP and ZIIP actually used in this period of 66.25: 173.99.
  • EXEC DELAYS
    • The total delay was 72% = ( 100 – (8+20) “using samples” above)
    • for 53% of all the samples it was was waiting for a ZIIP engine
    • for 19% of all the the samples it was waiting for a GP engine.
    • You can have other delays listed here, for example paging, or your program is capped to limit how much CPU it is allowed.

Once z/OSMF had started, and settled down, there were still delays for IIP (28%). To me this looks like a lumpy workload, that perhaps there is a timer which pops and runs multiple threads. There are more threads than IIPs – so some have to wait.

Reports for transactional work

I defined a transaction so I could measure the response times (and CPU used) for a service in z/OSMF. A TSO address space is started, and z/OSMF sends a client/server request to the TSO address space. The response time is sub-second so a good candidate to demonstrate WLM for a transaction.

I configured z/OSMF to have

<zosWorkloadManager collectionName=”MOPZCET”/>
<wlmClassification>
<httpClassification transactionClass=”ZCI3″ resource=”/zosmf/webispf/*/“/>
</wlmClassification>

The collection name is passed to WLM to determine the service class and report class of the work. The default is the server name.

All ISPF (with a URL of /zosmf/webispf/*) requests were classified as ZCI3.

I then used WLM to configure

  • a service class ZCI3 with Average response time of 00:00:00.010
  • a classification rule for type CB, a rule for CN=MOPZCET, and sub-rule TC = ZCI4. This gave the service class and report class.

The data in the report had

-TRANSACTIONS–
AVG 0.01
MPL 0.01
ENDED 21
END/S 0.07

21 transactions in 5 minutes is 0.07 a second.

MPL (MultiProgramming Limit is the target which represents the number of address spaces that must be in the swapped-in state for the service class period to meet its goals. I’ve never used it!

TRANS-TIME HHH.MM.SS.FFFFFF
ACTUAL               140526
EXECUTION            139950
QUEUED                  575

The average time was 0.140 seconds.

GOAL: RESPONSE TIME 000.00.00.010 AVG

That was the specification in WLM (note the specified value of 0.010 is very different to the 0.140 achieved)


          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP 
 S0W1     000.00.00.140526 66.7 14.1   0.0  0.0 N/A  18 0.0  9.1 9.1  

This shows the average response time was 0.140 seconds, used 18% on a ZIIP, and waited 9% of the time for a ZIIP

To the right of the data in the report was

--- DELAY % --- 
UNK IDL CRY CNT                 
 64 0.0 0.0 0.0 

Which says there was 64% of the delay was unknown. This could be

  • waiting for end user input
  • waiting for TCP/IP data
  • the program sent off a request and is waiting for a response.

For example the ISPF transaction in z/OSMF had sent a request to an address space running TSO. This address space processed the request and sent the response back. I am guessing that the 64% delay was waiting for TSO to process the request and send back the response.

You also get a response time profile based on the service class

                              ----------RESPONSE TIME DISTRIBUTION---------- 
   -----TIME------  # TRANS   0    10   20   30   40   50   60   70   80   90   100 
   HH.MM.SS.FFFFFF  IN BUCKET |....|....|....|....|....|....|....|....|....|....| 
<= 00.00.00.014000          0  > 
<= 00.00.00.015000          0  > 
<= 00.00.00.020000          2  >>>>>> 
<= 00.00.00.040000          5  >>>>>>>>>>>>> 
>  00.00.00.040000         14  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 

This shows that out of the 21 requests, 7 were below 0.040 seconds, and 14 were over 0.040 seconds.

From the service class, it was specified as GOAL: RESPONSE TIME 000.00.00.010 AVG so this goal is very badly specified. It would be better set to average of 0.140 seconds.

I changed the service class to a goal of 0.140 seconds and activated it. After I had run some tests the output was

          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS %
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT            
 S0W1     000.00.00.097733  100  0.7   0.0  0.0 N/A  50 0.0  0.0            

Which showed no delays

and a response time profile

                                ---RESPONSE TIME DISTRIBUTION--- 
    -----TIME------  --# TRANS  0    10   20   30   40   50   60
    HH.MM.SS.FFFFFF  IN BUCKET  |....|....|....|....|....|....|.
 <= 00.00.00.070000          0  > 
 <= 00.00.00.084000          5  >>>>>>>>>>>>>>> 
 <= 00.00.00.098000          9  >>>>>>>>>>>>>>>>>>>>>>>>>> 
 <= 00.00.00.112000          1  >>>> 
 <= 00.00.00.126000          0  > 
 <= 00.00.00.140000          1  >>>> 
 <= 00.00.00.154000          1  >>>> 
 <= 00.00.00.168000          0  > 
 <= 00.00.00.182000          0  > 
 <= 00.00.00.196000          0  > 
 <= 00.00.00.210000          1  >>>> 
 <= 00.00.00.280000          0  > 

An average of 0.10 seconds, with some taking up to 0.210 seconds.

Real time information

You can get the information in near real time from RMF (or other monitors)

For example for processor delays

            Service  CPU  DLY USG EAppl  ----------- Holding Job(s) ---------
Jobname  CX Class    Type  %   %    %     %  Name      %  Name      %  Name 
IZUSVR1  SO STCHIM   CP     2  35 56.53   91 IZUSVR1    4 JES2MON    2 TCPIP 
                     IIP   94  95 183.1   89 IZUSVR1                         

This shows that job IZUSVR1

  • Was delayed for 2% of the time on a GP
  • Used 35% of the GP engines
  • Was delayed 94% of the time on a ZIIP
  • and used 95% of the available ZIIP resource
  • The jobs using CPU were IZUSVR1 (using 91%) JES2MON and TCPIP
  • The jobs using ZIIP were IZUSVR1

What to do now?

You need to identify the goals of your work, and set sensible goals. This may take several iterations. You also need to understand the priorities of the work, and userid.

Once you have configured your system to report on response times of your business critical work, you can adjust the service classes so your work achieves it goals.

Define reporting classes so you can monitor different groups of work and that they are meeting their goals.

One Minute MVS performance – Work Load Manager – background

Question: In your car how do you tell if your car has a problem? Answer: You look at the dashboard and see if there is a red light showing. You may not know how to fix it – but you know that you need to get help to fix it.

The aim of this series of blog posts is to show you what to look for in z/OS performance and if you have a problem.

I will cover

Ive written a blog post on how to understand reports from WLM

Managing workload

40 years ago the pilot of a commercial jet had many knobs and dials to control the performance (and speed) of the aeroplane. These days computers do most of the small tuning; the pilot sets the overall goals, and the computer does the rest.

It is the same with managing workload performance on z/OS. The systems programmer used to individually adjust the performance of jobs and transactions running on z/OS. These days the systems programmer sets the overall goals and the computer does the rest.

The work on z/OS is managed by the WorkLoad Manager (WLM). The systems programmer defined goals like

  • these CICS transactions should run with an average response time of 1 second.
  • The trivial TSO commands should run in under 10 milliseconds.
  • Batch – I dont care… it can run in the background.

I remember one customer saying that one day when he switched on WLM (so the WLM managed the workload), he noticed that the batch workload finished early, it made every thing go faster!

This is because when the system had been manually “tuned”; the CICS transactions were finishing in under half a second (much faster than the requirements of of 1 second). WLM worked to the goals, and the CICS transactions executing with a response time of 1 second. This meant there was spare CPU, and more batch workload could be done during the day.

How to monitor work?

For short lived requests, like CICS transactions or TSO commands, the response time is the obvious metric. Typically the response time is under a second, and at the end of a minute you should have many data points to tell you if you are achieving your response time goals.

Long running jobs or started tasks may run for weeks, so the time to run the job is meaningless. It could run slower when the system is busy, or run faster when the system is lightly loaded. It needs a second by second metric to measure progress.

WLM uses the concept of how much the work can be delayed. It uses a metric called Execution velocity which in concept is “the ratio of CPU used” to “time waiting for a resource”. In simple terms

100 * CPU used /(CPU used + wait time for a resource).

WLM can periodically check this ratio and adjust the priority of the work to achieve the goals.

If the execution velocity is 100 then it is not delayed for CPU.

If the execution velocity is 1 then if the work used 1 second (or millisecond) of CPU, then it is OK for the job to be delay for I/O or waiting for CPU for 100 seconds (or milliseconds). The ratio is important – not the absolute values.

How to work out which work to dispatched?

Every 10 seconds WLM looks at the data to decide which service classes need more or less CPU

WLM looks at all the work, and if it is meeting the goals for the service class (the definition of the goals).

  1. If all the work is within its goals pick any waiting work to dispatch
  2. If any work is not within its goal, adjust the dispatching priorities. Start with work with Importance 1, when this is within its goals, look at work with Importance 2 etc..

What can you configure

You can configure the system with goals like

  1. CICS transactions should take 1 second elapsed time to execute.
  2. Quick TSO commands using less than 0.5 seconds of CPU have high velocity.
  3. Slow TSO command using more than 0.5, but less than 5 seconds of CPU have Importance 3.
  4. Expensive TSO commands using more than 5 seconds of CPU have low priority
  5. Colin’s TSO userid always gets high priority regardless of the commands.
  6. Batch jobs with this accounting information, can run with high velocity
  7. Long batch jobs, or those batch jobs using more than 1 second of CPU, have low priority.

Work can get tracked across the system, and if WLM detects that CICS transactions are slowing down, then when the CICS issues a DB2 request in a different LPAR in the sysplex, it makes sure the request in DB2 has a high enough priority to keep the response time goals. WLM can also prioritise I/O so that the I/O for one transaction takes precedence over the I/O for a batch job.

The systems programmer creates a few broad categories of work, and specifies the goals of the service class. These service classes control the priority of work.

Use the WLM redbook for guidance on defining WLM service classes.

Service classes define the goals.

You have reporting classes for groups of similar jobs or transactions to report WLM information on these similar jobs or transactions. So although a group of work has the same service class, you can report it different ways, for example by transaction, or by userid.

You can define CICS, IMS, or Liberty as a server, and transactions/work within the server get WLM classified. So for job CICSA, the transactions PAY1,PAY2,PAY3 have high priority; for z/OSMF, userid COLIN has high priority.

What class is my work in?

You can display which service class a job is in using the D A,jobname operator command, for example it gave

WKL=STARTED SCL=STCLOM .

The Service CLass is STCLOM.

You can use SDSF DA, and use the column SrvClass. (You need to start RMF, then go into SDSF to display the Srvclass and other WLM related parameters).

You can change the service class of a job by using the operator command

RESET IZUSVR1,SRVCLASS=STCMDM

(Note which way round the letters are jobname IZUSVR1, service class SRVCLASS=…)

or, if you are authorised, overtype the field in SDSF, or from z/OSMF WLM plugin.

To change it permanently you’ll need to change the WLM definitions.

More details of how it works

I found the WLM redbook useful.

I described above that the execution velocity was 100 * CPU used /(CPU used + wait time for a resource).

The concept is correct – but the implementation is different. If my job had used 1000 seconds of CPU since it started, it is not helpful in seeing it behaviour over the last few minutes, as the execution velocity would be insensitive.

Every 250 milliseconds (4 times a second) WLM looks at every job/transaction in the system. It then updates internal control blocks for each Service Class and Report Class and increments a table.

  • executing – add 1 to the active (or using) CPU
  • transferring data to a device (connect time) add 1 to the active ( or using) I/O
  • waiting in z/OS to start an I/O – add one to the delayed for I/O
  • being paged in – add 1 to the delayed for”page in”
  • etc
  • waiting for the end user to enter data – do nothing.
  • waiting for TCPIP data – do nothing.

Execution velocity = 100 * (Total active samples /(Total active samples + Total delayed samples).

If during a 25 second period the transaction was

  • using CPU, in 20 samples,
  • transferring data to disk, in 10 sample
  • waiting to start an I/O, in 25 samples
  • waiting for the end user to type some data, in 45 samples.

From this we can see…

  • The count of active samples is 20 + 10
  • The count of delayed is 25 samples.
  • 45 samples are not used.

Execution velocity = 100 * ( (20 + 10) /(20+10) + 25)) = 55 .

An execution velocity of 100 means that when ever the job was sampled, it was always either dispatched and running; or transferring data to I/O.

An execution velocity says if we expect the job to use 50 seconds of CPU, and has a velocity of 10 then we expect the job to run in about 500 seconds. If it used 50 seconds of CPU, and was transferring data (connect time) of 20 seconds, the execution velocity would be 100 * (50 + 20) /((50+ 20) + 450) = 13 % which is close enough to 10% velocity.

Real goals from my system

For TSO on my z/OS there are goals

  1. For the first 800 service units (a systems independent measure of CPU usage)
    1. 80% requests to complete within 00:00:00.30
    2. Work has importance 2
  2. After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

  1. Execution velocity of 30
  2. Importance 3

For started tasks with Low Priority the goals are

  1. Discretionary – there no goals – just do your best

I cut the CPU cost of doing nothing.

I was running z/OSMF and saw that the CPU costs where high when it was sitting there doing nothing. I managed to reduce the CPU costs by more than half. This would apply to other Liberty based web servers, such as MQWEB, and z/OS Connect.

I could see from the MVS system trace there was a lot of activity creating a thread, and deleting a thread, a lot of costs associated with these activities, such as allocating and freeing storage.

I increased the number of threads so that this allocating a thread and delete a thread activity disappeared.

In the xml configuration file (based from server.xml) was the default

<executor name=”LargeThreadPool” id=”default” coreThreads=”100″
maxThreads=”0″ keepAlive=”60s” stealPolicy=”STRICT”
rejectedWorkPolicy=”CALLER_RUNS” />

I changed this to

<executor name=”LargeThreadPool” id=”default”
coreThreads=”300″ maxThreads=”600″ keepAlive=”60s”
stealPolicy=”STRICT” rejectedWorkPolicy=”CALLER_RUNS” />

and restarted the server.

The options are documented here. There is an option keepAlive which defaults to 60 seconds. If a thread has been idle for this time, the thread is a candidate to be freed to reduce the pool back to corethreads size.

I was alerted to this problem when I looked at an MVS system trace. This is described here.

There is a discussion how sun thread pools work in this post. It is not obvious. This may or may not be how this executor works.

What value should you use?

This is a hard question, as Liberty does not provide this information directly.

I used the Health Checker connects from Eclipse to the JVM and extracts information about the JVM and applications.

This shows that at rest there was a lot of activity. I increased it to 250 threads and restarted the server and got

So better … but still some activity. I increased it to 300 threads, and the graph was flat.

I set up USER.Z24A.PROCLIB(CEEOPT) with

RPTSTG(ON),
RPTOPTS(ON)

in my z/OSMF job I had

//CEEOPTS DD DISP=SHR,DSN=USER.Z24A.PROCLIB(CEEOPT)

This printed out a lot of useful information about the stack and heap usage. It the bottom it said

Largest number of threads concurrently active: 397

The number of threads includes threads from the pool I had specified, plus other threads that z/OSMF creates. The health check showed there were 372 threads, event though coreThreads was set to “300”.

I also used jconsole to display information about the highest thread usage. The URL was service:jmx:rest://10.1.1.2:10443/IBMJMXConnectorREST. It displays peak threads and live threads.

Security

I found the security of both jconsole, and health check, was weak (userid and password). I was unable to successfully set up a TLS certificate logon to the server.

The information from rptstg was only available at shutdown.

Why does increasing the number of threads reduce the CPU when idle?

The thread pool has logic to remove unused threads and shrink it to the coreThreads size. If the pool size is too small it has to create threads and delete threads according to the load. See here. The keepAlive mentioned at the top is how long a thread can be idle for, before it can be considered a candidate for deletion.

Summary

Monitor the CPU used when idle and see if increasing the threadpool to 300 helps.