Running a python rest application on z/OS

I installed Python and co-req packaged on my z/OS system, described here. I wanted to run a REST workload into z/OSMF. I could have used, Liberty, z/OS Connect or MQWEB as the backend.

It makes use of the python requests package.

Initial script

#!/usr/bin/env python3 
import requests 
from timeit import default_timer as timer 
import urllib3 
my_header = { 
  'Connection': 'keep-alive', 
  'Content-Type': 'application/json', 
  'Cache-Control': 'max-age=0', 
  'Authorization': 'Basic Y395aW46cGFu67GhlMG4=', 
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
  'Accept-Language': 'en-GB,en;q=0.5', 
  'DNT': '1', 
  'Connection': 'keep-alive', 
  'Upgrade-Insecure-Requests': '1', 
  'Cache-Control': 'max-age=0' 
}
urllib3.disable_warnings() 
geturl ="https://10.1.1.2:29443/zosmf/rest/mvssubs?ssid=IZUG" 
jar = requests.cookies.RequestsCookieJar() 
start=timer() 
res = requests.get(geturl,headers=my_header,verify=False,cookies=jar) 
end=timer() 
print("duration=",end-start) 
if res.status_code != 200: 
  print(res.status_code) 
print("Output=",res.text) 
jar=  res.cookies

Comments on the python script

Authorization’: ‘Basic Y395aW46cGFu67GhlMG4=’ is the 64 bit encoding of the userid and password. Which is trivial(!) to decode.
urllib3.disable_warnings() Without this set you get a message InsecureRequestWarning: Unverified HTTPS request is being made to host ‘10.1.1.2’. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings. This is because the certificate sent down from the server has not been validated.
jar= res.cookies says save the cookies into the jar dictionary, for future use

The output was

duration= 1.210928201675415
output= {“items”:[
{“subsys”:”IZUG”, “active”:true, “dynamic”:true, “funcs”:[10]}
],”JSONversion”:1}

Verifying TLS certificate

With urllib3.disable_warnings() present it will cause error warnings to be suppressed.
When this statement is not present, there will be warnings about certificate problems.

In the statement res = requests.get(geturl,headers=my_header,verify=False,cookies=jar) verify is either “False” or the name of a CA .pem file containing the CA certificates. I used verify=ABC and got

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:

I got the error because “ABC” is not a valid file, and the verification could not be done.

I exported the CA certificate used by the server using

RACDCERT CERTAUTH   EXPORT(LABEL('TEMP-CA')) -         
       DSN('IBMUSER.CERT.TEMP.CA.PEM')   -             
       FORMAT(CERTB64) -                               
       PASSWORD('password')

I could only get verify=…. to work with a USS file, so I had to copy the dataset IBMUSER.CERT.TEMP.CA.PEM into a USS file CACert.pem. Then when I used

res = requests.get(geturl,headers=my_header,verify=CACert.pem,cookies=jar)

it worked fine.

Using a client certificate.

You cannot use RACF certificate with the requests facility, because the underlying code does not support it. You have to use .pem style certificate.

The support does not allow you to specify a password for the private key, so this is not very secure.

You define

cf=”colinpaicesECp256r1.pem”
kf=”colinpaicesECp256r1.key.pem”
cpcert=(cf,kf)

where

cf is the name of the file with the certificate in it
kf is the name of the file with the private key in it
cpcert is the python tuple.

If your certificate file also includes the private key, you do not need the kf, just use cpcert=(cf).

You use it

res = requests.get(geturl,headers=my_header,verify=CACert.pem,cookies=jar,cert=cpcert)

I tried exporting a certificate from z/OS using RACDCERT EXPORT … format(PKCS12B64), copying it to a uss file, and using that, but it did not work. The file could not be read.

I tried creating a private key with a password (to make it more secure) but when I used it I got the message

urllib3.exceptions.SSLError: Client private key is encrypted, password is required

There is a package request_pkcs12 which provides support for a password on the certificate. https://github.com/m-click/requests_pkcs12. I did not use this, I recreated my certificate and private key without a password.

I tried running on Linux using my Hardware Security Module (which plugs into a USB socket). This also failed as I could not enter the PIN for the device.

Compare the response time to running across the network.

I ran the same python script on z/OS and on Linux. The round trip time of the rest request was

1.41 seconds on z/OS
0.92 seconds on Linux.

I think I’ll run my tests from Linux in future.

Whoops I left my private key unprotected.

I had set up my certificates and private key so I could use them to work with a web server on z/OS. I set them up on Linux and copied them to z/OS. My tests all worked. It wasn’t till this morning when I thought, those test should have failed.

With Python and the requests package you can send REST requests to a backend server, using digital certificates for authentication. You configure which private key to use, but you are unable to specify the password for the private key. As this worked, there must be no password on the key file – whoops.

This is a bit like leaving your front door unlocked with a note “If we are not here, please come in and make yourself at home”.

To generate a private key I had used

openssl ecparam -name prime256v1 -genkey -noout -out $name.key.pem

but there is no way of specifying a password. With hindsight it was obvious there was no password protection.

I had to use

openssl ecparam -name prime256v1 -genkey -noout | openssl ec -out $name.key.pem -passout file:password.file

Where the first part of the command generated a private key, and wrote it to the output stream. The second part does nothing with the data, but wrote it to the output file using the password in password.file.

When I used this private key, the python requests failed because it needed a password!

I should have remembered

I remember helping with some security testing many years ago.

Userid1 was able to use resource1, good, that worked
Userid2 was able to use resource2, good, that worked
Userid1 was able to use resource2, good ,that worked – hang on… that should have failed.

The person customising security had missed one configuration step, so all users had access to all resources, and the test plan did not have any negative testing.

Installing python on z/OS.

I wanted to use Python on z/OS to run a REST workload. It took a bit of time, but it worked with no major problems.

As well as installing Python, I had to install some packages so I could use the request package to issue REST requests.

This page has a good article about using python on z/OS. I used it to install additional packages.

This post has topics

Allocate a ZFS
Format the ZFS
Mount the ZFS
Set up a profile
Install the request package and co requisites

There is a Python from Rocket software, and one from IBM. They may be the same; I used the IBM version.

There are instructions here. I logged on to my usual IBM web site, where my usual browser remembered my userid and password, then used https://www.ibm.com/products/open-enterprise-python-zos/pricing .

I used the web site https://www.ibm.com/docs/en/python-zos/3.9?topic=configuration-installing-configuring-pax-edition.

When I used the web site directly, I had to enter my userid and password.

The web site has a .pax file, and instructions.

The HAMB390.runnable.pax.Z file used 492418 * 512 byte blocks on z/OS.

I did not have enough space in my ZFS on z/OS, so I had to allocate a new ZFS

Allocate a ZFS

//DEFINE EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE –
CLUSTER –
(NAME(COLIN.ZFS1 ) –
VOLUMES(USER00) –
LINEAR –
MEGABYTES(254 25) –
SHAREOPTIONS(3 3))
/*

Format it

//FORMATFS EXEC PGM=IOEAGFMT,REGION=0M,COND=(0,NE,DEFINE),
// PARM=(‘-aggregate COLIN.ZFS1 -compat’)
//SYSPRINT DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDERR DD SYSOUT=*
//*

Mount it

//MOUNT EXEC PGM=IKJEFT1A,COND=((0,NE,DEFINE),(0,NE,FORMATFS))
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
MOUNT FILESYSTEM(‘COLIN.ZFS1’) TYPE(ZFS) +
MOUNTPOINT(‘/colin2’) MODE(RDWR) PARM(‘AGGRGROW’) AUTOMOVE
/*

If you want to use this ZFS after an IPL you will have to mount it. If you only want to use the ZFS as a temporary ZFS you can unmount and delete it.

I created another ZFS for the unpacked Python with size MEGABYTES(550 50).

I followed the install instructions https://www.ibm.com/docs/en/python-zos/3.9?topic=configuration-installing-configuring-pax-edition using the directory /colin3.

After I had unpacked it, there was a directory usr/lpp/IBM/cyp/v3r9 with members IBM and pyz.

The path for python is /colin3/usr/lpp/IBM/cyp/v3r9/pyz/bin

You may want to consider setting up a symbolic link from /usr/lpp/pyz to the new libraries. From an authorised userid issue

ln -s /colin3/usr/lpp/IBM/cyp/v3r9/pyz /usr/lpp/pyz

Set up a profile

I set up a pythonprof file with

export LIBPATH=/usr/lpp/pyz/lib:$LIBPATH
export _BPXK_AUTOCVT=’ON’
export _CEE_RUNOPTS=’FILETAG(AUTOCVT,AUTOTAG) POSIX(ON)’
# if you are using a bash shell, you should also set these:
export _TAG_REDIR_ERR=txt
export _TAG_REDIR_IN=txt
export _TAG_REDIR_OUT=txt
export PATH=/usr/lpp/pyz/bin:$PATH

When I wanted to run a python profile, I use . ./pythonprof to do the set up.

I then checked python worked using

/usr/lpp/pyz/bin/python3 –version

Which gave me

Python 3.9.2

Install the request package and co requisites

I used the information here to install additional packages so I could use a rest interface from python. I have copied the relevant section and expanded it.

If you want to use private keys with a password you need to use package requests_pkcs12. Use Pip download requests_pkcs12 instead of download requests below.

Our z/OS service isn’t connected to the internet, so we did it like this:

On a laptop with Python and pip installed:

a. mkdir requests
b. cd requests
c. pip download requests

This pulled the “requests” package and four dependencies: The files end in .whl

1. certifi
2. chardet
3. idna
4. urllib

Binary FTP all five to a USS directory (/u/myid/pipdl)

On z/OS, in USS in the directory we uploaded to, install all of the packages using:

. /u/adcd/pythonprof
pip3 install requests --find-links ./

Note lots of error messages as pip tries to access the internet version, but then success at the end.

You can now erase the .whl files.

I ran my python programs – and they worked!

One minute MVS – synchronous I/O.

This blog is one of a series of blog posts about performance topics on z/OS. This post is about synchronous I/O technology which reduces I/O time from milliseconds down to microseconds.

There is a 2018 redbook with some good content. This 2020 presentation is a good introduction. This gets a bit technical and looks at it from the Storage Controller view point.

History

Some protocols are very chatty. For example when you download a television program to your television, you see it as one download. Under the covers the software sees this television program as many parts, and there will be a conversation like “Here is part 1, the first 10 MBs”… “ok got that – please send the next”. Within this conversation is TCP/IP chatter, “here is packet 4013” … “ok got that” etc.

40 years ago the I/O to disks was a bit like the following conversation between the mainframe and the disk controller. Imaging this as a series of phone calls. The “Hello” is where they initiate a new call, and “Bye” is where they hang up

Mainframe: Hello storage controller, I want to read from disk with volume label A1USR1
Storage controller: Sorry, it is in use, please try later. Bye.
Mainframe : Hello? the telephone line is busy, I’ll retry.
Mainframe: Hello, is disk A1USR1 available now?
Storage controller: yes. I’ve reserved it for you
Mainframe: Please move the disk read heads so they are under cylinder 61
Storage controller: OK will do. Bye.
Storage controller: Hello. OK the heads are under cylinder 61
Mainframe: Read from head 4 (track 4) third record
Storage controller: OK will do. Bye.
Storage controller: Hello. OK, done that – here is the data
Mainframe: Thank you, I’ve finished with the volume A1USR1.
Storage controller: You’re welcome – that took 25 milliseconds. Bye.

Today’s conversation is much faster, as mostly data is in the storage controller’s cache. There are still a few phone calls.

Today’s synchronous I/O

With the new synchronous I/O technology (zHyperLink) the conversation is more like the following. (Using the new premium support phone number)

Mainframe: Hello Storage controller, I want to read from disk with volume label A1USR1 Cylinder 61 track 4.
Storage controller: Just a second – we have it in cache, here is the data. Bye.

Or perhaps the conversation goes like.

Mainframe: Hello Storage controller, I want to read from disk with volume label A9SYS1 Cylinder 995 track 2.
Storage controller: Just a second – I’m sorry we do not have it in cache. Please use the old way of doing it, using the old customer support number. Bye.

Technical background

With the old way of doing I/O there were multi asynchronous requests coming back from the Storage Controller. It needed CPU to start the I/O ( Start subchannel) then suspend the program until the I/O completed and then resume the program. The program could have been re-dispatched on a different CPU, so its data was not in the processor cache.

Sometimes the amount of CPU used was more than the duration of I/O request! With the coupling facility(CF), came technology to issue an I/O request synchronously. With this the “Issue CF request” was one mainframe instruction which started the I/O, waited for the response and then resumed. There was no z/OS interrupt, and there was no dispatcher involved. Generally this used less CPU than the asynchronous request, and was faster.

The duration of the synchronous request depends on the work being done in the CF. The more work it does, the longer the instruction takes. For example the CF is doing some work on some data for a different processor, and locks a resource. This will delay other users of the data, who get locked out. Also as the physical distance between the CF and the mainframe increased, the overall duration increased (due to the speed to light).

At some point it is more efficient to use the asynchronous request; send the request; suspend until it has completed; rather than the send the request and wait.

The CF code has logic to determine if the request should be synchronous or asynchronous. Even though the recent requests have all been asynchronous, it will periodically try a synchronous request to see if it is worth switching to synchronous mode.

Additional hardware

For the mainframe to support this synchronous I/O it needs additional hardware.

It continues to need the FIbreCONnection (FICON) today for those requests that do not support synchronous I/O.
It needs new zHyperLink cables connected from the mainframe to the Storage Controller. You need a direct connection, not via a switch.
The CF and mainframe need to be closer than 150 meters.
You can have duplexed CFs which both have to be within 150 meters.

One Minute MVS performance – TCP/IP

Question: In your car how do you tell if your car has a problem? Answer: You look at the dashboard and see if there is a red light showing. You may not know how to fix it – but you know that you need to get help to fix it.

The aim of this series of blog posts is to show you what to look for in z/OS performance and if you have a problem.

I will cover

What is a TCP/IP performance problem?

People complain about a TCP/IP performance problem when “it” seems slow. This could be caused by a variety of problems

Data between two ends is being discarded. This can occur on an unreliable, or overloaded component, whose default action is to throw away data, knowing it will be resent.
The time taken to get from one end to the other and back (“a ping”) is slow. This can be caused by slow or overloaded components.
There is a lot of data to send, for example a movie, or a web page with lots of javascript or graphics.
Or all of the above.

There is a quote “Never under estimate the bandwith with of a lorry full of tapes”. It might take 10 hours, but a truck 6 ft wide by 20 ft long could hold 300,000 1TB tapes and deliver 8 TBytes/second (with a round trip time of 20 hours). Which is more than the internet can provide!

You need to know

Are packets being thrown away? You see this from the number of packets which were resent.
What is the round trip time? (You could use ping – but you may not be able to)
Is data being sent efficiently – in big blocks?

TCP/IP concepts

With TCP/IP there is a connection between a sender and a receiver. The sender sends numbered packets of data to the receiver. The receiver sends an acknowledgement that a packet has been received.

The following is a representation of the flow

The sender sends packet 1
The sender sends packet 2
The sender sends packet 3
The receiver receives packet 1 and sends an acknowledgement for packet 1
The sender sends packet 4
The receiver receives packet 2 and sends an acknowledgement for packet 2
The sender waits until the acknowledgement of packet 1 has been received
The sender sends packet 5 and waits till the acknowledgement of packet2 has been received
etc

This way it is self limiting. It means the sender cannot send more than the receiver can handle.

If a packet goes missing, eventually the sender gets a time out, and resends it.

There are two parts to “performance”.

FTP like: How much data can be sent per second. This is of interest to FTP and MQ, where there is mainly a one way transmission of lots of data. The round trip time is not so critical if you can have a lot of data in transit.
Transactional: Send some data and wait for the remote end to respond, for example a web browser. The amount of data may be measured in KB, but the round trip time is important.

The term “window” is often used in TCP/IP.

The term “send window” on the sender side represents the total number of packets yet to be acknowledged by the receiver. With a bigger window, there is more data in the pipe line, and the throughput goes up. With a window of 1, one packet is sent and the sender waits for the acknowledgement before sending the next. With this, if there is a high latency, the overall throughput will be low.

More details

One of the factors that affects performance is the receive buffer size. If this was set to 4KB, it means that an application can read up to 4 KB of data at a time. This receive buffer size is sent to the sender, and basically says “send chunks up to this size – as that is all the receiver can take” – this sets the send-buffer-size.

The term Dynamic Right Sizing(DRS) allows the TCP receive buffer size to expand if the network conditions are favourable.

The term Outbound Right Sizing(ORS) allows the TCP send buffer size to expand if the network conditions are favourable.

Another term used is congestion window. If too much data is sent, or the network is unreliable, packets will get lost or thrown away. The congestion window is a measure of how much data can be in-flight. If packets get lost, the congestion window is made smaller. If packets are not lost, then it will try to increase the congestion window. This is a very rough indication of the quality of the network.

FTP like performance

There are several factors which can improve the throughput down a connection

Make packets bigger. In the early days of TCP/IP a typical packet was 256 bytes. These days a typical default packet size can be 64KB or more.
- One of the Smarts in the protocol is called dynamic right sizing, where TCP will send increasing larger packets until the receiver says “big enough”. The packet size can change with load.
How much data to send before waiting for the acknowledgement. For a reliable connection, where data is never lost, it is efficient to send a lot of data before waiting. This is called a large send window.
If the connection is unreliable, it may be more efficient to have only a small send window, before waiting for the acknowledgement.

Transactional work

Having big buffers may not improve throughput, for example with a web page, the data may all fit into 2KB. In this case having a buffer size of 16KB or 64 KB may make no difference to throughput or performance.
Typically if one packet contains all the data, then this will be acknowledge as soon as it arrives.
Some web pages with a lot of javascript or images, may require big buffers, and many packets.

How to see what is going on

You can use the well known “ping” command to send data to the remote end, and get the response. This gives a measure of the network time.

I found most of the data for looking at performance, is available from the netstat command. I found it useful to capture the output of the command in a file or data set.

What connections are connected to this server?

I use the netstat command in TSO , because my fingers are more used to it, and the command options are more memorable than the omvs command ( for example with omvs netstat, do I need the -a or -A option)

netstat conn (port 1414
netstat conn report hlq colin ( port 1414
netstat conn report dsn ‘colin.output’ ( port 1414

These all gave the same output. The report hlq colin creates a data set colin.netstat.conn. The data set name is from the hlq, ‘netstat’, and the subcommand. You can specify a data set name using the ‘dsn’ option.

For omvs you can use

netstat -c -p TCPIP -P 1414 > filename

That lists all of the connections for port 1414.

The command gave me

MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:18:34    
User Id  Conn     Local Socket           Foreign Socket         State    
-------  ----     ------------           --------------         -----    
CSQ9CHIN 00000023 10.1.1.2..1414         10.1.0.2..60538        Establsh 
CSQ9CHIN 00000022 0.0.0.0..1414          0.0.0.0..0             Listen

There is one connection established from 10.1.0.2 port 60538 to the server with the port listening on 1414.

The commands below give a lot of information about the connection

netstat all report hlq colin (ipport 10.1.0.2+60538
netstat -A -p TCPIP -B 10.1.0.2+60538 > all.port1

Output from the netstat command

The fields are described at the bottom of this page.

Both commands gave me the same output.

There is a lot of data. I’ve broken it into sections with comments after the interesting fields.

  MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:23:29 
  Client Name: CSQ9CHIN                 Client Id: 00000023 
  Local Socket: 10.1.1.2..1414          Foreign Socket: 10.1.0.2..60538 
  BytesIn:            0000002988        BytesOut:           0000002912 
  SegmentsIn:         0000000019        SegmentsOut:        0000000011

09:23:29 is the time when request was made. If you repeat the command you can get the interval between commands, and so calculate rates.
You get the client (job) name CSQ9CHIN.
The listener socket for the job (local socket) 10.1.1.2 with port 1414.
The foreign socket – the remote end of the connection. IP address 10.1.0.2 port 60538.
You can get the data rate If you repeat the command, calculate the deltas BytesIn and BytesOut, and divide by the time between measurement.

  StartDate:          06/16/2021        StartTime:          10:00:21 
  Last Touched:       10:20:37          State:              Establsh 
  RcvNxt:             2019327903        SndNxt:             0864946572 
  ClientRcvNxt:       2019327903        ClientSndNxt:       0864946572 
  InitRcvSeqNum:      2019324914        InitSndSeqNum:      0864943659 
  CongestionWindow:  0000018720        SlowStartThreshold: 0000065535

Look at the congestion window. Big is good. Small may indicate small amounts of data being sent or it may indicate network problems, either slow connections or packets are being dropped.

IncomingWindowNum:  2019458463        OutgoingWindowNum:  0865008524 
SndWl1:             2019327903        SndWl2:             0864946572 
SndWnd:             0000061952        MaxSndWnd:          0000064256

Check the send window. A small (1KB) send window can indicate poor configuration at the remote client, or only small amounts of data are being sent.

SndUna:             0864946572        rtt_seq:            0864946064 
MaximumSegmentSize: 0000001440        DSField:            00 
Round-trip information:
  Smooth trip time: 6.000              SmoothTripVariance: 12.000

Monitor the smooth route trip time (in milliseconds) this the local end to the remote end, and back. The variance gives a measure of the spread of response times. These are not strictly averages.

If you had a million requests taking 1 millisecond, and then had a long request taking 1000 milliseconds. The “Average” response time would change by a very small amount (to 1.09 milliseconds). The smoothed (or weighted average) may be something like – (99 * previous average + current value) /100. In this case the “average” goes up to 10.9 milliseconds, which is noticeable different.

ReXmt:              0000000000        ReXmtCount:         0000000000

The re transmits should be zero – or not changing. If this number increases it means the network has lost packets.

DupACKs:            0000000000        RcvWnd:             0000130560

The receive window is usually set to 2 * receive buffer.

SockOpt:            88                TcpTimer:           00

Check SockOpt. Check bit 0x08. If set this indicates “delayed acknowledgement disabled”. See Nagle algorithm. This value being set is good.

If this is not set, then sender can delay sending data for up to about 200 ms, and so combine data from different applications into the same packet for the same destination. This reduces network traffic as there are fewer packets, but it delays the data being sent.

TcpSig:             04                TcpSel:             40 
TcpDet:             E4                TcpPol:             00 
TcpPrf:            81                TcpPrf2:            20 
TcpPrf3:            00

For FTP type applications check the TCP Performance Flag TcpPrf. This says if Dynamic Right sizing (using bigger buffers) is enabled. The flag bits are x80 – enabled, x40 Active, x20 Active but disabled. X80 |X40 is good.

The TCP performance flag2 TcpPrf2. This is for outbound right sizing (ORS). A non zero value is good.

DelayAck:           Yes 
QOSPolicy:          No 
TTLSPolicy:         No 
RoutingPolicy:      No 
ReceiveBufferSize:  0000065536        SendBufferSize:     0000065536

These buffer sizes should be large with 64KB or larger, if so the system can dynamically increase them.

They can be configured at the TCP/IP level, or by the application. If they are 64KB or higher then TCP Dynamic Right Sizing can be used (adjust the buffers to match the load).

ReceiveDataQueued:  0000000000 
SendDataQueued:     0000000000

These should always be zero.

Received data queued means the application is slow to retrieve the data
Send data queued – the application has issued a send – but TCP/IP cannot process it.

SendStalled:        No 
Ancillary Input Queue: N/A

Send stalled should always be no.

What do you need to check?

SendStalled, ReceiveDataQueued,SendDataQueued should all be 0. They usually are 0. They would be non zero if there was a problem right now. If the problem gets better, these values would be 0.
Check ReXmt = The total number of times a packet has been retransmitted for this connection. This count is historical for the life of the connection.
- If this is zero then there have been no re transmits, and so no packets lost.
- If this is non zero, then it could be a historical problem. Wait and reissue the netstat command. If the ReXmt value has changed, this indicates packets are being lost.
Check the round trip time (and variance). Is the value what you expected? If there is traffic flowing on the connection, display the value multiple times, and see if there is significant variation.
Check ReceiveBufferSize and SendBufferSize. Values of 64KB or larger are good. Small is not good.
Check congestion window.

It is good to have some data for a normal day, and a problem day. For example if the packets are often lost, then this may not be the problem. If the SendBufferSize is only 8KB today and was 64KB last week – this would a good place to start looking. So capture and save NETSTAT reports for typical sessions.

What about connections into z/OS

Windows has a netstat command.

On Linux Netstat has been superseded with ss for example

ss –info dst 10.1.1.2
ss –info dst 10.1.1.2:1414
ss –info src 101.0.2
This is ss dash dash info …

gives similar information for connections going to 10.1.1.2, or the address and port 10.1.1.2:1414

Example netstat output from a slow FTP in connection

Client Name: IBMUSER                  Client Id: 000006FE 
Local Socket: 10.1.1.2..1109          Foreign Socket: 10.1.0.2..35508 
  BytesIn:            0220191104        BytesOut:           0000000000
  SegmentsIn:         0000152946        SegmentsOut:        0000083051
  StartDate:          06/28/2021        StartTime:          13:47:56 
  Last Touched:       14:24:28          State:              Establsh 
  RcvNxt:             3569682809        SndNxt:             2105824963
  ClientRcvNxt:       3569577977        ClientSndNxt:       2105824963
  InitRcvSeqNum:      3349491704        InitSndSeqNum:      2105824962
  CongestionWindow:   0000005760        SlowStartThreshold: 0000065535
  IncomingWindowNum:  3569946679        OutgoingWindowNum:  2105889219
  SndWl1:             3569681369        SndWl2:             2105824963
  SndWnd:             0000064256        MaxSndWnd:          0000064256
  SndUna:             2105824963        rtt_seq:            2105824962
  MaximumSegmentSize: 0000001440        DSField:            00 
  Round-trip information: 
    Smooth trip time: 3.000             SmoothTripVariance: 2.000 
  ReXmt:              0000000000        ReXmtCount:         0000000000
  DupACKs:            0000000000        RcvWnd:             0000263870 
  SockOpt:            A0                TcpTimer:           00 
  TcpSig:             04                TcpSel:             40 
  TcpDet:             E0                TcpPol:             00 
  TcpPrf:             E0                TcpPrf2:            28 
  TcpPrf3:            00 
  DelayAck:           Yes 
  QOSPolicy:          No 
  TTLSPolicy:         No 
  RoutingPolicy:      No 
  ReceiveBufferSize:  0000184351        SendBufferSize:     0000184320 
  ReceiveDataQueued:  0000104832 
    OldQDate:         06/28/2021        OldQTime:           14:24:27 
  SendDataQueued:     0000000000 
  SendStalled:        No 
  Ancillary Input Queue: N/A 
  Application Data:   EZAFTP0S D IBMUSER   C      FSSH

Comments

Congestion window low
Smooth trip time: 3.00 good
ReXmt: 0 good
Receive buffr 184351- good
Receive buffer queued 104832 – BAD

One Minute MVS performance – Work Load Manager – looking at WLM reports.

I have a set of blog posts relating to getting started with z/OS performance. This blog post follows on the overview of WLM, and describes the contents of the reports, and how you can tell if work is being delayed, and why it is being delayed.

Real goals from my system

For TSO on my z/OS there are goals

For the first 800 service units (a systems independent measure of CPU usage)
1. 80% requests to complete within 00:00:00.30
2. Work has importance 2
After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

Execution velocity of 30
Importance 3

For started tasks with Low Priority the goals are

Discretionary – there no goals – just do your best

How do I tell what is going on and if the goals have been met?

RMF can display data in near real time (every minute or so).

RMF captures data and produces SMF records which can be processed by RMF and other products.

You can report on

How well the service class did against its goals
How well transactions or work did, from a reporting class.

You could have all CICS transactions in a service class, so they get the same CPU profile etc, but have different reporting classes. You can monitor CE* transaction, and PAY* transactions differently.

You could have a reporting class for work coming in from other systems, depending on the userid.

I set up a reporting class for z/OSMF. In the RMF batch report SYSRPTS(WLMGL(RCPER(ZOSMF)).

One part of the report was contained


         z/OS V2R4               SYSPLEX ADCDPL             DATE 06/14/2021           INTERVAL 05.00.003   
                                 RPT VERSION V2R4 RMF       TIME 09.25.00
POLICY=ETPBASE                        REPORT CLASS=ZOSMF                                   PERIOD=1 
 -TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF  TRANS-APPL%-----CP-IIPCP/AAPCP-IIP/AAP  ---ENCLAVES--- 
 AVG        1.00  ACTUAL                    0  TOTAL        66.25       64.20  173.99  AVG ENC   0.00 
 MPL        1.00  EXECUTION                 0  MOBILE        0.00        0.00    0.00  REM ENC   0.00 
 ENDED         0  QUEUED                    0  CATEGORYA     0.00        0.00    0.00  MS ENC    0.00 
 END/S      0.00  R/S AFFIN                 0  CATEGORYB     0.00        0.00    0.00 
                                                                                                                
 ----SERVICE----   SERVICE TIME  ---APPL %---  --PROMOTED--  --DASD I/O---  ----STORAGE----  -PAGE-IN RATES- 
 IOC        2366K  CPU  720.505  CP     66.25  BLK    0.000  SSCHRT    0.2  AVG    81420.24  SINGLE      0.0 
 CPU      617333   SRB    0.223  IIPCP  64.20  ENQ    0.000  RESP      0.0  TOTAL  81421.05  BLOCK       0.0 
 MSO      154219   RCT    0.000  IIP   173.99  CRM    0.000  CONN      0.0  SHARED     0.00  SHARED      0.0 
 SRB         191   IIT    0.013  AAPCP   0.00  LCK    0.889  DISC      0.0                   HSP         0.0 
 TOT        3138K  HST    0.000  AAP      N/A  SUP    0.000  Q+PEND    0.0 
 GOAL: EXECUTION VELOCITY 70.0%     VELOCITY MIGRATION:   I/O MGMT  28.3%     INIT MGMT 28.3% 
                                                                                                                
          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  -------------- EXEC DELAYS % -----------  
 SYSTEM                    VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU                                
 S0W1        --N/A--       28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19

Key fields:

INTERVAL 05.00.003

This tells the duration of the requests.

POLICY=ETPBASE REPORT CLASS=ZOSMF PERIOD=1

This tells you this is a report class (rather than a service class) the name is zOSMF, and is for period 1 . When you have service classes which have more than one criteria , such as high priority for the first 0.5 seconds of CPU – then low priority, these will have multiple periods.

-TRANSACTIONS–
AVG 1.00
MPL 1.00
ENDED 0
END/S 0.00

This says on average there was one instance running. You can have multiple transactions or jobs in a class. Add up the total duration of all jobs/transactions and divide by the interval to get the average(AVG).

MPL (multi programming level) is an advanced topic and describes how many instances were concurrently active.

No jobs/transactions ended in this interval, with a ending rate of 0 in 5 minutes.

—APPL %—
CP 66.25
IIPCP 64.20
IIP 173.99
AAPCP 0.00
AAP N/A

This shows the percentage of CPU used over the interval

66.25 percent on GP engines
64.20 percent IIPCP is 64.20 % of GP engine was doing work that could have run on a ZIIP – if there had been spare ZIIP capacity. 66.25 – 64.20 = 2.05 of work on a GP that was not ZIIP eligible.
173.99 percent of ZIIP work running on a ZIIP engine – so nearly 2 ZIIP engines were being used
0 AAPCP – there was no ZAAP eligible work offloaded onto a GP
0 AAP there was no work running on an ZAAP

The total ZIIP used was 173.99 in ZIIP engines, +64.20 of a GP = 238 or almost 2.5 ZIIP engines worth.

It is good to run on ZIIPs where possible, because ZIIPs are cheaper ($$) than GPs, and GPs may be configured to be slower than a ZIIP.

GOAL: EXECUTION VELOCITY 70.0%

The performance goal for this work was defined as Execution Velocity of 70 %.

 
         EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM  VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU      
 S0W1    28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19

The achieved execution velocity was 28.3% against a target of 70%
The performance index was 2.5. The performance goal is goal/actual. A value of 1 or smaller is good. The value here shows the goal was not met. You need to consider
- Changing the goal for this work so the target goal is what you can achieve on a normal day
- Changing the importance of the work for when the system is constrained.
- If you change the goal for one set of work – it may impact other work, so you need to look at the system as a whole and decide which is your important work.
- Add more CPUs or ZIIPs – these may not help if the delays are not CPU… see below
Average number of address spaces in this class 1.
EXEC USING%. The figures above were for true CPU used. WLM samples activities 4 times a second. Of the samples where jobs were running or waiting for waiting for a resource.
- 8% of an CPU engine was used – this includes ZIIP work running on GP.
- 20% of a ZIIP engine
- The ratio 8:20 is similar to CPU on GP and ZIIP actually used in this period of 66.25: 173.99.
EXEC DELAYS
- The total delay was 72% = ( 100 – (8+20) “using samples” above)
- for 53% of all the samples it was was waiting for a ZIIP engine
- for 19% of all the the samples it was waiting for a GP engine.
- You can have other delays listed here, for example paging, or your program is capped to limit how much CPU it is allowed.

Once z/OSMF had started, and settled down, there were still delays for IIP (28%). To me this looks like a lumpy workload, that perhaps there is a timer which pops and runs multiple threads. There are more threads than IIPs – so some have to wait.

Reports for transactional work

I defined a transaction so I could measure the response times (and CPU used) for a service in z/OSMF. A TSO address space is started, and z/OSMF sends a client/server request to the TSO address space. The response time is sub-second so a good candidate to demonstrate WLM for a transaction.

I configured z/OSMF to have

<zosWorkloadManager collectionName=”MOPZCET”/>
<wlmClassification>
<httpClassification transactionClass=”ZCI3″ resource=”/zosmf/webispf/*/“/>
</wlmClassification>

The collection name is passed to WLM to determine the service class and report class of the work. The default is the server name.

All ISPF (with a URL of /zosmf/webispf/*) requests were classified as ZCI3.

I then used WLM to configure

a service class ZCI3 with Average response time of 00:00:00.010
a classification rule for type CB, a rule for CN=MOPZCET, and sub-rule TC = ZCI4. This gave the service class and report class.

The data in the report had

-TRANSACTIONS–
AVG 0.01
MPL 0.01
ENDED 21
END/S 0.07

21 transactions in 5 minutes is 0.07 a second.

MPL (MultiProgramming Limit is the target which represents the number of address spaces that must be in the swapped-in state for the service class period to meet its goals. I’ve never used it!

TRANS-TIME HHH.MM.SS.FFFFFF
ACTUAL               140526
EXECUTION            139950
QUEUED                  575

The average time was 0.140 seconds.

GOAL: RESPONSE TIME 000.00.00.010 AVG

That was the specification in WLM (note the specified value of 0.010 is very different to the 0.140 achieved)


          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP 
 S0W1     000.00.00.140526 66.7 14.1   0.0  0.0 N/A  18 0.0  9.1 9.1

This shows the average response time was 0.140 seconds, used 18% on a ZIIP, and waited 9% of the time for a ZIIP

To the right of the data in the report was

--- DELAY % --- 
UNK IDL CRY CNT                 
 64 0.0 0.0 0.0

Which says there was 64% of the delay was unknown. This could be

waiting for end user input
waiting for TCP/IP data
the program sent off a request and is waiting for a response.

For example the ISPF transaction in z/OSMF had sent a request to an address space running TSO. This address space processed the request and sent the response back. I am guessing that the 64% delay was waiting for TSO to process the request and send back the response.

You also get a response time profile based on the service class

                              ----------RESPONSE TIME DISTRIBUTION---------- 
   -----TIME------  # TRANS   0    10   20   30   40   50   60   70   80   90   100 
   HH.MM.SS.FFFFFF  IN BUCKET |....|....|....|....|....|....|....|....|....|....| 
<= 00.00.00.014000          0  > 
<= 00.00.00.015000          0  > 
<= 00.00.00.020000          2  >>>>>> 
<= 00.00.00.040000          5  >>>>>>>>>>>>> 
>  00.00.00.040000         14  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

This shows that out of the 21 requests, 7 were below 0.040 seconds, and 14 were over 0.040 seconds.

From the service class, it was specified as GOAL: RESPONSE TIME 000.00.00.010 AVG so this goal is very badly specified. It would be better set to average of 0.140 seconds.

I changed the service class to a goal of 0.140 seconds and activated it. After I had run some tests the output was

          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS %
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT            
 S0W1     000.00.00.097733  100  0.7   0.0  0.0 N/A  50 0.0  0.0

Which showed no delays

and a response time profile

                                ---RESPONSE TIME DISTRIBUTION--- 
    -----TIME------  --# TRANS  0    10   20   30   40   50   60
    HH.MM.SS.FFFFFF  IN BUCKET  |....|....|....|....|....|....|.
 <= 00.00.00.070000          0  > 
 <= 00.00.00.084000          5  >>>>>>>>>>>>>>> 
 <= 00.00.00.098000          9  >>>>>>>>>>>>>>>>>>>>>>>>>> 
 <= 00.00.00.112000          1  >>>> 
 <= 00.00.00.126000          0  > 
 <= 00.00.00.140000          1  >>>> 
 <= 00.00.00.154000          1  >>>> 
 <= 00.00.00.168000          0  > 
 <= 00.00.00.182000          0  > 
 <= 00.00.00.196000          0  > 
 <= 00.00.00.210000          1  >>>> 
 <= 00.00.00.280000          0  >

An average of 0.10 seconds, with some taking up to 0.210 seconds.

Real time information

You can get the information in near real time from RMF (or other monitors)

For example for processor delays

            Service  CPU  DLY USG EAppl  ----------- Holding Job(s) ---------
Jobname  CX Class    Type  %   %    %     %  Name      %  Name      %  Name 
IZUSVR1  SO STCHIM   CP     2  35 56.53   91 IZUSVR1    4 JES2MON    2 TCPIP 
                     IIP   94  95 183.1   89 IZUSVR1

This shows that job IZUSVR1

Was delayed for 2% of the time on a GP
Used 35% of the GP engines
Was delayed 94% of the time on a ZIIP
and used 95% of the available ZIIP resource
The jobs using CPU were IZUSVR1 (using 91%) JES2MON and TCPIP
The jobs using ZIIP were IZUSVR1

What to do now?

You need to identify the goals of your work, and set sensible goals. This may take several iterations. You also need to understand the priorities of the work, and userid.

Once you have configured your system to report on response times of your business critical work, you can adjust the service classes so your work achieves it goals.

Define reporting classes so you can monitor different groups of work and that they are meeting their goals.

One Minute MVS performance – Work Load Manager – background

The aim of this series of blog posts is to show you what to look for in z/OS performance and if you have a problem.

I will cover

Ive written a blog post on how to understand reports from WLM

Managing workload

40 years ago the pilot of a commercial jet had many knobs and dials to control the performance (and speed) of the aeroplane. These days computers do most of the small tuning; the pilot sets the overall goals, and the computer does the rest.

It is the same with managing workload performance on z/OS. The systems programmer used to individually adjust the performance of jobs and transactions running on z/OS. These days the systems programmer sets the overall goals and the computer does the rest.

The work on z/OS is managed by the WorkLoad Manager (WLM). The systems programmer defined goals like

these CICS transactions should run with an average response time of 1 second.
The trivial TSO commands should run in under 10 milliseconds.
Batch – I dont care… it can run in the background.

I remember one customer saying that one day when he switched on WLM (so the WLM managed the workload), he noticed that the batch workload finished early, it made every thing go faster!

This is because when the system had been manually “tuned”; the CICS transactions were finishing in under half a second (much faster than the requirements of of 1 second). WLM worked to the goals, and the CICS transactions executing with a response time of 1 second. This meant there was spare CPU, and more batch workload could be done during the day.

How to monitor work?

For short lived requests, like CICS transactions or TSO commands, the response time is the obvious metric. Typically the response time is under a second, and at the end of a minute you should have many data points to tell you if you are achieving your response time goals.

Long running jobs or started tasks may run for weeks, so the time to run the job is meaningless. It could run slower when the system is busy, or run faster when the system is lightly loaded. It needs a second by second metric to measure progress.

WLM uses the concept of how much the work can be delayed. It uses a metric called Execution velocity which in concept is “the ratio of CPU used” to “time waiting for a resource”. In simple terms

100 * CPU used /(CPU used + wait time for a resource).

WLM can periodically check this ratio and adjust the priority of the work to achieve the goals.

If the execution velocity is 100 then it is not delayed for CPU.

If the execution velocity is 1 then if the work used 1 second (or millisecond) of CPU, then it is OK for the job to be delay for I/O or waiting for CPU for 100 seconds (or milliseconds). The ratio is important – not the absolute values.

How to work out which work to dispatched?

Every 10 seconds WLM looks at the data to decide which service classes need more or less CPU

WLM looks at all the work, and if it is meeting the goals for the service class (the definition of the goals).

If all the work is within its goals pick any waiting work to dispatch
If any work is not within its goal, adjust the dispatching priorities. Start with work with Importance 1, when this is within its goals, look at work with Importance 2 etc..

What can you configure

You can configure the system with goals like

CICS transactions should take 1 second elapsed time to execute.
Quick TSO commands using less than 0.5 seconds of CPU have high velocity.
Slow TSO command using more than 0.5, but less than 5 seconds of CPU have Importance 3.
Expensive TSO commands using more than 5 seconds of CPU have low priority
Colin’s TSO userid always gets high priority regardless of the commands.
Batch jobs with this accounting information, can run with high velocity
Long batch jobs, or those batch jobs using more than 1 second of CPU, have low priority.

Work can get tracked across the system, and if WLM detects that CICS transactions are slowing down, then when the CICS issues a DB2 request in a different LPAR in the sysplex, it makes sure the request in DB2 has a high enough priority to keep the response time goals. WLM can also prioritise I/O so that the I/O for one transaction takes precedence over the I/O for a batch job.

The systems programmer creates a few broad categories of work, and specifies the goals of the service class. These service classes control the priority of work.

Use the WLM redbook for guidance on defining WLM service classes.

Service classes define the goals.

You have reporting classes for groups of similar jobs or transactions to report WLM information on these similar jobs or transactions. So although a group of work has the same service class, you can report it different ways, for example by transaction, or by userid.

You can define CICS, IMS, or Liberty as a server, and transactions/work within the server get WLM classified. So for job CICSA, the transactions PAY1,PAY2,PAY3 have high priority; for z/OSMF, userid COLIN has high priority.

What class is my work in?

You can display which service class a job is in using the D A,jobname operator command, for example it gave

WKL=STARTED SCL=STCLOM .

The Service CLass is STCLOM.

You can use SDSF DA, and use the column SrvClass. (You need to start RMF, then go into SDSF to display the Srvclass and other WLM related parameters).

You can change the service class of a job by using the operator command

RESET IZUSVR1,SRVCLASS=STCMDM

(Note which way round the letters are jobname IZUSVR1, service class SRVCLASS=…)

or, if you are authorised, overtype the field in SDSF, or from z/OSMF WLM plugin.

To change it permanently you’ll need to change the WLM definitions.

More details of how it works

I found the WLM redbook useful.

I described above that the execution velocity was 100 * CPU used /(CPU used + wait time for a resource).

The concept is correct – but the implementation is different. If my job had used 1000 seconds of CPU since it started, it is not helpful in seeing it behaviour over the last few minutes, as the execution velocity would be insensitive.

Every 250 milliseconds (4 times a second) WLM looks at every job/transaction in the system. It then updates internal control blocks for each Service Class and Report Class and increments a table.

executing – add 1 to the active (or using) CPU
transferring data to a device (connect time) add 1 to the active ( or using) I/O
waiting in z/OS to start an I/O – add one to the delayed for I/O
being paged in – add 1 to the delayed for”page in”
etc
waiting for the end user to enter data – do nothing.
waiting for TCPIP data – do nothing.

Execution velocity = 100 * (Total active samples /(Total active samples + Total delayed samples).

If during a 25 second period the transaction was

using CPU, in 20 samples,
transferring data to disk, in 10 sample
waiting to start an I/O, in 25 samples
waiting for the end user to type some data, in 45 samples.

From this we can see…

The count of active samples is 20 + 10
The count of delayed is 25 samples.
45 samples are not used.

Execution velocity = 100 * ( (20 + 10) /(20+10) + 25)) = 55 .

An execution velocity of 100 means that when ever the job was sampled, it was always either dispatched and running; or transferring data to I/O.

An execution velocity says if we expect the job to use 50 seconds of CPU, and has a velocity of 10 then we expect the job to run in about 500 seconds. If it used 50 seconds of CPU, and was transferring data (connect time) of 20 seconds, the execution velocity would be 100 * (50 + 20) /((50+ 20) + 450) = 13 % which is close enough to 10% velocity.

Real goals from my system

For TSO on my z/OS there are goals

For the first 800 service units (a systems independent measure of CPU usage)
1. 80% requests to complete within 00:00:00.30
2. Work has importance 2
After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

Execution velocity of 30
Importance 3

For started tasks with Low Priority the goals are

Discretionary – there no goals – just do your best

How long can you talk for without drawing breath?

I remember reading a book on technical writing which said you should read every report aloud to make sure the text flows, and makes sense. A comma is where you pause, (or have a side comment), and a full stop is where you finish the thought, and take a breath. If you cannot express the sentence in one breath – the sentence is too long and you should rewrite it.

If you have to read a sentence twice or more, consider rewriting it. Also remember the people reading your document may not have English as their first language. “Foreign words” take more brain power and the reader may not be entirely familiar with the word or the context used. I recently heard a conversation

“is it right to turn left here?”,
“no, right”

which maybe valid English, but it takes a moment to understand it.

Another advantage of reading it aloud is it it helps you to spot duplicated words or missing. (There was a double whoops in that sentence.)

I was reading a description in a technical report which took me several minutes to understand it. It used terms(defining some of them inline <and also had nested comments about the terms >), then using terms like “this” (when “this” could be one of several objects) (I kept thinking which this?, that this, or this this), and so I copied the text into an editor so I could break it into phrases (logical groups of thoughts) and work out what was important. See – it is hard work to understand some sentences.

I am a great believer that reading should be linear, you should not have to keep going back to reread text. I also think that people need signposts to help them through the text, along the lines of the old ~~adage~~ saying for a good presentation “tell them what you are going to tell them, tell them, tell them what you have told them”. Instead of one long rambling paragraph it might be clearer to say

There are three areas we need to consider, area1, area2, area3
Area1…
Area2…
Area3…

I have noticed also noticed that people often read the first sentence, and ignore the second part, this is where good signposting can help. This applies to emails and conversations. My wife said to me “We’ll need some of the nice bread with seeds on it, and some rolls, from the bakers, but the bakers isn’t open yet’. So off I went to the bakers to find it closed. I was busy parsing the first part of the sentence, and missed the second part.

I found with some people that if you have two questions send them as two emails not one. They read the first question, reply to it, and do not read the rest of the email.

I cut the CPU cost of doing nothing.

I was running z/OSMF and saw that the CPU costs where high when it was sitting there doing nothing. I managed to reduce the CPU costs by more than half. This would apply to other Liberty based web servers, such as MQWEB, and z/OS Connect.

I could see from the MVS system trace there was a lot of activity creating a thread, and deleting a thread, a lot of costs associated with these activities, such as allocating and freeing storage.

I increased the number of threads so that this allocating a thread and delete a thread activity disappeared.

In the xml configuration file (based from server.xml) was the default

<executor name=”LargeThreadPool” id=”default” coreThreads=”100″
maxThreads=”0″ keepAlive=”60s” stealPolicy=”STRICT”
rejectedWorkPolicy=”CALLER_RUNS” />

I changed this to

<executor name=”LargeThreadPool” id=”default”
coreThreads=”300″ maxThreads=”600″ keepAlive=”60s”
stealPolicy=”STRICT” rejectedWorkPolicy=”CALLER_RUNS” />

and restarted the server.

The options are documented here. There is an option keepAlive which defaults to 60 seconds. If a thread has been idle for this time, the thread is a candidate to be freed to reduce the pool back to corethreads size.

I was alerted to this problem when I looked at an MVS system trace. This is described here.

There is a discussion how sun thread pools work in this post. It is not obvious. This may or may not be how this executor works.

What value should you use?

This is a hard question, as Liberty does not provide this information directly.

I used the Health Checker connects from Eclipse to the JVM and extracts information about the JVM and applications.

This shows that at rest there was a lot of activity. I increased it to 250 threads and restarted the server and got

So better … but still some activity. I increased it to 300 threads, and the graph was flat.

I set up USER.Z24A.PROCLIB(CEEOPT) with

RPTSTG(ON),
RPTOPTS(ON)

in my z/OSMF job I had

//CEEOPTS DD DISP=SHR,DSN=USER.Z24A.PROCLIB(CEEOPT)

This printed out a lot of useful information about the stack and heap usage. It the bottom it said

Largest number of threads concurrently active: 397

The number of threads includes threads from the pool I had specified, plus other threads that z/OSMF creates. The health check showed there were 372 threads, event though coreThreads was set to “300”.

I also used jconsole to display information about the highest thread usage. The URL was service:jmx:rest://10.1.1.2:10443/IBMJMXConnectorREST. It displays peak threads and live threads.

Security

I found the security of both jconsole, and health check, was weak (userid and password). I was unable to successfully set up a TLS certificate logon to the server.

The information from rptstg was only available at shutdown.

Why does increasing the number of threads reduce the CPU when idle?

The thread pool has logic to remove unused threads and shrink it to the coreThreads size. If the pool size is too small it has to create threads and delete threads according to the load. See here. The keepAlive mentioned at the top is how long a thread can be idle for, before it can be considered a candidate for deletion.

Summary

Monitor the CPU used when idle and see if increasing the threadpool to 300 helps.

What’s going on? – getting performance data from a z/OS systrace

On my little z/OS system, one address space was using a lot of CPU – but doing nothing. What was going on? The address space was a z/OSMF based on the Liberty Web server.

The blog post tells you how to take a dump, and use IPCS to display useful information from the system trace. The system trace contains deep down information like

Task A was dispatched on this processor at this time.
It issued a request MVS to get a block of storage, and time.
The request completed, with this return code, at this time.
Task A was interupted at this time
Task B was dispatched

There is a lot of detailed information, and it is overwhelming when you first look at it. This blog post shows how you can get summary information from the trace – while ignoring all of the detailed, scary stuff. It does not require any prior knowledge of IPCS or dumps.

Take your dump
Go into IPCS
Systrace output

Take your dump

DUMP COMM=(COLINS DUMP)
R xx,jobname=IZUSVR1

This gives output like

IEE600I REPLY TO 01 IS;JOBNAME=IZUSVR1
IEA045I AN SVC DUMP HAS STARTED AT TIME=16.24.56 DATE=06/21/2021 044
FOR ASID (0044) QUIESCE = YES
…
IEA611I COMPLETE DUMP ON SYS1.S0W1.Z24A.DMP00004

Go into IPCS

I find it easier to use a wide(132) screen for IPCS.

This may be in your ISPF panels, or you might need to issue a command before starting ISPF. You might need to talk to your system programmer.

You get the primary menu

 ------------------- z/OS 02.04.00 IPCS PRIMARY OPTION MENU
 OPTION  ===>                                              
                                                           
    0  DEFAULTS    - Specify default dump and options      
    1  BROWSE      - Browse dump data set                  
    2  ANALYSIS    - Analyze dump contents                 
    3  UTILITY     - Perform utility functions             
    4  INVENTORY   - Inventory of problem data             
    5  SUBMIT      - Submit problem analysis job to batch  
    6  COMMAND     - Enter subcommand, CLIST or REXX exec  
    T  TUTORIAL    - Learn how to use the IPCS dialog      
    X  EXIT        - Terminate using log and list defaults

Select option 0

------------------------- IPCS Default Values ---------------------------------
 Command ===>                                                                   
                                                                                
   You may change any of the defaults listed below.  The defaults shown before  
   any changes are LOCAL.  Change scope to GLOBAL to display global defaults.   
                                                                                
   Scope   ==> LOCAL   (LOCAL, GLOBAL, or BOTH)                                 
                                                                                
   If you change the Source default, IPCS will display the current default      
   Address Space for the new source and will ignore any data entered in         
   the Address Space field.                                                     
                                                                                
   Source  ==> DSNAME('SYS1.CTRACE1')
   Address Space   ==> RBA
   Message Routing ==> NOPRINT TERMINAL NOPDS
   Message Control ==> CONFIRM VERIFY FLAG(WARNING)
   Display Content ==> NOMACHINE REMARK REQUEST NOSTORAGE SYMBOL ALIGN
                                                                                
 Press ENTER to update defaults.                                                
                                                                                
 Use the END command to exit without an update.

Replace the source with DSN(‘your dumpname’).

Change Scope from LOCAL to BOTH

Press enter to update. Use =6 on the command line to get to the IPCS command window.

Enter a free-form IPCS subcommand or a CLIST or REXX exec invocation below: 
                                                                            
===>                                                                        
                                                                            
                                                                            
----------------------- IPCS Subcommands and Abbreviations -----------------
ADDDUMP           | DROPDUMP, DROPD   | LISTDUMP, LDMP    | RENUM,    REN   
ANALYZE           | DROPMAP,  DROPM   | LISTMAP,  LMAP    | RUNCHAIN, RUNC  
ARCHECK           | DROPSYM,  DROPS   | LISTSYM,  LSYM    | SCAN            
ASCBEXIT, ASCBX   | EPTRACE           | LISTUCB,  LISTU   | SELECT          
ASMCHECK, ASMK    | EQUATE,   EQU, EQ | LITERAL           | SETDEF,   SETD  
CBFORMAT, CBF     | FIND,     F       | LPAMAP            | STACK           
CBSTAT            | FINDMOD,  FMOD    | MERGE             | STATUS,   ST    
CLOSE             | FINDUCB,  FINDU   | NAME              | SUMMARY,  SUMM  
COPYDDIR          | GTFTRACE, GTF     | NAMETOKN          | SYSTRACE        
COPYDUMP          | INTEGER           | NOTE,     N       | TCBEXIT,  TCBX  
COPYTRC           | IPCS HELP, H      | OPEN              | VERBEXIT, VERBX 
CTRACE            | LIST,     L       | PROFILE,  PROF    | WHERE,    W

If you use the command “systrace” you will see the scary internal trace. PF3 out of it.
Use the command

systrace jobname(IZUSVR1) PERFDATA(DOWHERE) time(LOCAL)

Go to the bottom of the report ( type m and press PF8) and type

REPORT VIEW

This gives you the report in an editor session, so you can exclude, delete, sort count etc.

This gives a lot of data. It is in sections, the sections are…FLocal

Summary of the dump

Analysis from 06/21/2021 16:24:46.391102 to 16:24:56.042146 9.651044 seconds

This gives the time of day, and the interval of the trace is 9.65 seconds.

Summary of CPU usage by engine

CPU  Type Pol  Park   SRB Time              TCB Time             Idle Time           
---- ---- ---- ---- --------------------- --------------------- ---------------------
0000 CP   High No       0.190562   1.974%     0.828988   8.589%     8.603271  89.143%
0001 CP   High No       0.098836   1.024%     0.393259   4.074%     9.143735  94.743%
0002 CP   High No       0.086573   0.897%     0.415063   4.300%     9.136385  94.667%
0003 zIIP High No       0.015463   0.160%     2.227832  23.083%     7.398707  76.662%
0004 zIIP High No       0.000000   0.000%     1.094373  11.339%     8.551280  88.604%
---- ---- ---- ---- --------------------- --------------------- ---------------------
                        0.391434              4.959518             42.833380

This shows

Most of the time was spent in TCB “application thread” mode (4.959 seconds of CPU) rather than SRB “system thread” mode (0.391 seconds of CPU).
One ZIIP was busy 23 % of the time, the other ZIIP was busy 11 % of the time.

Summary of CPU overall over 5 engines

 SRB time      :     0.391434 
 TCB time      :     4.959518 
 Idle time     :    42.833380 
 CPU Overhead  :     0.070886 
                 ------------ 
         Total :    48.255220

This summarises the data

4.95 seconds of CPU in TCB mode in 9.65 seconds of trace
42 seconds idle
5 engines * 9.65 seconds duration = 48.25

CPU break down by ASID/Jobname

CPU breakdown by ASID: 
                                                    
ASID Jobname    SRB Time     TCB Time    Total Time 
---- -------- ------------ ------------ ------------
0001 *MASTER*     0.011086     0.017940     0.029027
0003 RASP         0.000186     0.000000     0.000186
0005 DUMPSRV      0.035545     0.008959     0.044504
0006 XCFAS        0.021590     0.074411     0.096001
...   
0044 IZUSVR1      0.021217     3.638295     3.659513
0045 COLIN        0.000000     0.000000     0.000000
0046 RMF          0.010238     0.020204     0.030442
0047 RMFGAT       0.019961     0.160512     0.180474
              ------------ ------------ ------------
                  0.391434     4.959518     5.350953

Most of the CPU was in ASID 44 for job IZUSVR1.

Breakdown by system thread (SRB) by address space/jobname

SRB breakdown by ASID: (WHERE command bypassed for CPU usage less than 0.100000): 
                                                                                   
ASID: 0001   Jobname: *MASTER* 
IASN      SRB PSW      # of SRBs     Time 
---- ----------------- --------- ------------ 
0001 070C0000 83D0E8A0         2     0.000314 
...  
ASID: 0003   Jobname: RASP 
IASN      SRB PSW      # of SRBs     Time

Ignore this unless the SRB usage was high.

Breakdown of CPU by used thread by address space/jobname


TCB breakdown by ASID: 
                                                   
ASID Jobname  TCB Adr  # of DSPs     Time 
---- -------- -------- --------- ------------ 
0001 *MASTER* 008EDE88         1     0.000535
... 
ASID Jobname  TCB Adr  # of DSPs     Time 
---- -------- -------- --------- ------------ 
0044 IZUSVR1  008C8E88        22     0.013143 
0044 IZUSVR1  008AD7A0        30     0.006694 
0044 IZUSVR1  008B97B8        37     0.015926 
0044 IZUSVR1  008BA3E8        50     0.017547 
0044 IZUSVR1  008B2628        15     0.007748 
0044 IZUSVR1  008C4840        19     0.008433 
0044 IZUSVR1  008BD2D8        20     0.008551 
0044 IZUSVR1  008CDC68        14     0.008107 
0044 IZUSVR1  008C8328        15     0.006540 
0044 IZUSVR1  008CAC68        16     0.006612 
0044 IZUSVR1  008C9E88        14     0.006634 
0044 IZUSVR1  008B5C68        14     0.005668 
0044 IZUSVR1  008CBBE0        28     0.015650 
0044 IZUSVR1  008ADE00        17     0.005861 
0044 IZUSVR1  008B9470        15     0.006014 
0044 IZUSVR1  008BEA48        17     0.017092 
0044 IZUSVR1  008C6CF0        20     0.010093
...
0044 IZUSVR1  008CC2D8       548     0.827491 
0044 IZUSVR1  008D2E88        25     0.445230 
0044 IZUSVR1  008D2510       819     0.412526 
0044 IZUSVR1  008CEE88        14     0.158703
0044 IZUSVR1  008D3E88         8     0.003189 
0044 IZUSVR1  008C4CF0        18     0.013237 
                                 ------------ 
                                     3.638295

There were 166 TCB’s which did something in the time period.
TCB with address 008D2510 was dispatched 819 times times in 9 seconds – using 0.4 seconds of CPU! This was being dispatched 100 times a second, and used 5 milliseconds of CPU on average per dispatch. This looks high considering the system was not doing any work.
TCB with address 008d2E88 was dispatched 25 times in 9 seconds, and used 0.44 seconds of CPU or 17 ms of CPU per dispatch. This is doing more work per dispatch than the previous TCB.

Display lock usage

Lock events for CEDQ   
  None found                
Lock events for CSMF       
  None found               
Lock events for CLAT        
  None found              
Lock events for CMS         
  None found                
Lock events for OTHR        
  None found

Nothing of interest here.

Display local lock usage – locking the job

Lock events for LOCL of ASID 0010 OMVS 
Lock ASID Jobname  TCB/WEB  Type    PSW Address    IASN  Suspended at    Resumed at     Suspend Time 
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------ 
CML  0044 IZUSVR1  008C33E8 TCB  00000000_04868084 0010 16:24:49.612051 16:24:49.612076     0.000025 
CML  0044 IZUSVR1  008B4938 TCB  00000000_048687E4 0010 16:24:49.612090 16:24:49.612570     0.000480 
... 
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------ 
Suspends:      6  Contention Time:     0.000821    0.008%               WU Suspend Time:    0.000823 
                                                                                                        
Lock events for LOCL of ASID 0044 IZUSVR1 
Lock ASID Jobname  TCB/WEB  Type    PSW Address    IASN  Suspended at    Resumed at     Suspend Time 
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------ 
LOCL 0044 IZUSVR1  008D3E88 TCB  00000000_010CCD62 0044 16:24:46.404417 16:24:46.404561     0.000144 
LOCL 0044 IZUSVR1  008D3E88 TCB  00000000_010ADA78 0044 16:24:46.410188 16:24:46.412182     0.001993
Suspends:     83  Contention Time:     0.042103    0.436%               WU Suspend Time:    0.079177

The LOCal Lock (LOCL) is the MVS lock used to serialise on the address space, for example updating some MVS control blocks. For example if MVS wants to cancel an address space, it has to get the Local lock, to make sure that critical work completes.

For the OMVS addess space, address space IZUZVSR1 got the lock 6 times, and was delayed for 0.823 milliseconds waiting for the local lock.
For the IZUSVR1 address space, 83 TCBs got the local lock, and were suspended for a total of 79 milliseconds.

Display timer events (CPU Clock comparator CLKC and timer TIMR)

ASID: 0044   Jobname: IZUSVR1 
SRB/TCB  IASN   Interrupt PSW    Count   Where processing 
-------- ---- -------- -------- -------- ----------------------------------------
00000000 0044 070C0000 81D83CB8        2 IEANUC01.ASAXMPSR+00     READ/WRITE NUCLEUS 
...
008CEE88 0044 078D0401 945F2B28       11 AREA(Subpool252Key00)+CB28 EXTENDED PRIVATE 
008CEE88 0044 078D2401 945F99BE        1 AREA(Subpool252Key00)+0139BE EXTENDED PRIVATE 
008CEE88 0044 078D2401 FA63F71E        1 SPECIALNAME+03F71E     EXTENDED PRIVATE 
008CEE88 0044 078D0401 FAB00178        1 SPECIALNAME+0178     EXTENDED PRIVATE 
008CEE88 0044 078D2401 FAB447BE        1 SPECIALNAME+0447BE     EXTENDED PRIVATE 
008CEE88 0044 078D0401 FAD9E660        1 SPECIALNAME+29E660     EXTENDED PRIVATE 
008B17B8 0044 070C0000 81C92030        1 IEANUC01.IAXVP+4048     READ ONLY NUCLEUS 
008B27C0 0044 072C1001 91AF2460        1 BBGZAFSM+7520     EXTENDED CSA 
...
008D2E88 0044 078D0401 945F2B28       22 AREA(Subpool252Key00)+CB28     EXTENDED PRIVATE 
008D2E88 0044 078D0401 FB036F08        1 SPECIALNAME+036F08     EXTENDED PRIVATE 
...
008D2E88 0044 078D0401 FC145732        1 AREA(Subpool229Key00)+A732     EXTENDED PRIVATE

This displays

The TCB
The virtual adress where the interrupt occurred
These entries are in time sequence, and so we can see the second entry had 11 interupts in quick succession (count is 11).
The Where processing, is a guess at converting the address into a module name. Sometimes it works, for example Module IEANUC01, csect ASAXMPSR, offset 0. Sometimes it cannot tell, for example from Java code.

This shows 2 things

The application said wake me up in a certain time period
The TCB was executing and z/OS interrupted it because it needed to go and dispatch some other work. This gives a clue as to hot spots in the code. If the same address occurs many times – the code was executing here many times. I look in the raw systrace to see if this is a TIMer (boring) or a CLKC interesting. Interesting gives you a profile of what code is CPU intensive.
You can delete all the records outside of this block, then sort 15 32 to sort on PSW address. For my IPCS dump the address 078D0401 945F2B28 occurred 35 times.

I/O activity

 Device   SSCH Issued    I/O Occurred     Duration   
 ------ --------------- --------------- ------------ 
  0A99  16:24:48.009819 16:24:48.010064     0.000244 
  0A99  16:24:48.033619 16:24:48.033856     0.000236 
  0A99  16:24:48.051014 16:24:48.051081     0.000066 
  0A99  16:24:48.057377 16:24:48.057434     0.000056 
  0A99  16:24:48.080331 16:24:48.080430     0.000098 
                                        ------------ 
                                            0.000702 
                                                     
    Events for 0A99 :            5                   
    Quickest I/O    :     0.000056                   
    Slowest  I/O    :     0.000244                   
    Total           :     0.000702                   
    Average         :     0.000140

This says for the device 0A99 there were 5 I/O requests, total time 0.7 milliseconds

I used the REPORT VIEW to get the data into ISPF edit,
deleted all the records above the I/O section
Used X ALL
F TOTAL ALL
This shows the totals for all I/Os. Most totals were under 1 ms. One I/O was over 5 seconds.
Displaying the detailed records above this TOTAL record showed one I/O took over 5 seconds!

End of report

End of PERFDATA analysis.

Advanced topic: Look at hot spots

I had seen that PSW 078D0401 945F2B28 was hot. If you go back to the IPCS command panel, you may be able to use the command

L 945F2B28

To display the storage. This will not work. You have to remove the top bit (80), so

L 145F2B28

may work.

If the first character is a letter (A-F) then you need to put a 0 on the front for example

L 0D2345678

You might need to put the address space in as well for example

L 145F2B28 ASID ID(x’44’)

You can say

L 145F2B28 ASID ID(x’44’) LENGTH(4096)

To display large sections of storage

Dig into the trace

You can use

systrace jobname(izusvr1)tcb(x’008CC2D8′)

to display all entries for the specified TCB and jobname.

Go to the bottom ( type Max press PF8)
use the report view command to get into and edit session
Columns 79 to 88 contain a description of some of the system calls
use X ALL;f ‘ ‘ 79 84 all;del all nx to delete all lines without a description

This gives you a picture of the MVS services being used.