Using SSH to get to z/OS

What is SSH?

SSH is Secure SHell. It allows you to securely logon to a remote Unix-like shell.

SSH has little in common with SSL or TSH. For example you cannot keep “certificates” in z/OS keyrings. (The documentation says you can – but it is talking about something else).

SSH uses a different protocol and certificate to TLS – you cannot use TLS certificate for SSH encryption and authentication because they have different formats.

Basic use

You can issue

ssh colin@10.1.1.2

and this will set up a secure session to the host 10.1.1.2 with the userid colin. By default it will prompt for a password. if you copy a certificate to the server, you can do password-less logons.

The first time you set up a connection you get asked for additional information (along the lines of “are you sure you want to connect to this system“). It stores information so it knows when you reuse the address.

To get out of a remote session command prompt use exit .

Configuring the server

Ive written about configuring the SSH Daemon on z/OS, here.

Different ways of using SSH

Entering the ssh command and the password, may be acceptable in many cases. It many cases, such as within a shell script, you do not want to enter the password. There are several ways of doing this

Enter the password as part of the ssh command. The command and password can be seen in the history file, and over the shoulder, so is not secure.
Store the password in a file, read the file and pass the password to the command. For example use sshpass.
Use keys. You create a key on your client machine, copy the key to userid(s) on the server. When you connect with the key, it checks the userid has the same key; if so it does not need a password.
Use signed certificates. This make administration much easier (well, different). You create a key, and get an SSH Certificate Authority to generate a certificate which includes your public key, the userids it applies to, and other information such as validity dates. The server has just a copy of CA’s public key. When you send your certificate to the server. the CA’s public key is used to validate it, and use it. The server has no additional work to do.

If you use a pass phrase for a key you have the same problem. How do you enter the passphrase when using a script; so do not specify a pass phrase.

You need to ensure that the password file , passphrase, and key are secure – such as only the owner can read it.

You can store command information in ~/.ssh/config. For example

# simple ssh command
Host 10.1.0.3
        HostName 10.1.0.3
        User colin

# ssh command using certificate and keys
Host 10.1.1.2
        HostName 10.1.1.2
        User ibmuser
        IdentitiesOnly yes
        IdentityFile /home/colinpaice/ssl/ssh/colin.key
        CertificateFile /home/colinpaice/ssl/ssh/colin.key-cert.pub

# ssh command for using a key        
Host ss
        HostName 10.1.1.2
        User adcda
        IdentitiesOnly yes
        IdentityFile /home/colinpaice/ssl/ssh/colin.selfsigned

If I use

ssh 10.1.0.3 it will use the first definition and user colin
ssh 10.1.1.2 it will logon to userid ibmuser, use the key in the colin.key, and the (signed) certificate in colin.key-cert.pub
ssh ss it will logon with userid adcda using the colin.selfsigned file. Userid adcda on the server needs a copy of the colin.selfsigned file.

Using plain ol’ SSH with a password

You need do no special setup for this.

Using keys

You need to create the keys once, then use them in future.

You can specify different encryption techniques, for example ed25519, dsa, and rsa. It defaults to rsa-sha2-512.

On Linux create the user certificate ssh-keygen -t ed25519

it prompts

Enter file in which to save the key (/home/colin/.ssh/id_ed25519):to save

it also creates ~/.ssh/id_ed25519.pub .

You need to copy the .pub file to the server. You can use

ssh-copy-id ibmuser@10.1.1.2

to copy the public key(s) to the userid (ibmuser). It will prompt for the userid’s password.

To use this file use the command

ssh ibmuser@10.1.12

You can explicitly say which keyfile to use. You can specify -f name on the ssh-keygen, and -i name on the ssh-copy and ssh commands to create and use a file name of your chosing.

The command

ssh -Q HostKeyAlgorithms

gives a list

ssh-ed25519                                      
ssh-ed25519-cert-v01@openssh.com                 
sk-ssh-ed25519@openssh.com                       
sk-ssh-ed25519-cert-v01@openssh.com              
ssh-rsa                                          
rsa-sha2-256                                     
rsa-sha2-512

I do not know if this is a prioritised list, but the ssh-ed25519 certificate was chosen for the handshake when I had an rsa and ed25519 certificates.

If you want to be able to logon to multiple userids issue the ssh-copy and ssh commands for each userid.

With this you will not need a password to logon to the server. You may have entered the password as part of the ssh-copy-id command, or copied the file to the userid, so it assumes you have access to the userids’ files.

Note: even if you change the password on the server, you can still logon using the key.

To stop someone(ibmuser) using the key – remove it from the /u/ibmuser/.ssh/authorized_keys file on the server. There could be several lines in the file. At the end of each line in the file is client userid@system. For my client it was colinpaice@colinpaice . For example

ssh-ed25519 AAAAC3NzaC1...NY3Xpp50OeHB colinpaice@colinpaice
ssh-ed25519 AAAAC3NzaC1...Txwd2NxlrKKZ colin@ColinNew

This file needs limit access (0600), for example

+ ls -ltr .ssh/authorized_keys
-rw-------   1 COLIN    SYS1        2256 Dec 18 08:21 .ssh/authorized_keys

If the logon without a password fails, use ssh -v colin@10.1.12

On the client, you can list the keys in ~/.ssh/known_hosts2 that a client has for a server using

ssh-keygen -F 10.1.1.2

where 10.1.1.2 is the server name.

Using certificates

When you create a certificate the key is signed by the CA. You can also add information such as validity dates, and add a list of userids this certificate can be used for with no password. I think this is a security exposure, as when you sign the certificate you give a list of userid. This action is out of the control of the z/OS systems programmer.

Even if you change the password on the back end, the logon will work – unless the userid is revoked.

Wikibooks has a good article on certificates.

Logically there are three machines involved in this

An isolated machine, which has the CA private certificate. Certificates are sent to this machine for signing and returning.
My client machine – for me this is running Ubuntu Linux.
The server machine – this is z/OS

The steps I took were

On the isolated CA machine create a Certificate Authority. The command ssh-keygen -t ed25519 -f ~/.ssh/user_ca_key -C ‘User Certificate Authority for *.example.com’ created files
- /home/colinpaice/.ssh/user_ca_key.pub
- /home/colinpaice/.ssh/user_ca_key
On z/OS I created the file /etc/ssh/user_ca_key.pub and copied the user_ca_key.pub file from Linux into it – Using cut and paste.
Make the z/OS file universal read
- chmod 644 /etc/ssh/user_ca_key.pub
On z/OS update /etc/ssh/sshd_config and add the following (to point to the file):
- TrustedUserCAKeys /etc/ssh/trusted_user_ca_key
On z/OS restart SSHD
- C SSHD3
- S SSHD
On Linux create the user certificate ssh-keygen -t ed25519 -f colin.key. This creates files
- colin.key
- colin.key.pub. This contains data like ssh-ed25519 AAAAC3Nz…OeHB colinpaice@colinpaice
Send the .pub file to the CA machine
On the CA machine issue ssh-keygen -s ~/.ssh/user_ca_key -I ‘colin’log -z ‘0002’ -n colin,joe colin.key.pub Where
- -I colinlog this is the value which is logged. For example on z/OS, when using the certificate; the SSHD log file had
  - Sep 10 13:11:40 S0W1 sshd[50397213]: Accepted certificate ID “colinlog” (serial 0) signed by ED25519 CA SHA256:s…TA via /etc/ssh/user_ca_key.pub
- -z ‘0002’ you can specify a serial number, or omit this
- -n colin,joe a list of userids within the certificate. If you want to logon to z/OS userid userid colin or joe you will not be asked for a password.
This creates colin.key-cert.pub. Send this file back to the requester.
Connect to z/OS. On Linux
- ssh -o CertificateFile=colin.key-cert.pub -i colin.key colin@10.1.1.2
You can store the configuration information in ~/.ssh/config

Host 10.1.1.2
        Hostname 10.1.1.2 
        User colin
        IdentitiesOnly yes
        IdentityFile /home/colinpaice/ssl/ssh/colin.key
        CertificateFile /home/colinpaice/ssl/ssh/colin.key-cert.pub

Where

Host is the nickname
Hostname is the address to use
User is the userid to logon to at the remote machine (z/OS)
IdentityFile is the private key for my Linux userid
CertificateFile is the signed certificate sent to the server.

You can then use ssh 10.1.1.2 which will pick up the other parameters from the .ssh/config file.

This will get you into a OMVS session. Use exit to leave.

Another way of doing it.

You can use ssh to copy the key around.

Generate a key if (you do not have one)

Look in ~/.ssh for a file with extension .pub

ssh-keygen -t rsa 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/colinpaice/.ssh/id_rsa): 
/home/colinpaice/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/colinpaice/.ssh/id_rsa
Your public key has been saved in /home/colinpaice/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:7S50/Zf8Q7J2VLH71v2WBB2KEVFwyZn21aeI1Tzhk8c colinpaice@colinpaice

You can specify where to put the private and public keys by using -f /u/colin/.ssh/mykey

Copy the key to z/OS

Specify the public key file, and the target userid and destination

ssh-copy-id -i /home/colinpaice/.ssh/id_rsa.pub colin@10.1.1.2

Connect without a password

ssh -i /home/colinpaice/.ssh/id_rsa.pub colin@10.1.1.2

If I did not specify the file name, I was prompted for the password.

Configuring RSEAPI on z/OS to use TLS

The RSEAPI server is the Apache Tomcat server plus RSEAPI specific stuff. If you know how to configure Tomcat, you know most of what you need. The Tomcat customising is documented here.

This post follows on from Getting REST to work into z/OS. I was unclear at first how to correctly specify overrides. I’ve blogged an article Passing parameters to Java program to show how some parameters are specified as RSEAPI_KEYSTORE_FILE=… and other parameters are specified as -Djava.protocol.handler.pkgs=…

See Java Parameters for how I configured RSEAPI to be able to flip configuration options.

Update your level of Java

I had various problems getting TLS to work with RSEAPI.

TLSv1.3 was not supported on the level of Java V8 I originally had.
I had to override the /etc/zexpl/java.security file so that it understood keyrings of the format safkeyring://START1/MQRING

When I refreshed the level of Java (to SR8 FP6 dated June 2023), things worked much better. I would recommend getting a level of Java shipped within the last year.

I tested this by changing rseapi.env to include

export JAVA_HOME="/usr/lpp/java/new/J8.0_64" 
export LIBPATH="$JAVA_HOME/bin:$JAVA_HOME/bin/classic:"$LIBPATH 
export PATH="$JAVA_HOME/bin:"$PATH

Without this I got various Java problems, such as an unresolved dependency.

Getting RSE to work with TLS was not trivial

The original version of RSEAPI was v1.0.5 (see /usr/lpp/IBM/rseapi/tomcat.base/bin/current_version.txt) Another version is available in GITHUB with a version of v1.1.0 created 7 July 2022 which I worked with.

TLS configuration changes

RSEAPI supports only one port. To use TLS change the procedure to use SECURE=’true’, (or override it at startup).

The RESAPI proc has the location of the configuration files. Mine says /etc/zexpl.

The main file to edit is /etc/zexpl/rseapi.env . The sample has a lot of commented out statements. I added at the bottom

RSEAPI_KEYSTORE_FILE="safkeyring://START1/MQRING " 
RSEAPI_KEYSTORE_TYPE="JCERACFKS" 
RSEAPI_KEYSTORE_PASS="password" 
RSEAPI_USING_ATTLS=false 
RSEAPI_SSL_ENABLED_PROTOCOLS=TLSv1.2

Which server certificate to use?

By default, the first key read from the keystore will be used. See certificateKeyAlias in Apache Tomcat configuration.

To change this I edited /usr/lpp/IBM/rseapi/tomcat.base/conf/sserver.xml and added

 <Connector port="${port.http}" protocol="${http.protocol}" 
   certificateKeyAlias="${serverCert}" 
   ...
/>

I then edited /etc/zexpl/rseapi.env and added

d3=" -DserverCert=NISTECCTEST" 
JAVA_OPTS="$d3 " 
CATALINA_OPTS=$JAVA_OPTS 
export JAVA_OPTS 
export CATALINA_OPTS

Using TLSv1.3

I blogged configuring Java to support TLSV1.3 a separate post TLS 1.3, Java and z/OS.

The additional RSEAPI specific configuration was

RSEAPI_HTTP_PROTOCOL=HTTP/1.1 
RSEAPI_SSL_ENABLED_PROTOCOLS=TLSv1.3 
RSEAPI_SSL_CIPHERS=TLS_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_CHACHA_POLY1305_SHA256 
RSEAPI_KEYSTORE_FILE="safkeyring://START1/MQRING " 
RSEAPI_KEYSTORE_TYPE="JCERACFKS" 
RSEAPI_KEYSTORE_PASS="password" 
RSEAPI_USING_ATTLS=false

You can use RSEAPI_SSL_ENABLED_PROTOCOLS=TLSv1.3,TLSV1.2 to get both TLS 1.2 and 1.3 support.

ClientAuth Support

The web browser (Tomcat) has support for requiring clients to specify a certificate as part of the TLS handshake.

I got this to work by editing /usr/lpp/IBM/rseapi/tomcat.base/conf/sserver.xml, and changing clientAuth=”false” to clientAuth=”{$clientAuth}”

<Connector port="${port.http}" protocol="${http.protocol}" 
     clientAuth="${clientAuth}" sslProtocol="TLS" 
...
>

and setting the value to “required” or specifying the value in the startup options:

d1="-DclientAuth=required"
t1=" -Djavax.net.ssl.trustStoreType=JCERACFKS" 
t2=" -Djavax.net.ssl.trustStore=safkeyring://START1/MQRING"
JAVA_OPTS=" $d1 $t1 $t2" 
CATALINA_OPTS=$JAVA_OPTS  
export JAVA_OPTS 
export CATALINA_OPTS

The …trustStoreType and … trustStore provide the defaults if non are specified in the sserver.xml.

Use of keystore and trust store

The use of a trust store to store the CA certificates, and any self signed certificates is recommended. The keystore then contains just the private keys needed by the server. This means you can have one trust store per LPAR, which saves administratio.

If you use a combined trust key and trust store, and this is shared by applications, then applications may get access to private certificates used by other application, so is not as secure.

The tomcat documentation describes the truststore* parameters. These are in the in <Connector…. within file /usr/lpp/IBM/rseapi/tomcat.base/conf/sserver.xml .

For example

<SSLHostConfig 
       protocols="${ssl.enabled.protocols}"> 
       <Certificate type="EC" 
          certificateKeyAlias="NISTECCTEST" 
          certificateKeystoreFile="${keystoreFile}" 
          certificateKeystorePassword="${keystorePass}" 
          truststoreType="${trustStoreType}"                     
          truststoreFile="${trustStoreFile}"
          truststorePassword="${trustStorePass}"
      /> 
</SSLHostConfig>

and specify -Dxx where xx is the value in ${xx} such as -DtrustStoreType=”JCERACFKS” . You can hard code the values.

Setup problems

I had a variety of problems. Most were solved by going to a newer level of Java or RSEAPI. For example earlier versions did not support TLSv1.3

Authority issue

I got the message

Caused by: java.lang.IllegalArgumentException: The private key of NEWTECCTEST is not available or no authority to access the private key

This was caused by the certificate belonged to a userid START1, but I was running RSEAPI on userid STCRSE. For userid STCRSE to be able to access the private certificate part of the certificate of another userid’s certificate, the STCRSE userid need UPDATE access to the keyring profile.

My keyring was safkeyring://START1/MQRING. I needed

permit START1.MQRING.LST   class(RDATALIB) access(update) id(STCRSE)
setropts raclist(rdatalib) refresh

Getting REST to work into z/OS with RSEAPI

I was asked if there was a REST API into z/OS, to enable a Python program to work with z/OS files.

The answer is yes, and it is pretty easy to set up and get working.

z/OS Explorer and z/OS ZOWE, have a REST interface into z/OS. For example with z/OS explorer you can use the VS Code to edit files on z/OS.

You can just use the server; you do not have to use z/OS explorer or ZOWE.

Remote System Explorer API (RSEAPI) is some RSEAPI specify code on top of Apache Tomcat web server. The customising is documented here.

See Configuring RSEAPI on z/OS to use TLS.

Which program/stc to use?

I found two Remote System Explorer (RSE) servers on my z/OS

RSED (dated on my system 2016)
RSEAPI (dated on my system 2020).

RSED used an internal interface, and is there for backwards compatibility.

RSEAPI is strategic with a REST API. It uses the Apache Tomcat Java web server.

The notes below are how I got RSEAPI to work on z/OS, and run my REST request into z/OS. I was running on z/PDT where the product was installed in HUH100.* libraries, but the system was only partially configured.

There are at least two versions of RSEAPI.

v1.0.5 from 2021 only support Java V8 – and you should use a recent fix pack for Java.
v1.1.0 from 2022 supports Java V8 and Java V11. You should use recent fix packs for these, as earlier ones do not have the latest TLS support.

I found it easier to use a current level of Java.

Basic setup

Mount the file system

The REST server is started with the RSEAPI started task.

The file system was not mounted. Use the TSO command

mount filesystem('HUH100.ZFS') mountpoint('/usr/lpp/IBM/rseapi/')       
type(ZFS) mode(read)

You can update your BPXPRMxx to include the same statements.

Start RSEAPI

The set up had mostly been done on my system, I just had to start it.

S RSEAPI,SECURE='false'

SECURE=’false’ says do not use TLS.

This starts several subtasks, including Java. It took over 1 minute for it to accept a connection and over 200 seconds before it was fully up, and able to respond to requests. The time to start is typical of starting a Java Server on my little z/OS running on zPDT on my Linux machine. On real hardware it takes just seconds so I’ve been told.

Once it had started the response time was ok.

Stopping RSEAPI

Within the STDOUT from the RSEAPI was

Registering MVS Console Listener for job RSEAPI6

To stop RSEAPI you have to use “P RSEAPI6”. Once Java had started successfully, it took less than 30 seconds to shut down. If Java was still starting up, it will not shutdown until Java has finished starting, so I tended to cancel the RSEAPI job (cancel RSEAPI6).

Changing the configuration

While exploring RSEAPI, I needed to change the configuration, for example using Java shared classes to improve start up time.

Some configuration is done using RSE specific environment variables in /etc/zexpl/rseapi.env, such as

RSE specific parameters

RSEAPI_KEYSTORE_FILE="safkeyring://START1/MQRING "

The level of Java

I changed the level of Java using

export JAVA_HOME="/usr/lpp/java/new/J8.0_64" 
export LIBPATH="$JAVA_HOME/bin:$JAVA_HOME/bin/classic:"$LIBPATH 
export PATH="$JAVA_HOME/bin:"$PATH

Java parameters

I added some Java specific parameters.

d1=" -verbose:dynload,class " 
d1="" 
d2=" -Dlog.level=INFO "                                                                                
JAVA_OPTS=" $d1 $d2  " 
CATALINA_OPTS=$JAVA_OPTS 
export JAVA_OPTS 
export CATALINA_OPTS

I built up a big list of variables and added them to the JAVA_OPTS, for example

JAVA_OPTS= “$d1 $d2 $d3 $p1 $p2” .

In the above example d1 is blank, and is not passed to Java. If I reorder the two d1 statements I can easily change the configuration, and later change it back again.

Reading the error logs

I had various problems getting TLS working. One hiccup was that Java writes error messages to //STDERR – in ASCII! and so is not easily read. I changed this to

//STDERR   DD PATH='/var/zexpl/logs/rseapi_6800.1/stderr',                   
//            PATHOPTS=(OWRONLY,OCREAT,OTRUNC),         
//            PATHMODE=SIRWXU

Normally this file is empty. You can use date and time in the file name

// SET PATH='/var/zexpl/logs/rseapi_6800.1' 
//RSEAPI   EXEC PGM=BPXBATSL,REGION=0M,TIME=NOLIMIT, 
//            PARM='PGM &HOME./tomcat.base/start.sh' 
//STDOUT   DD SYSOUT=* 
//STDERR DD PATH='&PATH/stderr.D&YYMMDD..T&HHMMSS', 
//        PATHOPTS=(OWRONLY,OCREAT,OTRUNC),PATHMODE=SIRWXU
//STDERR   DD SYSOUT=* 
//CEEOPTS DD * 
RPTSTG(ON) 
/* 
//STDENV   DD *,SYMBOLS=(JCLONLY) 
_BPXK_AUTOCVT=ON 
...

To look at the output I used the omvs command

oedit /var/zexpl/logs/rseapi_6800.1/

which lists the contents of the directory, then used E to edit stderr – it displays EBCDIC text, or EA to display the file in ASCII – for the Java stuff.

The TLS support writes messages to the same (/var/zexpl/logs/rseapi_6800.1/) directory. Files have format description.yyyy-mm-dd

The files of interest

catalina.2023-08-07 has information from Java about problems with TLS.
localhost_access.2023-08-07 shows the request and the return code such as “GET /rseapi/api/v1/datasets/COLIN.D%2A/list HTTP/1.1″ 401 437

Enhanced startup messages

By specifying

-Dlog.level=finer

I got useful information in stderr and catalina….log files. For example

Server version name:   Apache Tomcat/10.0.23 
Server built:          Jul 14 2022 08:16:11 UTC 
Server version number: 10.0.23.0 
OS Name:               z/OS 
OS Version:            02.04.00 
Architecture:          s390x 
Java Home:             /Z24C/usr/lpp/java/J8.8_64/J8.0_64 
JVM Version:           8.0.8.6 - pmz6480sr8fp6-20230601_01(SR8 FP6) 
JVM Vendor:            IBM Corporation 
CATALINA_BASE:         /u/ibmuser/aaa/tomcat.base 
CATALINA_HOME:         /u/ibmuser/aaa/tomcat.home 
...
Command line argument: -Duser.dir=/S0W1/tmp 
Command line argument: -Dlog.level=FINER

Using the browser interface

The URL http://10.1.1.2:6800/rseapi/api-docs/ displays a Swagger page, where you can try out the different commands, for example list dataset names, or display a member.

http: because I have not enabled https yet
10.1.1.2 is the address of my z/OS image
6800 is the port
/rseapi/api-docs/ is the URL to display the swagger documentation.

This gave me

Expand the MVS Datasets and it gives a list of option, including

I expanded the GET to get all dataset names matching the filter. I clicked on Try it out. I entered a High Level Qualifier, and selected execute. The first time the session issues a request it prompts for userid and password. It returns with the data about my data sets, and the strings

curl: curl -X GET “http://10.1.1.2:6800/rseapi/api/v1/datasets/COLIN/list” -H “accept: application/json”
Request Url: http://10.1.1.2:6800/rseapi/api/v1/datasets/COLIN/list

This is the information I need to issue a curl request.

For one of the operations I got

HTTP Status 401 – Unauthorized

This is because the userid using the service did not have a R/W home directory. I sometimes got

ICH408I USER(COLIN ) GROUP(SYS1 ) NAME(COLIN PAICE)
/u/.rseapi CL(DIRACC ) FID(…)
INSUFFICIENT AUTHORITY TO MKDIR
ACCESS INTENT(-W-) ACCESS ALLOWED(GROUP R-X)

Using the curl interface.

I used the shell script

trace="-v"
url='http://10.1.1.2:6800/rseapi/api/v1/datasets/COLIN.ZL*/list'
curl $trace  --config  curlapi.config $url --user "colin:xxxxxxxx"

and the configuration file curlapi.config

--header "accept: application/json"
--header "Accept-Encoding: gzip, deflate"
--header "Accept-Language: en-GB,en-US;q=0.9,en;q=0.8"
--header "Connection: keep-alive"

or combining them

head='--header "accept: application/json"  '
head2='--header "Accept-Encoding: gzip, deflate"'
head3='--header "Accept-Language: en-GB,en-US;q=0.9,en;q=0.8" '
head4='--header "Connection: keep-alive" '
url='http://10.1.1.2:6800/rseapi/api/v1/datasets/COLIN.ZL*/list'
curl $trace  --user "colin:xxxxxxxx"  $head $head2 $head3 $head4 $url

The output body was

{"items": [{
  "name": "COLIN.ZLOGON.CLIST",
  "migrated": false
* Connection #5 to host 10.1.1.2 left intact
}]}

This took about 2 seconds to process one file name. It took 7 seconds to process 300 file names.

Processing multiple requests from CURL

There is an overhead setting up the connection. You can issue multiple requests from CURL, so this connection is done once, and is faster than doing multiple CURL requests.

The examples below are for TLS session

I used a shell script

rsecurl.sh

trace="-v"
tls="--cert  ./$name.pem:password --key $name.key.pem --cacert doczosca.pem --tlsv1.2" 
post="GET"
user='--user colin:xxxxxxx'
curl $trace -X $post $tls  --config  curlapi.config $user -H@curlapi.headers

curlapi.headers

Accept: application/json
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Authorization: Basic Y29saW46cFFu3Gh2MDI=
Cache-Control: no-cache
Connection: keep-alive
Dnt: 1
Pragma: no-cache

curlapi.config

This has the two requests – with a different URL. The -o directs the output to a file

-o ./COLIN.LIST
url = "https://10.1.1.2:6800/rseapi/api/v1/datasets/COLIN.D%2A/list"

-o ./ADCD.LIST
url = "https://10.1.1.2:6800/rseapi/api/v1/datasets/ADCD.%2A/list"

The script ran and created COLIN.LIST and ADCD.LIST.

Using Python to issue a REST request

The Python code below issues two requests.

home = "/home/colinpaice/ssl/ssl2/"
ca=home+"doczosca.pem"
cert=home+"docec384.pem"
key=home+"docec384.key.pem"
cookie=home+"cookie.jar.txt"
# url="https://10.1.1.2:6800/rseapi/api/v1/datasets/COLIN.D%2A/list"

buffer = BytesIO()
c = pycurl.Curl()
dir(c)
print("C=",c)
try:
  c.setopt(c.URL, "https://10.1.1.2:6800/rseapi/api/v1/datasets/COLIN.Z%2A/list")
  c.setopt(c.WRITEDATA, buffer)
  c.setopt(pycurl.CAINFO, ca)
  c.setopt(pycurl.CAPATH, "") 
  c.setopt(pycurl.SSLKEY, key)
  c.setopt(pycurl.SSLCERT, cert)
  c.setopt(pycurl.COOKIE,cookie)
  c.setopt(pycurl.COOKIEJAR,cookie)
  c.setopt(pycurl.SSLKEYPASSWD , "password") 
  c.setopt(c.HEADERFUNCTION, header_function)
  c.setopt(pycurl.HTTPHEADER, ['Accept: application/json'])
  c.setopt(c.USERPWD, 'colin:xxxxxxxx')
  c.setopt(pycurl.VERBOSE, True)
  c.perform()
  body = buffer.getvalue()
  print(body.decode('iso-8859-1'))
# now a second one 
  c.setopt(c.URL, "https://10.1.1.2:6800/rseapi/api/v1/datasets/ADCD.*/list")
  c.perform()
  body = buffer.getvalue()
  print(body.decode('iso-8859-1'))
  print("==================")
  c.close()
except Exception as e:
  print("exception :",e  )
finally:
    print("ok")

This gave the data in JSON format. The c.setopt(pycurl.VERBOSE, True) gave

C= <pycurl.Curl object at 0x55cc87355170>
*   Trying 10.1.1.2:6800...
* Connected to 10.1.1.2 (10.1.1.2) port 6800 (#0)
* found 1 certificates in /home/colinpaice/ssl/ssl2/doczosca.pem
* found 0 certificates in 
* GnuTLS ciphers: NORMAL:-ARCFOUR-128:-CTYPE-ALL:+CTYPE-X509:-VERS-SSL3.0
* ALPN, offering h2
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_ECDSA_AES_256_GCM_SHA384
*   server certificate verification OK
*   server certificate status verification SKIPPED
*   common name: 10.1.1.2 (matched)
*   server certificate expiration date OK
*   server certificate activation date OK
*   certificate public key: EC/ECDSA
*   certificate version: #3
*   subject: O=NISTECCTEST,OU=SSS,CN=10.1.1.2
*   start date: Sun, 02 Jul 2023 00:00:00 GMT
*   expire date: Tue, 02 Jul 2024 23:59:59 GMT
*   issuer: O=COLIN,OU=CA,CN=DocZosCA
* ALPN, server did not agree to a protocol
* Server auth using Basic with user 'colin'

Which may be useful when trying to debug TLS problems.

CEE3501S The module libpython3.8.so was not found.

Running some Python programs on z/OS I got the above error when using Python 11.

If seems that when the C code was compiled, an option (which I cannot find documented) says make it downward compatible.

The fix is easy…

Mount the Python 11 file system r/w (this is a one of)
cd /u/ibmuser/python/v3r11/lib or what every library you are using
ln -s libpython3.11.so libpython3.8.so
This says… if you are looking for libpython3.8.so … go and use libpython3.11.so.

The command ls -ltr /u/ibmuser/python/v3r11/lib/libpython* gave

-rwxr-xr-x ... Jul 15 12:09 /u/ibmuser/python/v3r11/lib/libpython3.11.so                     
lrwxrwxrwx ... Sep 6 12:11  /u/ibmuser/python/v3r11/lib/libpython3.8.so -> libpython3.11.so

How to take (and process) a RACF GTF trace with Java

When trying to resolve a certificate problem in a Java program, see here, I tried unsuccessfully to take a RACF trace to see what calls were being issued, and what reason codes were being returned.

The RACF GTF had no entries for the Java program!

Start RACF trace

My started task was called OZUSRV4. I had to specify a jobname to RACF trace of OZUSRV4* because Java spawns address spaces, and it was a spawned address space that did all of the Java work. If your started task is 8 characters long – just specify the 8 character name.

The trace command was the RACF SET TRACE command, where # is my RACF subsystem recognition character.

#SET TRACE(CALLABLE(TYPE(41))JOBNAME(OZUSVR4*))

Where type(41) is for IRRSDL00 which performs the R_datalib, keyring processing.

Start GTF

S GTF.GTF
R 1,trace=usrp
R 2,USR=(F44) 
R 3,END
R 4,U

Run the test

I ran my started task, and stopped the RACF trace

#SET TRACE(CALLABLE(NONE))JOBNAME(OZUSVR4*)) 
#set list

The output of the #set list command included

TRACE OPTIONS                   - NOIMAGE                                    
                                - NOAPPC                                     
                                - NOSYSTEMSSL                                
                                - NORRSF                                     
                                - NORACROUTE                                 
                                - NOCALLABLE                                 
                                - NOPDCALLABLE                               
                                - NODATABASE                                 
                                - NOGENERICANCHOR                            
                                - NOASID                                     
                                - JOBNAME                                    
                                   OZUSVR4*                                  
                                - NOCLASS                                    
                                - NOUSERID                                   
SUBSYSTEM USERID                - START1

So the traces are off…. but it still has a reference to OZUSVR4 – strange.

Process the GTF file.

I used IPCS to look at the GTF file

=0 and specify the GTF file name
=6 dropd to drop any saved status from last time that dataset was used
gtf usr(all) It displays the output in an editor like window.
report view displays it in ISPF editor, view mod.
You can the do things like
- x all
- f ‘RACF Reason code’ all

To display the records with non zero return codes.

The output is very chatty – and it was hard to find the data I wanted from data with a hex dump of the string “OFFSET” etc. For example

Trace Identifier:             00000036                           
Record Eyecatcher:            RTRACE                             
Trace Type:                   OMVSPRE                            
Ending Sequence:              ........                           
Calling address:              00000000  79403A2D                 
Requestor/Subsystem:          ........  ........                 
Primary jobname:              OZUSVR44                           
Primary asid:                 00000035                           
Primary ACEEP:                00000000  008FC8A0                 
Home jobname:                 OZUSVR44                           
Home asid:                    00000035                           
Home ACEEP:                   00000000  008FC8A0                 
Task address:                 00000000  008CF298                 
Task ACEEP:                   00000000  00000000                 
Time:                         DDD4C11D  776E2A40                 
Error class:                  ........                           
Service number:               00000029                           
RACF Return code:             00000000                           
RACF Reason code:             00000000                           
Return area address:          00000000  00000000                 
Parameter count:              0000002B    
...                       
Area length:                  00000008                                                                                
                                                                                                                      
Area value:                                                                                                  
D6C6C6E2  C5E30050                               | OFFSET.&                         |  
                                                                                                                      
Area length:                  00000007                                                                                
                                                                                                                      
Area value:                                                                                                           
06E2E3C1  D9E3F1                                 | .START1                          |

I wrote a REXX exec which post processes the output and removes what I think is irrelevant data.

An example of what I think is useful is below. Non zero return codes have ! in column 1

! Return code: 00000008 8 
! Reason code: 00000004 4  4 Parameter list error occurred. 
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
! Return code: 00000008 8 
! Reason code: 0000002C 44 44 No certificate found with the specified status 
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Area value: 
00000050  10AFC67C  ...
...
  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Area value:          | .START1                          | 
06E2E3C1  D9E3F1                                                
Area value:          | .MQRING                          | 
06D4D8D9  C9D5C7

You can download the rexx exec from

racfgtf Download

You need to upload it to a CLIST available to ISPF.

Solving certificate problems in Java on z/OS

I spent many any hour trying to understand why z/OSMF was getting a message saying certificate not found in keyring, when it was always there when I checked it.

I tried Java trace options but they did not help. I have my own Java program, and that gave me a message from IRRSDL00 (the callable service to access keyrings). But when I did a RACF GTF trace to get see what was going on I got no entries in the trace. Weird. Once I solved the problems, the solution was obvious.

My Java program reported

java.io.IOException: The private key of NEWTECCTEST is not available or no authority to access the private key

z/OSMF report

[ERROR ] CWPKI0024E: The NISTECCTEST certificate alias specified by the attribute serverKeyAlias is either not found in KeyStore safkeyring://START1/MQRING or it is invalid.

The problem and the solution

The message The private key … is not available or no authority to access the private key. Has a hint as to the problem. The documentation is hidden away. It was not as bad as

It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

but it is not easy to find. It says

Applications can call the R_datalib callable service (IRRSDL00) to extract the private keys from certain certificates after they have access to the key ring. A private key is returned only when the following conditions are met:

For RACF real key rings:

User certificates An application can extract the private key from a user certificate if the following conditions are met:

The certificate is connected to the key ring with the PERSONAL usage option.

One of the following two conditions is true:

The caller’s user ID is the user ID associated with the certificate if the access to the key ring is through the checking on IRR.DIGTCERT.LISTRING in the FACILITY CLASS, or

The caller’s user ID has READ or UPDATE authority to the <ringOwner>.<ringName>.LST resource in the RDATALIB class. READ access enables retrieving one’s own private key, UPDATE access enables retrieving other’s.

I had a keyring START1.MQRING and the start task userid had read access to it. Within the keyring was the certificate NISTECCTEST owner by userid START1. The started task userid needs UPDATE access to the keyring to be able to access the private key belonging to a different userid.

Reasons for “not found” reason code

Under the covers the callable server IRRSDL00 is called. The reason code are documented here. You might get SAF return code 8, RACF return code 8, RACF reason code 44.

The certificate was not in the keyring
It was NOTRUST
It had expired
The CA for the certificate was not in the keyring,
The userid did not have update access to the keyring when there are private certificates from other userids.

Java and z/OS keyrings – unlocking the puzzle

I had been trying to follow some instructions on how to use a z/OS keyring in a Java application, but I was having lots of error messages and little success. Searching the internet did not help because most of the documentation on the internet does not tell you what version of Java they were using; but fortunately I managed to stumble on the magic combination of options which worked.

I ended up with one of my configuration files with information for Java 8 and Java 11, which I had to clean up.

There are several pieces of the puzzle, all which have to be right.

Recent Java versions were better than old ones. Java V8 SR8 FP6 from 2023 was better than Java V8 SR6 FP19 from 2020
The java.security file; this contains the master set of security definitions for the system. Your application can override options within this file, or use a totally different file
Which jar file to use.
Which classes to use – specifying the high level qualifier
How you specify your keyring for example safkeyring://START1/MQRING, or safkeyringjce://START1/MQRING
Which Java start up overrides your Java program uses.

What worked for me

On Java 8 the following are typical default values

-Djava.protocol.handler.pkgs=com.ibm.crypto.provider
-Djavax.net.ssl.keyStoreType=JCERACFKS
-Djavax.net.ssl.keyStore=safkeyring://START1/MQRING

Because I was using Apache Tomcat web server, some of the definitions were taken from the server.xml file, and override the -Djavax.net.ssl.keyStoreType and -Djavax.net.ssl.keyStore parameters.

RSEAPI_KEYSTORE_FILE="safkeyring://START1/MQRING " 
RSEAPI_KEYSTORE_TYPE="JCERACFKS"

The Java security file.

The default file is /usr/lpp/java/J8.0/lib/security/java.security or /usr/lpp/java/J8.0_64/lib/security/java.security depending on which JVM you are using (31 bit or 64 bit).

You can override this using

-Djava.security.properties==/etc/zexpl/java.security to specify which file to use instead of the default file.
-Djava.security.properties=/etc/zexpl/java.security for entries in the specified file to override the entries in the default file.

The content of interest is like

security.provider.1=com.ibm.jsse2.IBMJSSEProvider2
security.provider.2=com.ibm.crypto.provider.IBMJCE
security.provider.3=com.ibm.crypto.plus.provider.IBMJCEPlus
security.provider.4=com.ibm.security.jgss.IBMJGSSProvider
security.provider.5=com.ibm.security.cert.IBMCertPath
security.provider.6=com.ibm.security.sasl.IBMSASL

I’ll explain what this means below.

Which classes – the high level qualifier

This is defined like -Djava.protocol.handler.pkgs=com.ibm.crypto.provider

This usage is explained below.

How Java loads things

My explanation below may be wrong – but it should give the concepts.

By some magic Java know which classes are in which jars. (Java may look in every jar at startup).
My keying is defined as safkeyring://START1/MQRING.
I have -Djava.protocol.handler.pkgs=com.ibm.crypto.provider
Java combines this information and looks through the classes for com.ibm.crypto.provider.safkeyring.Handler class.
It finds the class in /usr/lpp/java/J8.0/lib/ext/ibmjceprovider.jar

If there are more than one potential class, it take the in the Jar file in the security.provider.n list.

When I used –Djava.protocol.handler.pkgs=com.ibm.crypto.hdwrCCA.provider, it searched for class com.ibm.crypto.hdwrCCA.provider.safkeyring.Handler which was in /usr/lpp/java/J8.0_64/lib/ext/ibmjcecca.jar .

When I used -Djava.protocol.handler.pkgs=com.ibm.crypto.hdwrCCA.provider, initially I got an ICSF security violation ( which was good news as it showed I was using ICSF). Once I fixed the security problem I got a Java exception

org.apache.catalina.LifecycleException: Protocol handler initialization failed 
Caused by: java.lang.IllegalArgumentException: Invalid keystore format 
Caused by: java.io.IOException: Invalid keystore format

There is documentation but I could not get round this problem

If I stopped ICSF (p CSF) I got Java exceptions

java.security.NoSuchAlgorithmException: no such algorithm: EC for provider IBMJCECCA

What did not work for me

IBMJCEPlus

There was a reference which said in order to use TLS V1.3 you need to use IBMJCEPlus.

I overrode the java.security file by using

security.provider.1=com.ibm.crypto.plus.provider.IBMJCEPlus

but this gave a Java exception

java.security.ProviderException: Could not load dependent library

which looks like it could not load a load module (.so) from the file system.

Upgrading the Java fixpack worked for me

I upgraded the level of Java V8 from 2020 to a 2023 version, going from fix pack16 to fix pack 25, and lots of problems went away.

Java 11

Java 11 has different provider of the keyring support.

security.provider.1=OpenJCEPlus
security.provider.2=IBMZSecurity
security.provider.3=SUN
security.provider.4=SunRsaSign
security.provider.5=SunEC
security.provider.6=SunJSSE
security.provider.7=SunJCE
security.provider.8=SunJGSS
security.provider.9=SunSASL
security.provider.10=XMLDSig
security.provider.11=SunPCSC

I did not try to use these, as the application I was using did not support Java 11.

There is some documentation on Java 11 and security.

Where are my omvs address spaces?

I was running a Java program in batch, and it started an OMVS address space to run the Java. When Java stopped, I could not find the OMVS output – because I was looking in the wrong place!

The program I was trying to run was RSEAPI. When this starts it creates other jobs with jobnames like RSEAPI6.

Display the jobs

In SDSF it had

JOBNAME  StepName ProcStep JobID    Owner  
RSEAPI6  STEP1             STC06719 STCRSE 
RSEAPI1  STEP1             STC06722 STCRSE 
RSEAPI   RSEAPI   RSEAPI   STC06728 STCRSE

The jobid of the RSEAPIn jobs are lower than the value for RSEAPI, this is because the address spaces were reused.

Shutdown or cancel

I cancelled RSEAPI6 and the other jobs stopped as well.

If you look in the spool for RSEAPI* it only showed job RSEAPI

Where are the other jobs?

There are system address spaces BPXAS. If your program issues a spawn or fork, it runs the work one of these address spaces. When the work request finishes, the address space stays running and becomes available for other work.

If you were hoping to find End of Step SMF statistics displayed (such as CPU and IO counts), these will be displayed when the BPXAS address space shuts down, and the figures are for all work which ran in that address space.

Purging the BPXAS job output

If you display the BPXAS jobs, it shows it is PROTected. This stops the casual end user from purging it. You have to add PROT to the command, for example $PS6723,PROT

Interacting with these address spaces

I tried to set CEE run time options to display the run time storage options, and to set heap size etc. I could not find how to do this.

LPA for Unix System Services .so modules

I discovered a shared library facility for loading OMVS modules (for example .so) into storage which can be shared by all OMVS address spaces.

This is an unusual blog, in that I’ve written about a topic – then say “do not use it!

Phil Wakelyn of CICS strategy said….

The UNIX shared library region, this is not something we currently recommend – see this doc, and its now disabled by default in CICS at the JVM level using the variable

_BPXK_DISABLE_SHLIB=NO

The reasons for this is that that:

It has negligible difference on JVM startup time
It has a substantial negative impact on virtual storage below the bar, as storage is allocated from the high private area in MB chunks for each library (dll), so there is lots of wasted space and you can quickly use up 100MB or more. This has been a large customer support issue for CICS customers where space is very tight below the bar and MVS private storage SOS conditions are fatal.
Shared libraries are loaded once per address space, so are cached at the address space level

Where are modules stored?

They are stored in a common area below the 31 bit line – so taking storage from all regions.

If you issue the command

D OMVS,L

it gives

BPXO051I 10.32.34 DISPLAY OMVS 293                              
OMVS     0010 ACTIVE             OMVS=(00,01,BP,IZ,RZ,BB,ZW,PY) 
SYSTEM WIDE LIMITS:         LIMMSG=NONE                         
                  CURRENT  HIGHWATER     SYSTEM                 
                    USAGE      USAGE      LIMIT                 
...               
SHRLIBRGNSIZE    58720256   65011712   67108864   
SHRLIBMAXPAGES          0          0     409600 *

I made it larger than the default (4096) by using the operator command

SETOMVS SHRLIBMAXPAGES=409600

How do you load modules into the shared region?

You need to set the extended attribute +l for example

extattr +l /usr/lpp/java/J8.8_64/J8.0_64/bin/j9vm/libdbgwrapper80.so

When this module is loaded, it will be loaded into the shared region – if there is enough space.

Be careful how you reference the modules

You should use a consistent reference to files. For example

/usr/lpp/java/J8.8_64/libj9a2e.so is the same file as /Z24C/usr/lpp/java/J8.8_64/libj9a2e.so, but the shared library will treat them as two different objects, and load both of them. This will waste space.

How do you unload a module from the shared region

I reset the attribute using extattr -l. This did not unload it. When it was next loaded, it appeared to be unloaded from the shared region, and the file on disk was used.

How do you know what is in the shared region?

There is no IBM answer. There is a rexx exec OMVS command written by an IBMer

The syntax is

wjsigshl -p

This gives output

Usage  Meg Used-Unused-Pgs Pathname 
    1    1       8     248 647857ED xxx/libracfimp.so 
...
    1   25    6349      51 647857EC xxx/compressedrefs/libj9jit29.so 
...
 Total Storage (Meg)             60 
 Total Module (Pages)         10146 
 Total Unused (Pages)          5214 
 Total Module Count              28

I changed /Z24C/usr/lpp/java/J8.8_64/J8.0_64/lib/s390x to xxx so the output would fit within the area.

Each modules gets storage in multiples of 1MB, so you waste space with small objects.

The used pages does not tie up directly with the size of the object on disk. For example for libjgsk8iccs_64.so

using ls -ltr the size is 913408
from the shared region mapping information is it 379648

It may be that there is information in the disk version which is not needed once the module is loaded.

How much space do I need?

I set the +l attribute for all of the .so objects in Java V8 SR 8. When I ran my Java program (which uses TLS) there were 28 modules loaded, and 60 MB of data used.

How do I turn it off for a job?

For your job, you can use the environment variable

_BPXK_DISABLE_SHLIB=YES

The documentation says

System shared libraries are disabled. When loading a program with the system shared library extended attribute (st_sharelib), the attribute is ignored and the program is loaded into the caller’s private storage. The _BPXK_DISABLE_SHLIB setting is propagated on both fork and spawn from the parent to the child process.

End words

Now you know how it works – do not use it.
I asked on one of the z/OS news groups if anyone used this facility and unusually I got no replies. It looks like it is not used in the z/OS community.

Turbo start your Java program on z/OS and save a bucket of CPU

This blog post follows on from Some of the mysteries of Java shared classes and gives some CPU figures.

This should help you with any of the Java applications running on z/OS, such as z/OSMF, z/OS Connect, MQWEB, RSEAPI, and ZOWE.

I ran the scenarios on z/OS on zPDT running on my Ubuntu Linux machine, and so the figures are nothing like you may expect on a real z/OS machine – but my figures should show you the potential.

Topics covered:

Overview of Java shared classes
Measurements
Scenarios
Analysis of the results
- Observation
Setting up to use the shared classes
- Strange behaviour
Where do you harden the cache to?
What happens if I change my Java program?
What happens internally?
Should I use .class files or package the .class files into a .jar files?
Should I use of BPXBATCH or BPXBATSL?
Problems I experienced while setting this up.

Overview of Java shared classes

With Java shared classes support, as a Java program starts, and reads the jar and class files and also copies them into memory somewhere. Successive start can use the in memory copy and avoid the read for disk and initial processing.

You can save the in-memory copy to disk, and restore this disk copy to memory, for example across IPLs.

Measurements

I measured the CPU user from the address space once the system was started

The Java program provides a high level trace. I note the time difference between the first message and the “I am up” message

Scenarios

I used three scenarios

IPL and start Java program with no share classes
Enable shared classes
IPL and restore the shared classes, and start the program

No shared classes

Scenario	CPU	Duration seconds
First run after IPL	394	172
Second run	425	183

Enable shared classes

I enabled shared classes by using the Java option

-Xshareclasses:verbose,name=rseapi,cachedir=/tmp/,groupAccess,nonpersistent,nonfatal,cacheDirPerm=0777″

Scenario	CPU	Duration seconds
run after shared classes enabled	500	200
Second run after shared classes enabled	292	116
Third run after shared classes enabled	251	81

IPL and restore snapshot

Scenario	CPU	Duration seconds
First run after (IPL and restore snapshot )	274	99
Second run	272	121
Third run	279	116
Fourth run	264	111

Analysis of the results

Using the shared classes saved CPU in the region of 25% and reduced the elapsed time by about a half.

The first time the Java program runs and creates the shared class data has a higher CPU cost, and increased elapsed time. The savings of CPU and elapsed time when the shared cache is reused outweighs this one time cost.

Observation

It appears that each time you restart using shared classes the CPU drops. I think this is due to the optimisation being done on the classes, but it may be some totally different effect – or it may just be co-incidence!

Setting up to use the shared classes

I added two job steps to my Java program JCL

Before – restore the share classes cache from the backup copy

// EXPORT SYMLIST=* 
// SET J='/usr/lpp/java/J8.8_64/J8.0_64/bin' 
// SET C='/tmp/' 
// SET N='rseapi' 
// SET V='restoreFromSnapshot'
// SET Q='cacheDirPerm=0777,groupAccess' 
//RESTORE  EXEC PGM=BPXBATCH,REGION=0M,PARMDD=PARMDD 
//PARMDD  DD *,SYMBOLS=(JCLONLY) 
SH &J/java -Xshareclasses:cacheDir=&C,name=&N,&V,&Q 
/*

If the in-memory cache exists you get message

JVMSHRC726E Non-persistent shared cache “rseapi” already exists. It cannot be restored from the snapshot.

After – save the shared class cache to disk

// SET V='snapshotCache' 
// SET J='/usr/lpp/java/J8.8_64/J8.0_64/bin' 
//SAVECAC  EXEC PGM=BPXBATCH,REGION=0M, 
//   PARM='SH &J/java -Xshareclasses:cacheDir=&C,name=&N,&V' 
//STDERR   DD   SYSOUT=* 
//STDOUT   DD   SYSOUT=*

Strange behaviour

By using the startup option -verbose:class,dynload you can get information about the classes as they are loaded.

When not using shared classes, there were records saying <Loaded ….. and giving durations of the loads etc.

When using shared classes there were still a few instances of <Loaded… . I could not find out why some classes were read from disk , and the rest were read from the shared classes cache.

If we could fix these, then the startup would be even faster!

After some investigation I can explain some of the strange behaviour.

When a jar is first used there is a <Loaded… for the class that requested the jar.
A class like <Loaded sun/reflect/GeneratedMethodAccessor1 with a number at the end gets a <Loader… entry.
Some other classes in a jar file get loaded using <Loader… though they do not look any different to classes which are loaded from the shared cache!

All in all, very strange.

Where do you harden the cache to?

By default the cache is saved to /tmp. As /tmp is often cleared at IPL, this means the cache will not exist across IPLs. You may wish to save it in an instance specific location such as /var/myprogram.

What happens if I change my Java program?

I had a small test program which I recompiled, and created the jar file. The Java source was

public class hw   { 
  public static void main(String[] args) throws Exception { 
    System.out.println("This will be printed"); 
    System.out.println("HELLo" )  ; 
    CPUtil.print(); // this prints Util.line 10 
    hw2.print(); 
  } 
}

When I reran the program the output contained

JVMSHRC169I Change detected in /u/adcd/hw.jar... 
  ...marked 3 cached classes stale 
class load: sun/launcher/LauncherHelper$FXHelper from: .../lib/rt.jar 
<Loaded CPUtil> 
<  Class size 427; ROM size 416; debug size 0> 
<  Read time 4 usec; Load time 108 usec; Translate time 595 usec> 
class load: CPUtil from: file:/u/adcd/hw.jar 
Output from CPUtil.line 10 
<Loaded hw2> 
<  Class size 386; ROM size 368; debug size 0> 
<  Read time 3 usec; Load time 107 usec; Translate time 635 usec> 
class load: hw2 from: file:/u/adcd/hw.jar

Where you can see output from my program is intermixed with the loader activity.

What happens internally

From the previous topic, it seems that Java has to read the files on disk for example to spot that a class has changed. This may just be a matter of reading the time stamp of the file on disk,or it may go into the file itself.

Should I use .class files or package the .class files into a .jar files?

This will be a hand waving type answer. Generally the answer is use a .jar file.

Use one .jar file	Use multiple .class files
One directory access and one security access check should reduce the CPU usage.	Multiple directory access and multiple security checks are required.
Reading one large file may be faster than reading many smaller files. An I/O has “set-up I/O”, “transfer data”, “shutdown I/O” there is one set-up and one shutdown.	Each file I/O has set-up and shutdown time as well as the transfer time and is generally slower than processing bigger files. (Think about large block sizes for data sets).
The .jar files are compressed so there is less data to transfer. The decompression of the jar file takes CPU.	Files do not need to be decompressed
For integrity reasons you can have your .jar file cryptographically signed.	You cannot sign .class files.

Should I use of BPXBATCH or BPXBATSL?

In the Tomcat script for starting the web server it issued

exec "/usr/lpp/java/J8.8_64/J8.0_64/bin/java" ...  &

The & makes it run in the background. As I was running this as a started task, this seemed unnecessary and removed the &.

I also used EXEC PGM=BPXBATSL instead of EXEC PGM=BPXBATCH

The combination of both reduced the start time significantly!

I had to specify environment variable _BPX_SPAWN_SCRIPT=YES to be able to run the script. Without it I got

BPXM047I BPXBATCH FAILED BECAUSE SPAWN (BPX1SPN) OF … FAILED WITH RETURN CODE 00000082 REASON CODE 0B1B0C27

Problems I experienced while setting this up.

Group access

When restoring from a snapshot I used

java -Xshareclasses:cacheDir=/tmp,name=rseapi’,restoreFromSnapshot’, cacheDirPerm=0777,groupAccess’

Which worked.

When I omitted the group Access I had the following messages in stderr of my Java program.

JVMSHRC020E An error has occurred while opening semaphore 
JVMSHRC336E Port layer error code = -197358 
JVMSHRC337E Platform error message: semget : EDC5111I Permission denied. 
JVMSHRC028E Permission Denied 
JVMSHRC670I Error recovery: attempting to use shared cache in readonly mode if the shared memory region exists, in response to "-Xshareclasses:nonfatal" option.                                                                                                                      
JVMSHRC659E An error has occurred while opening shared memory 
JVMSHRC336E Port layer error code = -393966 
JVMSHRC337E Platform error message: shmget : EDC5111I Permission denied. 
JVMSHRC028E Permission Denied 
JVMSHRC627I Recreation of shared memory control file is not allowed when running in read-only mode. 
JVMSHRC840E Failed to start up the shared cache. 
JVMSHRC686I Failed to startup shared class cache. Continue without using it as -Xshareclasses:nonfatal is specified c

The OMVS command ipcs -m gave

>ipcs -m
IPC status as of Mon Aug 21 17:33:54 2023
Shared Memory:
T ID KEY MODE OWNER GROUP
m 8196 0x6100c70e --rw-rw---- OMVSKERN SYS1
m 8197 0x6100c30e --rw------- OMVSKERN STCGROUP

When the correct group access was specified the ipcs -m command gave

>ipcs -m
IPC status as of Mon Aug 21 17:38:40 2023                         
Shared Memory:                                                    
T         ID     KEY        MODE       OWNER    GROUP             
m       8196 0x6100c70e --rw-rw---- OMVSKERN     SYS1             
m      73733 0x6100c30e --rw-rw---- OMVSKERN STCGROUP

and the group mode has values -rw.

Wrong owner

I submitted a job to run Java which created the shared cache. I then tried running the same program using a started task with a different userid.

The cache on disk had access

-rw-rw----   1 COLIN    SYS1          32 Aug 25 11:05 C290M4F1A64_semaphore_zosmf_G41L00       
-rw-rw----   1 COLIN    SYS1          40 Aug 25 11:05 C290M4F1A64_memory_zosmf_G41L00

But my started task was running with a different userid and group.

I got messages

JVMSHRC684E An error has occurred while opening semaphore. Control file could not be locked.         
JVMSHRC336E Port layer error code = -102                                                             
JVMSHRC337E Platform error message: EDC5111I Permission denied. (errno2=0xEF076015)                  
JVMSHRC028E Permission Denied

I delete the cache entries, and restarted the started task. I also added another step to the started task to issue snapshotCache.