Introduction to Java Shared classes on z/OS

Java shared classes provides the facility for Java to store classes, and JITted classes in shared memory which persists across the restart of a Java application. You can harden this shared memory to disk, and reload it after an IPL.

If shared classes are being used, then as the JVM starts it looks for the classes in the shared cache, and if not found, loads them from the file system, processes them and stores them in the shared classes memory. With this scheme, the second start of the JVM gains from the shared classes.

See the introduction documentation and some practical documentation.

Background

To support the use of the shared memory, Java uses standard facilities for shared memory, and for semaphores (to ensure single threading of updates).

I have my shared classes in directory /u/tmp/zowec. When the shared classes are used, two files are created.

  • javasharedresources/C290M17F1A64_semaphore_zoweGW_G43L00
  • javasharedresources/C290M17F1A64_memory_zoweGW_G43L00

Listing them gives

672 -rw-r--r--   1 ZWESVUSR ZWEADMIN ... C290M17F1A64_semaphore_zoweGW_G43L00  
673 -rw-r--r-- 1 ZWESVUSR ZWEADMIN ... C290M17F1A64_memory_zoweGW_G43L00

The files are owned by userid ZWESVUSR and group ZAWADMIN from my started task userid. The semaphore file has an inode of 672 (0x2a0), and the shared memory file has an inode of 673(0x2a1).

The shared memory and semaphores use InterProcess Communication (ipc) facilities.

The id in hex of the files is used as part of the ipc key.

The command ipcs gave

Shared Memory:                                                
T ID KEY MODE OWNER GROUP
m 204805 0x6102a131 --rw------- ZWESVUSR ZWEADMIN

Semaphores:
T ID KEY MODE OWNER GROUP
s 200708 0x8102a031 --ra------- ZWESVUSR ZWEADMIN

These resources are owned by userid:ZWESVUSR, and group:ZWEADMIN. The userid and group comes from the userid that created the shared classes cache.

If you need to “clean up” you can delete these using the ipcrm command.

Why can’t Java find my class?

I had specified a classpath to load a class loading program, but my program could not be found.

I specified the Java options -XshowSettings.

The value of java.class.path had just one .jar, and did not include my classpath!

I found in some Oracle documentation

When you use -jar, the specified JAR file is the source of all user classes, and other class path settings are ignored. If you’re using JAR files, then see jar.

Simple – once you know…

How to fix it

I added my class to the Jar

/usr/lpp/javaJ21.0_64/bin/jar uf /u/tmp/zowep/components/gateway/bin/gateway-service.jar -C /u/tmp/java CCLoader.class

My class was found – and it worked!

Finding the needle in a field of hay.

As part of a debugging a problem I collected some traces. One was over 22 million lines of code! How do you extract information from it?

My trace file was trace.txt and I was looking for entries with java/lang/Object in them.

How big is the field?

The command (wc – lower case l)

wc -l trace.txt

gave me

22662732 trace.txt

22.6 million lines of output

Which part of the field?

grep -n java/lang/Object trace.txt |less

The -n option numbers the source lines.

The grep command gave me output starting with

601428:13:39:41.475442621 0x000000002174c200 j9vm.351 Entry >loader 0x0 class java/lang/Object attemptDynamicClassLoad entry

The first number is the position in the file, 601428.

Searching for “exit” (/exit) gave me

904914:13:39:44.830209269 0x000000002174c200 j9vm.319

line number 904919 in the file.

Extract some of the records

I want to ignore the first 601428-1 lines – then take up to and including line 904914 is a line count of 303486.

tail -n +601428 trace.txt |head -n 303487 |less -N

The plus sign in tail -n + says -ignore the given number of lines.

This extracts the lines of interest and displays them with line numbers ( -N option). You can use >file1 to pipe it into a file, and then use less -N file.

I then iterated, now searching for j9shr – the Java Share Classes component, until I found the couple of interesting lines.

Extract data from a record.

You can extract field, or substrings using awk. For example piping data into

awk '{print substr($0,1,100)}'

Will display only the first 100 characters in a line.

awk '{print substr($0,7,6 )    substr($0,39,140)}'

Displays part of the time stamp – start column 7 for 6 characters, and from column 39 to 140.

You can also use cut to cut give specific fields, or field ranges, or column ranges. For example

cut --fields 1,3-5 -d " "
cut --bytes 1-8,60-100

Where -d is the delimiter.

Clever stuff

The Java trace formatting gives you the option to indent the descriptions, if function p calls function q, function q is indented by 2 additional characters.

I used this, and a bit of Python code to calculate how long each request was active for and display this on the “exit” line of the function. I also saved how many times a function was used, and total time in the function.

The results were interesting – what I thought would be “slow” functions, were actually not slow.

Using a cuckoo function

I was trying to track down a problem in Java where shared memory was not being accessed as it should. This was too early in the JVM startup for trace – so how to debug it?

One option I tried was using a cuckoo function. A cuckoo is well know for laying an egg in another bird’s nest; the nest owner then treats the cuckoo’s egg as its own. I inserted some code between Java and a C function, and Java called my code.

The Java code uses the C run time function shmget() — Get a shared memory segment. I wanted to write a stub function to display parameters, and then call the system version of shmget.

To do this I had to get the Java program to use my cuckoo version of shmget.

My program

void wto( char * msg) { 
// time_t ltime;
// time(&ltime);
//printf(" the time is %s", ctime(&ltime));
struct __cons_msg cmsg;
int rc;
int cmsg_cmd = 0;
/* fill in the __cons_msg structure */
cmsg.__format.__f1.__msg = msg;
cmsg.__format.__f1.__msg_length = strlen(msg);
rc = __console(&cmsg,NULL,&cmsg_cmd);
}
// this has the same signature as the run time code
cpshmget(
key_t key, size_t size, int shmflg)

{
char buffer[200];
int i, j;
j = sprintf(buffer, ">shmget key %8.8x sz %i flg 0x%8.8x\0", key,size,shmflg);
wto(buffer);
int myresult = shmget(key,size,shmflg);
//sprintf(buffer,"<shmget1 r %i e %i e2 0x%8.8x\0",myresult, errno, __errno2());
//wto(buffer);
//myresult = shmget(key,size,shmflg);
return myresult;
}

I compiled it using a shell script (I found it easier than a make file)

name2=cp 
p1=" -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64"
p2="-Wc,DLL -D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS"
p2="-D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS"
p3="-D_OPEN_SYS_FILE_EXT -qstrict "
p4="-Wa,asa,goff -qgonumber -qenum=int"
p5="-I."
p6=""
p7="-Wc,ASM,SHOWINC,ASMLIB(//'SYS1.MACLIB') "
p8="-Wc,LIST(c.lst),SOURCE,NOWARN64,XREF -Wa,LIST,RENT"
/bin/xlc $p1 $p2 $p3 $p4 $p5 $p6 $p7 $p8 -c $name2.c -o $name2.o

It is used as an object .o code – not as an executable module.

Bind it to the Java code

I wanted to cuckoo the module libj9prt29.so .

I saved a copy of the original module libj9prt29.so as libj9prt29.so.orig. I could then include this into the binder, and not worry about any of my old code staying around.

When I needed “a clean run”, without any of my code, I copied libj9prt29.so.orig over libj9prt29.so .

The bind job

//IBMBIND JOB 1,MSGCLASS=H 
// SET BPARM='CASE=MIXED,SIZE=(900K,124K),XREF,RMODE=ANY,CALL=YES'
// SET CPARM='MAP,DYNAM=DLL,LIST=ALL,AMODE=64,TERM=YES'
//BIND EXEC PGM=IEWL,REGION=0M,PARM='&CPARM,&BPARM'
//SYSLIB DD DSN=CEE.SCEEBND2,DISP=SHR
// DD DSN=CBC.SCCNOBJ,DISP=SHR
// DD DSN=SYS1.CSSLIB,DISP=SHR
//Z8921 DD DSN=CEE.SCEELIB(CELQS003),DISP=SHR
//SYSLMOD DD PATH='/usr/lpp/java/J17.0_64/lib/default/',
// PATHDISP=(KEEP,KEEP)
//SYSPRINT DD SYSOUT=*
//SYSLIN DD *
ORDER CELQSTRT
ENTRY CELQSTRT
INCLUDE /u/tmp/java/cp.o
change shmget(cpshmget)
INCLUDE /usr/lpp/java/J17.0_64/lib/default/libj9prt29.so.orig
include Z8921
name libj9prt29.so
//

Notes:

  • You specify SYSLMOD as a directory and specify the “name” later. This was not obvious from the documentation
  • INCLUDE /u/tmp/java/cp.o copies in my code
  • change shmget(cpshmget) says any code which is included after this, cuckoo the entry point shmget to point to cpshmget
  • INCLUDE /usr/lpp/java/J17.0_64/lib/default/libj9prt29.so.orig include the original module
  • include Z8921 = CEE.SCEELIB(CELQS003) contains statements like IMPORT CODE64,CELQV003,’shmget’,280 which defines where shmget is really defined
  • name libj9prt29.so save it as this name in //SYSLMOD

This worked well – but…

I originally had

  wto(buffer); 
int myresult = shmget(key,size,shmflg);
sprintf(buffer,"<shmget1 r %i e %i e2 0x%8.8x\0",myresult, errno, __errno2());
wto(buffer);
//myresult = shmget(key,size,shmflg);

But the wto function, which invokes __console(), sets errno and __errno2. The values passed back to Java were the values from the __console function, not the shmget function.

Issuing the myresult = shmget(key,size,shmflg); a second time set the errno and __errno2 – but if the request was to create, or delete, issuing it a second time gave a different error code. For example if “delete” was specified – the first time worked, the second time gave “not found”, and this confused the Java program calling this function.

What’s going on inside my Java program?

You can use the IBM healthcenter to get information from what is running within Java.

I used Java startup option

-Xhealthcenter
-Dcom.ibm.diagnostics.healthcenter.headless=on
-Dcom.ibm.java.diagnostics.healthcenter.data.collection.level=headless
-Dcom.ibm.java.diagnostics.healthcenter.headless.files.max.size=50000000
-Dcom.ibm.java.diagnostics.healthcenter.headless.delay.start=1
-Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/u/tmp/zowe
-Dcom.ibm.diagnostics.healthcenter.readonly=on
-Dcom.ibm.diagnostics.healthcenter.logging.level=fine
-Dcom.ibm.diagnostics.healthcenter.jmx=off
-Dcom.ibm.java.diagnostics.healthcenter.agent.transport=jrmp
-Xtrace:buffers={2m,dynamic}

which are described here.

I used -Dcom.ibm.java.diagnostics.healthcenter.headless.delay.start=1 to wait for a minute before recording, so I didn’t get the Java startup information.

When I started the Java program I got out messages

com.ibm.diagnostics.healthcenter.headless INFO: Headless data collection has started 
com.ibm.diagnostics.healthcenter.headless INFO: Agent will keep last 5 hcd files
com.ibm.diagnostics.healthcenter.headless INFO: Headless collection output directory is /u/tmp/zowec

Shutdown the Java program.

I got a file healthcenter080225_160620_33620365_1.hcd in /u/tmp/zowec.

Download this in binary to your workstation.

Start Eclipse. Window-> Perspective-> Open Perspective-> Other-> Health center profiling

File-> Load Data… Select your downloaded file.

From the status window you can select different views of the data

Method profiling

This display information like

  • Under Tree%this says 18.49 of the time was spent in this class…. including other classes it called
  • Under Self% this is the amount of time spent within just this class. 9.86 of the total time was spent within this class. 15.75 of the total time was spent in just this class. It was recorded in 1617 sample
  • If you select a line, in another part of the display it shows you what called it, and what it called.

CPU

Shows the amount of CPU used over time

Classes

Shows time from start when classes were loaded, and if they were shared classes or not,

Environment

Give you the start option, and information about the JVM

Garbage collection

Gives you a profile and other information on garbage collection

IO

Gives you information about IO. My display said “Your application has opened 45 files and closed 39 files.” It did not tell me which files were opened – it just showed the 6 which were still open, so not very useful!

Using Elliptic Curves is better than RSA – but not as easy

I’ve spent a few days down some rabbit holes trying to solve a problem with an Elliptic Curve, and when I found the answer, I remember I had hit this problem before.

Using an Elliptic Curve generated with

RACDCERT ID( IZUSVR ) GENCERT + 
SUBJECTSDN(CN('CONN2') O('IBM') OU('C2PDFLT')) +
ALTNAME(IP(10.1.1.2))-
WITHLABEL('CONN2.IZUDFLT'), +
NISTECC +
SIZE(521) +

SIGNWITH(CERTAUTH LABEL('DOCZOSCAEC' )) NOTAFTER(DATE(2026/01/01))

I was getting in a Zowe trace

javax.net.ssl.SSLException: No supported CertificateVerify signature algorithm for EC key

This was with TLSv1.3.

In the TLS handshake from Chrome I could see

Supported Groups (5 groups)
Supported Group: Reserved (GREASE) (0x9a9a)
Supported Group: Unknown (0x11ec)
Supported Group: x25519 (0x001d)
Supported Group: secp256r1 (0x0017)
Supported Group: secp384r1 (0x0018)

I’m sure you spotted the problem immediately. The NISTECC SIZE(521) Elliptic Curve has a name of secp521r1, and this is not in the list. At the server, the logic checks that it’s certificate is supported by the groups sent from the client, and so reports

No supported CertificateVerify signature algorithm for EC key

I found Why is secp521r1 no longer supported in Chrome, others?. The article says that the keysize (256) is a strong as keysize(384) – so why bother with keysize(384).

I changed my certificate to keysize(256) and it worked.

What’s C.E.E.1 and which function is this address in?

I had an abend in a module created from C programs, and wanted to know which function had the problem.

In the old days, each function had an eye-catcher and a compile date at the top of each function, so it was easy to scroll upwards towards the start of the dump until you came across the eye-catcher.

And then we had XPLINK… I think the XP link stands for eXtra Performance. The output of the C compiler changed to make it the code more efficient, especially with Java functions. For example:

  • before XPLINK, a call to a function would save all of the registers, update the save area chain – call the module… return and restore the registers afterwards. Some functions were as simple as set this value to 0 – and the overhead of the call was many times the cost of the instructions. The overhead was reduced by only saving what needed to be saved, and passing more parameters in registers.
  • by moving constant data, such as character strings, out of the mainline instructions, meant that sometimes fewer pages were needed for instructions – and so fewer pages were needed in the hardware instruction cache, and so may be faster. Moving the function name and compile time into the “data page” and so into the data cache, was part of this.

The function name etc is available – just not in an obvious place.

What does the code look like?

Meta data is stored in Program Prolog Areas (PPAs)

In the listing, at the start of each XPLINK function is “C.E.E.1”

00000E70    00C300C5 00C500F1 00000080 000000A0a *.C.E.E.1.....

At offset 00000080 from 00000E70 is a block of storage (PPA1) which identifies this function.

0006D8  02           =AL1(2)            Version 
0006D9 CE =AL1(206) CEL signature
...
0006F0 0003 AL2(3),C'pit'
0006F8 FFFFF9C0 =F'-1600' Offset to Entry Point Marker
  • At offset 0 is 0x02… so you know you are in the right control block
  • At offset 2 is 0xCE
  • At offset 10 is the code length
  • At offset 18 is 2 bytes of length, followed by the function name, possibly with up to 3 bytes of padding to align the next field on a 4 byte work boundary.
  • (Sometimes,) after this, the offset will vary because the name length is variable, is a field like 0XFF… such as FFFFFA60. This is – 0x5a0 (-1440). The address of this section (the 0x02) minus this value gets you to the C.E.E.1. This does not always seem to work!

The start of the function “pit” is at x6d8 – 1600 = x6d8 -x640 = x98

At this address in the listing was

                                *  void pit() 
000098 00C300C5 DC =F'12779717' XPLink entrypoint marker
00009C 00C500F1 DC =F'12910833'
0000A0 00000640 DC =F'1600'
0000A4 00000080 DC =F'128'
0000A8 pit DS 0D
* Start of executable code
0000A8 9049 4780 STM r4,r9,1920(r4) .....

Note the value’s 1600 and -1600 tie up.

So not too difficult to find the name of the function.

Ripping the covers off Java and seeing what’s configured

I needed to see what definitions were being used by Java applications like z/OSMF and Zowe. I could not find an easy way, so I had to resort to taking a dump!

Use the Unix command ps -ef to display all of the Unix threads, and what they are running

For example

IBMUSER:/global/zosmf/configuration: >ps -ef  |grep zosmf                                                                         
OMVSKERN 50397205 83951638 - 18:36:28 ? 6:48 /usr/lpp/zosmf/liberty/lib/native/zos/s390x/bbgzsrv --clean zosmfServer
IZUSVR 50397230 50397231 - 18:36:18 ? 0:00 /usr/lpp/zosmf/liberty/lib/native/zos/s390x/bbgzangl COLD=N NAME=IZUANG1

Send a command to take a dump

kill -QUIT 50397205

This created a dump START1.JVM.IZUSVR0.D250206.T184027.X001 and a file /global/zosmf/data/logs/zosmfServer/javacore.20250206.184027.50397205.0002.txt

The .txt file is human readable, and may have all the data you need.

Extract information from the binary dump

Use the same Java as the application was using

  1. Copy the dump dataset (known as a TRANSACTION DUMP) to a Unix file, such as dump
    cp "//'START1.JVM.IZUSVR0.D250206.T184027.X001 '" dump
  2. extract a java dump from TRANSACTION DUMP using /usr/lpp/java/J11.0_64/bin/jextract dump -v, this creates a dump.zip
  3. Use the jdmpview to look at the dump.
  4. Run dump viewer /usr/lpp/java/J11.0_64/bin/jdmpview -J-Xmx768m -zip dump.zip
  5. Run info thread in jdmpview to display information about current thread java/native stack trace.

Why didn’t my Elliptic Curve certificate work all the time?

I had created a certificate on z/OS, signed it with a CA, and it seemed that I could use in in some scenarios, but not when Java was involved. Sometimes when Java was involved, the Java program just hung!

The short answer.

After several weeks on and off, I found a reference saying an EC curve needs a CA which is also an EC. Having a EC certificate with an RSA certificate in the chain can cause problems!

I changed my CA, (and distributed it to all my clients) and it worked!

I also had to increase the length of the key…

The medium length answer

The definition which worked

RACDCERT ID( IZUSVR ) GENCERT + 
SUBJECTSDN(CN('CONN2') O('IBM') OU('C2PDFLT')) +
ALTNAME(IP(10.1.1.2))-
WITHLABEL('CONN2.IZUDFLT'), +
NISTECC SIZE(521) +
SIGNWITH(CERTAUTH LABEL('DOCZOSCAEC' )) NOTAFTER(DATE(2026/01/01))
RACDCERT ALTER(LABEL('CONN2.IZUDFLT')) ID(IZUSVR) TRUST

With a CA

RACDCERT GENCERT  -                                             
CERTAUTH -
SUBJECTSDN(CN('DocZosCAECC')-
O('ECC') -
OU('CA')) -
NOTAFTER( DATE(2027-07-02 ))-
KEYUSAGE( CERTSIGN ) -
NISTECC SIZE(521) -
WITHLABEL('DOCZOSCAEC')

The long and interesting answer

My debug trail was along the path of taking a System SSL trace, formatting it, and looking at the return code – to find a blog post I had written about the same return code.

With the chrome browser you can give a url chrome://net-export, and chrome will trace the flow.

You

  • start the trace
  • run the workload
  • stop the trace and save the file
  • use “The log file can be loaded using the netlog_viewer.” Click on the link.
  • drop your file onto the tools.
  • pick events
  • I found the last line of output to my server. It was like
  • Tick the box at the front of the line.

This gave me

2222: SOCKET
https://10.1.1.2:7557 <https://10.1.1.2 same_site>
Start Time: 2025-02-04 08:49:00.638

t=10890 [st= 0] +SOCKET_ALIVE [dt=128]
--> source_dependency = 2213 (SSL_CONNECT_JOB)
t=10890 [st= 0] +TCP_CONNECT [dt=8]
--> address_list = [
"10.1.1.2:7557"
]
--> aliases = []
t=10890 [st= 0] TCP_CONNECT_ATTEMPT [dt=8]
--> address = "10.1.1.2:7557"
t=10898 [st= 8] -TCP_CONNECT
--> local_address = "10.1.0.2:51284"
--> remote_address = "10.1.1.2:7557"
t=10898 [st= 8] +SSL_CONNECT [dt=119]
t=10900 [st= 10] SSL_HANDSHAKE_MESSAGE_SENT
...
t=11016 [st=126] SSL_HANDSHAKE_MESSAGE_RECEIVED
--> bytes =
... 1]T
--> type = 15
t=11016 [st=126] SSL_CERTIFICATES_RECEIVED
--> certificates =
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

t=11017 [st=127] SOCKET_BYTES_SENT
--> byte_count = 30
t=11017 [st=127] SSL_ALERT_SENT
--> bytes =
02 2F ./
t=11017 [st=127] SSL_HANDSHAKE_ERROR
--> error_lib = 16
--> error_reason = 245
--> file = "../../third_party/boringssl/src/ssl/extensions.cc"
--> line = 451
--> net_error = -107 (ERR_SSL_PROTOCOL_ERROR)
--> ssl_error = 1

This lead me to Feature: Deprecate TLS SHA-1 server signatures. This says Chrome is removing support for signature algorithms using SHA-1 for server signatures during the TLS handshake. This does not affect SHA-1 support in server certificates, which was already removed, or in client certificates, which continues to be supported.

This didn’t help me, but a link from a link from one hit mentioned the CA also needs to be an Elliptic Curve.

Why can’t java use my key ring?

I had a problem with z/OSMF. I configured it to use an exiting keyring, but it consistently refused to use it. I had messages like

[WARNING ] CWPKI0809W: There is a failure loading the defaultKeyStore keystore. If an SSL configuration references the defaultKeyStore keystore, then the SSL configuration will fail to initialize.

This blog post covers how I debugged this situation.

What seemed strange was this only occurred when an Elliptic Curve certificate was being used – and not an RSA certificate.

Even more curiouser was the documentation mentioned access to the <ringOwner>.<ringName>.LST resource in the RDATALIB class. See here. I didn’t have this defined and yet RSA certificates would work! So curiouser and curiouser (or for the people who like correct grammar, curiouser and more curiouser).

All applications needing access to certificates and private keys use the R_datalib callable service.

The bottom line

  • z/OSMF has userid IZUSVR
  • I had a keyring and used two certificates
    • An RSA certificate, CCPKeyring.IZUDFLT, belonging to userid IZUSVR – based on the sample JCL provided by z/OSMF
    • An existing Elliptic Curve certificate NISTEC224 belonging to userid COLIN. This works else where.
  • Without <ringOwner>.<ringName>.LST defined the class(RDATALIB) the RSA certificate worked
  • Without <ringOwner>.<ringName>.LST defined the class(RDATALIB) the Elliptic Curve certificate failed
  • Once I found the problem I defined <ringOwner>.<ringName>.LST in class(RDATALIB), and gave the userid IZUSVR Update access to it – and the Elliptic curve worked
  • The reasons (being wise after the event)
    • R_datalib checks access on one profile in the RDATALIB class first – <ringowner>.<ringname>.LST. If there is none, it will fall back to check on two profiles in the FACILITY class – IRR.DIGTCERT.LISTRING and IRR.DIGTCERT.GENCERT. If the certificate is not owned by the accessing ID (except CERTAUTH or SITE), RDATALIB class has to be used for private key access.
    • This is true for the RSA certificate, used the IRRDIGTCERT.LISTRING class(FACILITY) and had access. So this worked.
    • For the Elliptic Curve, the caller’s userid (IZUSVR) is not the associated with the certificate (COLIN) so this fails, and the logic drops through to the RDATALIB checking.
    • The caller’s user ID has READ or UPDATE authority to the ..LST resource in the RDATALIB class. READ access enables retrieving one’s own private key, UPDATE access enables retrieving other’s. The ring did not exist, and so this access was not given.

How did I debug this? – Using Java trace

Adding configuration to z/OSMF

I copied /global/zosmf/configuration/local_override.cfg to /global/zosmf/configuration/local_override.colin

I edited/global/zosmf/configuration/local_override.cfg and changes the JVM options line to

JVM_OPTIONS=”-Xoptionsfile=’/global/zosmf/configuration/local_override.colin'”

I edited the local_override.colin, deleted all but the JVM options line, then split the line at \n so it looks like

-Dcom.ibm.ws.classloading.tcclLockWaitTimeMillis=300000
-Xscmx150M
-Xquickstart

Add debug information to the configuraton file

I added

-Djava.security.auth.debug=pkcs11keystore
-Dlog.level=Error

The output

[err] Jan 17, 2025 8:18:52 AM com.ibm.crypto.ibmjcehybrid.provider.HybridRACFKeyStore engineLoad 
TRACE: Loading keyring CCPKeyring.IZUDFLT as a JCECCARACFKS type keystore.
...
[err] Jan 17, 2025 8:19:02 AM com.ibm.crypto.hdwrCCA.provider.RACFInputStream getEntry
FINER: The private key of NISTEC224 is not available or no authority to access the private key
[err] Jan 17, 2025 8:19:02 AM com.ibm.crypto.ibmjcehybrid.provider.HybridRACFKeyStore engineLoad
TRACE: Error loading and storing certificates and key material from underlying JCECCARACFKS keyring CCPKeyring.IZUDFLT
java.io.IOException: The private key of NISTEC224 is not available or no authority to access the private key . This can be expected if the IBMJCECCA is not setup correctly or
ICSF is down. Will now attempt to load the keyring as a JCERACFKS keyring.

Which is not a very helpful message.

How did I debug this? – Using RACF trace

R_datalib is the callable service to ALL the exploiters which need access to a RACF keyring (certificates and private keys). It is r_datalib or its alias irrsdl00 with callable type number 41.

Enable the RACF trace

#SET TRACE(CALLABLE(TYPE(41))JOBNAME(IZU*))

Start GTF

S GTF.GTF,M=GTFRACF

This reported

IEF403I GTF - STARTED - TIME=08.17.03                                  
IEF188I PROBLEM PROGRAM ATTRIBUTES ASSIGNED
AHL121I TRACE OPTION INPUT INDICATED FROM MEMBER GTFRACF OF PDS
USER.Z24C.PROCLIB
TRACE=USRP
USR=(F44)
END
AHL103I TRACE OPTIONS SELECTED --USR=(F44)
AHL906I THE OUTPUT BLOCK SIZE OF 27998 WILL BE USED FOR OUTPUT 702
DATA SETS:
SYS1.TRACE

I started z/OSMF until it failed.

Stop GTF

p GTF 
AHL006I GTF ACKNOWLEDGES STOP COMMAND
AHL904I THE FOLLOWING TRACE DATASETS CONTAIN TRACE DATA :
SYS1.TRACE

Use IPCS to look at the dump, using command GTF USR(ALL). Go to the bottom of the output, use the command report view. This gives an ISPF edit session.

  • x all
  • f ‘RACF Reason code:’ all
    • You are interested in the non zero codes. “Label” each line of interest using the line prefix command .a, .b etc.
  • reset
  • loc .a
    • This will position you by the labelled line. Look up the RACF return and reason codes here. I had Reason Code 2c, which is decimal 44. Look for the keyring, or other information. I do not know which data tells you which sub operation r_datalib was doing, but for me it had the keyring name “CCPKeyring.IZUDFLT “. The description in the reason code documentation does not cover the situation of not having update access to the keyring, so I’ve raised a doc comment on it.