z/OSMF autostart: how to stop it, and how to use it (or not)

I upgraded my z/OS from ADCD Z24A to ADCD Z24C. This has updates to lots of the software, including z/OSMF. This includes some performance fixes, so z/OSMF start up is much quicker and uses much less CPU. However the newer level of ADCD Z24C now starts z/OSMF automatically. It took a few attempts to stop this.

When z/OS starts, it takes configuration parameters from IEASYSxx. You can see which IEASYSxx you are using with the DISPLAY IPLINFO operator command. You can see which IZU parameter you are using with

d iplinfo,izu
IEE255I SYSTEM PARAMETER ‘IZU’: AS

With the DISPLAY PARMLIB command, you get the parmlib concatenation

D PARMLIB
IEE251I 08.34.02 PARMLIB DISPLAY
PARMLIB DATA SETS SPECIFIED AT IPL
ENTRY FLAGS VOLUME DATA SET
    1   S   C4CFG1 USER.Z24C.PARMLIB
    2   S   C4CFG1 FEU.Z24C.PARMLIB
    3   S   C4SYS1 ADCD.Z24C.PARMLIB
    4   S   C4RES1 SYS1.PARMLIB

Where the ‘S’ means it came from a LOADxx parameter. A ‘D’ means Default SYS1.PARMLIB.

Look in each data set in turn for the IZUPRMxx member (xx=AS in my case).

Contents of the IZUPRMxx member

Within the member is SERVER_PROC(‘IZUSVR1’) This tells the IPL code which server to start.

Within the member is line with AUTOSTART(…). The value can be

  • CONNECT – I think of this as AUTOSTART(NO)
  • LOCAL – I think of this as AUTOSTART(MAYBE)

See here for a discussion.

It is a bit more complex than YES|NO. It has capability to allow one of a group of z/OSMF servers to start.

If you have AUTOSTART(CONNECT) specify AUTOSTART_GROUP(NONE).

If you have AUTOSTART(LOCAL) and AUTOSTART_GROUP(COLIN) for more than one IZU servers. Then at IPL it checks to see if a Z/OSM server with AUTOSTART(LOCAL) and AUTOSTART_GROUP(COLIN) is already active. If so – the instance does not start.

The documentation says it checks by having an ENQ on the file system with the AUTOSTART_GROUP value. This implies you need the z/OSMF data directories to be on the same ZFS file system.

Should I use autostart?

This is a tough question. I cannot test it because I only have one LPAR, but I have some thoughts.

Single LPAR, single Z/OSMF instance

This is relatively easy. You can start z/OSMF automatically though commands at IPL, or you can use the z/OSMF IZUPRMxx method, or start it manually.

Multiple LPARs in a sysplex, single Z/OSMF instance.

If you have a shared file system, you can start the z/OSMF instance on any LPAR. If you start the instance more than once, it detects this and will only allow one instance to be active.

You have to plan to be able to starting an instance on different systems. For example the IP address and port for the base system will be different. You’ll need to set up a TCP/IP environment to support this. See HA Liberty web server – introduction to using VIPA to provide high availability connectivity and the z/OSMF documentation

Multiple LPARs in a sysplex, multiple z/OSMF instances.

This is where the autostart may be useful. The first LPAR to be started will start the z/OSMF instance. When other LPARs start, they detect that another z/OSMF in the group is active, and will not start the z/OSMF instance. As with starting a single z/OSMF instance in a multi LPAR environment, you need to plan the connectivity. See HA Liberty web server – introduction to using VIPA to provide high availability connectivity and the z/OSMF documentation.

I struggle to see why starting just one instance is useful. For availability I would want more than once instances to be running at the same time. With only one instance. If you stop it, and restart on a different LPAR, you have a period of a minute or more where you do not have z/OSMF running.

I would have a group_token, so each instance can register the “group name” is active. An application can then ask to be notified when a member of the group becomes active, using standard z/OS services.

Stateless z/OSMF instances

If you are using z/OSMF facilities which save state, the autostart of just one server will not work. For example if you are using any workflow facilities, state is saved in the file system. You need to logon to the same instance to be able to continue working on the workflow. If today you run on LPARA’s z/OSMF and tomorrow you run on LPARB’s z/OSMF you cannot do your workflow.

You need to plan your z/OSMF usage and plan to have “stateless” z/OSMF servers which can use AUTOSTART; and workflow servers – for which you have only one instance (which can be moved around) and do not use autostart.

One Minute MVS – tuning stack and heap pools

These days many applications use a stack and heap to manage storage used by an application. For C and Cobol programs on z/OS these use the C run time facilities. As Java uses the C run time facilities, it also uses the stack and heap.

If the stack and heap are not configured appropriately it can lead to an increase in CPU. With the introduction of 64 bit storage, tuning the heap pools and stack is no longer critical. You used to have to carefully manage the stack and heap pool sizes so you didn’t run out of storage.

The 5 second information on what to check, is the number of segments freed for the stack and heap should be zero. If the value is large then a lot of CPU is being used to manage the storage.

The topics are

Kinder garden background to stack.

When a C (main) program starts, it needs storage for the variables uses in the program. For example

int i;
for (ii=0;ii<3:ii++)
{}

char * p = malloc(1024);

The variables ii and p are variables within the function, and will be on the functions stack. p is a pointer.

The block of storage from the malloc(1024) will be obtained from the heap, and its address stored in p.

When the main program calls a function the function needs storage for the variables it uses. This can be done in several ways

  1. Each function uses a z/OS GETMAIN request on entry, to allocate storage, and a z/OS FREEMAIN request on exit. These storage requests are expensive.
  2. The main program has a block of storage which functions can use. For the main program uses bytes 0 to 1500 of this block, and the first function needs 500 bytes, so uses bytes 1501 to 2000. If this function calls another function, the lower level function uses storage from 2001 on wards. This is what usually happens, it is very efficient, and is known as a “stack”.

Intermediate level for stack

It starts to get interesting when initial block of storage allocated in the main program is not big enough.

There are several approaches to take when this occurs

  1. Each function does a storage GETMAIN on entry, and FREEMAIN on exit. This is expensive.
  2. Allocate another big block of storage, so successive functions now use this block, just like in the kinder garden case. When functions return to the one that caused a new block to be allocated,
    1. this new block is freed. This is not as expensive as the previous case.
    2. this block is retained, and stored for future requests. This is the cheapest case. However a large block has been allocated, and may never be used again.

How big a block should it allocate?

When using a stack, the size of the block to allocate is the larger of the user specified size, and the size required for the function. If the specified secondary size was 16KB, and a function needs 20KB of storage, then it will allocate at least 20KB.

How do I get the statistics?

For your C programs you can specify options in the #PRAGMA statement or, the easier way, is to specify it through JCL. You specify C run time options through //CEEOPTS … For example

//CEEOPTS DD *
STACK(2K,12K,ANYWHERE,FREE,2K,2K)
RPTSTG(ON)

Where

  • STACK(…) is the size of the stack
  • RPTSTG(ON) says collect and display statistics.

There is a small overhead in collecting the data.

The output is like:

STACK statistics:                                                
  Initial size:                                2048     
  Increment size:                             12288     
  Maximum used by all concurrent threads:  16218808     
  Largest used by any thread:              16218808     
  Number of segments allocated:                2004     
  Number of segments freed:                    2002     

Interpreting the stack statistics

From the above data

  • This shows the initial stack size was 2KB and an increment of 12KB.
  • The stack was extended 2004 times.
  • Because the statement had STACK(2K,12K,ANYWHERE,FREE,2K,2K), when the secondary extension became free it was FREEMAINed back to z/OS.

When KEEP was used instead of FREE, the storage was not returned back to z/OS.

The statistics looked like

STACK statistics:                                                
  Initial size:                               2048     
  Increment size:                            12288     
  Maximum used by all concurrent thread:  16218808     
  Largest used by any thread:             16218808     
  Number of segments allocated:               1003     
  Number of segments freed:                      0     

What to check for and what to set

For most systems, the key setting is KEEP, so that freed blocks are not released. You can see this a) from the definition b) Number of segments freed is 0.

If a request to allocate a new segment fails, then the C run time can try releasing segments that are not in use. If this happens the “”segments freed” will be incremented.

Check that the “segments freed” is zero, and if not, investigate why not.

When a program is running for a long time, a small number of “segments allocated” is not a problem.

Make the initial size larger, closer to the “Largest used of any thread” may improve the storage utilisation. With smaller segments there is likely to be unused space, which was too small for a functions request, causing the next segment to be used. So a better definition would be

STACK(16M,12K,ANYWHERE,KEEP,2K,2K)

Which gave

STACK statistics:                                                          
  Initial size:                                     16777216               
  Increment size:                                      12288               
  Maximum used by all concurrent threads:           16193752               
  Largest used by any thread:                       16193752               
  Number of segments allocated:                            1               
  Number of segments freed:                                0               

Which shows that just one segment was allocated.


Kinder garden background to heap

When there is a malloc() request in C, or a new … in Java, the storage may exist outside of the function. The storage is obtained from the heap.

The heap has blocks of storage which can be reused. The blocks may all be of the same size, or or different sizes. It uses CPU time to scan free blocks looking for the best one to reuse. With more blocks it can use increasing amounts of CPU.

There are heap pools which avoids the costs of searching for the “right” block. It uses a pools of blocks. For example:

  1. there is a heap pool with 1KB fixed size blocks
  2. there is another heap pool with 16KB blocks
  3. there is another heap pool with 256 KB blocks.

If there is a malloc request for 600 bytes, a block will be taken from the 1KB heap pool.

If there is a malloc request for 32KB, a block would be used from the 256KB pool.

If there is a malloc request for 512KB, it will issue a GETMAIN request.

Intermediate level for heap

If there is a request for a block of heap storage, and there is no free storage, a large segment of storage can be obtained, and divided up into blocks for the stack. If the heap has 1KB blocks, and a request for another block fails, it may issue a GETMAIN request for 100 * 1KB and then add 100 blocks of 1KB to the heap. As storage is freed, the blocks are added to the free list in the heap pool.

There is the same logic as for the stack, about returning storage.

  1. If KEEP is specified, then any storage that is released, stays in the thread pool. This is the cheapest solution.
  2. If FREE is specified, then when all the blocks in an additional segment have been freed, then free the segment back to the z/OS. This is more expensive than KEEP, as you may get frequent GETMAIN and FREEMAIN requests.

How many heap pools do I need and of what size blocks?

There is usually a range of block sizes used in a heap. The C run time supports up to 12 cell sizes. Using a Liberty Web server, there was a range of storage requests, from under 8 bytes to 64KB.

With most requests there will frequently be space wasted. If you want a block which is 16 bytes long, but the pool with the smallest block size is 1KB – most of the storage is wasted.
The C run time gives you suggestions on the configuration of the heap pools, the initial size of the pool and the size of the blocks in the pool.

Defining a heap pool

How to define a heap pool is defined here.

You specify the size of overall size of storage in the heap using the HEAP statement. For example for a 16MB total heap size.

HEAP(16M,32768,ANYWHERE,FREE,8192,4096)

You then specify the pool sizes


HEAPPOOL(ON,32,1,64,2,128,4,256,1,1024,7,4096,1,0)

The figures in bold are the size of the blocks in the pool.

  • 32,1 says maximum size of blocks in the pool is 32 bytes, allocate 1% of the heap size to this pool
  • 64,2 says maximum size of blocks in the pool is 64 bytes, allocate 2% of the heap size to this pool
  • 128,4 says maximum size of blocks in the pool is 128 bytes, allocate 4% of the heap size to this pool
  • 256,1 says maximum size of blocks in the pool is 256 bytes, allocate 1% of the heap size to this pool
  • 1024,7 says maximum size of blocks in the pool is 1024 bytes, allocate 7% of the heap size to this pool
  • 4096,1 says maximum size of blocks in the pool is 4096 bytes, allocate 1% of the heap size to this pool
  • 0 says end of definition.

Note, the percentages do not have to add up to 100%.

For example, with the CEEOPTS

HEAP(16M,32768,ANYWHERE,FREE,8192,4096)
HEAPPOOLS(ON,32,50,64,1,128,1,256,1,1024,7,4096,1,0)

After running my application, the data in //SYSOUT is


HEAPPOOLS Summary:                                                         
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In 
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use      
  ------------------------------------------------------------------------ 
       32        40    50      209715           0           0           0 
       64        72      1        2330           1        1002           2 
      128       136      1        1233           0           0           0 
      256       264      1         635           0           0           0 
     1024      1032      7        1137           1           2           0 
     4096      4104      1          40           1           1           1 
  ------------------------------------------------------------------------ 

For the cell size of 32, 50% of the pool was allocated to it,

Each block has a header, and the total size of the 32 byte block is 40 bytes. The number of 40 bytes units in 50% of 16 MB is 8MB/40 = 209715, so these figures match up.

(Note with 64 bit heap pools, you just specify the absolute number you want – not a percentage of anything).

Within the program there was a loop doing malloc(50). This uses cell pool with size 64 bytes. 1002 blocks(cells) were used.

The output also has

Suggested Percentages for current Cell Sizes:
HEAPP(ON,32,1,64,1,128,1,256,1,1024,1,4096,1,0)


Suggested Cell Sizes:
HEAPP(ON,56,,280,,848,,2080,,4096,,0)

I found this confusing and not well documented. It is another of the topics that once you understand it it make sense.

Suggested Percentages for current Cell Sizes

The first “suggested… ” values are the suggestions for the size of the pools if you do not change the size of the cells.

I had specified 50% for the 32 byte cell pool. As this cell pool was not used ( 0 allocated cells) then it suggests making this as 1%, so the suggestion is HEAPP(ON,32,1

You could cut and paste this into you //CEEOPTS statement.

Suggested Cell Sizes

The C run times has a profile of all the sizes of blocks used, and has suggested some better cell sizes. For example as I had no requests for storage less than 32 bytes, making it bigger makes sense. For optimum storage usage, it suggests of using sizes of 56, 280,848,2080,4096 bytes.

Note it does not give suggested number of blocks. I think this is poor design. Because it knows the profile it could have an attempt at specifying the numbers.

If you want to try this definition, you need to add some values such as

HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

Then rerun your program, and see what percentage figures it recommends, update the figures, and test again. Not the easiest way of working.

What to check for and what to set

There can be two heap pools. One for 64 bit storage ( HEAPPOOL64) the other for 31 bit storage (HEAPPOOL).

The default configuration should be “KEEP”, so any storage obtained is kept and not freed. This saves the cost of expensive GETMAINS and FREEMAINs.

If the address space is constrained for storage, the C run time can go round each heap pool and free up segments which are in use.

The value “Number of segments freed” for each heap should be 0. If not, find out why (has the pool been specified incorrectly, or was there a storage shortage).

You can specify how big each pool is

  • for HEAPPOOL the HEAP size, and the percentage to be allocated to each pool – so two numbers to change
  • for HEAPPOOL64 you specify the size of each pool directly.

The sizes you specify are not that sensitive, as the pools will grow to meet the demand. Allocating one large block is cheaper that allocating 50 smaller blocks – but for a server, this different can be ignored.

With a 4MB heap specified

HEAP(4M,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

the heap report was

 HEAPPOOLS Summary: 
   Specified Element   Extent   Cells Per  Extents    Maximum      Cells In 
   Cell Size Size      Percent  Extent     Allocated  Cells Used   Use 
   ------------------------------------------------------------------------ 
        56        64      1         655           2        1002           2 
       280       288      1         145           1           1           0 
       848       856      1          48           1           1           0 
      2080      2088      1          20           1           1           1 
      4096      4104      1          10           0           0           0 
   ------------------------------------------------------------------------ 
   Suggested Percentages for current Cell Sizes: 
     HEAPP(ON,56,2,280,1,848,1,2080,1,4096,1,0) 

With a small(16KB) heap specified

HEAP(16K,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

The output was

HEAPPOOLS Summary:                                                            
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In    
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use         
  ------------------------------------------------------------------------    
       56        64      1           4         251        1002           2    
      280       288      1           4           1           1           0    
      848       856      1           4           1           1           0    
     2080      2088      1           4           1           1           1    
     4096      4104      1           4           0           0           0    
  ------------------------------------------------------------------------    
  Suggested Percentages for current Cell Sizes:                               
    HEAPP(ON,56,90,280,2,848,6,2080,13,4096,1,0)                             

and we can see it had to allocate 251 extents for all the request.

Once the system has “warmed up” there should not be a major difference in performance. I would allocate the heap to be big enough to start with, and avoid extensions.

With the C run time there are heaps as well as heap pools. My C run time report gave

64bit User HEAP statistics:
31bit User HEAP statistics:
24bit User HEAP statistics:
64bit Library HEAP statistics:
31bit Library HEAP statistics:
24bit Library HEAP statistics:
64bit I/O HEAP statistics:
31bit I/O HEAP statistics:
24bit I/O HEAP statistics:

You should check all of these and make the initial size the same as the suggested recommended size. This way the storage will be allocated at startup, and you avoid problems of a request to expand the heap failing due to lack of storage during a buys period.

Advanced level for heap

While the above discussion is suitable for many workloads, especially if they are single threaded. It can get more complex when there are multiple thread using the heappools.

If you have a “hot” or highly active pool you can get contention when obtaining and releasing blocks from the heap pool. You can define multiple pools for an element size. For example

HEAPP(ON,(56,4),1,280,1,848,1,2080,1,4096,1,0)

The (56,4) says make 4 pools with block size of 56 bytes.

The output has

HEAPPOOLS Summary:                                                          
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In  
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use       
  ------------------------------------------------------------------------  
       56       64     1           4         251        1002           2  
       56       64     1           4           0           0           0  
       56       64     1           4           0           0           0  
       56       64     1           4           0           0           0  
      280       288      1           4           1           1           0  
      848       856      1           4           1           1           0  
     2080      2088      1           4           1           1           1  
     4096      4104      1           4           0           0           0  
  ------------------------------------------------------------------------  

We can see there are now 4 pools with cell size of 56 bytes. The documentation says Multiple pools are allocated with the same cell size and a portion of the threads are assigned to allocate cells out of each of the pools.

If you have 16 threads you might expect 4 threads to be allocated to each pool.

How do you know if you have a “hot” pool.

You cannot tell from the summary, as you just get the maximum cells used.

In the report is the count of requests for different storage ranges.

Pool  2     size:   160 Get Requests:           777707 
  Successful Get Heap requests:    81-   88                 77934 
  Successful Get Heap requests:    89-   96                 59912 
  Successful Get Heap requests:    97-  104                 47233 
  Successful Get Heap requests:   105-  112                 60263 
  Successful Get Heap requests:   113-  120                 80064 
  Successful Get Heap requests:   121-  128                302815 
  Successful Get Heap requests:   129-  136                 59762 
  Successful Get Heap requests:   137-  144                 43744 
  Successful Get Heap requests:   145-  152                 17307 
  Successful Get Heap requests:   153-  160                 28673
Pool  3     size:   288 Get Requests:            65642  

I used ISPF edit, to process the report.

By extracting the records with size: you get the count of requests per pool.

Pool  1     size:    80 Get Requests:           462187 
Pool  2     size:   160 Get Requests:           777707 
Pool  3     size:   288 Get Requests:            65642 
Pool  4     size:   792 Get Requests:            18293 
Pool  5     size:  1520 Get Requests:            23861 
Pool  6     size:  2728 Get Requests:            11677 
Pool  7     size:  4400 Get Requests:            48943 
Pool  8     size:  8360 Get Requests:            18646 
Pool  9     size: 14376 Get Requests:             1916 
Pool 10     size: 24120 Get Requests:             1961 
Pool 11     size: 37880 Get Requests:             4833 
Pool 12     size: 65536 Get Requests:              716 
Requests greater than the largest cell size:               1652 

It might be worth splitting Pool 2 and seeing if makes a difference in CPU usage at peak time. If it has a benefit, try Pool 1.

You can also sort the “Successful Heap requests” count, and see what range has the most requests. I don’t know what you would use this information for, unless you were investigating why so much storage was being used.

Ph D level for heap

For high use application on boxes with many CPUs you can get contention for storage at the hardware cache level.

Before a CPU can use storage, it has to get the 256 byte cache line into the processor cache. If two CPU’s are fighting for storage in the same 256 bytes the throughput goes down.

By specifying

HEAPP(ALIGN….

It ensures each block is isolated in its own cache line. This can lead to an increase in virtual storage, but you should get improved throughput at the high end. It may make very little difference when there is little load, or on an LPAR with few engines.

I’m sorry I haven’t a clue…

As well as being a very popular British comedy, it is how I sometimes feel about what is happening inside the Liberty Web servers, and products like z/OSMF, z/OS Connect and MQWEB. It feels like a spacecraft in cartoons – there are usually only two controls – start and stop.

One reason for this is that the developers often do not have to use the product in production, and have not sat there, head in hand saying “what is going on ?”.

In this post I’ll cover

What data values to expose

As a concept, if you give someone a lever to pull – you need to give them a way of showing the effect of pulling the level.

If you give someone a tuning parameter, they need to know the impact of using the tuning parameter. For example

  • you implement a pool of blocks of storage.
  • you can configure the number of maximum number of blocks
  • if a thread needs some storage, and there is a free block in the pool, then assign the block to the thread. When the thread has finished with it, the thread goes back into the pool.
  • if all the blocks in the pool are in-use, allocate a block. When the thread has finished with the block – free it.
  • if you specify a very large number of blocks it could cause a storage shortage

The big questions with this example is “how big do you make the pool”?

To be able to specify the correct pool size you need to know information like

  • What was the maximum number of blocks used – in total
  • How many times were additional blocks allocated (and freed)
  • What was the total number of blocks requested.

You might decide that the pool is big enough if less than1% of requests had to allocate a block.

If you find that the maximum value used was 1% of the size of the pool, you can make the pool much smaller.

If you find that 99% of the requests were allocated/freed, this indicates the pool is much to small and you need to increase the size.

For other areas you could display

  • The number of authentication requests that were userid+ password, or were from a certificate.
  • The number of authentication requests which failed.
  • The list of userid names in the userid cache.
  • How many times each application was invoked.
  • The number of times a thread had to wait for a resource.
  • The elapsed time waiting for a resource, and what the resource was.

What attributes to expose

You look at the data to ask

  • Do I have a problem now?
  • Will I have a problem in the future? You need to collect information over time and look at trends.
  • When we had a problem yesterday, did this component contribute to it? You need to have historical data.

It is not obvious what data attributes you should display.

  • The “value now” is is easy to understand.
  • The “average value” is harder. Is this from the start of the application (6 months ago), or a weighted average (99 * previous average + current value)/100. With this weighted average, a change since the previous value indicates the trend.
  • The maximum value is hard – from when? There may have been a peak at startup, and small peaks since then will not show up. Having a “reset command” can be useful, or have it reset on a timer – such as display and reset every 10 minutes.
  • If you “reset” the values and display the value before any activity, what do you display? “0”s for all of the values, or the values when the reset command was issued.

Resetting values can make it easier to understand the data. Comparing two 8 digit numbers is much harder than comparing two 2 digit numbers.

How to expose data

Java has a Java Management eXtension (JMX) for reporting management information. It looks very well designed, is easy to use, and very compact! There is an extensive document from Oracle here.

I found Basic Introduction to JMX by Baeldung , was an excellent article with code samples on GitHub. I got these working in Eclipse within an hour!

The principal behind JMX is …

For each field you want to expose you have a get… method.

You define an interface with name class| |”MBean” which defines all of the methods for displaying the data.

public interface myClassMBean {
public String getOwner();
public int getMaxSize();
}

You define the class and the methods to expose the data.

public class myClass implements myClassMBean{

// and the methods to expose the data

public String getOwner() {
return fileOwner;
}

public int getMaxSize() {
return fileSize;
}

}

And you tell JMX to implement it

myClass myClassInstance = new myClass(); // create the instance of myClass

MBeanServer server = ManagementFactory.getPlatformMBeanServer();
ObjectName objectName =….
server.registerMBean(myClassInstance, objectName);

Where myClassInstance is a class instance. The JMX code extracts the name of the class from the object, and can the identify all the methods defined in the class||”MBean” interface. Tools like jconsole can then query these methods, and invoke them.

ObjectName is an object like

ObjectName objectName = new ObjectName(“ColinJava:type=files,name=onefile”);

Where “ColinJava” is a high level element, “type” is a category, and “name” is the description of the instance .

That’s it.

When you use jconsole ( or other tools) to display it you get

You could have

MBeanServer server = ManagementFactory.getPlatformMBeanServer();

ObjectName bigPoolName = new ObjectName(“ColinJava:type=threadpool,name=BigPool”);
server.registerMBean(bigpoolInstance, bigPoolName);

ObjectName medPoolName = new ObjectName(“ColinJava:type=threadpool,name=MedPool”);
server.registerMBean(medpoolInstance, medPoolname);

ObjectName smPoolName = new ObjectName(“ColinJava:type=threadpool,name=SmallPool”);
server.registerMBean(smallpoolInstance,smPoolName);

This would display the stats data for three pools

  • ColinJava
    • threadpool
      • Bigpool..
      • MedPool….
      • SmallPool…

And so build up a tree like

  • ColinJava
    • threadpool
      • Bigpool..
      • MedPool….
      • SmallPool…
    • Userids
      • Userid+password
      • Certificate
    • Applications
      • Application 1
      • Application 2
    • Errors
      • Applications
      • Authentication

You can also have set…() methods to set values, but you need to be more careful; checking authorities, and possibly synchronising updates with other concurrent activity.

You can also have methods like resetStats() which show up within jconsole as Operations.

How do I build up the list of what is needed?

It is easy to expose data values which have little value. I remember MQ had a field in the statistics “Number of times the hash table changed”. I never found a use for this. Other times I thought “If only we had a count of ……”

You can collect information from problems reported to you. “It was hard to diagnose because… if we had the count of … the end user could have fixed it without calling us”.

Your performance team is another good source of candidates fields. Part of the performance team’s job is to identify statistics to make it easier to tune the system, and reduce the resources used. It is not just about identifying hot spots.

Before you implement the collection of data, you could present to your team on how the data will be used, and produce some typical graphs. You should get some good feedback, even if it is “I dont understand it”.

What can I use to display the data

There are several ways of displaying the data.

  • jconsole – which comes as part of Java can display the data in a window
  • python – you can issue a query can capture the data. I have this set up to capture the data every 10 seconds
  • other tools using the standard interfaces.

Have a good REST and save a fortune in CPU

The REST protocol is a common programming model with the internet. It is basically a one shot model, which scales and has high availability, but can have a very high CPU cost. There are things you can do to reduce the CPU cost. Also, the MQWeb server, has implemented some changes to reduce the cost. See here for the MQ documentation.

The post gives some guidance on reducing the costs, for Liberty based servers.

The traditional model and the REST model

The traditional application model may have a client and a flow to the server

  • Connect to the server and authenticate
  • Debit my account by £500 within syncpoint
  • Credit your account by £500 within syncpoint
  • Commit the transaction
  • Do the next transaction etc
  • At the end of the day, disconnect from the server.

The REST model would be

  • Connect to the server and authenticate and do (Debit my account by £500, credit your account by £500), disconnect

This model has the advantage that it scales. When you initiate a transaction it can go to any one of the available back-end servers. This spreads the load and improves availability.

With the traditional model, the clients connects any available server at the start of day stays connected all day. If a new server becomes available during the day, it may get no workload.

The downside of the REST model is the cost. Establishing a connection and authenticating can be very expensive. I could not find much useful documentation on how to reduce these costs.

There are two parts of getting a REST connection.

  • Establishing the connection
  • Authentication

Establishing the connection

You can have each REST request use a new session for every REST request each of which which involves a full TLS handshake. Two requests could go to different servers, or go to the same server.

You can issue multiple REST request over the same session, to the same backend server.

On my little z/OS, using z/OSMF it takes

  • about 1 second to create a new connection and issue a request and terminate
  • about 0.1 seconds to use the shared session, per REST request.

Establishing the TLS session is expensive, as there is a lot of computation to generate the keys.

For MQWEB, the results are very similar.

Authentication

Once the session has been established each REST request requires authentication.

If you are using userid and password, the values are checked with z/OS.

If you are using client certificate authentication the Subject DN is looked up in the security manager, and if there is a DN to userid mapping, the userid is returned.

Once you have a valid userid, the userid’s access can be obtained from the security manager.

All of these values can be cached in the Liberty web server. So the first time a certificate or userid is used, it will take a longer than successive times.

Information about authentication is then encrypted and passed back in the REST response as the LtpaToken2 cookie.

If a REST request passes the cookie back to the server, then the information in the cookie is used by the server, and fewer checks need to be done.

This cookie can expire, and when it does expire the userid and password, or the certificate DN is checked as before, and the cookie will be updated.

If you do not send the LtpaToken2 cookie, this will cause the passed authentication information to be revalidated. If you want to change userid, do not send the the cookie.

Is any of this documented?

There is not a lot of documentation. There is information Configuring the authentication cache in Liberty.

There is a parameter javax.net.ssl.sessionCacheSize. If this is not set the default is 20480.

Running a python rest application on z/OS

I installed Python and co-req packaged on my z/OS system, described here. I wanted to run a REST workload into z/OSMF. I could have used, Liberty, z/OS Connect or MQWEB as the backend.

It makes use of the python requests package.

Initial script

#!/usr/bin/env python3 
import requests 
from timeit import default_timer as timer 
import urllib3 
my_header = { 
  'Connection': 'keep-alive', 
  'Content-Type': 'application/json', 
  'Cache-Control': 'max-age=0', 
  'Authorization': 'Basic Y395aW46cGFu67GhlMG4=', 
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
  'Accept-Language': 'en-GB,en;q=0.5', 
  'DNT': '1', 
  'Connection': 'keep-alive', 
  'Upgrade-Insecure-Requests': '1', 
  'Cache-Control': 'max-age=0' 
}
urllib3.disable_warnings() 
geturl ="https://10.1.1.2:29443/zosmf/rest/mvssubs?ssid=IZUG" 
jar = requests.cookies.RequestsCookieJar() 
start=timer() 
res = requests.get(geturl,headers=my_header,verify=False,cookies=jar) 
end=timer() 
print("duration=",end-start) 
if res.status_code != 200: 
  print(res.status_code) 
print("Output=",res.text) 
jar=  res.cookies  

Comments on the python script

  • Authorization’: ‘Basic Y395aW46cGFu67GhlMG4=’ is the 64 bit encoding of the userid and password. Which is trivial(!) to decode.
  • urllib3.disable_warnings() Without this set you get a message InsecureRequestWarning: Unverified HTTPS request is being made to host ‘10.1.1.2’. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings. This is because the certificate sent down from the server has not been validated.
  • jar= res.cookies says save the cookies into the jar dictionary, for future use

The output was

duration= 1.210928201675415
output= {“items”:[
{“subsys”:”IZUG”, “active”:true, “dynamic”:true, “funcs”:[10]}
],”JSONversion”:1}

Verifying TLS certificate

With urllib3.disable_warnings() present it will cause error warnings to be suppressed.
When this statement is not present, there will be warnings about certificate problems.

In the statement res = requests.get(geturl,headers=my_header,verify=False,cookies=jar) verify is either “False” or the name of a CA .pem file containing the CA certificates. I used verify=ABC and got

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:

I got the error because “ABC” is not a valid file, and the verification could not be done.

I exported the CA certificate used by the server using

RACDCERT CERTAUTH   EXPORT(LABEL('TEMP-CA')) -         
       DSN('IBMUSER.CERT.TEMP.CA.PEM')   -             
       FORMAT(CERTB64) -                               
       PASSWORD('password')                            

I could only get verify=…. to work with a USS file, so I had to copy the dataset IBMUSER.CERT.TEMP.CA.PEM into a USS file CACert.pem. Then when I used

res = requests.get(geturl,headers=my_header,verify=CACert.pem,cookies=jar)

it worked fine.

Using a client certificate.

You cannot use RACF certificate with the requests facility, because the underlying code does not support it. You have to use .pem style certificate.

The support does not allow you to specify a password for the private key, so this is not very secure.

You define

cf=”colinpaicesECp256r1.pem”
kf=”colinpaicesECp256r1.key.pem”
cpcert=(cf,kf)

where

  • cf is the name of the file with the certificate in it
  • kf is the name of the file with the private key in it
  • cpcert is the python tuple.

If your certificate file also includes the private key, you do not need the kf, just use cpcert=(cf).

You use it

res = requests.get(geturl,headers=my_header,verify=CACert.pem,cookies=jar,cert=cpcert)

I tried exporting a certificate from z/OS using RACDCERT EXPORT … format(PKCS12B64), copying it to a uss file, and using that, but it did not work. The file could not be read.

I tried creating a private key with a password (to make it more secure) but when I used it I got the message

urllib3.exceptions.SSLError: Client private key is encrypted, password is required

There is a package request_pkcs12 which provides support for a password on the certificate. https://github.com/m-click/requests_pkcs12. I did not use this, I recreated my certificate and private key without a password.

I tried running on Linux using my Hardware Security Module (which plugs into a USB socket). This also failed as I could not enter the PIN for the device.

Compare the response time to running across the network.

I ran the same python script on z/OS and on Linux. The round trip time of the rest request was

  • 1.41 seconds on z/OS
  • 0.92 seconds on Linux.

I think I’ll run my tests from Linux in future.

One Minute MVS performance – Work Load Manager – looking at WLM reports.

I have a set of blog posts relating to getting started with z/OS performance. This blog post follows on the overview of WLM, and describes the contents of the reports, and how you can tell if work is being delayed, and why it is being delayed.

Real goals from my system

For TSO on my z/OS there are goals

  1. For the first 800 service units (a systems independent measure of CPU usage)
    1. 80% requests to complete within 00:00:00.30
    2. Work has importance 2
  2. After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

  1. Execution velocity of 30
  2. Importance 3

For started tasks with Low Priority the goals are

  1. Discretionary – there no goals – just do your best

How do I tell what is going on and if the goals have been met?

RMF can display data in near real time (every minute or so).

RMF captures data and produces SMF records which can be processed by RMF and other products.

You can report on

  1. How well the service class did against its goals
  2. How well transactions or work did, from a reporting class.

You could have all CICS transactions in a service class, so they get the same CPU profile etc, but have different reporting classes. You can monitor CE* transaction, and PAY* transactions differently.

You could have a reporting class for work coming in from other systems, depending on the userid.

I set up a reporting class for z/OSMF. In the RMF batch report SYSRPTS(WLMGL(RCPER(ZOSMF)).

One part of the report was contained


         z/OS V2R4               SYSPLEX ADCDPL             DATE 06/14/2021           INTERVAL 05.00.003   
                                 RPT VERSION V2R4 RMF       TIME 09.25.00
POLICY=ETPBASE                        REPORT CLASS=ZOSMF                                   PERIOD=1 
 -TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF  TRANS-APPL%-----CP-IIPCP/AAPCP-IIP/AAP  ---ENCLAVES--- 
 AVG        1.00  ACTUAL                    0  TOTAL        66.25       64.20  173.99  AVG ENC   0.00 
 MPL        1.00  EXECUTION                 0  MOBILE        0.00        0.00    0.00  REM ENC   0.00 
 ENDED         0  QUEUED                    0  CATEGORYA     0.00        0.00    0.00  MS ENC    0.00 
 END/S      0.00  R/S AFFIN                 0  CATEGORYB     0.00        0.00    0.00 
                                                                                                                
 ----SERVICE----   SERVICE TIME  ---APPL %---  --PROMOTED--  --DASD I/O---  ----STORAGE----  -PAGE-IN RATES- 
 IOC        2366K  CPU  720.505  CP     66.25  BLK    0.000  SSCHRT    0.2  AVG    81420.24  SINGLE      0.0 
 CPU      617333   SRB    0.223  IIPCP  64.20  ENQ    0.000  RESP      0.0  TOTAL  81421.05  BLOCK       0.0 
 MSO      154219   RCT    0.000  IIP   173.99  CRM    0.000  CONN      0.0  SHARED     0.00  SHARED      0.0 
 SRB         191   IIT    0.013  AAPCP   0.00  LCK    0.889  DISC      0.0                   HSP         0.0 
 TOT        3138K  HST    0.000  AAP      N/A  SUP    0.000  Q+PEND    0.0 
 GOAL: EXECUTION VELOCITY 70.0%     VELOCITY MIGRATION:   I/O MGMT  28.3%     INIT MGMT 28.3% 
                                                                                                                
          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  -------------- EXEC DELAYS % -----------  
 SYSTEM                    VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU                                
 S0W1        --N/A--       28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19                               

Key fields:

INTERVAL 05.00.003

This tells the duration of the requests.

POLICY=ETPBASE REPORT CLASS=ZOSMF PERIOD=1

This tells you this is a report class (rather than a service class) the name is zOSMF, and is for period 1 . When you have service classes which have more than one criteria , such as high priority for the first 0.5 seconds of CPU – then low priority, these will have multiple periods.

-TRANSACTIONS–
AVG 1.00
MPL 1.00
ENDED 0
END/S 0.00

This says on average there was one instance running. You can have multiple transactions or jobs in a class. Add up the total duration of all jobs/transactions and divide by the interval to get the average(AVG).

MPL (multi programming level) is an advanced topic and describes how many instances were concurrently active.

No jobs/transactions ended in this interval, with a ending rate of 0 in 5 minutes.

—APPL %—
CP 66.25
IIPCP 64.20
IIP 173.99
AAPCP 0.00
AAP N/A

This shows the percentage of CPU used over the interval

  • 66.25 percent on GP engines
  • 64.20 percent IIPCP is 64.20 % of GP engine was doing work that could have run on a ZIIP – if there had been spare ZIIP capacity. 66.25 – 64.20 = 2.05 of work on a GP that was not ZIIP eligible.
  • 173.99 percent of ZIIP work running on a ZIIP engine – so nearly 2 ZIIP engines were being used
  • 0 AAPCP – there was no ZAAP eligible work offloaded onto a GP
  • 0 AAP there was no work running on an ZAAP

The total ZIIP used was 173.99 in ZIIP engines, +64.20 of a GP = 238 or almost 2.5 ZIIP engines worth.

It is good to run on ZIIPs where possible, because ZIIPs are cheaper ($$) than GPs, and GPs may be configured to be slower than a ZIIP.

GOAL: EXECUTION VELOCITY 70.0%

The performance goal for this work was defined as Execution Velocity of 70 %.

 
         EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM  VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU      
 S0W1    28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19       
  • The achieved execution velocity was 28.3% against a target of 70%
  • The performance index was 2.5. The performance goal is goal/actual. A value of 1 or smaller is good. The value here shows the goal was not met. You need to consider
    • Changing the goal for this work so the target goal is what you can achieve on a normal day
    • Changing the importance of the work for when the system is constrained.
    • If you change the goal for one set of work – it may impact other work, so you need to look at the system as a whole and decide which is your important work.
    • Add more CPUs or ZIIPs – these may not help if the delays are not CPU… see below
  • Average number of address spaces in this class 1.
  • EXEC USING%. The figures above were for true CPU used. WLM samples activities 4 times a second. Of the samples where jobs were running or waiting for waiting for a resource.
    • 8% of an CPU engine was used – this includes ZIIP work running on GP.
    • 20% of a ZIIP engine
    • The ratio 8:20 is similar to CPU on GP and ZIIP actually used in this period of 66.25: 173.99.
  • EXEC DELAYS
    • The total delay was 72% = ( 100 – (8+20) “using samples” above)
    • for 53% of all the samples it was was waiting for a ZIIP engine
    • for 19% of all the the samples it was waiting for a GP engine.
    • You can have other delays listed here, for example paging, or your program is capped to limit how much CPU it is allowed.

Once z/OSMF had started, and settled down, there were still delays for IIP (28%). To me this looks like a lumpy workload, that perhaps there is a timer which pops and runs multiple threads. There are more threads than IIPs – so some have to wait.

Reports for transactional work

I defined a transaction so I could measure the response times (and CPU used) for a service in z/OSMF. A TSO address space is started, and z/OSMF sends a client/server request to the TSO address space. The response time is sub-second so a good candidate to demonstrate WLM for a transaction.

I configured z/OSMF to have

<zosWorkloadManager collectionName=”MOPZCET”/>
<wlmClassification>
<httpClassification transactionClass=”ZCI3″ resource=”/zosmf/webispf/*/“/>
</wlmClassification>

The collection name is passed to WLM to determine the service class and report class of the work. The default is the server name.

All ISPF (with a URL of /zosmf/webispf/*) requests were classified as ZCI3.

I then used WLM to configure

  • a service class ZCI3 with Average response time of 00:00:00.010
  • a classification rule for type CB, a rule for CN=MOPZCET, and sub-rule TC = ZCI4. This gave the service class and report class.

The data in the report had

-TRANSACTIONS–
AVG 0.01
MPL 0.01
ENDED 21
END/S 0.07

21 transactions in 5 minutes is 0.07 a second.

MPL (MultiProgramming Limit is the target which represents the number of address spaces that must be in the swapped-in state for the service class period to meet its goals. I’ve never used it!

TRANS-TIME HHH.MM.SS.FFFFFF
ACTUAL               140526
EXECUTION            139950
QUEUED                  575

The average time was 0.140 seconds.

GOAL: RESPONSE TIME 000.00.00.010 AVG

That was the specification in WLM (note the specified value of 0.010 is very different to the 0.140 achieved)


          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP 
 S0W1     000.00.00.140526 66.7 14.1   0.0  0.0 N/A  18 0.0  9.1 9.1  

This shows the average response time was 0.140 seconds, used 18% on a ZIIP, and waited 9% of the time for a ZIIP

To the right of the data in the report was

--- DELAY % --- 
UNK IDL CRY CNT                 
 64 0.0 0.0 0.0 

Which says there was 64% of the delay was unknown. This could be

  • waiting for end user input
  • waiting for TCP/IP data
  • the program sent off a request and is waiting for a response.

For example the ISPF transaction in z/OSMF had sent a request to an address space running TSO. This address space processed the request and sent the response back. I am guessing that the 64% delay was waiting for TSO to process the request and send back the response.

You also get a response time profile based on the service class

                              ----------RESPONSE TIME DISTRIBUTION---------- 
   -----TIME------  # TRANS   0    10   20   30   40   50   60   70   80   90   100 
   HH.MM.SS.FFFFFF  IN BUCKET |....|....|....|....|....|....|....|....|....|....| 
<= 00.00.00.014000          0  > 
<= 00.00.00.015000          0  > 
<= 00.00.00.020000          2  >>>>>> 
<= 00.00.00.040000          5  >>>>>>>>>>>>> 
>  00.00.00.040000         14  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 

This shows that out of the 21 requests, 7 were below 0.040 seconds, and 14 were over 0.040 seconds.

From the service class, it was specified as GOAL: RESPONSE TIME 000.00.00.010 AVG so this goal is very badly specified. It would be better set to average of 0.140 seconds.

I changed the service class to a goal of 0.140 seconds and activated it. After I had run some tests the output was

          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS %
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT            
 S0W1     000.00.00.097733  100  0.7   0.0  0.0 N/A  50 0.0  0.0            

Which showed no delays

and a response time profile

                                ---RESPONSE TIME DISTRIBUTION--- 
    -----TIME------  --# TRANS  0    10   20   30   40   50   60
    HH.MM.SS.FFFFFF  IN BUCKET  |....|....|....|....|....|....|.
 <= 00.00.00.070000          0  > 
 <= 00.00.00.084000          5  >>>>>>>>>>>>>>> 
 <= 00.00.00.098000          9  >>>>>>>>>>>>>>>>>>>>>>>>>> 
 <= 00.00.00.112000          1  >>>> 
 <= 00.00.00.126000          0  > 
 <= 00.00.00.140000          1  >>>> 
 <= 00.00.00.154000          1  >>>> 
 <= 00.00.00.168000          0  > 
 <= 00.00.00.182000          0  > 
 <= 00.00.00.196000          0  > 
 <= 00.00.00.210000          1  >>>> 
 <= 00.00.00.280000          0  > 

An average of 0.10 seconds, with some taking up to 0.210 seconds.

Real time information

You can get the information in near real time from RMF (or other monitors)

For example for processor delays

            Service  CPU  DLY USG EAppl  ----------- Holding Job(s) ---------
Jobname  CX Class    Type  %   %    %     %  Name      %  Name      %  Name 
IZUSVR1  SO STCHIM   CP     2  35 56.53   91 IZUSVR1    4 JES2MON    2 TCPIP 
                     IIP   94  95 183.1   89 IZUSVR1                         

This shows that job IZUSVR1

  • Was delayed for 2% of the time on a GP
  • Used 35% of the GP engines
  • Was delayed 94% of the time on a ZIIP
  • and used 95% of the available ZIIP resource
  • The jobs using CPU were IZUSVR1 (using 91%) JES2MON and TCPIP
  • The jobs using ZIIP were IZUSVR1

What to do now?

You need to identify the goals of your work, and set sensible goals. This may take several iterations. You also need to understand the priorities of the work, and userid.

Once you have configured your system to report on response times of your business critical work, you can adjust the service classes so your work achieves it goals.

Define reporting classes so you can monitor different groups of work and that they are meeting their goals.

I cut the CPU cost of doing nothing.

I was running z/OSMF and saw that the CPU costs where high when it was sitting there doing nothing. I managed to reduce the CPU costs by more than half. This would apply to other Liberty based web servers, such as MQWEB, and z/OS Connect.

I could see from the MVS system trace there was a lot of activity creating a thread, and deleting a thread, a lot of costs associated with these activities, such as allocating and freeing storage.

I increased the number of threads so that this allocating a thread and delete a thread activity disappeared.

In the xml configuration file (based from server.xml) was the default

<executor name=”LargeThreadPool” id=”default” coreThreads=”100″
maxThreads=”0″ keepAlive=”60s” stealPolicy=”STRICT”
rejectedWorkPolicy=”CALLER_RUNS” />

I changed this to

<executor name=”LargeThreadPool” id=”default”
coreThreads=”300″ maxThreads=”600″ keepAlive=”60s”
stealPolicy=”STRICT” rejectedWorkPolicy=”CALLER_RUNS” />

and restarted the server.

The options are documented here. There is an option keepAlive which defaults to 60 seconds. If a thread has been idle for this time, the thread is a candidate to be freed to reduce the pool back to corethreads size.

I was alerted to this problem when I looked at an MVS system trace. This is described here.

There is a discussion how sun thread pools work in this post. It is not obvious. This may or may not be how this executor works.

What value should you use?

This is a hard question, as Liberty does not provide this information directly.

I used the Health Checker connects from Eclipse to the JVM and extracts information about the JVM and applications.

This shows that at rest there was a lot of activity. I increased it to 250 threads and restarted the server and got

So better … but still some activity. I increased it to 300 threads, and the graph was flat.

I set up USER.Z24A.PROCLIB(CEEOPT) with

RPTSTG(ON),
RPTOPTS(ON)

in my z/OSMF job I had

//CEEOPTS DD DISP=SHR,DSN=USER.Z24A.PROCLIB(CEEOPT)

This printed out a lot of useful information about the stack and heap usage. It the bottom it said

Largest number of threads concurrently active: 397

The number of threads includes threads from the pool I had specified, plus other threads that z/OSMF creates. The health check showed there were 372 threads, event though coreThreads was set to “300”.

I also used jconsole to display information about the highest thread usage. The URL was service:jmx:rest://10.1.1.2:10443/IBMJMXConnectorREST. It displays peak threads and live threads.

Security

I found the security of both jconsole, and health check, was weak (userid and password). I was unable to successfully set up a TLS certificate logon to the server.

The information from rptstg was only available at shutdown.

Why does increasing the number of threads reduce the CPU when idle?

The thread pool has logic to remove unused threads and shrink it to the coreThreads size. If the pool size is too small it has to create threads and delete threads according to the load. See here. The keepAlive mentioned at the top is how long a thread can be idle for, before it can be considered a candidate for deletion.

Summary

Monitor the CPU used when idle and see if increasing the threadpool to 300 helps.

The best way to save money, is not to spend it. The same is true for CPU.

I was trying to understand why a z/OSMF address space, using the Liberty web server (written in Java) was using a lot of CPU – when it was doing no work. I looked into the MVS system trace and saw some interesting behaviour. If you are trying to investigate a high CPU usage in a z/OS Job, I hope the following may help you with where you need to start looking.

If you are looking at a Java program, knowing there is a problem does not help you with what is causing the problem. There is Java, which uses C code, which uses USS services, which uses MVS services which is what you see in the system trace. The symptom is a long way from the source code. You might be able to correlate the time stamp in the system trace with the Java trace.

The ‘interesting’ behaviour…

  • Getmain/freemain storage requests. These are heavy weight requests for getting and freeing storage. Once warmed up, I would expect no storage requests.
  • “Storage Obtain”/”Storage Release” requests. These are medium weight requests for getting and freeing storage. Once warmed up, I would expect no storage requests.
  • Attach task and detach tasks, or pthread_create(). I would have expected tasks would have been attached at start up, and there would be no need for more tasks. I can see that under load more tasks may be required until the system stabilises.
  • Many timer pops a second. There were 50 time pops every second (one every 20 milliseconds). Is this efficient? No! It may be more efficient to have the duration between timer pops increase if there is no load, so a timer pop 10 times a second may be acceptable when the system is idle – and reset it to a short interval when the system is busy, or change the programming model to be wait-post rather than spinning the wheels.

I’ll discuss these in more detail below.

Storage requests, GETMAINs, FREEMAINs, STORAGE requests.

For a non trivial application such as a web server, MQ, DB2 etc, I would not expect to see any storage requests in the system trace once the system has warmed up. When I worked for MQ development, we went through the system trace, and every time we found a GETMAIN or STORAGE OBTAIN, we worked to eliminate it, until there were non left.

Use a stack

Instead of each subroutine using GETMAIN or “Storage request” to get a block of storage for its variables, I would expect a program stack to be used. For example the top level program for the thread allocates a 1MB block of storage and uses this as a stack. The top level program uses the first 2KB from this buffer. The first subroutine uses this buffer from 2KB for 3KB. If this subroutine calls another subroutine, the lower level subroutine uses this buffer from 5KB for 2KB. This is a very efficient way of managing storage and each subroutine needs only 10’s of instructions to get and release storage from the stack.

A problem can occur if the stack is not big enough, and there is logic like “If no space in the stack – then GETMAIN a block of storage”. If this happens the request quickly becomes expensive.

C (Language Environment) programs on z/OS can set the stack size, and when the system is shutdown, print out statistics on the stack usage.

Use the heap

A subroutine may need some “external storage”, which exist outside of the subroutine, for example store entries in a table for the life of the job. A heap (or heappool) is a very efficient way of managing the storage. If your program gets some storage, it does not return it, when the block has been finished with, the program “adds it to the heap” so it can be reused.

A simple heap example.

This might be an array of 3 pointers;

  • storage[0] is a chain of free 1KB blocks,
  • storage[1] is a chain of free blocks from 1K+1 bytes to 10KB,
  • storage[2] is a chain of free blocks from 10K+1 bytes to 50KB.

If your program needs a 512 byte block – it looks to see if there is a free block chained from storage[0], if not allocate a 1KB block (not 512 byte). When it has finished with the block, put it onto the storage[0] chain.
Over time the number of elements on each chain is sufficient to run the workload, and there should be no more storage requests. An increase in throughput may increase the demand for storage, and so during this “warm up” period, there may be more storage requests.

C run time statistics

For C programs on z/OS you can get the C runtime component to print out statistics on the stack and heap usage, and gives recommendations on the best size to specify.

In the //CEEOPTS data set you can specify the following

You may want to use HEAPPOOLS64(ALIGN…) and HEAPPOOLS(ALIGN…) when there are multiple threads so the blocks are hardware cache friendly, and you do not have two CPUs fighting for the same hardware cache data.

Ive blogged One Minute MVS – tuning stack and heap pools.

Smart programs

MVS can call exit programs, for example when an asynchronous event has happened, such as a timer has expired. These programs are expected to allocate storage for their variables ,do some work, give back the storage, and return. This can be very expensive – you have the cost of getting and freeing a block of storage just to set a few bits.

You may be able to write your exit program so it only uses registers, and does not need any virtual storage for variables. If this is not possible then consider passing a block of storage into the called program. For example the RACF Admin function

CALL IRRSEQ00 (Work_area,… )

Work_area: The name of a 1024-byte work area for SAF and RACF usage. The work area must be in the primary address space.

Example exit program

You could use the assembler macro STIMERM (Set TIMER). You specify the time interval, the address of the exit, and a user parameter. This user parameter is passed to the exit program when it gets control.

  • This could be a pointer to a WAIT ECB block,
  • or a pointer to a structure, one element in the structure is the WAIT ECB block, another element is the address of a block of storage the exit can use.

Attach task and detach task.

It is expensive to attach and detach tasks, so it is important to do it as little as possible. From a USS perspective the attach is from pthread_create.

A common design template to eliminate the attach/detach model is to have a pool of threads to do work.

  • A work request comes in, the dispatcher task gets a worker thread from the pool, and gives the work request to it. When the worker has finished it puts itself back in the pool.
  • If there was no worker thread available, check the configuration for the maximum number of threads, If this limit has not been reached, create a new worker thread.
  • If the was no worker thread available and the number of threads was at the limit, then wait until a worker thread is free.
  • Some thread pools have logic to shrink the pool if it gets too big. Without this logic a thread pool could be very large because it hit a peak usage weeks ago, and the pool has only been little used since.

Having a pool means that some of the expensive set up is done only once per thread, for example connect to DB2 or connect to MQ. You also avoid the expensive create (attach) of a thread, and delete (detach) of the thread. The application has logic like

  • Dispatching application attaches a new thread.
  • start thread
    • perform the expensive set up – for example connect to DB2 or connect to MQ
    • add task to the thread pool
    • do until told to shutdown
      • wait for work
      • do the work
    • end do
    • disconnect from DB2 or MQ
    • thread returns and is detached
Problems with the thread pool

One problem with using a thread pool is if the minimum pool size is too small. Smart thread pools have options like

  • lowest number of threads in thread pool
  • maximum number of thread thread pool
  • maximum idle time of a thread. If there are more threads than the lowest number of threads, and a thread has been idle for longer than this time then free the thread.

You can get the”thrashing” on a low usage system

  • The lowest number of threads is specified as 10 threads.
  • The main program needs 50 threads – it uses 10 from the pool, and allocates 40 new threads. These are added to the pool when the work has finished.
  • The clean-up process periodically checks the pool. If there are more threads than the lowest number, then purge ones which have been idle for more than the specified idle-time. 40 threads are purged
  • Repeat:
  • The main program needs 50 threads- it uses 10 from the pool, and allocates 40 new threads. These are added to the pool when the work has finished.
  • The clean-up process periodically checks the pool. If there are more threads than the lowest number, then purge ones which have been idle for more than the specified idle-time. 40 threads are purged

In this case there is a lot of attach/purge activity.

Making the pool size 50, or the maximum idle time very large will prevent this thrashing…

  • The lowest number of threads is specified as 50 threads.
  • The main program needs 50 threads – it uses 50 from the pool.
  • The clean-up process periodically checks the pool. The pool size is OK – do nothing.
  • Repeat:
  • The main program needs 50 threads- it uses 50 from the pool.
  • The clean-up process periodically checks the pool. The pool size is OK – do nothing

In this case the number of threads stays constant and you do not get the create/delete (attach/purge) of threads.

In one test this reduced the CPU time used when idling by more than 50 %.

Many timer pops a second

In my system trace I can see a task wakes up, it sets a timer for 20 milliseconds later, and suspends itself. This is very inefficient. This should be a wait-post model instead of an application in a loop, sleeping and checking something.

When investigating this you need to think about the speed of your box. Consider an application which just does

  • setting a timer to wake up in 10 milliseconds
  • it wakes up a thread which does nothing – but set a timer for 10 ms later (or 100 times a second)

On my slow box this could take me 1 ms of CPU to do this once, – or 100 ms of CPU for 100 times a second. One engine would be busy 10% of the time.

If I had a box which was 10 times faster and only took 0.1 ms of CPU to do the same work. For 100 iterations this would be 10 ms of CPU or 1% of an engine. To some people this is at the “noise level” and not worth looking at.

To you 1% CPU per second is “noise level”, to me the noise level of 10% CPU per second is a flashing red light, a loud klaxon and people in body armour running past.

Some of the mysteries of Java shared classes

When Java executes a program it read in the jar file, breaks it into the individual classes, converts the byte codes into instructions, and when executing it may replace instructions with more efficient instructions (Jitting). It can also convert the byte codes into instructions ahead of time, so called Ahead Of Time (AOT) compilation.

With shared classes, the converted byte codes, any Jitted code, and any AOT code can be saved in a data space.

  • When the java program runs a second time, it can reuse the data in the dataspace, avoid the overhead of the reading the jar file from the file system, and coverting the byte codes into instructions.
  • The data space can be hardened to a file, and restored to a data space, so can be used across system IPLs.

Using this, it reduced the start-up time of my program by over 20 seconds on my slow zPDT system. The default size of the cache is 16MB – one of my applications needed 100 MB, so most of the benefits of he shared classes could not be exploited if the defaults were used.

This blog post describes more information about this, and what tuning you can do.

Issuing commands to manage the shared classes cache

Commands to manage the shared classes cache are issued like

java -Xshareclasses:cacheDir=/tmp,name=client6,printStats

which can be done using JCL

// SET V=’listAllCaches’
// SET V=’printStats’
// SET C=’/tmp’
// SET N=’client6′
//S1 EXEC PGM=BPXBATCH,REGION=0M,
// PARM=’SH java -Xshareclasses:cacheDir=&C,name=&N,verbose,&V’
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*

Enabling share classes

You specify -Xsharedclasses information as a parameter to the program, for example in the command line or in a jvm properties file.

To use the shared classes capability you have to specify all of the parameters on one line, like

-Xshareclasses:verbose,name=client6,cacheDirPerm=0777,cacheDir=/tmp

Having it like

-Xshareclasses:name=client6,,cacheDirPerm=0777,cacheDir=/tmp
-Xshareclass:verbose

means the name, etc all take their defaults. Only shareclass:verbose would be used.

Changing share classes parameters

You can have more than one cache; you specify a name. You specify a directory were an image is stored when the cache is hardened to disk.

Some of the options like name= and cacheDir= are picked up when the JVM starts, Other parameters like cacheDirPerm are only used when the cache is (re-)created.

You can delete the cache in two ways.

Delete the cache from your your Java program

When you are playing around, you can add reset to the end of the -Xshareclasses string to caused the cache to be deleted and recreated.This gives output like

JVMSHRC010I Shared cache “client6” is destroyed
JVMSHRC158I Created shared class cache “client6”
JVMSHRC166I Attached to cache “client6”, size=20971328 bytes

This was especially useful when tuning the storage allocations.

Delete the cache independently

java -Xshareclasses:cacheDir=/tmp,name=client6,destroy

How to allocate the size of the cache

You specify the storage allocations using -Xsc.. (where sc stands for shareclasses)

If you have -Xsharedcache:verbose… specified then when the JVM shuts down you get

JVMSHRC168I Total shared class bytes read=11660. Total bytes stored=5815522
JVMSHRC818I Total unstored bytes due to the setting of shared cache soft max is 0.
Unstored AOT bytes due to the setting of -Xscmaxaot is 1139078.
Unstored JIT bytes due to the setting of -Xscmaxjitdata is 131832.

This shows the values of maxaot and maxjitdata are too small they were

-Xscmx20m
-Xscmaxaot2k
-Xscmaxjitdata2k

Whem the values were big enough I got

JVMSHRC168I Total shared class bytes read=12960204. Total bytes stored=8885038
JVMSHRC818I Total unstored bytes due to the setting of shared cache soft max is 0.
Unstored AOT bytes due to the setting of -Xscmaxaot is 0.
Unstored JIT bytes due to the setting of -Xscmaxjitdata is 0.

How big a cache do I need?

If you use -Xshareclasses:verbose… it will display messages

for example

JVMSHRC166I Attached to cache “client6”, size=2096960 bytes
JVMSHRC269I The system does not support memory page protection

JVMSHRC096I Shared cache “client6” is full. Use -Xscmx to set cache size.
JVMSHRC168I Total shared class bytes read=77208. Total bytes stored=2038042

Message JVMSHRC096I Shared cache “client6” is full. Use -Xscmx to set cache size, tells you the cache is full – but no information about how big it needs to be.

You can use

java -Xshareclasses:cacheDir=/tmp,name=client6,printStats

to display statistics like

-Xshareclasses persistent cache disabled]                                         
[-Xshareclasses verbose output enabled]                                            
JVMSHRC159I Opened shared class cache "client6"                                    
JVMSHRC166I Attached to cache "client6", size=2096960 bytes                        
JVMSHRC269I The system does not support memory page protection                     
JVMSHRC096I Shared cache "client6" is full. Use -Xscmx to set cache size.          
                                                                                   
Current statistics for cache "client6": 
cache size                           = 2096592                       
softmx bytes                         = 2096592                       
free bytes                           = 0                             
ROMClass bytes                       = 766804                        
AOT bytes                            = 6992                          
Reserved space for AOT bytes         = -1                            
Maximum space for AOT bytes          = 1048576                       
JIT data bytes                       = 212                           
Reserved space for JIT data bytes    = -1                            
Maximum space for JIT data bytes     = 1048576                       
Zip cache bytes                      = 1131864                       
Startup hint bytes                   = 0                             
Data bytes                           = 13904                         
Metadata bytes                       = 12976                         
Metadata % used                      = 0%                            
Class debug area size                = 163840                        
Class debug area used bytes          = 119194                        
Class debug area % used              = 72%

Cache is 100% full  
                                                                             

This show the cache is 100% full, and how much space is used for AOT and JIT. The default value of -Xscmx I had was almost 16MB. I made it 200MB and this was large enough.

I could not find a way of getting my program to issue printStats.

How do I harden the cache?

You can use use the

java -Xshareclasses:cacheDir=/tmp,name=zosmf,verbose,snapshotCache

command to create the cache on disk. Afterwards the listAllCaches command gave

Cache name level        cache-type     feature 
client6    Java8 64-bit non-persistent cr
client6    Java8 64-bit snapshot       cr

Showing the non persistent data space, and the snapshot file.

You can use the restoreFromSnapshot to restore from the file to the data cache; before you start your Java program. You would typically do this after an IPL.

How can I tell what is going on and if shared classes is being used?

The java options “-verbose:dynload,class

reports on the

  • dynamic loading of the files, and processing them,
  • what classes are being processed.

For example

<Loaded java/lang/reflect/AnnotatedElement from /Z24A/usr/lpp/java/J8.0_64/lib/rt.jar>
< Class size 3416; ROM size 2672; debug size 0>
< Read time 1196 usec; Load time 330 usec; Translate time 1541 usec>
class load: java/lang/reflect/AnnotatedElement from: /Z24A/usr/lpp/java/J8.0_64/lib/rt.jar
class load: java/lang/reflect/GenericDeclaration from: /Z24A/usr/lpp/java/J8.0_64/lib/rt.jar

dynload gave

<Loaded java/lang/reflect/AnnotatedElement from /Z24A/usr/lpp/java/J8.0_64/lib/rt.jar>
< Class size 3416; ROM size 2672; debug size 0>
< Read time 1196 usec; Load time 330 usec; Translate time 1541 usec>

this tells you a jar file was read from the file system, and how long it took to process it.

class gave

class load: java/lang/reflect/AnnotatedElement from: /Z24A/usr/lpp/java/J8.0_64/lib/rt.jar
class load: java/lang/reflect/GenericDeclaration from: /Z24A/usr/lpp/java/J8.0_64/lib/rt.jar

This shows two classe were extracted from the jar file.

In a perfect system you will get the class load entries, but not <Loaded….

Even when I had a very large cache size, I still got dynload entries. These tended to be loading class files rather than jar files.

For example there was a dynload entry for com/ibm/tcp/ipsec/CaApplicationProperties. This was file /usr/lpp/zosmf./installableApps/izuCA.ear/izuCA.war/WEB-INF/classes/com/ibm/tcp/ipsec/CaApplicationProperties.class

If you can make these into a .jar file you may get better performance. (But you may not get better performance, as it may take more time to load a large jar file).

I also noticed that there was dynload for com/ibm/xml/crypto/IBMXMLCryptoProvider which is in /Z24A/usr/lpp/java/J8.0_64/lib/ext/ibmxmlcrypto.jar, so shared classes has some deeper mysteries!

What happens if the .jar file changes?

As part of the class load, it checks the signature of the file on disk, matches the signature on the data space. If they are different the data space will be updated.

“Why were the options I passed to Java ignored” – or “how to tell what options were passed to my Java?”

I was struggling to understand a problem with shared classes in Java and I found the options being used by my program were not as I expected.

I thought it would be a very simple task to display at start up options used. It may be, but I could not find how to do it. If anyone knows the simple answer please tell me.

I found one way – take a dump! This seems a little extreme, but it was all I could find. With Liberty you can take a javacore dump (F IZUSVR1,JAVACORE) and display it, or you can take a dump at start up.

In the jvm.options I specified

-Xdump:java:events=vmstart

This gave me in //STDERR

JVMDUMP039I Processing dump event “vmstart”, detail “” at 2021/05/20 13:19:06 – please wait.
JVMDUMP032I JVM requested Java dump using ‘/S0W1/var/…/javacore.20210520.131906.65569.0001.txt’
JVMDUMP010I Java dump written to /S0W1/var…/javacore.20210520.131906.65569.0001.txt
JVMDUMP013I Processed dump event “vmstart”, detail “”.

In this file was list of all the options passed to the JVM

1CIUSERARGS UserArgs:
2CIUSERARG -Xoptionsfile=/usr/lpp/java/J8.0_64/lib/s390x/compressedrefs/options.default
2CIUSERARG -Xlockword:mode=default,noLockword=java/lang/String,noLockword=java/util/Ma
2CIUSERARG -Xjcl:jclse29
2CIUSERARG -Djava.home=/usr/lpp/java/J8.0_64
2CIUSERARG -Djava.ext.dirs=/usr/lpp/java/J8.0_64/lib/ext
2CIUSERARG -Xthr:tw=HEAVY
2CIUSERARG -Xshareclasses:name=liberty-%u,nonfatal,cacheDirPerm=1000,cacheDir=…
2CIUSERARG -XX:ShareClassesEnableBCI
2CIUSERARG -Xscmx60m
2CIUSERARG -Xscmaxaot4m
2CIUSERARG -Xdump:java:events=vmstart
2CIUSERARG -Xscminjitdata5m
2CIUSERARG -Xshareclasses:nonFatal
2CIUSERARG -Xshareclasses:groupAccess
2CIUSERARG -Xshareclasses:cacheDirPerm=0777
2CIUSERARG -Xshareclasses:cacheDir=/tmp,name=zosmf2
2CIUSERARG -Xshareclasses:verbose
2CIUSERARG -Xscmx100m

the storage limits

1CIUSERLIMITS User Limits (in bytes except for NOFILE and NPROC)
NULL ------------------------------------------------------------------------
NULL         type            soft limit  hard limit
2CIUSERLIMIT RLIMIT_AS       1831837696   unlimited
2CIUSERLIMIT RLIMIT_CORE        4194304     4194304
2CIUSERLIMIT RLIMIT_CPU       unlimited   unlimited
2CIUSERLIMIT RLIMIT_DATA      unlimited   unlimited
2CIUSERLIMIT RLIMIT_FSIZE     unlimited   unlimited
2CIUSERLIMIT RLIMIT_NOFILE        10000       10000
2CIUSERLIMIT RLIMIT_STACK     unlimited   unlimited
2CIUSERLIMIT RLIMIT_MEMLIMIT 4294967296  4294967296

and environment variables used.

1CIENVVARS Environment Variables
2CIENVVAR LOGNAME=IZUSVR
2CIENVVAR _CEE_ENVFILE_S=DD:STDENV
2CIENVVAR _EDC_ADD_ERRNO2=1
2CIENVVAR _EDC_PTHREAD_YIELD=-2
2CIENVVAR WLP_SKIP_MAXPERMSIZE=true
2CIENVVAR _BPX_SHAREAS=NO
2CIENVVAR JAVA_HOME=/usr/lpp/java/J8.0_64

All interesting stuff including the -X.. parameters. I could see that the parameters I had specified were not being picked up because they were specified higher up! Another face palm moment.

There was a lot more interesting stuff in the file, but this was not relevant to my little problems.

Once z/OSMF was active I took a dump using the f izusvr1,javacore command and looked at the information on the shared classes cache

1SCLTEXTCMST Cache Memory Status
1SCLTEXTCNTD Cache Name      Cache path
2SCLTEXTCMDT sharedcc_IZUSVR /tmp/javasharedresources/..._IZUSVR_G37
2SCLTEXTCPF Cache is 85% full
2SCLTEXTCSZ Cache size = 104857040
2SCLTEXTSMB Softmx bytes = 104857040
2SCLTEXTFRB Free bytes = 14936416

This is where I found the shared cache was not what I was expecting! I originally spotted that the cache was too small – and made it bigger.

Afterwards…

Remember to delete the javacore files.

I removed the -Xdump:java:events=vmstart statement, because I found it more convenient to use the f izusvr1,javacore command to take a dump when needed.