Using z/OSMF Network Configuration assistant for TCPIP, to define AT-TLS configuration

I initially found it hard to set up the AT-TLS configuration for MQ. The easiest way was to use the sample configurations provided by MQ. See here for an overview. I used Scenario 5 – Between an IBM MQ for z/OS queue manager and a client application running on IBM MQ for Multiplatforms.

Using the MQ samples, this took about 10 minutes once I had PAGENT and SYSLOGD set up.

I thought I would try to use the TCP provided facilities. There is a lot of documentation, but it is not easy to find what you need. It has been written as an IBM developer, rather than from an end user or task based perspective.

I then thought I would try to use the “way of the future” and the z/OS configuration tool z/OSMF. You use a browser to connect to z/OSMF and do your work through the browser interface. The z/OSMF interface has configuration tools, and workflow tools which guide you step by step through configuration.

I’ve blogged Using z/OSMF workflows for TCPIP. Using the workflow was not very successful.

The Network Configuration Assistance is used to configure the PAGENT, and I used it to define a AT-TLS configuration. Initially this was a struggle as there was no documentation to tell me how to do it. Once I had been through the configuration a couple of times, I found the correct path through the configuration progress and it is relatively painless.

My mission.

My mission was to configure AT-TLS and to provide two ports for use with MQ.

I wanted to do this using two people (me with two userids) and do the typical steps when changing systems, such as saving configurations before changing them, and deploying them, when I had a “change window”.

Using the Network configuration assistant (CA)

AT-TLS concepts

You need to be aware of the AT-TLS concepts when defining the configuration. From an administrator’s perspective:

  • What ports you want to protect? This is known by the CA as Traffic Descriptors. You can specify
    • An IP port
    • A range of IP ports
  • What IP addresses you want to protect.
    • The IP address. A TCP/IP stack can support different IP addresses. You can use a specific IP address. You can select on all IPV4, or all IPV6, or all addresses.
    • The name of a group of IP addresses. z/OSMF CA calls these Address Groups.
  • How do you want to protect the session. For example what levels of TLS, and what cipher specs do you want to use. This is known by the CA as Security levels.
  • The mapping of ports to protecting the session. z/OSMF calls this Requirement Maps.
  • You configure at the TCPIP stack level.
  • z/OSMF has groups of z/OS instances, with one or more z/OS instances, and you can have multiple TCPIP stacks in a z/OS instance.

Backing store

The configuration assistant(CA), stores configuration in a backing store. You can use tools to copy the current store. I found a copy of the file in /global/zosmf/data/app/CAV2R4/backingStore/save Data. I should be able to use standard backup procedures to keep a copy of the file. The resulting configuration is created in a file which is used by PAGENT.

You can copy a backing store within the CA, and so go back to the original one if you need to.

Before you start.

You should collect the information that you will be used to configure PAGENT. For example

  • Which systems and IP stacks will be used.
  • Which keyrings and certificates will be used?
  • For each port you want to protect.
  • What rules do you want, for example which cipher specs.

I found the terms used when creating the rules manually – did not map to the CA concepts, but once you understood the difference in terminology it was ok.

How to define the resources

If you define the configuration bottom up. You define all of the rules, then when you get to configure the TCPIP stack, the rules and other components should all be there, you just have to select them.

If you define the configuration top down, you define the TCPIP stack, then the rules for the stack. You have to define the TCPIP stack, then the rules, then go back and reconfigure the TCPIP stack to add the rules.

I think bottom up configuration is better while you gain experience of the tool. Once you are familiar with the tool then the top down approach will work ok, and may be how you update the configuration.

Getting started

  • Double click on the Network Configuration Assistant icon.
  • You get a page Welcome to V2R4 Configuration Assistant for z/OS Communications Server. On this page you can specify the name of the backing store. The pull down lists the existing backing stores. If you do not have a backing store create one. You can use “tools”button to copy the backing store.
  • The “getting started” gives you information on how to use the panels. I found it a little confusing at times. It displays the help in a separate window. In the Table of Contents, it has “AT/TLS – getting started”. I didn’t find the help or tutorials much use.
  • On the Welcome page, press Proceed.
    • I sometimes get “The backing store is locked by your id.” I got this after I had shutdown down z/OSMF without logging off.
    • You can use “Tools” to manage your backing store, and configuration.
    • “Select a TCP/IP technology to configure” : AT-TLS
    • The layout of the panels, make me think you create the definitions from top to bottom, and so the tabs are defined left to right. I think it is easier to define resources then create the group/image/stack.

Define the rules for which ports to be protected

In the page Network Configuration Assistant (Home) -> AT-TLS page, click on the Traffic Descriptors tab.

  • Actions -> New…
  • Name COLINTD
  • Actions-> New…
  • Under the Details tab, specify the port or port range and any other information
  • Under the KeyRing tab, specify the keyring and the Certificate label or let it default to the TCPIP stack level keyring.
  • Under the Advanced tab, I let everything default.
  • Click OK
  • You can define a second port within this Traffic Descriptor
  • Click OK

You can press Save to save the current definitions.

Define which IP addresses you want to protect (optional)

In the page Network Configuration Assistant (Home) -> AT-TLS page, click on the Address Groups tab.

By default it has

  • All_IPv4_Addresses
  • All_IPv6_Addresses
  • All_IP_Addresses

A TCPIP stack can host different IP addresses, one for each connector coming in. If you want to limit rules to particular stack owned IP addresses, create a definition.

  • Actions-> New
  • Name: COLINAG
  • IP Address: 10.1.1.2
  • IP Address: 10.1.1.3
  • OK

You can press Save to save the current definitions.

How do you want to protect the session.

For example what levels of TLS, and what cipher specs do you want to use.

In the page Network Configuration Assistant (Home) -> AT-TLS page, click on the Security Levels tab.

  • Actions: -> New…
  • Name: COLINSL
  • Select levels of TLS you want to use
  • Next, then select cipher specs. I used “Use 2019 suggested values”
  • Next – I took the default (“Do not generate FIPS 140 support“)
  • Click on Advanced settings.
    • If you want to use client authentication click the “Use client authentication” box
    • OK
  • Finish

Your definition should be in the list.

You can press Save to save the current definitions.

Mapping of ports to Session protection

In the page Network Configuration Assistant (Home) -> AT-TLS page, click on the Requirement Maps tab.

  • Actions: -> New…
  • Name: COLINMAP .
  • In the mappings table,
    • use the Traffic Descriptor pull down and select the Traffic Descriptor you created above. For example COLINTD.
    • Under Security Level pull down select the security definition you created above. For example COLINSL.
  • OK

If I changed an existing definition, I had a pop-up

Modify Requirement Map.
The requirement map you are changing may be referenced in at least one connectivity rule.

Prior to making this change you may want to see which connectivity rules are referencing this requirement map. Click OK to show where used. Click Proceed to proceed with the Modify; otherwise, click Cancel.

Click OK to show where it is used.

Click Proceed

You can press Save to save the current definitions.

Create the group, z/OS instance and TCPIP Stack

In the page Network Configuration Assistant (Home) -> AT-TLS page, click on the Systems tab.

  • Action: -> Add z/OS group…
  • Name: COLINGR
  • Click OK
  • Action: -> Add z/OS system image…
  • Name: COLMVSA
  • Keyring: START1.COLINRING
  • Press OK
  • I get a pop-up Proceed to the next step? Connectivity rules are configured for each TCP/IP stack. To continue with configuration you need to add a TCP/IP stack to the new z/OS system image. Do you want to add a TCP/IP stack now? Click on Proceed.
  • This gets to the Add TCP/IP stack
  • Name:TCPIP
  • OK
  • I get a pop-up. Proceed to the next Step? o continue with the configuration you should add connectivity rules to the TCP/IP stack. Do you want to be directed to the TCP/IP stack rules panel? Proceed.
    • If you cancel you can use the Actions -> rules to define the rules.
  • I get a pop-up Proceed to the Next Step? Do you want to start a wizard to create a connectivity rule? Click Proceed.
  • This gets to the Data End points where you associate the IP addresses to the stack instance.
  • Name: COLINRN
  • Select from the address group pull-down, or let it default.
  • Press Next
  • This gets to the Requirement Mapping to Stack association.
    • You can select from the existing requirements map see Mapping of ports to Session protection above, or create a new mapping.
    • You can create a new map, for example Name: COLINMP
      • Select from the Traffic Descriptor pull down
      • Select from the Security level pull down.
  • Press Next
  • You can specify advanced settings, such as Tracing, Tuning, Environment, Effective times, Handshake
  • Finish
  • Close

You can press Save to save the current definitions.

Join the bits up

In the page Network Configuration Assistant (Home) -> AT-TLS page, click on the Systems tab.

  • Select a group instance
  • Actions: Install All files for this group
  • This will list the configuration files.
  • On the Configured File Name,
    • right click on the file name value, and ->Show Configuration File. This will show you the configuration as it might be deployed.
    • right click on the file name value and -> Install … . Specify the a file name and click GO.
    • Close
  • You can use
    • Actions: Install to create the configuration file
    • Actions: Show configuration file to see the generated configuration

You can now use the configuration file as input to PAGENT.

You can press Save to save the current definitions.

Extending the configuration to add a new rule

It took a while to work out how to do this, but it is really easy.

In the page Network Configuration Assistant (Home) -> AT-TLS page, click on the Traffic Descriptors tab.

  • Create a new Traffic descriptor as above
  • Get back to Network Configuration Assistant (Home) -> AT-TLS page, and click on the Systems tab.
  • Select a TCPIP instance, and click Actions: -> Rules..
  • Actions: -> New
  • Connectivity rule name: rule2
  • Press Next
  • You can select from the existing requirements map see Mapping of ports to Session protection above, or create a new mapping. If you have just created a new rule, then you may not have defined a mapping, and create it “inline”.

Then install it.

Use the configuration

You need to change the PAGENT JCL to use the created configuration file. You may want to copy it to a backup, as the next time you reconfigure it can overwrite the file. Or just create a new file perhaps with a date in the filename.

If you have problems with a newly reconfigured file you need a copy of the previous, working, definitions.

Display the configuration

On many items, you can use right click -> Show where used. This will then display the group, image, stack, connectivity rules and data end points where the item is used.

Should I use this just to get started, or every time.

When I created my definitions by hand, I could put definitions in to a “Common” section, and have multiple TCPIP stacks in one configuration file. I could have small files with bits of configuration in them.

If you use the CA, “common” definitions are copied into the configuration file, and you have one configuration file per TCPIP stack instance, so you do not need to have a common section etc.

As a configuration tool, now I know how to use it, I might continue to use it – but it is slightly more complex than this.

I want to enable trace for one definition. To do so means I have to…

  • Change the configuration to set the trace. This can be difficult if someone else is in the middle of changing the configuration.
  • Deploy the whole configuration. You may pick up incomplete changes which have been made, but not deployed.
  • If a second TCPIP stack is using the configuration, this may get trace enabled if the configuration file is recreated.

Overall (my views may change tomorrow), I would use the CA to create my configuration. Then not use it again – or use it again to generate definitions which I can copy into my master configuration files. I would restructure the configuration so create small files with specific content.

Using z/OSMF workflows for TCPIP.

I found it hard to set up the AT-TLS configuration for MQ. The easiest way was to use the sample configurations provided by MQ. See here for an overview. I used Scenario 5 – Between an IBM MQ for z/OS queue manager and a client application running on IBM MQ for Multiplatforms.

This took about 10 minutes once I had PAGENT and SYSLOGD set up.

I thought I would try to use the TCP provided facilities. There is a lot of documentation, but it is not easy to find what you need. It has been written as an IBM developer, rather than from an end user perspective.

I then thought I would try to use the “way of the future” and the z/OS configuration tool z/OSMF. You use a browser to connect to z/OSMF and do your work through the browser interface. The z/OSMF interface has configuration tools, and workflow tools which guide you step by step through configuration.

I found using the workflow tools was harder than using the TCPIP documentation and TCPIP samples, and I would not recommend its use.

Ive blogged Using z/OSMF Network Configuration assistant for TCPIP, to define AT-TLS configuration. Which worked.

The workflow stuff makes the easy bit “easier”, but does not help with the hard stuff. An improvement would be to skip the workflow, and have one page of instructions saying copy samples into Proclib, and Unix; run a RACF job. We could do with a workflow to help configure syslogd, which I had a struggle to get working in a non trivial situation. For example having error messages for PAGENT go to one file, and have the TLS trace go into another file.

My mission.

My mission was to configure AT-TLS and to provide two ports for use with MQ.

I wanted to do this using two people (me with two userids) and do the typical steps when changing systems, such as saving configurations before changing them, and deploying them, when I had a “change window”.

Initial steps

z/OSMF provides facilities like ISPF, Workload management configuration, system status etc. I used Workflow.

It was hard to know where to start. I assumed (wrongly) that there would be a workflow to define the AT-TLS definitions.

It seems you use Workflow to define the PAGENT and syslogd JCL, and not for configuring the PAGENT or syslogd.

Instructions to use Workflow to configure TCPIP JCL procedures

  • Double click the workflow icon.
  • From the actions pull down, select Create workflow…
  • You need to select Workflow definition file: I could not find what I had to specify. There was no prompting. The “?” basically said “put a value here”. The help key just gave me a panel with information about using creating a workflow.
  • I found an IBM support document which says
    • Workflows for Policy-based Networking
    • ezb_pagent_setup_wizard.xml – This workflow provides the steps for setting up the Policy Agent (Pagent). Pagent is required for all of the policy-based networking technologies: IPSec, AT-TLS, IDS, PBR, and QoS. Pagent uses syslogd for logging.
    • ezb_syslogd_setup_wizard.xml – This workflow provides the steps for setting up syslogd.
    • ezb_tcpip_profile_sample_wizard.xml – This workflow provides a sample TCP/IP profile which contains common statements required to enable AT-TLS and IP Security, and additionally includes port reservation statements for running daemons.
  • I had to use the fully qualified filename /usr/lpp/zosmf/workflow/plugins/izuca/ezb_syslogd_setup_wizard.xml
  • This came up with an error in the workflow name because the default name has ‘z/OS… ‘ and ‘/’ is not a valid character. I removed the ‘/’.
  • At the bottom of the page you can Assign all steps to owner user id. I did not do this, and had to assign steps below
  • You get a list of steps that need to be done.
  • Assign the work to a userid
    • Select all of the steps, and use Actions-> Assignment and ownership -> Add assignees.
    • This displays the assigned roles. I used Actions -> add to add my SAF userid. I pressed OK and returned to the list of steps – all now assigned to me.
  • I selected the first step “define the “RACF userid for Syslogd”, Actions -> Accept .
  • Click on the task, and it gives you a window with tabs. The important tab is Perform. If this is greyed out, you have not accepted the task!
    • Fill in the details and click Next, Next etc. You can edit the contents.
    • You can save it – but you need to give a data set. It suggested SYS1(SYSLOGD). I had to change it (every time) to COLIN.ATTLS(…)
    • Next – gives you the save panel. You have to specify the dataset where you want to save it. The default was wrong for me.
    • Once saved you have to submit it manually, check the output, and edit the file if needed.
  • Back at the workflow details, it had step 1 complete (even though you may not have submitted it)
  • I accepted step 2 and started working on it.
    • It asks for Dataset HLQ – but I could not change it.
    • I stepped through the definitions – and had to type in my dataset again (why can’t it remember what I specified last time).
    • This step just creates a job with some RACF definitions in it.
  • I ran step 3 -again just creating a JCL member of definitions
  • Step 4 “Sample Syslogd Configuration Setup“. This just copies in a sample configuration.
    • “Save” did not do anything
  • Step 5 “Sample started procedure for Syslogd” creates a sample Procedure.
  • On the workflows page, it shows the workflow is 100% complete.

Having been through all of this, the create JCL did not run, one line in error was

//SYSLOGD PROC PROG=”,
// VARS=”,
// PARMS=”
//SYSLOGD EXEC PGM=&PROG., REGION=0K,TIME=NOLIMIT,
// PARM=(‘POSIX(ON) ALL31(ON)’,
// ‘ENVAR(“_CEE_ENVFILE=DD:VARS”)’,
// ‘/&PARMS.’)

  • &PROG had not been specified – you gave to go and find what you need to specified (SYSLOGD)
  • There is a blank after the &PROG., so the REGION=0K,TIME=NOLIMIT, is ignored
  • The location of the configuration (in &VARs) is not specified.

Create the PAGENT JCL

I followed the same process to create the PAGENT file.

I used file /usr/lpp/zosmf/workflow/plugins/izuca/ezb_pagent_setup_wizard.xml.

When this JCL ran, it produced messages

06/16 08:00:20 SYSERR :000: …plfm_config_medium_open: cannot open ‘/etc/pagent.conf’, errno EDC5129I No such file or directory.

You have to know to copy the configuration file from the PDS to /etc/pagent.conf.

Comments on using the workflows

This seems a lot of work to produce code which does not work. The process feels unloved. I am surprised that the problems I found have not been fixed – they are Unit Test level bugs.

I think it is far simpler to follow the documentation, for example to create the procedure. The documentation says

Update the cataloged procedure, syslogd, by copying the sample in SEZAINST(SYSLOGD) to your system or recognized PROCLIB. Specify syslogd parameters and change the data set names to suit your local configurtion See the syslog daemon section of SEZAINST(EZARACF) for SAF considerations for started procedures

The instructions could be on one side of paper, and would be quicker than using the workflow.

z/OSMF autostart: how to stop it, and how to use it (or not)

I upgraded my z/OS from ADCD Z24A to ADCD Z24C. This has updates to lots of the software, including z/OSMF. This includes some performance fixes, so z/OSMF start up is much quicker and uses much less CPU. However the newer level of ADCD Z24C now starts z/OSMF automatically. It took a few attempts to stop this.

When z/OS starts, it takes configuration parameters from IEASYSxx. You can see which IEASYSxx you are using with the DISPLAY IPLINFO operator command. You can see which IZU parameter you are using with

d iplinfo,izu
IEE255I SYSTEM PARAMETER ‘IZU’: AS

With the DISPLAY PARMLIB command, you get the parmlib concatenation

D PARMLIB
IEE251I 08.34.02 PARMLIB DISPLAY
PARMLIB DATA SETS SPECIFIED AT IPL
ENTRY FLAGS VOLUME DATA SET
    1   S   C4CFG1 USER.Z24C.PARMLIB
    2   S   C4CFG1 FEU.Z24C.PARMLIB
    3   S   C4SYS1 ADCD.Z24C.PARMLIB
    4   S   C4RES1 SYS1.PARMLIB

Where the ‘S’ means it came from a LOADxx parameter. A ‘D’ means Default SYS1.PARMLIB.

Look in each data set in turn for the IZUPRMxx member (xx=AS in my case).

Contents of the IZUPRMxx member

Within the member is SERVER_PROC(‘IZUSVR1’) This tells the IPL code which server to start.

Within the member is line with AUTOSTART(…). The value can be

  • CONNECT – I think of this as AUTOSTART(NO)
  • LOCAL – I think of this as AUTOSTART(MAYBE)

See here for a discussion.

It is a bit more complex than YES|NO. It has capability to allow one of a group of z/OSMF servers to start.

If you have AUTOSTART(CONNECT) specify AUTOSTART_GROUP(NONE).

If you have AUTOSTART(LOCAL) and AUTOSTART_GROUP(COLIN) for more than one IZU servers. Then at IPL it checks to see if a Z/OSM server with AUTOSTART(LOCAL) and AUTOSTART_GROUP(COLIN) is already active. If so – the instance does not start.

The documentation says it checks by having an ENQ on the file system with the AUTOSTART_GROUP value. This implies you need the z/OSMF data directories to be on the same ZFS file system.

Should I use autostart?

This is a tough question. I cannot test it because I only have one LPAR, but I have some thoughts.

Single LPAR, single Z/OSMF instance

This is relatively easy. You can start z/OSMF automatically though commands at IPL, or you can use the z/OSMF IZUPRMxx method, or start it manually.

Multiple LPARs in a sysplex, single Z/OSMF instance.

If you have a shared file system, you can start the z/OSMF instance on any LPAR. If you start the instance more than once, it detects this and will only allow one instance to be active.

You have to plan to be able to starting an instance on different systems. For example the IP address and port for the base system will be different. You’ll need to set up a TCP/IP environment to support this. See HA Liberty web server – introduction to using VIPA to provide high availability connectivity and the z/OSMF documentation

Multiple LPARs in a sysplex, multiple z/OSMF instances.

This is where the autostart may be useful. The first LPAR to be started will start the z/OSMF instance. When other LPARs start, they detect that another z/OSMF in the group is active, and will not start the z/OSMF instance. As with starting a single z/OSMF instance in a multi LPAR environment, you need to plan the connectivity. See HA Liberty web server – introduction to using VIPA to provide high availability connectivity and the z/OSMF documentation.

I struggle to see why starting just one instance is useful. For availability I would want more than once instances to be running at the same time. With only one instance. If you stop it, and restart on a different LPAR, you have a period of a minute or more where you do not have z/OSMF running.

I would have a group_token, so each instance can register the “group name” is active. An application can then ask to be notified when a member of the group becomes active, using standard z/OS services.

Stateless z/OSMF instances

If you are using z/OSMF facilities which save state, the autostart of just one server will not work. For example if you are using any workflow facilities, state is saved in the file system. You need to logon to the same instance to be able to continue working on the workflow. If today you run on LPARA’s z/OSMF and tomorrow you run on LPARB’s z/OSMF you cannot do your workflow.

You need to plan your z/OSMF usage and plan to have “stateless” z/OSMF servers which can use AUTOSTART; and workflow servers – for which you have only one instance (which can be moved around) and do not use autostart.

One Minute MVS – tuning stack and heap pools

These days many applications use a stack and heap to manage storage used by an application. For C and Cobol programs on z/OS these use the C run time facilities. As Java uses the C run time facilities, it also uses the stack and heap.

If the stack and heap are not configured appropriately it can lead to an increase in CPU. With the introduction of 64 bit storage, tuning the heap pools and stack is no longer critical. You used to have to carefully manage the stack and heap pool sizes so you didn’t run out of storage.

The 5 second information on what to check, is the number of segments freed for the stack and heap should be zero. If the value is large then a lot of CPU is being used to manage the storage.

The topics are

Kinder garden background to stack.

When a C (main) program starts, it needs storage for the variables uses in the program. For example

int i;
for (ii=0;ii<3:ii++)
{}

char * p = malloc(1024);

The variables ii and p are variables within the function, and will be on the functions stack. p is a pointer.

The block of storage from the malloc(1024) will be obtained from the heap, and its address stored in p.

When the main program calls a function the function needs storage for the variables it uses. This can be done in several ways

  1. Each function uses a z/OS GETMAIN request on entry, to allocate storage, and a z/OS FREEMAIN request on exit. These storage requests are expensive.
  2. The main program has a block of storage which functions can use. For the main program uses bytes 0 to 1500 of this block, and the first function needs 500 bytes, so uses bytes 1501 to 2000. If this function calls another function, the lower level function uses storage from 2001 on wards. This is what usually happens, it is very efficient, and is known as a “stack”.

Intermediate level for stack

It starts to get interesting when initial block of storage allocated in the main program is not big enough.

There are several approaches to take when this occurs

  1. Each function does a storage GETMAIN on entry, and FREEMAIN on exit. This is expensive.
  2. Allocate another big block of storage, so successive functions now use this block, just like in the kinder garden case. When functions return to the one that caused a new block to be allocated,
    1. this new block is freed. This is not as expensive as the previous case.
    2. this block is retained, and stored for future requests. This is the cheapest case. However a large block has been allocated, and may never be used again.

How big a block should it allocate?

When using a stack, the size of the block to allocate is the larger of the user specified size, and the size required for the function. If the specified secondary size was 16KB, and a function needs 20KB of storage, then it will allocate at least 20KB.

How do I get the statistics?

For your C programs you can specify options in the #PRAGMA statement or, the easier way, is to specify it through JCL. You specify C run time options through //CEEOPTS … For example

//CEEOPTS DD *
STACK(2K,12K,ANYWHERE,FREE,2K,2K)
RPTSTG(ON)

Where

  • STACK(…) is the size of the stack
  • RPTSTG(ON) says collect and display statistics.

There is a small overhead in collecting the data.

The output is like:

STACK statistics:                                                
  Initial size:                                2048     
  Increment size:                             12288     
  Maximum used by all concurrent threads:  16218808     
  Largest used by any thread:              16218808     
  Number of segments allocated:                2004     
  Number of segments freed:                    2002     

Interpreting the stack statistics

From the above data

  • This shows the initial stack size was 2KB and an increment of 12KB.
  • The stack was extended 2004 times.
  • Because the statement had STACK(2K,12K,ANYWHERE,FREE,2K,2K), when the secondary extension became free it was FREEMAINed back to z/OS.

When KEEP was used instead of FREE, the storage was not returned back to z/OS.

The statistics looked like

STACK statistics:                                                
  Initial size:                               2048     
  Increment size:                            12288     
  Maximum used by all concurrent thread:  16218808     
  Largest used by any thread:             16218808     
  Number of segments allocated:               1003     
  Number of segments freed:                      0     

What to check for and what to set

For most systems, the key setting is KEEP, so that freed blocks are not released. You can see this a) from the definition b) Number of segments freed is 0.

If a request to allocate a new segment fails, then the C run time can try releasing segments that are not in use. If this happens the “”segments freed” will be incremented.

Check that the “segments freed” is zero, and if not, investigate why not.

When a program is running for a long time, a small number of “segments allocated” is not a problem.

Make the initial size larger, closer to the “Largest used of any thread” may improve the storage utilisation. With smaller segments there is likely to be unused space, which was too small for a functions request, causing the next segment to be used. So a better definition would be

STACK(16M,12K,ANYWHERE,KEEP,2K,2K)

Which gave

STACK statistics:                                                          
  Initial size:                                     16777216               
  Increment size:                                      12288               
  Maximum used by all concurrent threads:           16193752               
  Largest used by any thread:                       16193752               
  Number of segments allocated:                            1               
  Number of segments freed:                                0               

Which shows that just one segment was allocated.


Kinder garden background to heap

When there is a malloc() request in C, or a new … in Java, the storage may exist outside of the function. The storage is obtained from the heap.

The heap has blocks of storage which can be reused. The blocks may all be of the same size, or or different sizes. It uses CPU time to scan free blocks looking for the best one to reuse. With more blocks it can use increasing amounts of CPU.

There are heap pools which avoids the costs of searching for the “right” block. It uses a pools of blocks. For example:

  1. there is a heap pool with 1KB fixed size blocks
  2. there is another heap pool with 16KB blocks
  3. there is another heap pool with 256 KB blocks.

If there is a malloc request for 600 bytes, a block will be taken from the 1KB heap pool.

If there is a malloc request for 32KB, a block would be used from the 256KB pool.

If there is a malloc request for 512KB, it will issue a GETMAIN request.

Intermediate level for heap

If there is a request for a block of heap storage, and there is no free storage, a large segment of storage can be obtained, and divided up into blocks for the stack. If the heap has 1KB blocks, and a request for another block fails, it may issue a GETMAIN request for 100 * 1KB and then add 100 blocks of 1KB to the heap. As storage is freed, the blocks are added to the free list in the heap pool.

There is the same logic as for the stack, about returning storage.

  1. If KEEP is specified, then any storage that is released, stays in the thread pool. This is the cheapest solution.
  2. If FREE is specified, then when all the blocks in an additional segment have been freed, then free the segment back to the z/OS. This is more expensive than KEEP, as you may get frequent GETMAIN and FREEMAIN requests.

How many heap pools do I need and of what size blocks?

There is usually a range of block sizes used in a heap. The C run time supports up to 12 cell sizes. Using a Liberty Web server, there was a range of storage requests, from under 8 bytes to 64KB.

With most requests there will frequently be space wasted. If you want a block which is 16 bytes long, but the pool with the smallest block size is 1KB – most of the storage is wasted.
The C run time gives you suggestions on the configuration of the heap pools, the initial size of the pool and the size of the blocks in the pool.

Defining a heap pool

How to define a heap pool is defined here.

You specify the size of overall size of storage in the heap using the HEAP statement. For example for a 16MB total heap size.

HEAP(16M,32768,ANYWHERE,FREE,8192,4096)

You then specify the pool sizes


HEAPPOOL(ON,32,1,64,2,128,4,256,1,1024,7,4096,1,0)

The figures in bold are the size of the blocks in the pool.

  • 32,1 says maximum size of blocks in the pool is 32 bytes, allocate 1% of the heap size to this pool
  • 64,2 says maximum size of blocks in the pool is 64 bytes, allocate 2% of the heap size to this pool
  • 128,4 says maximum size of blocks in the pool is 128 bytes, allocate 4% of the heap size to this pool
  • 256,1 says maximum size of blocks in the pool is 256 bytes, allocate 1% of the heap size to this pool
  • 1024,7 says maximum size of blocks in the pool is 1024 bytes, allocate 7% of the heap size to this pool
  • 4096,1 says maximum size of blocks in the pool is 4096 bytes, allocate 1% of the heap size to this pool
  • 0 says end of definition.

Note, the percentages do not have to add up to 100%.

For example, with the CEEOPTS

HEAP(16M,32768,ANYWHERE,FREE,8192,4096)
HEAPPOOLS(ON,32,50,64,1,128,1,256,1,1024,7,4096,1,0)

After running my application, the data in //SYSOUT is


HEAPPOOLS Summary:                                                         
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In 
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use      
  ------------------------------------------------------------------------ 
       32        40    50      209715           0           0           0 
       64        72      1        2330           1        1002           2 
      128       136      1        1233           0           0           0 
      256       264      1         635           0           0           0 
     1024      1032      7        1137           1           2           0 
     4096      4104      1          40           1           1           1 
  ------------------------------------------------------------------------ 

For the cell size of 32, 50% of the pool was allocated to it,

Each block has a header, and the total size of the 32 byte block is 40 bytes. The number of 40 bytes units in 50% of 16 MB is 8MB/40 = 209715, so these figures match up.

(Note with 64 bit heap pools, you just specify the absolute number you want – not a percentage of anything).

Within the program there was a loop doing malloc(50). This uses cell pool with size 64 bytes. 1002 blocks(cells) were used.

The output also has

Suggested Percentages for current Cell Sizes:
HEAPP(ON,32,1,64,1,128,1,256,1,1024,1,4096,1,0)


Suggested Cell Sizes:
HEAPP(ON,56,,280,,848,,2080,,4096,,0)

I found this confusing and not well documented. It is another of the topics that once you understand it it make sense.

Suggested Percentages for current Cell Sizes

The first “suggested… ” values are the suggestions for the size of the pools if you do not change the size of the cells.

I had specified 50% for the 32 byte cell pool. As this cell pool was not used ( 0 allocated cells) then it suggests making this as 1%, so the suggestion is HEAPP(ON,32,1

You could cut and paste this into you //CEEOPTS statement.

Suggested Cell Sizes

The C run times has a profile of all the sizes of blocks used, and has suggested some better cell sizes. For example as I had no requests for storage less than 32 bytes, making it bigger makes sense. For optimum storage usage, it suggests of using sizes of 56, 280,848,2080,4096 bytes.

Note it does not give suggested number of blocks. I think this is poor design. Because it knows the profile it could have an attempt at specifying the numbers.

If you want to try this definition, you need to add some values such as

HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

Then rerun your program, and see what percentage figures it recommends, update the figures, and test again. Not the easiest way of working.

What to check for and what to set

There can be two heap pools. One for 64 bit storage ( HEAPPOOL64) the other for 31 bit storage (HEAPPOOL).

The default configuration should be “KEEP”, so any storage obtained is kept and not freed. This saves the cost of expensive GETMAINS and FREEMAINs.

If the address space is constrained for storage, the C run time can go round each heap pool and free up segments which are in use.

The value “Number of segments freed” for each heap should be 0. If not, find out why (has the pool been specified incorrectly, or was there a storage shortage).

You can specify how big each pool is

  • for HEAPPOOL the HEAP size, and the percentage to be allocated to each pool – so two numbers to change
  • for HEAPPOOL64 you specify the size of each pool directly.

The sizes you specify are not that sensitive, as the pools will grow to meet the demand. Allocating one large block is cheaper that allocating 50 smaller blocks – but for a server, this different can be ignored.

With a 4MB heap specified

HEAP(4M,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

the heap report was

 HEAPPOOLS Summary: 
   Specified Element   Extent   Cells Per  Extents    Maximum      Cells In 
   Cell Size Size      Percent  Extent     Allocated  Cells Used   Use 
   ------------------------------------------------------------------------ 
        56        64      1         655           2        1002           2 
       280       288      1         145           1           1           0 
       848       856      1          48           1           1           0 
      2080      2088      1          20           1           1           1 
      4096      4104      1          10           0           0           0 
   ------------------------------------------------------------------------ 
   Suggested Percentages for current Cell Sizes: 
     HEAPP(ON,56,2,280,1,848,1,2080,1,4096,1,0) 

With a small(16KB) heap specified

HEAP(16K,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)

The output was

HEAPPOOLS Summary:                                                            
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In    
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use         
  ------------------------------------------------------------------------    
       56        64      1           4         251        1002           2    
      280       288      1           4           1           1           0    
      848       856      1           4           1           1           0    
     2080      2088      1           4           1           1           1    
     4096      4104      1           4           0           0           0    
  ------------------------------------------------------------------------    
  Suggested Percentages for current Cell Sizes:                               
    HEAPP(ON,56,90,280,2,848,6,2080,13,4096,1,0)                             

and we can see it had to allocate 251 extents for all the request.

Once the system has “warmed up” there should not be a major difference in performance. I would allocate the heap to be big enough to start with, and avoid extensions.

With the C run time there are heaps as well as heap pools. My C run time report gave

64bit User HEAP statistics:
31bit User HEAP statistics:
24bit User HEAP statistics:
64bit Library HEAP statistics:
31bit Library HEAP statistics:
24bit Library HEAP statistics:
64bit I/O HEAP statistics:
31bit I/O HEAP statistics:
24bit I/O HEAP statistics:

You should check all of these and make the initial size the same as the suggested recommended size. This way the storage will be allocated at startup, and you avoid problems of a request to expand the heap failing due to lack of storage during a buys period.

Advanced level for heap

While the above discussion is suitable for many workloads, especially if they are single threaded. It can get more complex when there are multiple thread using the heappools.

If you have a “hot” or highly active pool you can get contention when obtaining and releasing blocks from the heap pool. You can define multiple pools for an element size. For example

HEAPP(ON,(56,4),1,280,1,848,1,2080,1,4096,1,0)

The (56,4) says make 4 pools with block size of 56 bytes.

The output has

HEAPPOOLS Summary:                                                          
  Specified Element   Extent   Cells Per  Extents    Maximum      Cells In  
  Cell Size Size      Percent  Extent     Allocated  Cells Used   Use       
  ------------------------------------------------------------------------  
       56       64     1           4         251        1002           2  
       56       64     1           4           0           0           0  
       56       64     1           4           0           0           0  
       56       64     1           4           0           0           0  
      280       288      1           4           1           1           0  
      848       856      1           4           1           1           0  
     2080      2088      1           4           1           1           1  
     4096      4104      1           4           0           0           0  
  ------------------------------------------------------------------------  

We can see there are now 4 pools with cell size of 56 bytes. The documentation says Multiple pools are allocated with the same cell size and a portion of the threads are assigned to allocate cells out of each of the pools.

If you have 16 threads you might expect 4 threads to be allocated to each pool.

How do you know if you have a “hot” pool.

You cannot tell from the summary, as you just get the maximum cells used.

In the report is the count of requests for different storage ranges.

Pool  2     size:   160 Get Requests:           777707 
  Successful Get Heap requests:    81-   88                 77934 
  Successful Get Heap requests:    89-   96                 59912 
  Successful Get Heap requests:    97-  104                 47233 
  Successful Get Heap requests:   105-  112                 60263 
  Successful Get Heap requests:   113-  120                 80064 
  Successful Get Heap requests:   121-  128                302815 
  Successful Get Heap requests:   129-  136                 59762 
  Successful Get Heap requests:   137-  144                 43744 
  Successful Get Heap requests:   145-  152                 17307 
  Successful Get Heap requests:   153-  160                 28673
Pool  3     size:   288 Get Requests:            65642  

I used ISPF edit, to process the report.

By extracting the records with size: you get the count of requests per pool.

Pool  1     size:    80 Get Requests:           462187 
Pool  2     size:   160 Get Requests:           777707 
Pool  3     size:   288 Get Requests:            65642 
Pool  4     size:   792 Get Requests:            18293 
Pool  5     size:  1520 Get Requests:            23861 
Pool  6     size:  2728 Get Requests:            11677 
Pool  7     size:  4400 Get Requests:            48943 
Pool  8     size:  8360 Get Requests:            18646 
Pool  9     size: 14376 Get Requests:             1916 
Pool 10     size: 24120 Get Requests:             1961 
Pool 11     size: 37880 Get Requests:             4833 
Pool 12     size: 65536 Get Requests:              716 
Requests greater than the largest cell size:               1652 

It might be worth splitting Pool 2 and seeing if makes a difference in CPU usage at peak time. If it has a benefit, try Pool 1.

You can also sort the “Successful Heap requests” count, and see what range has the most requests. I don’t know what you would use this information for, unless you were investigating why so much storage was being used.

Ph D level for heap

For high use application on boxes with many CPUs you can get contention for storage at the hardware cache level.

Before a CPU can use storage, it has to get the 256 byte cache line into the processor cache. If two CPU’s are fighting for storage in the same 256 bytes the throughput goes down.

By specifying

HEAPP(ALIGN….

It ensures each block is isolated in its own cache line. This can lead to an increase in virtual storage, but you should get improved throughput at the high end. It may make very little difference when there is little load, or on an LPAR with few engines.

I’m sorry I haven’t a clue…

As well as being a very popular British comedy, it is how I sometimes feel about what is happening inside the Liberty Web servers, and products like z/OSMF, z/OS Connect and MQWEB. It feels like a spacecraft in cartoons – there are usually only two controls – start and stop.

One reason for this is that the developers often do not have to use the product in production, and have not sat there, head in hand saying “what is going on ?”.

In this post I’ll cover

What data values to expose

As a concept, if you give someone a lever to pull – you need to give them a way of showing the effect of pulling the level.

If you give someone a tuning parameter, they need to know the impact of using the tuning parameter. For example

  • you implement a pool of blocks of storage.
  • you can configure the number of maximum number of blocks
  • if a thread needs some storage, and there is a free block in the pool, then assign the block to the thread. When the thread has finished with it, the thread goes back into the pool.
  • if all the blocks in the pool are in-use, allocate a block. When the thread has finished with the block – free it.
  • if you specify a very large number of blocks it could cause a storage shortage

The big questions with this example is “how big do you make the pool”?

To be able to specify the correct pool size you need to know information like

  • What was the maximum number of blocks used – in total
  • How many times were additional blocks allocated (and freed)
  • What was the total number of blocks requested.

You might decide that the pool is big enough if less than1% of requests had to allocate a block.

If you find that the maximum value used was 1% of the size of the pool, you can make the pool much smaller.

If you find that 99% of the requests were allocated/freed, this indicates the pool is much to small and you need to increase the size.

For other areas you could display

  • The number of authentication requests that were userid+ password, or were from a certificate.
  • The number of authentication requests which failed.
  • The list of userid names in the userid cache.
  • How many times each application was invoked.
  • The number of times a thread had to wait for a resource.
  • The elapsed time waiting for a resource, and what the resource was.

What attributes to expose

You look at the data to ask

  • Do I have a problem now?
  • Will I have a problem in the future? You need to collect information over time and look at trends.
  • When we had a problem yesterday, did this component contribute to it? You need to have historical data.

It is not obvious what data attributes you should display.

  • The “value now” is is easy to understand.
  • The “average value” is harder. Is this from the start of the application (6 months ago), or a weighted average (99 * previous average + current value)/100. With this weighted average, a change since the previous value indicates the trend.
  • The maximum value is hard – from when? There may have been a peak at startup, and small peaks since then will not show up. Having a “reset command” can be useful, or have it reset on a timer – such as display and reset every 10 minutes.
  • If you “reset” the values and display the value before any activity, what do you display? “0”s for all of the values, or the values when the reset command was issued.

Resetting values can make it easier to understand the data. Comparing two 8 digit numbers is much harder than comparing two 2 digit numbers.

How to expose data

Java has a Java Management eXtension (JMX) for reporting management information. It looks very well designed, is easy to use, and very compact! There is an extensive document from Oracle here.

I found Basic Introduction to JMX by Baeldung , was an excellent article with code samples on GitHub. I got these working in Eclipse within an hour!

The principal behind JMX is …

For each field you want to expose you have a get… method.

You define an interface with name class| |”MBean” which defines all of the methods for displaying the data.

public interface myClassMBean {
public String getOwner();
public int getMaxSize();
}

You define the class and the methods to expose the data.

public class myClass implements myClassMBean{

// and the methods to expose the data

public String getOwner() {
return fileOwner;
}

public int getMaxSize() {
return fileSize;
}

}

And you tell JMX to implement it

myClass myClassInstance = new myClass(); // create the instance of myClass

MBeanServer server = ManagementFactory.getPlatformMBeanServer();
ObjectName objectName =….
server.registerMBean(myClassInstance, objectName);

Where myClassInstance is a class instance. The JMX code extracts the name of the class from the object, and can the identify all the methods defined in the class||”MBean” interface. Tools like jconsole can then query these methods, and invoke them.

ObjectName is an object like

ObjectName objectName = new ObjectName(“ColinJava:type=files,name=onefile”);

Where “ColinJava” is a high level element, “type” is a category, and “name” is the description of the instance .

That’s it.

When you use jconsole ( or other tools) to display it you get

You could have

MBeanServer server = ManagementFactory.getPlatformMBeanServer();

ObjectName bigPoolName = new ObjectName(“ColinJava:type=threadpool,name=BigPool”);
server.registerMBean(bigpoolInstance, bigPoolName);

ObjectName medPoolName = new ObjectName(“ColinJava:type=threadpool,name=MedPool”);
server.registerMBean(medpoolInstance, medPoolname);

ObjectName smPoolName = new ObjectName(“ColinJava:type=threadpool,name=SmallPool”);
server.registerMBean(smallpoolInstance,smPoolName);

This would display the stats data for three pools

  • ColinJava
    • threadpool
      • Bigpool..
      • MedPool….
      • SmallPool…

And so build up a tree like

  • ColinJava
    • threadpool
      • Bigpool..
      • MedPool….
      • SmallPool…
    • Userids
      • Userid+password
      • Certificate
    • Applications
      • Application 1
      • Application 2
    • Errors
      • Applications
      • Authentication

You can also have set…() methods to set values, but you need to be more careful; checking authorities, and possibly synchronising updates with other concurrent activity.

You can also have methods like resetStats() which show up within jconsole as Operations.

How do I build up the list of what is needed?

It is easy to expose data values which have little value. I remember MQ had a field in the statistics “Number of times the hash table changed”. I never found a use for this. Other times I thought “If only we had a count of ……”

You can collect information from problems reported to you. “It was hard to diagnose because… if we had the count of … the end user could have fixed it without calling us”.

Your performance team is another good source of candidates fields. Part of the performance team’s job is to identify statistics to make it easier to tune the system, and reduce the resources used. It is not just about identifying hot spots.

Before you implement the collection of data, you could present to your team on how the data will be used, and produce some typical graphs. You should get some good feedback, even if it is “I dont understand it”.

What can I use to display the data

There are several ways of displaying the data.

  • jconsole – which comes as part of Java can display the data in a window
  • python – you can issue a query can capture the data. I have this set up to capture the data every 10 seconds
  • other tools using the standard interfaces.

Have a good REST and save a fortune in CPU

The REST protocol is a common programming model with the internet. It is basically a one shot model, which scales and has high availability, but can have a very high CPU cost. There are things you can do to reduce the CPU cost. Also, the MQWeb server, has implemented some changes to reduce the cost. See here for the MQ documentation.

The post gives some guidance on reducing the costs, for Liberty based servers.

The traditional model and the REST model

The traditional application model may have a client and a flow to the server

  • Connect to the server and authenticate
  • Debit my account by £500 within syncpoint
  • Credit your account by £500 within syncpoint
  • Commit the transaction
  • Do the next transaction etc
  • At the end of the day, disconnect from the server.

The REST model would be

  • Connect to the server and authenticate and do (Debit my account by £500, credit your account by £500), disconnect

This model has the advantage that it scales. When you initiate a transaction it can go to any one of the available back-end servers. This spreads the load and improves availability.

With the traditional model, the clients connects any available server at the start of day stays connected all day. If a new server becomes available during the day, it may get no workload.

The downside of the REST model is the cost. Establishing a connection and authenticating can be very expensive. I could not find much useful documentation on how to reduce these costs.

There are two parts of getting a REST connection.

  • Establishing the connection
  • Authentication

Establishing the connection

You can have each REST request use a new session for every REST request each of which which involves a full TLS handshake. Two requests could go to different servers, or go to the same server.

You can issue multiple REST request over the same session, to the same backend server.

On my little z/OS, using z/OSMF it takes

  • about 1 second to create a new connection and issue a request and terminate
  • about 0.1 seconds to use the shared session, per REST request.

Establishing the TLS session is expensive, as there is a lot of computation to generate the keys.

For MQWEB, the results are very similar.

Authentication

Once the session has been established each REST request requires authentication.

If you are using userid and password, the values are checked with z/OS.

If you are using client certificate authentication the Subject DN is looked up in the security manager, and if there is a DN to userid mapping, the userid is returned.

Once you have a valid userid, the userid’s access can be obtained from the security manager.

All of these values can be cached in the Liberty web server. So the first time a certificate or userid is used, it will take a longer than successive times.

Information about authentication is then encrypted and passed back in the REST response as the LtpaToken2 cookie.

If a REST request passes the cookie back to the server, then the information in the cookie is used by the server, and fewer checks need to be done.

This cookie can expire, and when it does expire the userid and password, or the certificate DN is checked as before, and the cookie will be updated.

If you do not send the LtpaToken2 cookie, this will cause the passed authentication information to be revalidated. If you want to change userid, do not send the the cookie.

Is any of this documented?

There is not a lot of documentation. There is information Configuring the authentication cache in Liberty.

There is a parameter javax.net.ssl.sessionCacheSize. If this is not set the default is 20480.

Running a python rest application on z/OS

I installed Python and co-req packaged on my z/OS system, described here. I wanted to run a REST workload into z/OSMF. I could have used, Liberty, z/OS Connect or MQWEB as the backend.

It makes use of the python requests package.

Initial script

#!/usr/bin/env python3 
import requests 
from timeit import default_timer as timer 
import urllib3 
my_header = { 
  'Connection': 'keep-alive', 
  'Content-Type': 'application/json', 
  'Cache-Control': 'max-age=0', 
  'Authorization': 'Basic Y395aW46cGFu67GhlMG4=', 
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
  'Accept-Language': 'en-GB,en;q=0.5', 
  'DNT': '1', 
  'Connection': 'keep-alive', 
  'Upgrade-Insecure-Requests': '1', 
  'Cache-Control': 'max-age=0' 
}
urllib3.disable_warnings() 
geturl ="https://10.1.1.2:29443/zosmf/rest/mvssubs?ssid=IZUG" 
jar = requests.cookies.RequestsCookieJar() 
start=timer() 
res = requests.get(geturl,headers=my_header,verify=False,cookies=jar) 
end=timer() 
print("duration=",end-start) 
if res.status_code != 200: 
  print(res.status_code) 
print("Output=",res.text) 
jar=  res.cookies  

Comments on the python script

  • Authorization’: ‘Basic Y395aW46cGFu67GhlMG4=’ is the 64 bit encoding of the userid and password. Which is trivial(!) to decode.
  • urllib3.disable_warnings() Without this set you get a message InsecureRequestWarning: Unverified HTTPS request is being made to host ‘10.1.1.2’. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings. This is because the certificate sent down from the server has not been validated.
  • jar= res.cookies says save the cookies into the jar dictionary, for future use

The output was

duration= 1.210928201675415
output= {“items”:[
{“subsys”:”IZUG”, “active”:true, “dynamic”:true, “funcs”:[10]}
],”JSONversion”:1}

Verifying TLS certificate

With urllib3.disable_warnings() present it will cause error warnings to be suppressed.
When this statement is not present, there will be warnings about certificate problems.

In the statement res = requests.get(geturl,headers=my_header,verify=False,cookies=jar) verify is either “False” or the name of a CA .pem file containing the CA certificates. I used verify=ABC and got

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:

I got the error because “ABC” is not a valid file, and the verification could not be done.

I exported the CA certificate used by the server using

RACDCERT CERTAUTH   EXPORT(LABEL('TEMP-CA')) -         
       DSN('IBMUSER.CERT.TEMP.CA.PEM')   -             
       FORMAT(CERTB64) -                               
       PASSWORD('password')                            

I could only get verify=…. to work with a USS file, so I had to copy the dataset IBMUSER.CERT.TEMP.CA.PEM into a USS file CACert.pem. Then when I used

res = requests.get(geturl,headers=my_header,verify=CACert.pem,cookies=jar)

it worked fine.

Using a client certificate.

You cannot use RACF certificate with the requests facility, because the underlying code does not support it. You have to use .pem style certificate.

The support does not allow you to specify a password for the private key, so this is not very secure.

You define

cf=”colinpaicesECp256r1.pem”
kf=”colinpaicesECp256r1.key.pem”
cpcert=(cf,kf)

where

  • cf is the name of the file with the certificate in it
  • kf is the name of the file with the private key in it
  • cpcert is the python tuple.

If your certificate file also includes the private key, you do not need the kf, just use cpcert=(cf).

You use it

res = requests.get(geturl,headers=my_header,verify=CACert.pem,cookies=jar,cert=cpcert)

I tried exporting a certificate from z/OS using RACDCERT EXPORT … format(PKCS12B64), copying it to a uss file, and using that, but it did not work. The file could not be read.

I tried creating a private key with a password (to make it more secure) but when I used it I got the message

urllib3.exceptions.SSLError: Client private key is encrypted, password is required

There is a package request_pkcs12 which provides support for a password on the certificate. https://github.com/m-click/requests_pkcs12. I did not use this, I recreated my certificate and private key without a password.

I tried running on Linux using my Hardware Security Module (which plugs into a USB socket). This also failed as I could not enter the PIN for the device.

Compare the response time to running across the network.

I ran the same python script on z/OS and on Linux. The round trip time of the rest request was

  • 1.41 seconds on z/OS
  • 0.92 seconds on Linux.

I think I’ll run my tests from Linux in future.

One Minute MVS performance – Work Load Manager – looking at WLM reports.

I have a set of blog posts relating to getting started with z/OS performance. This blog post follows on the overview of WLM, and describes the contents of the reports, and how you can tell if work is being delayed, and why it is being delayed.

Real goals from my system

For TSO on my z/OS there are goals

  1. For the first 800 service units (a systems independent measure of CPU usage)
    1. 80% requests to complete within 00:00:00.30
    2. Work has importance 2
  2. After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

  1. Execution velocity of 30
  2. Importance 3

For started tasks with Low Priority the goals are

  1. Discretionary – there no goals – just do your best

How do I tell what is going on and if the goals have been met?

RMF can display data in near real time (every minute or so).

RMF captures data and produces SMF records which can be processed by RMF and other products.

You can report on

  1. How well the service class did against its goals
  2. How well transactions or work did, from a reporting class.

You could have all CICS transactions in a service class, so they get the same CPU profile etc, but have different reporting classes. You can monitor CE* transaction, and PAY* transactions differently.

You could have a reporting class for work coming in from other systems, depending on the userid.

I set up a reporting class for z/OSMF. In the RMF batch report SYSRPTS(WLMGL(RCPER(ZOSMF)).

One part of the report was contained


         z/OS V2R4               SYSPLEX ADCDPL             DATE 06/14/2021           INTERVAL 05.00.003   
                                 RPT VERSION V2R4 RMF       TIME 09.25.00
POLICY=ETPBASE                        REPORT CLASS=ZOSMF                                   PERIOD=1 
 -TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF  TRANS-APPL%-----CP-IIPCP/AAPCP-IIP/AAP  ---ENCLAVES--- 
 AVG        1.00  ACTUAL                    0  TOTAL        66.25       64.20  173.99  AVG ENC   0.00 
 MPL        1.00  EXECUTION                 0  MOBILE        0.00        0.00    0.00  REM ENC   0.00 
 ENDED         0  QUEUED                    0  CATEGORYA     0.00        0.00    0.00  MS ENC    0.00 
 END/S      0.00  R/S AFFIN                 0  CATEGORYB     0.00        0.00    0.00 
                                                                                                                
 ----SERVICE----   SERVICE TIME  ---APPL %---  --PROMOTED--  --DASD I/O---  ----STORAGE----  -PAGE-IN RATES- 
 IOC        2366K  CPU  720.505  CP     66.25  BLK    0.000  SSCHRT    0.2  AVG    81420.24  SINGLE      0.0 
 CPU      617333   SRB    0.223  IIPCP  64.20  ENQ    0.000  RESP      0.0  TOTAL  81421.05  BLOCK       0.0 
 MSO      154219   RCT    0.000  IIP   173.99  CRM    0.000  CONN      0.0  SHARED     0.00  SHARED      0.0 
 SRB         191   IIT    0.013  AAPCP   0.00  LCK    0.889  DISC      0.0                   HSP         0.0 
 TOT        3138K  HST    0.000  AAP      N/A  SUP    0.000  Q+PEND    0.0 
 GOAL: EXECUTION VELOCITY 70.0%     VELOCITY MIGRATION:   I/O MGMT  28.3%     INIT MGMT 28.3% 
                                                                                                                
          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  -------------- EXEC DELAYS % -----------  
 SYSTEM                    VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU                                
 S0W1        --N/A--       28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19                               

Key fields:

INTERVAL 05.00.003

This tells the duration of the requests.

POLICY=ETPBASE REPORT CLASS=ZOSMF PERIOD=1

This tells you this is a report class (rather than a service class) the name is zOSMF, and is for period 1 . When you have service classes which have more than one criteria , such as high priority for the first 0.5 seconds of CPU – then low priority, these will have multiple periods.

-TRANSACTIONS–
AVG 1.00
MPL 1.00
ENDED 0
END/S 0.00

This says on average there was one instance running. You can have multiple transactions or jobs in a class. Add up the total duration of all jobs/transactions and divide by the interval to get the average(AVG).

MPL (multi programming level) is an advanced topic and describes how many instances were concurrently active.

No jobs/transactions ended in this interval, with a ending rate of 0 in 5 minutes.

—APPL %—
CP 66.25
IIPCP 64.20
IIP 173.99
AAPCP 0.00
AAP N/A

This shows the percentage of CPU used over the interval

  • 66.25 percent on GP engines
  • 64.20 percent IIPCP is 64.20 % of GP engine was doing work that could have run on a ZIIP – if there had been spare ZIIP capacity. 66.25 – 64.20 = 2.05 of work on a GP that was not ZIIP eligible.
  • 173.99 percent of ZIIP work running on a ZIIP engine – so nearly 2 ZIIP engines were being used
  • 0 AAPCP – there was no ZAAP eligible work offloaded onto a GP
  • 0 AAP there was no work running on an ZAAP

The total ZIIP used was 173.99 in ZIIP engines, +64.20 of a GP = 238 or almost 2.5 ZIIP engines worth.

It is good to run on ZIIPs where possible, because ZIIPs are cheaper ($$) than GPs, and GPs may be configured to be slower than a ZIIP.

GOAL: EXECUTION VELOCITY 70.0%

The performance goal for this work was defined as Execution Velocity of 70 %.

 
         EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM  VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU      
 S0W1    28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19       
  • The achieved execution velocity was 28.3% against a target of 70%
  • The performance index was 2.5. The performance goal is goal/actual. A value of 1 or smaller is good. The value here shows the goal was not met. You need to consider
    • Changing the goal for this work so the target goal is what you can achieve on a normal day
    • Changing the importance of the work for when the system is constrained.
    • If you change the goal for one set of work – it may impact other work, so you need to look at the system as a whole and decide which is your important work.
    • Add more CPUs or ZIIPs – these may not help if the delays are not CPU… see below
  • Average number of address spaces in this class 1.
  • EXEC USING%. The figures above were for true CPU used. WLM samples activities 4 times a second. Of the samples where jobs were running or waiting for waiting for a resource.
    • 8% of an CPU engine was used – this includes ZIIP work running on GP.
    • 20% of a ZIIP engine
    • The ratio 8:20 is similar to CPU on GP and ZIIP actually used in this period of 66.25: 173.99.
  • EXEC DELAYS
    • The total delay was 72% = ( 100 – (8+20) “using samples” above)
    • for 53% of all the samples it was was waiting for a ZIIP engine
    • for 19% of all the the samples it was waiting for a GP engine.
    • You can have other delays listed here, for example paging, or your program is capped to limit how much CPU it is allowed.

Once z/OSMF had started, and settled down, there were still delays for IIP (28%). To me this looks like a lumpy workload, that perhaps there is a timer which pops and runs multiple threads. There are more threads than IIPs – so some have to wait.

Reports for transactional work

I defined a transaction so I could measure the response times (and CPU used) for a service in z/OSMF. A TSO address space is started, and z/OSMF sends a client/server request to the TSO address space. The response time is sub-second so a good candidate to demonstrate WLM for a transaction.

I configured z/OSMF to have

<zosWorkloadManager collectionName=”MOPZCET”/>
<wlmClassification>
<httpClassification transactionClass=”ZCI3″ resource=”/zosmf/webispf/*/“/>
</wlmClassification>

The collection name is passed to WLM to determine the service class and report class of the work. The default is the server name.

All ISPF (with a URL of /zosmf/webispf/*) requests were classified as ZCI3.

I then used WLM to configure

  • a service class ZCI3 with Average response time of 00:00:00.010
  • a classification rule for type CB, a rule for CN=MOPZCET, and sub-rule TC = ZCI4. This gave the service class and report class.

The data in the report had

-TRANSACTIONS–
AVG 0.01
MPL 0.01
ENDED 21
END/S 0.07

21 transactions in 5 minutes is 0.07 a second.

MPL (MultiProgramming Limit is the target which represents the number of address spaces that must be in the swapped-in state for the service class period to meet its goals. I’ve never used it!

TRANS-TIME HHH.MM.SS.FFFFFF
ACTUAL               140526
EXECUTION            139950
QUEUED                  575

The average time was 0.140 seconds.

GOAL: RESPONSE TIME 000.00.00.010 AVG

That was the specification in WLM (note the specified value of 0.010 is very different to the 0.140 achieved)


          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP 
 S0W1     000.00.00.140526 66.7 14.1   0.0  0.0 N/A  18 0.0  9.1 9.1  

This shows the average response time was 0.140 seconds, used 18% on a ZIIP, and waited 9% of the time for a ZIIP

To the right of the data in the report was

--- DELAY % --- 
UNK IDL CRY CNT                 
 64 0.0 0.0 0.0 

Which says there was 64% of the delay was unknown. This could be

  • waiting for end user input
  • waiting for TCP/IP data
  • the program sent off a request and is waiting for a response.

For example the ISPF transaction in z/OSMF had sent a request to an address space running TSO. This address space processed the request and sent the response back. I am guessing that the 64% delay was waiting for TSO to process the request and send back the response.

You also get a response time profile based on the service class

                              ----------RESPONSE TIME DISTRIBUTION---------- 
   -----TIME------  # TRANS   0    10   20   30   40   50   60   70   80   90   100 
   HH.MM.SS.FFFFFF  IN BUCKET |....|....|....|....|....|....|....|....|....|....| 
<= 00.00.00.014000          0  > 
<= 00.00.00.015000          0  > 
<= 00.00.00.020000          2  >>>>>> 
<= 00.00.00.040000          5  >>>>>>>>>>>>> 
>  00.00.00.040000         14  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 

This shows that out of the 21 requests, 7 were below 0.040 seconds, and 14 were over 0.040 seconds.

From the service class, it was specified as GOAL: RESPONSE TIME 000.00.00.010 AVG so this goal is very badly specified. It would be better set to average of 0.140 seconds.

I changed the service class to a goal of 0.140 seconds and activated it. After I had run some tests the output was

          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS %
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT            
 S0W1     000.00.00.097733  100  0.7   0.0  0.0 N/A  50 0.0  0.0            

Which showed no delays

and a response time profile

                                ---RESPONSE TIME DISTRIBUTION--- 
    -----TIME------  --# TRANS  0    10   20   30   40   50   60
    HH.MM.SS.FFFFFF  IN BUCKET  |....|....|....|....|....|....|.
 <= 00.00.00.070000          0  > 
 <= 00.00.00.084000          5  >>>>>>>>>>>>>>> 
 <= 00.00.00.098000          9  >>>>>>>>>>>>>>>>>>>>>>>>>> 
 <= 00.00.00.112000          1  >>>> 
 <= 00.00.00.126000          0  > 
 <= 00.00.00.140000          1  >>>> 
 <= 00.00.00.154000          1  >>>> 
 <= 00.00.00.168000          0  > 
 <= 00.00.00.182000          0  > 
 <= 00.00.00.196000          0  > 
 <= 00.00.00.210000          1  >>>> 
 <= 00.00.00.280000          0  > 

An average of 0.10 seconds, with some taking up to 0.210 seconds.

Real time information

You can get the information in near real time from RMF (or other monitors)

For example for processor delays

            Service  CPU  DLY USG EAppl  ----------- Holding Job(s) ---------
Jobname  CX Class    Type  %   %    %     %  Name      %  Name      %  Name 
IZUSVR1  SO STCHIM   CP     2  35 56.53   91 IZUSVR1    4 JES2MON    2 TCPIP 
                     IIP   94  95 183.1   89 IZUSVR1                         

This shows that job IZUSVR1

  • Was delayed for 2% of the time on a GP
  • Used 35% of the GP engines
  • Was delayed 94% of the time on a ZIIP
  • and used 95% of the available ZIIP resource
  • The jobs using CPU were IZUSVR1 (using 91%) JES2MON and TCPIP
  • The jobs using ZIIP were IZUSVR1

What to do now?

You need to identify the goals of your work, and set sensible goals. This may take several iterations. You also need to understand the priorities of the work, and userid.

Once you have configured your system to report on response times of your business critical work, you can adjust the service classes so your work achieves it goals.

Define reporting classes so you can monitor different groups of work and that they are meeting their goals.

I cut the CPU cost of doing nothing.

I was running z/OSMF and saw that the CPU costs where high when it was sitting there doing nothing. I managed to reduce the CPU costs by more than half. This would apply to other Liberty based web servers, such as MQWEB, and z/OS Connect.

I could see from the MVS system trace there was a lot of activity creating a thread, and deleting a thread, a lot of costs associated with these activities, such as allocating and freeing storage.

I increased the number of threads so that this allocating a thread and delete a thread activity disappeared.

In the xml configuration file (based from server.xml) was the default

<executor name=”LargeThreadPool” id=”default” coreThreads=”100″
maxThreads=”0″ keepAlive=”60s” stealPolicy=”STRICT”
rejectedWorkPolicy=”CALLER_RUNS” />

I changed this to

<executor name=”LargeThreadPool” id=”default”
coreThreads=”300″ maxThreads=”600″ keepAlive=”60s”
stealPolicy=”STRICT” rejectedWorkPolicy=”CALLER_RUNS” />

and restarted the server.

The options are documented here. There is an option keepAlive which defaults to 60 seconds. If a thread has been idle for this time, the thread is a candidate to be freed to reduce the pool back to corethreads size.

I was alerted to this problem when I looked at an MVS system trace. This is described here.

There is a discussion how sun thread pools work in this post. It is not obvious. This may or may not be how this executor works.

What value should you use?

This is a hard question, as Liberty does not provide this information directly.

I used the Health Checker connects from Eclipse to the JVM and extracts information about the JVM and applications.

This shows that at rest there was a lot of activity. I increased it to 250 threads and restarted the server and got

So better … but still some activity. I increased it to 300 threads, and the graph was flat.

I set up USER.Z24A.PROCLIB(CEEOPT) with

RPTSTG(ON),
RPTOPTS(ON)

in my z/OSMF job I had

//CEEOPTS DD DISP=SHR,DSN=USER.Z24A.PROCLIB(CEEOPT)

This printed out a lot of useful information about the stack and heap usage. It the bottom it said

Largest number of threads concurrently active: 397

The number of threads includes threads from the pool I had specified, plus other threads that z/OSMF creates. The health check showed there were 372 threads, event though coreThreads was set to “300”.

I also used jconsole to display information about the highest thread usage. The URL was service:jmx:rest://10.1.1.2:10443/IBMJMXConnectorREST. It displays peak threads and live threads.

Security

I found the security of both jconsole, and health check, was weak (userid and password). I was unable to successfully set up a TLS certificate logon to the server.

The information from rptstg was only available at shutdown.

Why does increasing the number of threads reduce the CPU when idle?

The thread pool has logic to remove unused threads and shrink it to the coreThreads size. If the pool size is too small it has to create threads and delete threads according to the load. See here. The keepAlive mentioned at the top is how long a thread can be idle for, before it can be considered a candidate for deletion.

Summary

Monitor the CPU used when idle and see if increasing the threadpool to 300 helps.

The best way to save money, is not to spend it. The same is true for CPU.

I was trying to understand why a z/OSMF address space, using the Liberty web server (written in Java) was using a lot of CPU – when it was doing no work. I looked into the MVS system trace and saw some interesting behaviour. If you are trying to investigate a high CPU usage in a z/OS Job, I hope the following may help you with where you need to start looking.

If you are looking at a Java program, knowing there is a problem does not help you with what is causing the problem. There is Java, which uses C code, which uses USS services, which uses MVS services which is what you see in the system trace. The symptom is a long way from the source code. You might be able to correlate the time stamp in the system trace with the Java trace.

The ‘interesting’ behaviour…

  • Getmain/freemain storage requests. These are heavy weight requests for getting and freeing storage. Once warmed up, I would expect no storage requests.
  • “Storage Obtain”/”Storage Release” requests. These are medium weight requests for getting and freeing storage. Once warmed up, I would expect no storage requests.
  • Attach task and detach tasks, or pthread_create(). I would have expected tasks would have been attached at start up, and there would be no need for more tasks. I can see that under load more tasks may be required until the system stabilises.
  • Many timer pops a second. There were 50 time pops every second (one every 20 milliseconds). Is this efficient? No! It may be more efficient to have the duration between timer pops increase if there is no load, so a timer pop 10 times a second may be acceptable when the system is idle – and reset it to a short interval when the system is busy, or change the programming model to be wait-post rather than spinning the wheels.

I’ll discuss these in more detail below.

Storage requests, GETMAINs, FREEMAINs, STORAGE requests.

For a non trivial application such as a web server, MQ, DB2 etc, I would not expect to see any storage requests in the system trace once the system has warmed up. When I worked for MQ development, we went through the system trace, and every time we found a GETMAIN or STORAGE OBTAIN, we worked to eliminate it, until there were non left.

Use a stack

Instead of each subroutine using GETMAIN or “Storage request” to get a block of storage for its variables, I would expect a program stack to be used. For example the top level program for the thread allocates a 1MB block of storage and uses this as a stack. The top level program uses the first 2KB from this buffer. The first subroutine uses this buffer from 2KB for 3KB. If this subroutine calls another subroutine, the lower level subroutine uses this buffer from 5KB for 2KB. This is a very efficient way of managing storage and each subroutine needs only 10’s of instructions to get and release storage from the stack.

A problem can occur if the stack is not big enough, and there is logic like “If no space in the stack – then GETMAIN a block of storage”. If this happens the request quickly becomes expensive.

C (Language Environment) programs on z/OS can set the stack size, and when the system is shutdown, print out statistics on the stack usage.

Use the heap

A subroutine may need some “external storage”, which exist outside of the subroutine, for example store entries in a table for the life of the job. A heap (or heappool) is a very efficient way of managing the storage. If your program gets some storage, it does not return it, when the block has been finished with, the program “adds it to the heap” so it can be reused.

A simple heap example.

This might be an array of 3 pointers;

  • storage[0] is a chain of free 1KB blocks,
  • storage[1] is a chain of free blocks from 1K+1 bytes to 10KB,
  • storage[2] is a chain of free blocks from 10K+1 bytes to 50KB.

If your program needs a 512 byte block – it looks to see if there is a free block chained from storage[0], if not allocate a 1KB block (not 512 byte). When it has finished with the block, put it onto the storage[0] chain.
Over time the number of elements on each chain is sufficient to run the workload, and there should be no more storage requests. An increase in throughput may increase the demand for storage, and so during this “warm up” period, there may be more storage requests.

C run time statistics

For C programs on z/OS you can get the C runtime component to print out statistics on the stack and heap usage, and gives recommendations on the best size to specify.

In the //CEEOPTS data set you can specify the following

You may want to use HEAPPOOLS64(ALIGN…) and HEAPPOOLS(ALIGN…) when there are multiple threads so the blocks are hardware cache friendly, and you do not have two CPUs fighting for the same hardware cache data.

Ive blogged One Minute MVS – tuning stack and heap pools.

Smart programs

MVS can call exit programs, for example when an asynchronous event has happened, such as a timer has expired. These programs are expected to allocate storage for their variables ,do some work, give back the storage, and return. This can be very expensive – you have the cost of getting and freeing a block of storage just to set a few bits.

You may be able to write your exit program so it only uses registers, and does not need any virtual storage for variables. If this is not possible then consider passing a block of storage into the called program. For example the RACF Admin function

CALL IRRSEQ00 (Work_area,… )

Work_area: The name of a 1024-byte work area for SAF and RACF usage. The work area must be in the primary address space.

Example exit program

You could use the assembler macro STIMERM (Set TIMER). You specify the time interval, the address of the exit, and a user parameter. This user parameter is passed to the exit program when it gets control.

  • This could be a pointer to a WAIT ECB block,
  • or a pointer to a structure, one element in the structure is the WAIT ECB block, another element is the address of a block of storage the exit can use.

Attach task and detach task.

It is expensive to attach and detach tasks, so it is important to do it as little as possible. From a USS perspective the attach is from pthread_create.

A common design template to eliminate the attach/detach model is to have a pool of threads to do work.

  • A work request comes in, the dispatcher task gets a worker thread from the pool, and gives the work request to it. When the worker has finished it puts itself back in the pool.
  • If there was no worker thread available, check the configuration for the maximum number of threads, If this limit has not been reached, create a new worker thread.
  • If the was no worker thread available and the number of threads was at the limit, then wait until a worker thread is free.
  • Some thread pools have logic to shrink the pool if it gets too big. Without this logic a thread pool could be very large because it hit a peak usage weeks ago, and the pool has only been little used since.

Having a pool means that some of the expensive set up is done only once per thread, for example connect to DB2 or connect to MQ. You also avoid the expensive create (attach) of a thread, and delete (detach) of the thread. The application has logic like

  • Dispatching application attaches a new thread.
  • start thread
    • perform the expensive set up – for example connect to DB2 or connect to MQ
    • add task to the thread pool
    • do until told to shutdown
      • wait for work
      • do the work
    • end do
    • disconnect from DB2 or MQ
    • thread returns and is detached
Problems with the thread pool

One problem with using a thread pool is if the minimum pool size is too small. Smart thread pools have options like

  • lowest number of threads in thread pool
  • maximum number of thread thread pool
  • maximum idle time of a thread. If there are more threads than the lowest number of threads, and a thread has been idle for longer than this time then free the thread.

You can get the”thrashing” on a low usage system

  • The lowest number of threads is specified as 10 threads.
  • The main program needs 50 threads – it uses 10 from the pool, and allocates 40 new threads. These are added to the pool when the work has finished.
  • The clean-up process periodically checks the pool. If there are more threads than the lowest number, then purge ones which have been idle for more than the specified idle-time. 40 threads are purged
  • Repeat:
  • The main program needs 50 threads- it uses 10 from the pool, and allocates 40 new threads. These are added to the pool when the work has finished.
  • The clean-up process periodically checks the pool. If there are more threads than the lowest number, then purge ones which have been idle for more than the specified idle-time. 40 threads are purged

In this case there is a lot of attach/purge activity.

Making the pool size 50, or the maximum idle time very large will prevent this thrashing…

  • The lowest number of threads is specified as 50 threads.
  • The main program needs 50 threads – it uses 50 from the pool.
  • The clean-up process periodically checks the pool. The pool size is OK – do nothing.
  • Repeat:
  • The main program needs 50 threads- it uses 50 from the pool.
  • The clean-up process periodically checks the pool. The pool size is OK – do nothing

In this case the number of threads stays constant and you do not get the create/delete (attach/purge) of threads.

In one test this reduced the CPU time used when idling by more than 50 %.

Many timer pops a second

In my system trace I can see a task wakes up, it sets a timer for 20 milliseconds later, and suspends itself. This is very inefficient. This should be a wait-post model instead of an application in a loop, sleeping and checking something.

When investigating this you need to think about the speed of your box. Consider an application which just does

  • setting a timer to wake up in 10 milliseconds
  • it wakes up a thread which does nothing – but set a timer for 10 ms later (or 100 times a second)

On my slow box this could take me 1 ms of CPU to do this once, – or 100 ms of CPU for 100 times a second. One engine would be busy 10% of the time.

If I had a box which was 10 times faster and only took 0.1 ms of CPU to do the same work. For 100 iterations this would be 10 ms of CPU or 1% of an engine. To some people this is at the “noise level” and not worth looking at.

To you 1% CPU per second is “noise level”, to me the noise level of 10% CPU per second is a flashing red light, a loud klaxon and people in body armour running past.