One Minute MVS performance – Work Load Manager – looking at WLM reports.

I have a set of blog posts relating to getting started with z/OS performance. This blog post follows on the overview of WLM, and describes the contents of the reports, and how you can tell if work is being delayed, and why it is being delayed.

Real goals from my system

For TSO on my z/OS there are goals

  1. For the first 800 service units (a systems independent measure of CPU usage)
    1. 80% requests to complete within 00:00:00.30
    2. Work has importance 2
  2. After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

  1. Execution velocity of 30
  2. Importance 3

For started tasks with Low Priority the goals are

  1. Discretionary – there no goals – just do your best

How do I tell what is going on and if the goals have been met?

RMF can display data in near real time (every minute or so).

RMF captures data and produces SMF records which can be processed by RMF and other products.

You can report on

  1. How well the service class did against its goals
  2. How well transactions or work did, from a reporting class.

You could have all CICS transactions in a service class, so they get the same CPU profile etc, but have different reporting classes. You can monitor CE* transaction, and PAY* transactions differently.

You could have a reporting class for work coming in from other systems, depending on the userid.

I set up a reporting class for z/OSMF. In the RMF batch report SYSRPTS(WLMGL(RCPER(ZOSMF)).

One part of the report was contained


         z/OS V2R4               SYSPLEX ADCDPL             DATE 06/14/2021           INTERVAL 05.00.003   
                                 RPT VERSION V2R4 RMF       TIME 09.25.00
POLICY=ETPBASE                        REPORT CLASS=ZOSMF                                   PERIOD=1 
 -TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF  TRANS-APPL%-----CP-IIPCP/AAPCP-IIP/AAP  ---ENCLAVES--- 
 AVG        1.00  ACTUAL                    0  TOTAL        66.25       64.20  173.99  AVG ENC   0.00 
 MPL        1.00  EXECUTION                 0  MOBILE        0.00        0.00    0.00  REM ENC   0.00 
 ENDED         0  QUEUED                    0  CATEGORYA     0.00        0.00    0.00  MS ENC    0.00 
 END/S      0.00  R/S AFFIN                 0  CATEGORYB     0.00        0.00    0.00 
                                                                                                                
 ----SERVICE----   SERVICE TIME  ---APPL %---  --PROMOTED--  --DASD I/O---  ----STORAGE----  -PAGE-IN RATES- 
 IOC        2366K  CPU  720.505  CP     66.25  BLK    0.000  SSCHRT    0.2  AVG    81420.24  SINGLE      0.0 
 CPU      617333   SRB    0.223  IIPCP  64.20  ENQ    0.000  RESP      0.0  TOTAL  81421.05  BLOCK       0.0 
 MSO      154219   RCT    0.000  IIP   173.99  CRM    0.000  CONN      0.0  SHARED     0.00  SHARED      0.0 
 SRB         191   IIT    0.013  AAPCP   0.00  LCK    0.889  DISC      0.0                   HSP         0.0 
 TOT        3138K  HST    0.000  AAP      N/A  SUP    0.000  Q+PEND    0.0 
 GOAL: EXECUTION VELOCITY 70.0%     VELOCITY MIGRATION:   I/O MGMT  28.3%     INIT MGMT 28.3% 
                                                                                                                
          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  -------------- EXEC DELAYS % -----------  
 SYSTEM                    VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU                                
 S0W1        --N/A--       28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19                               

Key fields:

INTERVAL 05.00.003

This tells the duration of the requests.

POLICY=ETPBASE REPORT CLASS=ZOSMF PERIOD=1

This tells you this is a report class (rather than a service class) the name is zOSMF, and is for period 1 . When you have service classes which have more than one criteria , such as high priority for the first 0.5 seconds of CPU – then low priority, these will have multiple periods.

-TRANSACTIONS–
AVG 1.00
MPL 1.00
ENDED 0
END/S 0.00

This says on average there was one instance running. You can have multiple transactions or jobs in a class. Add up the total duration of all jobs/transactions and divide by the interval to get the average(AVG).

MPL (multi programming level) is an advanced topic and describes how many instances were concurrently active.

No jobs/transactions ended in this interval, with a ending rate of 0 in 5 minutes.

—APPL %—
CP 66.25
IIPCP 64.20
IIP 173.99
AAPCP 0.00
AAP N/A

This shows the percentage of CPU used over the interval

  • 66.25 percent on GP engines
  • 64.20 percent IIPCP is 64.20 % of GP engine was doing work that could have run on a ZIIP – if there had been spare ZIIP capacity. 66.25 – 64.20 = 2.05 of work on a GP that was not ZIIP eligible.
  • 173.99 percent of ZIIP work running on a ZIIP engine – so nearly 2 ZIIP engines were being used
  • 0 AAPCP – there was no ZAAP eligible work offloaded onto a GP
  • 0 AAP there was no work running on an ZAAP

The total ZIIP used was 173.99 in ZIIP engines, +64.20 of a GP = 238 or almost 2.5 ZIIP engines worth.

It is good to run on ZIIPs where possible, because ZIIPs are cheaper ($$) than GPs, and GPs may be configured to be slower than a ZIIP.

GOAL: EXECUTION VELOCITY 70.0%

The performance goal for this work was defined as Execution Velocity of 70 %.

 
         EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM  VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP CPU      
 S0W1    28.3  2.5   1.0  8.0 N/A  20 0.0   72  53  19       
  • The achieved execution velocity was 28.3% against a target of 70%
  • The performance index was 2.5. The performance goal is goal/actual. A value of 1 or smaller is good. The value here shows the goal was not met. You need to consider
    • Changing the goal for this work so the target goal is what you can achieve on a normal day
    • Changing the importance of the work for when the system is constrained.
    • If you change the goal for one set of work – it may impact other work, so you need to look at the system as a whole and decide which is your important work.
    • Add more CPUs or ZIIPs – these may not help if the delays are not CPU… see below
  • Average number of address spaces in this class 1.
  • EXEC USING%. The figures above were for true CPU used. WLM samples activities 4 times a second. Of the samples where jobs were running or waiting for waiting for a resource.
    • 8% of an CPU engine was used – this includes ZIIP work running on GP.
    • 20% of a ZIIP engine
    • The ratio 8:20 is similar to CPU on GP and ZIIP actually used in this period of 66.25: 173.99.
  • EXEC DELAYS
    • The total delay was 72% = ( 100 – (8+20) “using samples” above)
    • for 53% of all the samples it was was waiting for a ZIIP engine
    • for 19% of all the the samples it was waiting for a GP engine.
    • You can have other delays listed here, for example paging, or your program is capped to limit how much CPU it is allowed.

Once z/OSMF had started, and settled down, there were still delays for IIP (28%). To me this looks like a lumpy workload, that perhaps there is a timer which pops and runs multiple threads. There are more threads than IIPs – so some have to wait.

Reports for transactional work

I defined a transaction so I could measure the response times (and CPU used) for a service in z/OSMF. A TSO address space is started, and z/OSMF sends a client/server request to the TSO address space. The response time is sub-second so a good candidate to demonstrate WLM for a transaction.

I configured z/OSMF to have

<zosWorkloadManager collectionName=”MOPZCET”/>
<wlmClassification>
<httpClassification transactionClass=”ZCI3″ resource=”/zosmf/webispf/*/“/>
</wlmClassification>

The collection name is passed to WLM to determine the service class and report class of the work. The default is the server name.

All ISPF (with a URL of /zosmf/webispf/*) requests were classified as ZCI3.

I then used WLM to configure

  • a service class ZCI3 with Average response time of 00:00:00.010
  • a classification rule for type CB, a rule for CN=MOPZCET, and sub-rule TC = ZCI4. This gave the service class and report class.

The data in the report had

-TRANSACTIONS–
AVG 0.01
MPL 0.01
ENDED 21
END/S 0.07

21 transactions in 5 minutes is 0.07 a second.

MPL (MultiProgramming Limit is the target which represents the number of address spaces that must be in the swapped-in state for the service class period to meet its goals. I’ve never used it!

TRANS-TIME HHH.MM.SS.FFFFFF
ACTUAL               140526
EXECUTION            139950
QUEUED                  575

The average time was 0.140 seconds.

GOAL: RESPONSE TIME 000.00.00.010 AVG

That was the specification in WLM (note the specified value of 0.010 is very different to the 0.140 achieved)


          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS % -
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT IIP 
 S0W1     000.00.00.140526 66.7 14.1   0.0  0.0 N/A  18 0.0  9.1 9.1  

This shows the average response time was 0.140 seconds, used 18% on a ZIIP, and waited 9% of the time for a ZIIP

To the right of the data in the report was

--- DELAY % --- 
UNK IDL CRY CNT                 
 64 0.0 0.0 0.0 

Which says there was 64% of the delay was unknown. This could be

  • waiting for end user input
  • waiting for TCP/IP data
  • the program sent off a request and is waiting for a response.

For example the ISPF transaction in z/OSMF had sent a request to an address space running TSO. This address space processed the request and sent the response back. I am guessing that the 64% delay was waiting for TSO to process the request and send back the response.

You also get a response time profile based on the service class

                              ----------RESPONSE TIME DISTRIBUTION---------- 
   -----TIME------  # TRANS   0    10   20   30   40   50   60   70   80   90   100 
   HH.MM.SS.FFFFFF  IN BUCKET |....|....|....|....|....|....|....|....|....|....| 
<= 00.00.00.014000          0  > 
<= 00.00.00.015000          0  > 
<= 00.00.00.020000          2  >>>>>> 
<= 00.00.00.040000          5  >>>>>>>>>>>>> 
>  00.00.00.040000         14  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 

This shows that out of the 21 requests, 7 were below 0.040 seconds, and 14 were over 0.040 seconds.

From the service class, it was specified as GOAL: RESPONSE TIME 000.00.00.010 AVG so this goal is very badly specified. It would be better set to average of 0.140 seconds.

I changed the service class to a goal of 0.140 seconds and activated it. After I had run some tests the output was

          RESPONSE TIME    EX   PERF  AVG   --EXEC USING%--  - EXEC DELAYS %
 SYSTEM   HHH.MM.SS.FFFFFF VEL% INDX ADRSP  CPU AAP IIP I/O  TOT            
 S0W1     000.00.00.097733  100  0.7   0.0  0.0 N/A  50 0.0  0.0            

Which showed no delays

and a response time profile

                                ---RESPONSE TIME DISTRIBUTION--- 
    -----TIME------  --# TRANS  0    10   20   30   40   50   60
    HH.MM.SS.FFFFFF  IN BUCKET  |....|....|....|....|....|....|.
 <= 00.00.00.070000          0  > 
 <= 00.00.00.084000          5  >>>>>>>>>>>>>>> 
 <= 00.00.00.098000          9  >>>>>>>>>>>>>>>>>>>>>>>>>> 
 <= 00.00.00.112000          1  >>>> 
 <= 00.00.00.126000          0  > 
 <= 00.00.00.140000          1  >>>> 
 <= 00.00.00.154000          1  >>>> 
 <= 00.00.00.168000          0  > 
 <= 00.00.00.182000          0  > 
 <= 00.00.00.196000          0  > 
 <= 00.00.00.210000          1  >>>> 
 <= 00.00.00.280000          0  > 

An average of 0.10 seconds, with some taking up to 0.210 seconds.

Real time information

You can get the information in near real time from RMF (or other monitors)

For example for processor delays

            Service  CPU  DLY USG EAppl  ----------- Holding Job(s) ---------
Jobname  CX Class    Type  %   %    %     %  Name      %  Name      %  Name 
IZUSVR1  SO STCHIM   CP     2  35 56.53   91 IZUSVR1    4 JES2MON    2 TCPIP 
                     IIP   94  95 183.1   89 IZUSVR1                         

This shows that job IZUSVR1

  • Was delayed for 2% of the time on a GP
  • Used 35% of the GP engines
  • Was delayed 94% of the time on a ZIIP
  • and used 95% of the available ZIIP resource
  • The jobs using CPU were IZUSVR1 (using 91%) JES2MON and TCPIP
  • The jobs using ZIIP were IZUSVR1

What to do now?

You need to identify the goals of your work, and set sensible goals. This may take several iterations. You also need to understand the priorities of the work, and userid.

Once you have configured your system to report on response times of your business critical work, you can adjust the service classes so your work achieves it goals.

Define reporting classes so you can monitor different groups of work and that they are meeting their goals.

One Minute MVS performance – Work Load Manager – background

Question: In your car how do you tell if your car has a problem? Answer: You look at the dashboard and see if there is a red light showing. You may not know how to fix it – but you know that you need to get help to fix it.

The aim of this series of blog posts is to show you what to look for in z/OS performance and if you have a problem.

I will cover

Ive written a blog post on how to understand reports from WLM

Managing workload

40 years ago the pilot of a commercial jet had many knobs and dials to control the performance (and speed) of the aeroplane. These days computers do most of the small tuning; the pilot sets the overall goals, and the computer does the rest.

It is the same with managing workload performance on z/OS. The systems programmer used to individually adjust the performance of jobs and transactions running on z/OS. These days the systems programmer sets the overall goals and the computer does the rest.

The work on z/OS is managed by the WorkLoad Manager (WLM). The systems programmer defined goals like

  • these CICS transactions should run with an average response time of 1 second.
  • The trivial TSO commands should run in under 10 milliseconds.
  • Batch – I dont care… it can run in the background.

I remember one customer saying that one day when he switched on WLM (so the WLM managed the workload), he noticed that the batch workload finished early, it made every thing go faster!

This is because when the system had been manually “tuned”; the CICS transactions were finishing in under half a second (much faster than the requirements of of 1 second). WLM worked to the goals, and the CICS transactions executing with a response time of 1 second. This meant there was spare CPU, and more batch workload could be done during the day.

How to monitor work?

For short lived requests, like CICS transactions or TSO commands, the response time is the obvious metric. Typically the response time is under a second, and at the end of a minute you should have many data points to tell you if you are achieving your response time goals.

Long running jobs or started tasks may run for weeks, so the time to run the job is meaningless. It could run slower when the system is busy, or run faster when the system is lightly loaded. It needs a second by second metric to measure progress.

WLM uses the concept of how much the work can be delayed. It uses a metric called Execution velocity which in concept is “the ratio of CPU used” to “time waiting for a resource”. In simple terms

100 * CPU used /(CPU used + wait time for a resource).

WLM can periodically check this ratio and adjust the priority of the work to achieve the goals.

If the execution velocity is 100 then it is not delayed for CPU.

If the execution velocity is 1 then if the work used 1 second (or millisecond) of CPU, then it is OK for the job to be delay for I/O or waiting for CPU for 100 seconds (or milliseconds). The ratio is important – not the absolute values.

How to work out which work to dispatched?

Every 10 seconds WLM looks at the data to decide which service classes need more or less CPU

WLM looks at all the work, and if it is meeting the goals for the service class (the definition of the goals).

  1. If all the work is within its goals pick any waiting work to dispatch
  2. If any work is not within its goal, adjust the dispatching priorities. Start with work with Importance 1, when this is within its goals, look at work with Importance 2 etc..

What can you configure

You can configure the system with goals like

  1. CICS transactions should take 1 second elapsed time to execute.
  2. Quick TSO commands using less than 0.5 seconds of CPU have high velocity.
  3. Slow TSO command using more than 0.5, but less than 5 seconds of CPU have Importance 3.
  4. Expensive TSO commands using more than 5 seconds of CPU have low priority
  5. Colin’s TSO userid always gets high priority regardless of the commands.
  6. Batch jobs with this accounting information, can run with high velocity
  7. Long batch jobs, or those batch jobs using more than 1 second of CPU, have low priority.

Work can get tracked across the system, and if WLM detects that CICS transactions are slowing down, then when the CICS issues a DB2 request in a different LPAR in the sysplex, it makes sure the request in DB2 has a high enough priority to keep the response time goals. WLM can also prioritise I/O so that the I/O for one transaction takes precedence over the I/O for a batch job.

The systems programmer creates a few broad categories of work, and specifies the goals of the service class. These service classes control the priority of work.

Use the WLM redbook for guidance on defining WLM service classes.

Service classes define the goals.

You have reporting classes for groups of similar jobs or transactions to report WLM information on these similar jobs or transactions. So although a group of work has the same service class, you can report it different ways, for example by transaction, or by userid.

You can define CICS, IMS, or Liberty as a server, and transactions/work within the server get WLM classified. So for job CICSA, the transactions PAY1,PAY2,PAY3 have high priority; for z/OSMF, userid COLIN has high priority.

What class is my work in?

You can display which service class a job is in using the D A,jobname operator command, for example it gave

WKL=STARTED SCL=STCLOM .

The Service CLass is STCLOM.

You can use SDSF DA, and use the column SrvClass. (You need to start RMF, then go into SDSF to display the Srvclass and other WLM related parameters).

You can change the service class of a job by using the operator command

RESET IZUSVR1,SRVCLASS=STCMDM

(Note which way round the letters are jobname IZUSVR1, service class SRVCLASS=…)

or, if you are authorised, overtype the field in SDSF, or from z/OSMF WLM plugin.

To change it permanently you’ll need to change the WLM definitions.

More details of how it works

I found the WLM redbook useful.

I described above that the execution velocity was 100 * CPU used /(CPU used + wait time for a resource).

The concept is correct – but the implementation is different. If my job had used 1000 seconds of CPU since it started, it is not helpful in seeing it behaviour over the last few minutes, as the execution velocity would be insensitive.

Every 250 milliseconds (4 times a second) WLM looks at every job/transaction in the system. It then updates internal control blocks for each Service Class and Report Class and increments a table.

  • executing – add 1 to the active (or using) CPU
  • transferring data to a device (connect time) add 1 to the active ( or using) I/O
  • waiting in z/OS to start an I/O – add one to the delayed for I/O
  • being paged in – add 1 to the delayed for”page in”
  • etc
  • waiting for the end user to enter data – do nothing.
  • waiting for TCPIP data – do nothing.

Execution velocity = 100 * (Total active samples /(Total active samples + Total delayed samples).

If during a 25 second period the transaction was

  • using CPU, in 20 samples,
  • transferring data to disk, in 10 sample
  • waiting to start an I/O, in 25 samples
  • waiting for the end user to type some data, in 45 samples.

From this we can see…

  • The count of active samples is 20 + 10
  • The count of delayed is 25 samples.
  • 45 samples are not used.

Execution velocity = 100 * ( (20 + 10) /(20+10) + 25)) = 55 .

An execution velocity of 100 means that when ever the job was sampled, it was always either dispatched and running; or transferring data to I/O.

An execution velocity says if we expect the job to use 50 seconds of CPU, and has a velocity of 10 then we expect the job to run in about 500 seconds. If it used 50 seconds of CPU, and was transferring data (connect time) of 20 seconds, the execution velocity would be 100 * (50 + 20) /((50+ 20) + 450) = 13 % which is close enough to 10% velocity.

Real goals from my system

For TSO on my z/OS there are goals

  1. For the first 800 service units (a systems independent measure of CPU usage)
    1. 80% requests to complete within 00:00:00.30
    2. Work has importance 2
  2. After this, any work has an execution velocity of 40.

For started tasks with Medium Priority the goals are

  1. Execution velocity of 30
  2. Importance 3

For started tasks with Low Priority the goals are

  1. Discretionary – there no goals – just do your best

The bear traps when using enclaves

I hit several problems when trying to use the enclave support.

In summary

  1. The functions to set up and use an enclave are available from C, but the functions to query and display usage are not available from C (and so not available from Java).
  2. Some functions caused an infinite loop because they overwrote the save area.
  3. Not all classify functions are available in C.  For example ClientIPAddr
  4. I had problems in 64 bit mode.
  5. Various documentation problems
  6. It is not documented that you need to pass the connection token to __server_classify(_SERVER_CLASSIFY_CONNTKN, (char * ) connToken. You get errno2 errno2=0x0330083B.  Home address space does not own the connect token
    from the input parameter list.
  7. You can query the CPU used by your enclave using the IWMQTME macro (in supervisor state!). I had to specify CURRENT_DISP=YES to cause the dispatcher to be called to update the CPU figures.  By default the CPU usage figures are updated at the end of a dispatch cycle.  On my low use system, my transactions were running without being redispatched, and so the CPU “used” was reported as 0.

In more detail…

Minimum functionality for C programs.

You cannot obtain the CPU used by the enclaves from a C program, as the functions are not defined.  I had to write my own assembler code to called the assembler macros to obtain the information.  Some of these macros require supervisor state.

Many macros clobber the save area

Many macros, use a program call to execute a function.  Other functions such as  IWMEQTME use a BASR instruction.  This function then does a standard save of the registers.  This means that you need to have a standard function save area.  Without this, the callers save area was used, and this overwrote the register, and Branch back… just branched to after the macro.

Instead of a function like

EDEL     RMODE  ANY 
EDEL     AMODE  31 
EDEL     CSECT 
          USING *,12 
          STM  14,12,12(13) 
          LR   12,15 
          L    6 0(1)  the work area  
          L    2,4(1)  ADDRESS OF THE passed data              
          IWM4EDEL ETOKEN=0(2),MF=(E,0(6),COMPLETE),                   XX 
                CPUTIME=8(2),ZAAPTIME=16(2),ZIIPTIME=24(2),            XX 
                RSNCODE=32(2),RETCODE=36(2) 
          LM   14,12,12(13) 
          SR   15,15 
          BR   14 

I needed to add in code to create a save area, for example with a different macro

QCPU     RMODE  ANY 
QCPU     AMODE  31 
QCPU     CSECT 
** CAUTION THE IWMEQTME CORRUPTS SAVE AREA SO PROGRAM NEEDS ITS OWN
** SAVE AREA 
      USING *,12 
      STM  14,12,12(13) 
      LR   2,1 
      LR   12,15 
      LA    0,WORKLEN 
      STORAGE OBTAIN,LENGTH=(0) 
      ST     1,8(,13) FORWARD CHAIN IN PREV SAVE AREA 
      ST     13,4(,1) BACKWARD CHAIN IN NEXT SAVE AREA 
      LR     13,1     SET UP NEW SAVE AREA/REENTRANT WORKAREA 
      L    2,0(2)  ADDRESS OF THE CPUTIME 
      IWMEQTME CPUTIME=8(2),ZAAPTIME=16(2),ZIIPTIME=24(2),          X 
            CURRENT_DISP=YES,                                       X 
            RSNCODE=4(2),RETCODE=0(2),MF=(E,32(2),COMPLETE) 
      LR   3,15 
* free the resgister save area
      LR     1,13               ADDRESS TO BE RELEASED 
      L     13,4(,13)          ADDRESS OF PRIOR SAVE AREA 
      LA    0,WORKLEN           LENGTH OF STORAGE TO RELEASE 
      STORAGE RELEASE,           RELEASE REENTRANT WORK AREA        X 
            ADDR=(1),            ..ADDRESS IN R1                    X 
            LENGTH=(0)           ..LENGTH IN R0 
      L    14,12(13) 
      LR  15,3 
      LM   0,12,20(13) 
 SR   15,15 
      BR   14   

Problems using a 64 bit program

I initially had my C program in 64 bit mode. This caused when I wrote some stub code to use the assembler interface, as the assembler macros are supported in AMODE 31, but my program, and storage areas were 64 bit, and the assembler code had problems.

Various documentation problems

  1. It is not documented that you need to pass the connection token to __server_classify(_SERVER_CLASSIFY_CONNTKN, (char * ) connToken. You get errno2 errno2=0x0330083B.  Home address space does not own the connect token
    from the input parameter list
  2. _SERVER_CLASSIFY_SUBSYSTEM_PARM Set the transaction subsystem parameter. When specified, value contains a NULL-terminated character string of up to 255 characters containing the subsystem parameter being used for the __server_pwu() call.  This applies to _Server_classify_ as well as __server_pwu().   The sample applies for  _SERVER_CLASSIFY_TRANSACTION_CLASS , _SERVER_CLASSIFY_TRANSACTION_NAME, _SERVER_CLASSIFY_USERID.
  3. Getting report and server class back from __server-classify
    1. It is  _SERVER_CLASSIFY_SRVCLSNM not _SERVER_CLASSIFY_SERVCLSNM.
    2. You use _SERVER_CLASSIFY_RPTCLSNM@, _SERVER_CLASSIFY_SERVCLS@, _SERVER_CLASSIFY_SERVCLSNM@ without the @ at the end.   I think this is meant to imply these are pointers.
    3. They did not work for me.  I could not see when the fields are available.   The classify work is only done during the CreateWorkUnit() request.  I request it before this function, and after this function and only got back a string of hex 0s.

Using enclaves in a java program

Ive blogged about using enclaves from a C program.  There is an interface from Java which uses this C interface.

Is is relatively easy to use enclave services from a java program, as there are java classes for most of the functions, available from JZOS toolkit.  For example the WorkloadManager class is defined here.

Below is a program I used to get the Work Load Manager(WLM) services working.

import java.util.concurrent.TimeUnit;
import com.ibm.jzos.wlm.ServerClassification;
import com.ibm.jzos.wlm.WorkUnit;
import com.ibm.jzos.wlm.WorkloadManager;
public class main
{
// run it with /usr/lpp/java/J8.0_64/bin/java main
public static void main(String[] args) throws Exception
{
WorkloadManager wlmToken = new WorkloadManager("JES", "SM3");
ServerClassification serverC = wlmToken.createServerClassification();
serverC.setTransactionName("TCI3");
for ( int j = 0;j<1000;j++)
{
WorkUnit wU = new WorkUnit(serverC, "MAINCP");
wU.join();
float f;
for (int i = 0;i<1000000;i++) f=ii2;
TimeUnit.MICROSECONDS.sleep(20*1000); // 200 milliseconds
wU.leave();
wU.delete(); // end the workload
}
wlmToken.disconnect();
}
}

The WLM statements are explained below.

WorkloadManager wlmToken = new WorkloadManager(“JES”, “SM3”);

This connects to the Work Load Manager and returns a connection token.    This needs to be done once per JVM.  You can use any relevant subsystem type, I used JES, and a SubsystemInstance (SI) of SM3. As a test, I created a new  subsystem category in WLM called DOG, and used that.  I defined ServerInstance SI with a value of SM3 within DOG and it worked.

z/OS uses uses subsystems such as JES for jobs submitted into JES2, and STC for Started task.

ServerClassification serverC = m.createServerClassification();

If your application is going to classify the transaction to determine the WLM service class and reporting  class you need this.  You create it, then add the classification criteria to it, see the following section.

Internally this passes the connection token wlmToken to the createServerClassification function.

serverC.setTransactionName(“TCI3”);

This passes information to WLM to determine the best service class and reporting class.  Within Subsystem CAT, Subsystem Instance SM1, I had a sub rule TransactionName (TN) with a value TCI3.  I defined the service class and a reporting class.

WorkUnit wU = new WorkUnit(serverC, “MAINCP”);

This creates the Independent (business transaction) enclave.  I have not see the value MAINCP reported in any reports.   This invokes the C run time function CreateWorkUnit(). The CreateWorkUnit function requires a STCK value of when the work unit started.  The Java code does this for you and passes the STCK through.

wU.join();

This connect the current task to the enclave, and any CPU it uses will be recorded against the enclave. 

wU.leave();

Disconnect the current task from the enclave.  After this call any CPU used by the thread will be recorded against the address space.

wU.delete();

The Independent enclave(Business transaction) has finished. WLM records the elapsed time and resources used for the business transaction.

m.disconnect();

The program disconnects from WLM.

Reporting class output.

I used RMF to print the SMF 72 records for this program.   The Reporting class for this program had

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF 
AVG        0.29  ACTUAL                36320 
MPL        0.29  EXECUTION             35291 
ENDED       998  QUEUED                 1028 
END/S      8.31  R/S AFFIN                 0 
#SWAPS        0  INELIGIBLE                0 
EXCTD         0  CONVERSION                0 
                 STD DEV               18368 
                                             
----SERVICE----   SERVICE TIME  ---APPL %--- 
IOC           0   CPU   12.543  CP      0.01 
CPU       10747   SRB    0.000  IIPCP   0.01 
MSO           0   RCT    0.000  IIP    10.44 
SRB           0   IIT    0.000  AAPCP   0.00 
TOT       10747   HST    0.000  AAP      N/A 

From this we can see that for the interval

  1. 998 transactions ended.  (Another report interval had 2 transactions ending)
  2. the response time was an average of 36.3 milliseconds
  3. a total of 12.543 seconds of CPU was used.
  4. it spent 10.44 % of the time on a ZIIP.
  5. 0.01 % of the time it was executing ZIIP eligible work on a CP as there was no available ZIIP.

Additional functions.

The functions below

  • ContinueWorkUnit – for dependent enclave
  • JoinWorkUnit – as before
  • LeaveWorkUnit – as before
  • DeleteWorkUnit – as before

can be used to record CPU against the dependent (Address space) enclave.  There is no WLM classify for a dependent enclave.

Java threads and WLM

A common application pattern is to use connection pooling.  For example the connect/disconnect to a database or MQ is expensive.  If you have a pool of threads, which connect, and start connected, an application can request a thread and get a thread which has already been connected to the resource manager.

It should be a simple matter of changing the interface from

connectionPool.getConnection()

to

connectionPool.getConnection(WorkUnit wU)
{
 connection = connectionPool.getConnection()
 connection.join(wU)
}

and add a connection.leave(wU) to the releaseConnection.