Using enclaves in a java program

Ive blogged about using enclaves from a C program.  There is an interface from Java which uses this C interface.

Is is relatively easy to use enclave services from a java program, as there are java classes for most of the functions, available from JZOS toolkit.  For example the WorkloadManager class is defined here.

Below is a program I used to get the Work Load Manager(WLM) services working.

import java.util.concurrent.TimeUnit;
import com.ibm.jzos.wlm.ServerClassification;
import com.ibm.jzos.wlm.WorkUnit;
import com.ibm.jzos.wlm.WorkloadManager;
public class main
{
// run it with /usr/lpp/java/J8.0_64/bin/java main
public static void main(String[] args) throws Exception
{
WorkloadManager wlmToken = new WorkloadManager("JES", "SM3");
ServerClassification serverC = wlmToken.createServerClassification();
serverC.setTransactionName("TCI3");
for ( int j = 0;j<1000;j++)
{
WorkUnit wU = new WorkUnit(serverC, "MAINCP");
wU.join();
float f;
for (int i = 0;i<1000000;i++) f=ii2;
TimeUnit.MICROSECONDS.sleep(20*1000); // 200 milliseconds
wU.leave();
wU.delete(); // end the workload
}
wlmToken.disconnect();
}
}

The WLM statements are explained below.

WorkloadManager wlmToken = new WorkloadManager(“JES”, “SM3”);

This connects to the Work Load Manager and returns a connection token.    This needs to be done once per JVM.  You can use any relevant subsystem type, I used JES, and a SubsystemInstance (SI) of SM3. As a test, I created a new  subsystem category in WLM called DOG, and used that.  I defined ServerInstance SI with a value of SM3 within DOG and it worked.

z/OS uses uses subsystems such as JES for jobs submitted into JES2, and STC for Started task.

ServerClassification serverC = m.createServerClassification();

If your application is going to classify the transaction to determine the WLM service class and reporting  class you need this.  You create it, then add the classification criteria to it, see the following section.

Internally this passes the connection token wlmToken to the createServerClassification function.

serverC.setTransactionName(“TCI3”);

This passes information to WLM to determine the best service class and reporting class.  Within Subsystem CAT, Subsystem Instance SM1, I had a sub rule TransactionName (TN) with a value TCI3.  I defined the service class and a reporting class.

WorkUnit wU = new WorkUnit(serverC, “MAINCP”);

This creates the Independent (business transaction) enclave.  I have not see the value MAINCP reported in any reports.   This invokes the C run time function CreateWorkUnit(). The CreateWorkUnit function requires a STCK value of when the work unit started.  The Java code does this for you and passes the STCK through.

wU.join();

This connect the current task to the enclave, and any CPU it uses will be recorded against the enclave. 

wU.leave();

Disconnect the current task from the enclave.  After this call any CPU used by the thread will be recorded against the address space.

wU.delete();

The Independent enclave(Business transaction) has finished. WLM records the elapsed time and resources used for the business transaction.

m.disconnect();

The program disconnects from WLM.

Reporting class output.

I used RMF to print the SMF 72 records for this program.   The Reporting class for this program had

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF 
AVG        0.29  ACTUAL                36320 
MPL        0.29  EXECUTION             35291 
ENDED       998  QUEUED                 1028 
END/S      8.31  R/S AFFIN                 0 
#SWAPS        0  INELIGIBLE                0 
EXCTD         0  CONVERSION                0 
                 STD DEV               18368 
                                             
----SERVICE----   SERVICE TIME  ---APPL %--- 
IOC           0   CPU   12.543  CP      0.01 
CPU       10747   SRB    0.000  IIPCP   0.01 
MSO           0   RCT    0.000  IIP    10.44 
SRB           0   IIT    0.000  AAPCP   0.00 
TOT       10747   HST    0.000  AAP      N/A 

From this we can see that for the interval

  1. 998 transactions ended.  (Another report interval had 2 transactions ending)
  2. the response time was an average of 36.3 milliseconds
  3. a total of 12.543 seconds of CPU was used.
  4. it spent 10.44 % of the time on a ZIIP.
  5. 0.01 % of the time it was executing ZIIP eligible work on a CP as there was no available ZIIP.

Additional functions.

The functions below

  • ContinueWorkUnit – for dependent enclave
  • JoinWorkUnit – as before
  • LeaveWorkUnit – as before
  • DeleteWorkUnit – as before

can be used to record CPU against the dependent (Address space) enclave.  There is no WLM classify for a dependent enclave.

Java threads and WLM

A common application pattern is to use connection pooling.  For example the connect/disconnect to a database or MQ is expensive.  If you have a pool of threads, which connect, and start connected, an application can request a thread and get a thread which has already been connected to the resource manager.

It should be a simple matter of changing the interface from

connectionPool.getConnection()

to

connectionPool.getConnection(WorkUnit wU)
{
 connection = connectionPool.getConnection()
 connection.join(wU)
}

and add a connection.leave(wU) to the releaseConnection.

Enclaves in practice. How to capture all the CPU your application uses, and knowing where to look for it.

Z/OS has enclaves to manage work.  

  1. When an enclave is used, a transaction can issue a DB2 request, then DB2 uses some TCBs in the DB2 address spaces on behalf of the original request.  The CPU used used by these DB2 TCBs can be charged back to the original  application. 
  2. When an enclave is not used, the CPU used by the original TCB is charged to its address space, and the DB2 TCBs are charged to the DB2 address space.  You do not get a complete picture of the CPU used by the application.
  3. A transaction can be defined to Work Load Managed(WLM) to set the priority of the transaction, so online transactions have high priority, and background work gets low priority.    With an enclave, the DB2 TCBs have the same priority as the original request.  With no enclave the TCBs have the priority as determined by DB2.

When an application sets up an enclave

  1. Threads can join the enclave, so any CPU the thread uses while in the enclave, is recorded against the enclave.
  2. These threads can be in the same address space, a different address space on the same LPAR, or even in a different LPAR in the SYSPLEX.
  3. Enclaves are closely integrated with Work Load Manager(WLM).   When you create an enclave you can give information about the business transaction, (such as transaction name and  userid).   You classify the application against different factors. 
  4. The classification maps to a service class.   This service class determines the appropriate priority profile.  Any threads using the enclave will get this priority.
  5. WLM reports on the elapsed time of the business transaction, and the CPU used.

What enclave types are there?

In this simple explanation there are two enclave types

  1. Independent enclave – what I think of as Business Transaction, where work can span multiple address spaces.  You pass transaction information (transaction, userid, etc) to WLM so it can set the priority for the enclave. You can get reports on the enclave showing elapsed time, and CPU used.  There can be many independent enclaves in the lifetime of a job.  You can have these enclaves running in parallel within a job.
  2. Dependent enclave or Address space enclave.   I cannot see the reason for this.  This is for tasks running within an address space which are not doing work for an independent enclave.  It could be used for work related to transactions in general.   In the SMF 30 job information records you get information on CPU used in the dependent enclave.  
  3. Work not in an enclave.  Threads by default run with the priority assigned to the address space.  CPU is charged to the address space.

To help me understand enclave reports, I set up two jobs

  1. The parent job,
    1. Creates an independent (transactional) enclave with “subsystem=JES, definition=SM3” and “TRANSACTION NAME=TCI2”.  It displays the enclave token.
    2. Sets up a dependent enclave.
    3. Joins the dependent enclave.
    4. Does some CPU intensive work.
    5. Sleeps for 30 seconds.
    6. Leaves the dependent enclave.
    7. Deletes the dependent enclave.
    8. Deletes the independent enclave.
    9. Ends.
  2. The child, or subtask, job.
    1. This reads the enclave token as a parameter.
    2. Joins the enclave,if the enclave does not exist, use the dependent enclave.
    3. Does some CPU intensive work.
    4. Leaves the enclave.
    5. Ends.

Where is information reported?

  1. Information about a job and the resources used by the job is in SMF 30 records. It reports total CPU used,  CPU used by independent enclaves, CPU used by the dependant enclave.  In JCL output where it reports the CPU etc used by the job step, this comes from the SMF 30 record.
  2. Information about the independent enclave is summarised in an SMF 72 record over a period(typically 15 minutes) and tells you information about the response time distribution, and the CPU used.

I used three scenarios

  1. Run the parent job – but not not the child.  This shows the costs of the just parent – when there is no child running the workload for it.
  2. Run the child but not the parent.   This shows the cost of the expensive workload.
  3. Both parent and child active.   This shows the costs of running the independent enclave  in the child are charged to the parent. 

SMF job resource used report

From the SMF 30 record we get the CPU for the Independent enclave.

Parent CPUChild CPU
Parent, no child
Total              : 0.070
Dependent enclave : 0.020
Not applicable
Child,no parentNot applicable
Total CPU         : 2.930
Dependent Enclave : 2.900
Non Enclave : 0.030
Parent and child
Total               : 2.860 
Independent enclave : 2.820
Dependent enclave : 0.010
Non enclave : 0.030
Total            : 0.020
No enclave CPU reported

From the parent and child we can see that the CPU used by the enclave work in the child job has been recorded against the parent’s job under “Independent enclave CPU”.

The SMF type 30 record shows the Parent job had CPU under Independent enclave, Dependent enclave, and a small amount (0.03) which was not enclave.

SMF WLM reports

From the SMF 72 data displayed by RMF (see below for an example) you get the number of transactions and CPU usage for the report class, and service class. I had a report class for each of the parent, and the child job, and for the Independent enclave transaction.

Total CPUElapsed timeEnded
Parent2.81834.971
Child2.6366.761
Business transaction2.81930.041

It is clear there is some double accounting. The CPU used for the child doing enclave processing, is also recorded in the Parent’s cost. The CPU used for the Business transaction is another view of the data from the parent and child address spaces.

For charging based on address spaces you should use the SMF 30 records.

You can use the SMF 72 records for reporting on the transaction costs.

RMF workload reports

When processing the SMF data using RMF you get out workload reports

//POST EXEC PGM=ERBRMFPP 
//MFPINPUT DD DISP=SHR,DSN=smfinput.dataset  
//SYSIN DD * 
SYSRPTS(WLMGL(SCPER,RCLASS,RCPER,SCLASS)) 
/* 

For the child address space report class RMF reported

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF
 AVG        0.05  ACTUAL             6.760380
 MPL        0.05  EXECUTION          6.254239
 ENDED         1  QUEUED               506141
 ...
 ----SERVICE----   SERVICE TIME  ---APPL %---
 IOC        2152   CPU    2.348  CP      2.20
 CPU        2012   SRB    0.085  IIPCP   0.00
 MSO         502   RCT    0.004  IIP     0.00
 SRB          73   IIT    0.197  AAPCP   0.00
 TOT        4739   HST    0.002  AAP      N/A

There was 1 occurrence of the child job, it ran for 6.76 seconds on average, and used a total of 2.636 seconds of CPU (if you add up the service time).

For a more typical job using many short duration independent enclaves the report looked like

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF 
 AVG        0.11  ACTUAL                13395 
 MPL        0.11  EXECUTION             13395 
 ENDED      1000  QUEUED                    0 
 END/S      8.33  R/S AFFIN                 0 
 SWAPS         0  INELIGIBLE                0
 EXCTD         0  CONVERSION                0 
                  STD DEV                1325 
 ----SERVICE----   SERVICE TIME  
 IOC           0   CPU    1.448   
 CPU        1241   SRB    0.000   
 MSO           0   RCT    0.000  
 SRB           0   IIT    0.000  
 TOT        1241   HST    0.000  

This shows 1000 transaction ended in the period and the average transaction response time was 13.395 milliseconds. The total CPU time used was 1.448 seconds, or an average of 1.448 milliseconds of CPU per transaction.

For the service class with a response time definition, you get a response time profile. The data below shows the most most response times were between 15 and 20 ms. The service class was defined with “Average response time of 00:00:00.010”. This drives the range of response times reported. If this data was for a production system you may want to adjust the “Average response time” to 00:00:00.015 to get the peak in the middle of the range.

-----TIME--- -% TRANSACTIONS-    0  10   20   30   40   50 
    HH.MM.SS.FFF CUM TOTAL BUCKET|...|....|....|....|....|
 <= 00.00.00.005       0.0  0.0   >                           
 <= 00.00.00.006       0.0  0.0  >                           
 <= 00.00.00.007       0.0  0.0  >                           
 <= 00.00.00.008       0.0  0.0  >                           
 <= 00.00.00.009       0.0  0.0  >                           
 <= 00.00.00.010       0.0  0.0  >                           
 <= 00.00.00.011       0.3  0.3  >                           
 <= 00.00.00.012      10.6 10.3  >>>>>>                      
 <= 00.00.00.013      24.9 14.3  >>>>>>>>                    
 <= 00.00.00.014      52.3 27.4  >>>>>>>>>>>>>>              
 <= 00.00.00.015      96.7 44.4  >>>>>>>>>>>>>>>>>>>>>>>     
 <= 00.00.00.020     100.0  3.2  >>                          
 <= 00.00.00.040     100.0  0.1  >                           
    00.00.00.040     100.0  0.0  >                                                                                

Take care.

The field ENDED is the number of transactions that ended in the interval. If you have a measurement that spans an interval, you will get CPU usage in both intervals, but “ENDED” only when the transaction ends. With a large number of transactions this effect is small. With few transactions it could cause divide by zero exceptions!