Using enclaves in a C program.

On z/OS enclaves allow you to set the priority of business transactions within your program, and to record the CPU used by the threads involved in the transaction – even if they are in a different address space.

Many of the WLM functions are assembler macros which require supervisor state. There are C run time functions which do some of the WLM functions. There are also Java methods which invoke the C run time functions.

Not all of the WLM functions are available in the C run time environment. You can set up enclaves, but the application cannot query information about the enclave, such as total CPU used, or the WLM parameters.

Minimal C program

The minimal program is below, and the key WLM functions are explained afterwards.

#include <sys/__wlm.h>
int main(void) { 
  wlmetok_t  enclavetoken; 
  server_classify_t   classify; 
  long rc; 
  // 
  //  Connect to work manager 
  // 
  unsigned int  connectToken = ConnectWorkMgr("JES","SM3");
  classify = __server_classify_create( ); 
  // pass the connection token to the classify
  //  This is needed but not documented 
  rc = __server_classify(classify, 
                         _SERVER_CLASSIFY_CONNTKN, 
                        (char *) connectToken 
       ); 
  rc = __server_classify(classify, 
                         _SERVER_CLASSIFY_TRANSACTION_NAME, 
                         "TCI2" 
  for ( int loop=0;loop < 1000 ;loop++) 
  { 
    rc= CreateWorkUnit(&enclavetoken, 
                       classify, 
                       NULL, 
                       "COLINS"   ); 
    rc = JoinWorkUnit(& enclavetoken); 
 
    // do some work to burn some CPU 
    for ( int i = 0;i< 100000 ;i++) 
    { 
      double xx = i/0.5; 
      double yy = xx * xx; 
    } 
    rc = LeaveWorkUnit(&enclavetoken); 
    rc = DeleteWorkUnit(&enclavetoken);   
    rc = DisconnectServer(&connectToken);
}
 

What are they key functions?

unsigned int connectToken = ConnectWorkMgr(“JES”,”SM3″); 

This creates a connection to WLM, and uses the subsystem JES, and subsystem name of SM3.  Note: On my system it is JES, not JES2.   The WLM dialogs, option 6. Classification Rules list the subsystems available.  You can browse a subsystem type and see the available definitions.  I had

         -------Qualifier--------                 -------Class--------  
Action   Type Name     Start        Service     Report   
 ____  1 SI   SM3      ___          TCI1SC      THRU
 ____  2   TN  TCI3    ___          TCI1SC      TCI3
 ____  2   TN  TCI2    ___          TCI1SC      TCI2

server_classify_t  classify = __server_classify_create( );

CreateWorkUnit, the function used  to create an independent enclave (business transaction), needs to be able to classify the transaction to determine what service class (priority) to give the enclave.  This request sets up the classify control block.

rc = __server_classify(classify, _SERVER_CLASSIFY_CONNTKN, (char *)&connectToken );

The documentation does not tell you to pass the connection token.  If you omit this step the CreateWorkUnit fails with error code errno2=0x0330083B.

The __server_classify expects a char * as the value, so you have to use (char *) & connectionToken.

rc = __server_classify(classify, _SERVER_CLASSIFY_TRANSACTION_NAME, “TCI2” );

This tells WLM about the transaction we want to use.  TRANSACTION_NAME matches up with TN above in the WLM definitions.  This says the business transaction is called TCI2.  There are other criteria such as userid, plan or LU. See here for the list.   The list is incomplete, as it does not support classifiers like Client IP address which is available with the assembler macros.

rc= CreateWorkUnit(&enclavetoken, classify,  NULL, “COLINS” );

This uses the classification parameter defined above, to create the independent enclave, and return the enclave token. 

The documentation for CreateWorkUnit says you need to pass the arrival time,  Address of a doubleword (unsigned long long) field that contains the arrival time of the work request  in STCK format.  I created a small assembler function which just returned a STCK value to my C program.  However I passed NULL and it seemed to produce the correct values  – I think CreateWorkUnit does a STCK for you.

You pass in the name of the function(“COLIN”).  The only place I had seen this is if you use the QueryWorkUnitClassification() to extract the classify information.  For example QueryWorkUnitClassification gave a control block with non empty fields   _ecdtrxn[8]=TCI2 , _ecdsubt[4]=JES , _ecdfcn[8]=COLIN  , _ecdsubn[8]=SM3 . This function does not return the report class or service class.

rc = JoinWorkUnit(& enclavetoken);

This cause any work this TCB does to be recorded against the enclave.

rc =LeaveWorkUnit(&enclavetoken);

This stop work being be recorded against the enclave.  Subsequent work gets charged to the home address space.

rc= DeleteWorkUnit(& enclavetoken);

The business transaction has finished.  Information about the response time and CPU used are stored.

rc =  DisconnectServer(&connectToken);

This disconnects from WLM.

Using a subtask

I had a different thread n the program which did some work for the transaction. Using the enclave token, this work can be recorded against the transaction using

// The enclave token is passed with the request
rc = JoinWorkUnit(&enclaveToken); 
do some work...
rc = LeaveWorkUnit(&enclaveToken);  

This would be useful if you are using a connect pool to connect to MQ or DB2 subsystem. You have a pool of threads which have done the expensive connect with a particular userid, and the thread is used to execute the MQ or DB2 subsystem as that userid.

Other function available

Other functions available

Dependent (address space) enclave

The above discussion was for an business transaction, know in the publications as an Independent enclave.   An address space can have a Dependent enclave where the CPU is recorded as “Dependent Enclave” within the address space.  You use the function ContinueWorkUnit(&enclave) to return the enclave token.    You then use JoinWorkUnit and LeaveWorkUnit as before.  I can not see why you might want to use this.

Display the classification

You can use the QueryWorkUnitClassification to return a structure for the classification.

Reset the classification.

If you want to classify a different transaction, you can use server_classify_init() to reset the structure.

Set up a server

You can set up a server where your application puts work onto WLM queues, and other threads can get work.   This is an advanced topic which I have not looked into.

Make your enclave visible across the sysplex

You can use ExportWorkUnit  and ImportWorkUnit to have your enclave be visible in the sysplex.

Query what systems in the sysplex are running in goal mode.

You can use QueryMetrics() to obtain the systems in the sysplex that are in goal mode. This includes  available CPU capacity and resource constraint status. 

What is not available to the C interface

One reason why I was investigating enclaves was to understand the enclave data in the SMF 30 records.  There is an assembler macro IWMEQTME which returns the CPU, ZIIP and ZAPP times, used by the independent enclaves.  Unfortunately this requires supervisor state.    I wrote some assembler code to extract this and display the data.  Another complication is that the IWLM macros are AMODE 31 – so it did not work with my 64 bit C program.

Enclaves in practice. How to capture all the CPU your application uses, and knowing where to look for it.

Z/OS has enclaves to manage work.  

  1. When an enclave is used, a transaction can issue a DB2 request, then DB2 uses some TCBs in the DB2 address spaces on behalf of the original request.  The CPU used used by these DB2 TCBs can be charged back to the original  application. 
  2. When an enclave is not used, the CPU used by the original TCB is charged to its address space, and the DB2 TCBs are charged to the DB2 address space.  You do not get a complete picture of the CPU used by the application.
  3. A transaction can be defined to Work Load Managed(WLM) to set the priority of the transaction, so online transactions have high priority, and background work gets low priority.    With an enclave, the DB2 TCBs have the same priority as the original request.  With no enclave the TCBs have the priority as determined by DB2.

When an application sets up an enclave

  1. Threads can join the enclave, so any CPU the thread uses while in the enclave, is recorded against the enclave.
  2. These threads can be in the same address space, a different address space on the same LPAR, or even in a different LPAR in the SYSPLEX.
  3. Enclaves are closely integrated with Work Load Manager(WLM).   When you create an enclave you can give information about the business transaction, (such as transaction name and  userid).   You classify the application against different factors. 
  4. The classification maps to a service class.   This service class determines the appropriate priority profile.  Any threads using the enclave will get this priority.
  5. WLM reports on the elapsed time of the business transaction, and the CPU used.

What enclave types are there?

In this simple explanation there are two enclave types

  1. Independent enclave – what I think of as Business Transaction, where work can span multiple address spaces.  You pass transaction information (transaction, userid, etc) to WLM so it can set the priority for the enclave. You can get reports on the enclave showing elapsed time, and CPU used.  There can be many independent enclaves in the lifetime of a job.  You can have these enclaves running in parallel within a job.
  2. Dependent enclave or Address space enclave.   I cannot see the reason for this.  This is for tasks running within an address space which are not doing work for an independent enclave.  It could be used for work related to transactions in general.   In the SMF 30 job information records you get information on CPU used in the dependent enclave.  
  3. Work not in an enclave.  Threads by default run with the priority assigned to the address space.  CPU is charged to the address space.

To help me understand enclave reports, I set up two jobs

  1. The parent job,
    1. Creates an independent (transactional) enclave with “subsystem=JES, definition=SM3” and “TRANSACTION NAME=TCI2”.  It displays the enclave token.
    2. Sets up a dependent enclave.
    3. Joins the dependent enclave.
    4. Does some CPU intensive work.
    5. Sleeps for 30 seconds.
    6. Leaves the dependent enclave.
    7. Deletes the dependent enclave.
    8. Deletes the independent enclave.
    9. Ends.
  2. The child, or subtask, job.
    1. This reads the enclave token as a parameter.
    2. Joins the enclave,if the enclave does not exist, use the dependent enclave.
    3. Does some CPU intensive work.
    4. Leaves the enclave.
    5. Ends.

Where is information reported?

  1. Information about a job and the resources used by the job is in SMF 30 records. It reports total CPU used,  CPU used by independent enclaves, CPU used by the dependant enclave.  In JCL output where it reports the CPU etc used by the job step, this comes from the SMF 30 record.
  2. Information about the independent enclave is summarised in an SMF 72 record over a period(typically 15 minutes) and tells you information about the response time distribution, and the CPU used.

I used three scenarios

  1. Run the parent job – but not not the child.  This shows the costs of the just parent – when there is no child running the workload for it.
  2. Run the child but not the parent.   This shows the cost of the expensive workload.
  3. Both parent and child active.   This shows the costs of running the independent enclave  in the child are charged to the parent. 

SMF job resource used report

From the SMF 30 record we get the CPU for the Independent enclave.

Parent CPUChild CPU
Parent, no child
Total              : 0.070
Dependent enclave : 0.020
Not applicable
Child,no parentNot applicable
Total CPU         : 2.930
Dependent Enclave : 2.900
Non Enclave : 0.030
Parent and child
Total               : 2.860 
Independent enclave : 2.820
Dependent enclave : 0.010
Non enclave : 0.030
Total            : 0.020
No enclave CPU reported

From the parent and child we can see that the CPU used by the enclave work in the child job has been recorded against the parent’s job under “Independent enclave CPU”.

The SMF type 30 record shows the Parent job had CPU under Independent enclave, Dependent enclave, and a small amount (0.03) which was not enclave.

SMF WLM reports

From the SMF 72 data displayed by RMF (see below for an example) you get the number of transactions and CPU usage for the report class, and service class. I had a report class for each of the parent, and the child job, and for the Independent enclave transaction.

Total CPUElapsed timeEnded
Parent2.81834.971
Child2.6366.761
Business transaction2.81930.041

It is clear there is some double accounting. The CPU used for the child doing enclave processing, is also recorded in the Parent’s cost. The CPU used for the Business transaction is another view of the data from the parent and child address spaces.

For charging based on address spaces you should use the SMF 30 records.

You can use the SMF 72 records for reporting on the transaction costs.

RMF workload reports

When processing the SMF data using RMF you get out workload reports

//POST EXEC PGM=ERBRMFPP 
//MFPINPUT DD DISP=SHR,DSN=smfinput.dataset  
//SYSIN DD * 
SYSRPTS(WLMGL(SCPER,RCLASS,RCPER,SCLASS)) 
/* 

For the child address space report class RMF reported

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF
 AVG        0.05  ACTUAL             6.760380
 MPL        0.05  EXECUTION          6.254239
 ENDED         1  QUEUED               506141
 ...
 ----SERVICE----   SERVICE TIME  ---APPL %---
 IOC        2152   CPU    2.348  CP      2.20
 CPU        2012   SRB    0.085  IIPCP   0.00
 MSO         502   RCT    0.004  IIP     0.00
 SRB          73   IIT    0.197  AAPCP   0.00
 TOT        4739   HST    0.002  AAP      N/A

There was 1 occurrence of the child job, it ran for 6.76 seconds on average, and used a total of 2.636 seconds of CPU (if you add up the service time).

For a more typical job using many short duration independent enclaves the report looked like

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF 
 AVG        0.11  ACTUAL                13395 
 MPL        0.11  EXECUTION             13395 
 ENDED      1000  QUEUED                    0 
 END/S      8.33  R/S AFFIN                 0 
 SWAPS         0  INELIGIBLE                0
 EXCTD         0  CONVERSION                0 
                  STD DEV                1325 
 ----SERVICE----   SERVICE TIME  
 IOC           0   CPU    1.448   
 CPU        1241   SRB    0.000   
 MSO           0   RCT    0.000  
 SRB           0   IIT    0.000  
 TOT        1241   HST    0.000  

This shows 1000 transaction ended in the period and the average transaction response time was 13.395 milliseconds. The total CPU time used was 1.448 seconds, or an average of 1.448 milliseconds of CPU per transaction.

For the service class with a response time definition, you get a response time profile. The data below shows the most most response times were between 15 and 20 ms. The service class was defined with “Average response time of 00:00:00.010”. This drives the range of response times reported. If this data was for a production system you may want to adjust the “Average response time” to 00:00:00.015 to get the peak in the middle of the range.

-----TIME--- -% TRANSACTIONS-    0  10   20   30   40   50 
    HH.MM.SS.FFF CUM TOTAL BUCKET|...|....|....|....|....|
 <= 00.00.00.005       0.0  0.0   >                           
 <= 00.00.00.006       0.0  0.0  >                           
 <= 00.00.00.007       0.0  0.0  >                           
 <= 00.00.00.008       0.0  0.0  >                           
 <= 00.00.00.009       0.0  0.0  >                           
 <= 00.00.00.010       0.0  0.0  >                           
 <= 00.00.00.011       0.3  0.3  >                           
 <= 00.00.00.012      10.6 10.3  >>>>>>                      
 <= 00.00.00.013      24.9 14.3  >>>>>>>>                    
 <= 00.00.00.014      52.3 27.4  >>>>>>>>>>>>>>              
 <= 00.00.00.015      96.7 44.4  >>>>>>>>>>>>>>>>>>>>>>>     
 <= 00.00.00.020     100.0  3.2  >>                          
 <= 00.00.00.040     100.0  0.1  >                           
    00.00.00.040     100.0  0.0  >                                                                                

Take care.

The field ENDED is the number of transactions that ended in the interval. If you have a measurement that spans an interval, you will get CPU usage in both intervals, but “ENDED” only when the transaction ends. With a large number of transactions this effect is small. With few transactions it could cause divide by zero exceptions!

No, No, think before you create a naming convention

I remember doing a review of a large customer who had grown by mergers and acquisitions.  We were discussing naming conventions, and did they have them.

“Naming conventions”, he said “we love them.  We have hundreds of them around the place”. He said it was to hard and disruptive to try to get it down to a small number of naming conventions.

I saw someone’s MQ configuration and wished they had thought through their naming convention, or asked someone with more experience.  This is what I saw

  • The MQ libraries were called CSQ910.SCSQAUTH
    • This is OK as it tells you what level of MQ you are using
    • It would be good to have a dataset alias of CSQ pointing to CSQ910.  Without this you have to change the JCL for all job, compiles, runs etc which had CSQ910.  When you moved from CSQ810 to CSQ900 you have change the JCL. If you then decide to go back to CSQ810 for a week, you have to change the JCL again.  With the alias is is easy – change the alias and the JCL does not need to change.    Change the alias again – and the JCL does not need to change.
  • The MQ logs were called CSQ710.QM.LOGCOPY1.DS01, … DS02,…DS03
    • This shows the classic problem of having the queue manager release as part of the object names.  It would have been better to have names like CSQ.QM.LOGCOPY1.DS01 without the MQ version in it.
    • The name does include a queue manager name of sorts, but a queue manager name of QM is not very good.  If you need another queue manager you will have names like QM, QMA, QMB so an inconsistent name.
    • It is good to have the queue manager name as part of the data set name, so if the queue manager was QM01 then have CSQ.QM01.
  • The page sets were CSQ710.QM.PAGESET0, CSQ710.QM.PAGESET1,  CSQ710.QM.PAGESET2,  CSQ710.QM.PAGESET3,  CSQ810.QM.PAGESET4, CSQ910.QM.PAGESET5
    • This shows the naming standard problem as it evolved over time.  They added more page sets, and used the MQ release as the High Level Qualifier.  The page sets are CSQ710,… CSQ810…,  CSQ910… – following the naming standard.

You do not invent a naming convention in isolation, you need to put an architect’s hat on and see the bigger picture, where you have production and test queue managers, different versions of MQ, and see MQ is just a small part of the z/OS infrastructure.

  • People often have one queue manager per LPAR, and call MQ after the LPAR.
  • You are likely to have multiple machines – for example to provide availability, so plan for multiple queue managers.
  • You may want different HLQ to be able to identify production queue manager data sets and test queue manager data sets..
  • The security team will need to set up profiles for queue managers. Having MQPROD and MQTEST as a HLQ may make it easier to set up.
  • The storage team (what I used to call data managers)  set up SMS with rules for data set placement. For example production pagesets with a name like MQPROD.**.PSID* go on the newest, fastest, mirrored disks.  MQTEST.** go on older disks.
  • As part of the SMS definitions, the storage team define how often, and when, to backup data sets.   A production page set may be backed up using flash copy once an hour.   (This is within the Storage subsystem and takes seconds.   It takes a copy by taking a copy of the pointers to the records on disk).   Non production get backed up overnight.

 

Lessons learned

  • For the IBM provided libraries, include the VRM in the data set names.
  • Define an alias pointing to the current libraries so applications do not need to change JCL.   You could have a Unix Services alias for the files in the zFS.
  • Do not put the MQ release in the queue manager data sets names.
  • Use queue manager names that are relevant and scale.
  • Talk to your security and storage managers about the naming conventions; what you want protected, and how you want your queue manager data sets to be managed.