Understanding Z/OS Connect SMF 123 subtype 2 record

Introduction to the z/OS Connect SMF records

z/OS Connect can provide two types of SMF record

  1. SMF 120 subtype 11, provided by the base Liberty support. This gives information on the URL used to access Liberty, and the CPU used to perform requests. This is enabled at the Server level – so you can have records for all request, or no requests. There is one SMF record for each web server request.
  2. SMF 123 provides information about API and the service used, and the “pass through” services. It provides elapsed time of the request, and of the the “pass through” requests. It does not provide CPU usage figures. This can be configured to produce records depending on the http host and port used to access z/OS Connect. One SMF record can have data for multiple web server requests. The SMF records are produced when the SMF record is full – or the server is shut down.

The SMF 120-11 and SMF 123 records are produced independently, and there is no correlating field between them. They both have a URI field, and time stamps, so at low volumes it may be possible to correlate the SMF data.

I’ll document the fields which I think are interesting. If you think other fields are useful please let me know and I’ll update this document.

I have written an SMF formatter in C which prints out interesting data, and summarises it.

SMF 123 subtype 2 fields

  • You get the standard date and time the record was produced, and with the LPAR. You can use PGM=IFASMFDP with the following to filter which records are copied
DATE(2020282,2020282)
START(1000)
START(2359)
  • There is server version (3), system(SOW1), SYSPLEX(ADCDPLEX) and job id(STC04774) which are not very interesting
  • Server job name(SM3) is more interesting. I started the server with s baqstrt,parms=’MQTEST’,jobname=sm3
  • The config dir (/var/zosconnect/servers/MQTEST/) is boring – as is server level (open beta)
  • The HTTP code, for example 200 OK, 403 Forbidden. You may want to report requests with errors in a separate file
    1. So you know you have errors and can fix them
    2. Your statistics, such as average response time do not have dirty data in them.
  • An HTTP 1 character flag – this has always been 00 for me. I cannot find it documented.
  • The Client IP address. 10.1.1.1
  • You get userid information.
    • I used a certificate to authenticate. The DN from the certificate is not available. You only get the userid from the RACF mapping of DN to userid. This mapped userid was in the 64 byte field. The 8 byte userid field was empty for me. The lack of certificate DN, and having the userid in the wrong field feels like a couple of buglets. If you use LDAP, I think the long ID is stored in the long field, and the z/OS userid stored in the short field – so inconsistent.
  • You get the URL(URI) used /stockmanager/stock/items/999999. I treat this as a main key for processing and summarising data. You may need to process this value as the partnumber (999999) will be different for each query. You may want to have standards which say the first two elements (/stockmanager/stock) are useful for reporting. The remaining elements should be ignored when summarising records.
  • The start and stop times (2020/10/08 09:18:19.852360 and 2020/10/08 09:18:22.073946) are interesting. You can calculate the duration – which is the difference between them.
  • Request type API, Service, Admin. An Admin request is like using an URL like /zosConnect/services/stockQuery to get information about the stockQuery service.
  • The API name and version – stockmanager 1.0.0
  • The service name and version – stockQuery 1.0.0. You get the version information. If you do an online display of the service the version is not available.
  • Method GET/POST etc
  • The service provider. This is the code which does the real work of connection to CICS, MQ, or passing the request through. IBM MQ for z/OS, restclient-1.0
  • Request id starts at 1 and increments for the life of the server. If you restart the server it will restart from 1. I do not think this is very useful.
  • For “pass through” requests, z/OS Connect confusingly calls the back end service the Statement of Record or (SOR). (MQ is a transport, not a Statement of Record.) The “pass through” service definition is built from a parameter file and zconbt program. The reported data is
    • SOR ID the host and port 10.1.3.10:19443. These are from the <zosconnect_zosConnectServiceRestClientConnection host and port values.
    • SOR Reference restClientServiceY This is from the connectionRef=in the parameter file and the <zosconnect_zosConnectServiceRestClientConnection…> definition
    • SOR Resource zosConnect/apis/through. This is from the uri= in the parameter file.
    • Time of entry and time of return of the SOR service.
    • From the times calculate the difference to get the duration of the remote request.
  • It would be useful to have this “pass through time” for services calling MQ, CICS etc, so we could get a true picture of the time spent processing the requests.
  • The size of the data payload (0) , and the size of the response(94) excluding any headers.
  • A tracking token. The hex 64 byte string is passed to the some called servers. It is passed to some backends (CICS and “pass through”) to be able to correlate the request across systems. It is not passed to MQ. See X-Correlation-ID below for an example. This field is nulls for Admin request. When a request was “passed through” to another z/OS Connect server which processed the request, the tracking token was not reported in the SMF data of the second system. I dont know if the CICS SMF data records this token, but it is of little use for MQ, or for “pass through”.
  • You get 4 request header, and 4 response header fields. They were blank in my measurements, even though headers were passed to the pass through service. Looking a the http traffic, the request coming in had a header “Content-Type:application/json”. The request passed through to the back end included
    • User-Agent: IBM_zOS_Connect_REST_SP/open beta/20200803-0918
    • X-Correlation-ID: BAQ1wsHYAQAYwcTDxNfTQEDi8ObxQEBAQNikjpk+1klAAA==

What can you do with the data?

Do I need to use the SMF data?

From a performance perspective these records do not provide much information, as they are lacking information about CPU usage. From an audit perspective they have some useful information – but the records are missing information which would provide useful audit information. There is an overlap between the information in the SMF 123 records and the …/servers/…logs/http_access.log file which provides, date time, userid, URI, HTTP code.

What do I want to report on?

Decide what elements of the URI you want to report on. For example the URI /stockmanager/stock/items/999999 includes the stock part number, which may be different for each request. You might decide to summarise API usage on just the first two elements /stockmanager/stock/. You may have to treat each API individually to extract the key information.

I’ll use the term key for the interesting part of the URI – for example /stockmanager/stock.

What reports are interesting?

I think typical questions are:

  1. Which is the most popular API key?
  2. Is the usage of an API key increasing?
  3. How many API key requests were unsuccessful? This can show set-up problems, or penetration attempts.
  4. What is the response time profile of the requests? Are you meeting the business response time criteria?
  5. Which sites are sending in most of the requests. You cannot charge back on CPU used, as you do not know the CPU usage. You could do charge back at a fixed cost per API request, with each API having a different rate.
  6. Which userids are sending in most of the requests. You may want to provide more granular certificate to userid mapping to give you more detailed information

Understanding z/OS Connect SMF 120 subtype 11 data

z/OS Connect can provide two types of SMF record

  1. SMF 120 subtype 11, provided by the base Liberty support. This gives information on the URL used to access Liberty, and the CPU used to perform requests. This is enabled at the Server level – so you can have records for all request, or no requests. There is one SMF record for each web server request. Would I use this to report CPU used ? No – see the bottom of this blog.
  2. SMF 123 provides information about API and the service used, and the “pass through” services. It provides elapsed time of the request, and of the the “pass through” requests. It does not provide CPU usage figures. This can be configured to produce records depending on the http host and port used to access z/OS Connect. One SMF record can have data for multiple web server requests. The SMF records are produced when the SMF record is full – or the server is shut down.

The SMF 120-11 and SMF 123 records are produced independently, and there is no correlating field between them. They both have a URI field, and time stamps, so at low volumes it may be possible to correlate the SMF data.

I’ll document the fields which I think are interesting. If you think other fields are useful please let me know and I’ll update this document.

I have written an SMF formatter in C which prints out interesting data, and summarises it.

SMF 120-11

  • You get the standard date and time the record was produced, and with the LPAR. You can use PGM=IFASMFDP with the following to filter which records are copied
DATE(2020282,2020282)
START(1000)
START(2359)
  • There is server version (3), system(SOW1), and job id(STC04774) which are not very interesting
  • Server job name(SM3) is more interesting. I started the server with s baqstrt,parms=’MQTEST’,jobname=sm3
  • The config dir (/var/zosconnect/servers/MQTEST/) is boring – as is code level (20.0.0.6)
  • The start and stop times (2020/10/08 09:18:19.852360 and 2020/10/08 09:18:22.073946) are interesting as is the duration – which is the difference between them.
  • You get userid information.
    • I used a certificate to authenticate. The DN from the certificate is not available. You only get the userid from the RACF mapping of DN to userid. This mapped userid was in the 64 byte field. The 8 byte userid field was empty for me. The lack of certificate DN, and having the userid in the wrong field feels like a couple of buglets.
  • You get the URL used /stockmanager/stock/items/999999 I treat this as a main key for processing and summarising data. If you want to summarise the data, you may want so summarise it just on /stockmanager/stock/. The full URI contains the part number – and so I would expect a large number of parts.
  • You can configure your requests to WLM. For example
<wlmClassification>
<httpClassification transactionClass="TCI1" method="GET" 
    resource="/zosConnect/services/stockQuery"/>
</wlmClassification>

This produced in the SMF record

WLMTRan :TCI1
WLM Classify type :URI :/zosConnect/services/stockQuery
WLM Classify type :Target Host :10.1.3.10
WLM Classify type :Target Port :19443

This means that the URL, the host, and the port were passed to WLM to classify.

If you get the WLM classification you also get CPU figures when the enclave request ended (was deleted).

  • You get the ports associated with the request.
    • Which port was used on the server – Target Port :9443
    • Where did the request come from? Origin :10.1.1.1 and port :36786
  • The number of bytes in the response Response bytes :791
  • CPU figures for the CPU used on the TCB. See discussion below on the usefulness of this number. You get the CPU figures before the request, and after the request – so you have to calculate the difference yourself! It does not look like the developers knew what the end user wanted. The values come from the timeused facility. You can calculate the delta and get
    • CPU Used Total : 0.967417
    • CPU Used on CP : 0.026327
    • and calculate these to to get CPU Delta. on Z**P : 0.941090 This is the CPU offloaded to ZIIP or ZAAP.
  • If you had the URI classified with WLM, you get Enclave data, see below for a discussion on what the numbers mean.
    • Enclave CPU time : 0.148803
    • Enclave CPU service : 0.000000
    • Enclave ZIIP time : 0.148803
    • Enclave ZIIP Service : 0.000000

What do the CPU numbers mean?

Typically a transaction flow is as follows

  1. A listening thread listens on the HTTP(s) host and port.
  2. When a request arrives, it passes the request to a worker thread, and goes back to listening
    1. The worker thread may do some work and send the response back
    2. The worker thread may need to call another thread to do some work. For example to issue an MQ request,
      1. the MQ code looks for a thread in a pool for a matching queue manager and userid. If it find one it uses it the thread and issues the MQ request.
      2. If it does not find a matching thing thread it may allocate a new thread, and issue an MQCONN to connect to MQ. These are both expensive operations, which is why having a pool of threads with queue manager and userid is a good way of saving CPU
      3. The work is done
      4. The thread is put back into the MQ pool, and the application returns to the worker thread
      5. The worker thread sends the response back to the originator

A thread can ask the operating system, how much CPU time it(the thread) has used. What usually happens is

  1. the thread requests how much CPU it has used
  2. the thread does some work
  3. the thread requests how much CPU it has used,
  4. the thread calculates the difference between the two CPU values and reports this delta.

I the SMF 120 record records the CPU from just the worker thread – and no other thread.

Enclaves

When there are more than one thread involved it gets more complex, as you could have a CICS transaction issuing an MQ request, then a DB2 request, and then an IMS request. You can set up z/OS WorkLoad Manager(WLM) to say “these CICS transactions in this CICS region are high priority”.

With some subsystems you can pass a WLM token into a request. The thread being invoked call tell WLM that the thread is now working on behalf of this token. The thread does some work, and tells WLM that it has finished doing the work. WLM can manage the priority of the threads to achieve the best throughput, for example making the thread high or low priority. WLM can manage a thread doing work in multiple LPARs across a sysplex!

WLM records the CPU used by the thread while performing the work, accumulates and reports this.

This use of multiple threads for a business transaction across one or more address spaces is known as an enclave.

What happens with enclaves?

  1. A request arrives at the listener thread.
  2. The Liberty looks up the URI in the <wlmClassification httpClassification…. It compare the server’s host, server’s port, the URI resource /stockmanager… method ( GET) and finds the best match for the transactionClass.
    1. If there is a transactionClass,
      1. the server calls WLM with the Subsystem type of CB, the specified collectionName, and the transactionClass.
      2. WLM looks for these parameters and if WLM has a matching definition then WLM will manage the priority of the work,
      3. WLM returns a WLM token.
      4. This WLM token is passed to threads which are set up for enclaves.
    2. If there is no transaction class specified in Liberty, or WLM does not have the subsystem, collectionname, transactionClass then there is no token or a null WLM token
    3. The work continue as before.
    4. If another thread is used then pass the WLM token. If the code is set up for WLM token then report “work started”, when it has finished report “work ended”

What happens if the request is not known to WLM.

The worker thread calculates the CPU used for just its work, and reports this. The CPU used by any other thread is not report. The figures reported are the CPUTotal timeused values. You have to calculate the difference yourself

What happens if the request is known to WLM.

You get the timeused CPU for the worker thread – as with the case where the request is not known to WLM.

From RMF (or other products) you get out reports for an interval with

  1. The number of requests in the interval
  2. The rate of requests in the interval
  3. The amount of time on a CP engine in seconds
  4. The amount of time on a ZIIP engine is seconds
  5. The amount of time on a ZAAP in seconds.
  6. Over the interval, what percentage of time was CP on CP engines, zAAP on zAAP engines, zAAP on CP engines, zIIP on zIIP engines.

From the SMF 120 records you get

Enclave CPU time
Enclave ZAAP time
Enclave ZIIP time

Example Enclave figures.

For 100 API requests, the figures as reported by SMF 120-11, and I averaged the values.

  1. Average CPU(1) 0.023
  2. Average CPU(2) 0.0008
  3. Enclave CPU 0.029
  4. Enclave ZAAP 0
  5. Enclave ZIIP 0.028

The figures reported by RMF per request

  1. CPU 0.031
  2. ZIIP 0.039
  3. ZAAP 0.000
  4. Total 0.070 seconds of CPU per transaction

These figures tie up – the Enclave CPU, ZIIP, and ZAAP are similar.

The CPU used by the server address space was

  1. CPU 30.1 seconds
  2. ZIIP 28.7 seconds
  3. ZAPP 0 seconds.
  4. Total 58.8.

Each request took 0.070, and there were 100 requests – so reported 7 second of CPU.

The difference(51) seconds is not reported in the transaction costs. It looks like the “timeused” value is less than 1% of the CPU value, and the enclave figures are under 2% of the grand total.

Looking at the trace in a dump, I can see many hot TCBs using much more CPU that is reported by WLM and RMF. I expect that many TCBs used in a request, but they do not have the enclave support in them. Overall – pretty useless for charge back and understanding the cost per transaction.

What’s the difference between MQ Web, and z/OS Connect MQ support?

With MQ Web

  1. You can issue commands to administer MQ  for example display, define, delete MQ objects.
  2. You can put and get messages to and from a queue.  The message is what you specify – typically a character string.

With Z/OS Connect MQ support

  1. You can put and get messages to and from a queue, and do transformations on the message.  For example mapping a COBOL structure to JSON.  
  2. You can do field validation.
  3. You can covert HTTP code “200” to “great it worked”.

What is common?

They both use z/OS WebSphere Liberty to provide the basic web server.

Is “how many fruit do I get per kilogram” a better metric than MIPS?

I was reading an article about the cost of a transaction and how many MIPS (Millions of Instructions Per Second) it used.  This annoyed me as it is wrong at so many levels.

  1. MIPS depends on how long the transaction ran for. Millions of Instructions used (MINU?) would be a better metric
  2. Instructions are different sizes, and the CPU used by an instruction varies depending on many things.
  3. If I was to talk about “How many fruit I can get per kilogram”, you would rightly say that  apples weigh more than grapes. You get different sorts of apples of different sizes, so you might get 12 cox’s apples per kilogram, but 10 Granny Smiths.  I’m sure you are thinking number of fruit per kilogram is a stupid measurement.   I agree – but this is the same discussion as MIPS.
  1. Obvious statement : The duration of an instruction depends on the instruction.   Instructions that access memory are often slower than instructions that just access registers.
  2. Not so obvious statement.    The same instruction (for example load register) can have different durations
  3. Not at all obvious:  The same physical instruction – for example in a loop – can have different durations
  4. Not at all obvious: A register to register instruction can be slower than a memory to register instruction.

Let’s take the covers off and get our hands under the covers.

  1. The z machines have the processors in “books”; in reality these are big cards.
  2. There can be multiple chips per book.
  3. There is memory in each book
  4. The length of the wire between two books might be 1 metre.
  5. Inside each chip are the processors.
  6. Each processor has cache for instructions, and cache for data.  This memory is within a few millimetres of the processors
  7. There is cache on the chip independent of the processors, it is within 1 cm of the processors.

Just a few more observations

  1. The Load instruction – is not an instruction – it is actually a microprogram using the underlying chip technology
  2. Some instructions were written in millicode – and then a release or so later were written in microcode.   The MVCL instruction is a good example of this.  It can move Megabytes of data in one instruction.

For the instruction L R4,20(R3).   This says load register 4 with the contents of  20 bytes off what register 3 point to.   I’ll look at how long this instruction takes to execute.

Put this load instruction in a little loop and go round the loop twice.

JUMPHERE SR    R1,R1    clear register 1
LOOP   LTR   R1,R1    test register 1
     BNZ   PAST    If it is non zero go to PAST
      L     R4,20(R3) Load register 4 from storage
      AFI   R1,1 Add 1 to register 1
    B LOOP 
PAST   DS   0H

What are the differences in the duration of the L R4,20(R3) instruction, first time and second time around the loop?

  1. If the application issued goto/jump to JUMPHERE, this block of code may not be in the processor’s instruction cache.  It may be in the chips’ cache.  If so – there will be a delay while the block of instructions are loaded into the proccessor’s cache.
  2. When the Load instruction is executed – both times it is in the processor’s cache, so there is no time difference.
  3. The first time, the instruction has to be parsed into  “Load”, “register 4″,”20 bytes from what register 3 points to”.   The second time, it can reuse this information – so it will be quicker.
  4. “Go and access the storage from what register 3 points to”.  You have convert register 3 to a real page. There are lots of calculations to Translate a virtual address to its “real address”.  For example each address space has an address 2000. It needs to use the look up tables for that address space to map 2000 to the real page.  Once it has calculated the “real” page – it keeps it in a Translation Lookaside Buffer (TLB). (LPAR#.ASID#.2000 -> 39945000).   The second time, this information is already in the TLB.
  5.  The “real” storage may be in the same chip as you are executing, or it may be on a different chip.  It could be on a chip on a different book.  There is about 1 meter of cabling between where the real storage is, and the instruction requesting it.  The speed of light determines how long it will take to access the data – 0.000001 of a meter or 1 meter.    The second time the load instruction is executed, it is already in the processors’ cache.

In this contrived example I would expect the second load to be much faster than the first load – perhaps 1000 times faster.  Apples and grapes…

The z Hardware can report where time is being spent, and allows IBM to tune the programs to reduced delays.

There was one z/OS product which had a pointer to the next free buffer in the trace table.   The code would get the buffer and update the pointer to the next free buffer.  This worked fine when there was only one thread running.  On a big box with 16 engines every processor was trying to grab and update this pointer.   Sometime it had to grab it from a different book, sometimes it had to grab it from another processor in the same chip.  From the hardware instrumentation it was obvious that a lot of time was spent waiting for this “latch”.   By giving each processor its own trace table – totally removed this contention, and the trace buffers were close to the processors – so a win win situation.

I said that a register to register could be slower than a memory to register – how?

The processors can process multiple instructions in parallel.  While it is waiting for some storage to arrive from 1 metre away, it can be decoding the next 10 instructions.  There is what I like to think of as a “commit” in all processing, and the commit processing has to be done in strict order. 

So if we had

    L  R1,20(R3)      
SR  R4,R4
  SR  R6,R6 

The two Subtract Registers may be very quick and almost completed before the Load instruction has finished. They cannot “commit” until the Load has committed.   Picture trying to debug this if the instructions had been executed out of order!

Now consider

 L    R1,20(R3)      
LTR R1,R1 Test if register 1 is zero
LH R6,100(R7)

The Load and Test Register(LTR) is using register 1 – which is being used by the Load instruction.  The LTR has to wait until  the previous Load has “Committed” before it can access the register.  In effect the LTR is having to wait until the Load’s data is retrieved from 1 metre away – so very slow.

The data for the Load Halfword R6,100(R7) may be in the data cache – and so be very fast.   We therefore have the situation that the register, register instruction LTR takes a “long” time, and the memory register Load Halfword – is blazingly fast.

Things ain’t what they used to be.

I remember when IBM changed from Silicon processors to CMOS.  We went from a machine with one engine delivering 50 MIPS to a machine with 5 processors each delivering 11 MIPS.   The bean counters said “be happy – you have more MIPS”.    The transactions were now 5 times slower, but you could now have 5 transactions  running in parallel, and so overall throughput was very similar.  I wonder if these bean counters are the ones who think that if the average family has 2.5 children, there must be a lot of half brothers and half sisters.

For those people who have to compare one box with another box, or want to know how much CPU is needed for a transaction use CPU seconds.  There are tables giving a benchmark of the same workload on different machines, and so you can make comparisons between the machines.