Introduction to the z/OS Connect SMF records
z/OS Connect can provide two types of SMF record
- SMF 120 subtype 11, provided by the base Liberty support. This gives information on the URL used to access Liberty, and the CPU used to perform requests. This is enabled at the Server level – so you can have records for all request, or no requests. There is one SMF record for each web server request.
- SMF 123 provides information about API and the service used, and the “pass through” services. It provides elapsed time of the request, and of the the “pass through” requests. It does not provide CPU usage figures. This can be configured to produce records depending on the http host and port used to access z/OS Connect. One SMF record can have data for multiple web server requests. The SMF records are produced when the SMF record is full – or the server is shut down.
The SMF 120-11 and SMF 123 records are produced independently, and there is no correlating field between them. They both have a URI field, and time stamps, so at low volumes it may be possible to correlate the SMF data.
I’ll document the fields which I think are interesting. If you think other fields are useful please let me know and I’ll update this document.
I have written an SMF formatter in C which prints out interesting data, and summarises it.
SMF 123 subtype 2 fields
- You get the standard date and time the record was produced, and with the LPAR. You can use PGM=IFASMFDP with the following to filter which records are copied
DATE(2020282,2020282) START(1000) START(2359)
- There is server version (3), system(SOW1), SYSPLEX(ADCDPLEX) and job id(STC04774) which are not very interesting
- Server job name(SM3) is more interesting. I started the server with s baqstrt,parms=’MQTEST’,jobname=sm3
- The config dir (/var/zosconnect/servers/MQTEST/) is boring – as is server level (open beta)
- The HTTP code, for example 200 OK, 403 Forbidden. You may want to report requests with errors in a separate file
- So you know you have errors and can fix them
- Your statistics, such as average response time do not have dirty data in them.
- An HTTP 1 character flag – this has always been 00 for me. I cannot find it documented.
- The Client IP address. 10.1.1.1
- You get userid information.
- I used a certificate to authenticate. The DN from the certificate is not available. You only get the userid from the RACF mapping of DN to userid. This mapped userid was in the 64 byte field. The 8 byte userid field was empty for me. The lack of certificate DN, and having the userid in the wrong field feels like a couple of buglets. If you use LDAP, I think the long ID is stored in the long field, and the z/OS userid stored in the short field – so inconsistent.
- You get the URL(URI) used /stockmanager/stock/items/999999. I treat this as a main key for processing and summarising data. You may need to process this value as the partnumber (999999) will be different for each query. You may want to have standards which say the first two elements (/stockmanager/stock) are useful for reporting. The remaining elements should be ignored when summarising records.
- The start and stop times (2020/10/08 09:18:19.852360 and 2020/10/08 09:18:22.073946) are interesting. You can calculate the duration – which is the difference between them.
- Request type API, Service, Admin. An Admin request is like using an URL like /zosConnect/services/stockQuery to get information about the stockQuery service.
- The API name and version – stockmanager 1.0.0
- The service name and version – stockQuery 1.0.0. You get the version information. If you do an online display of the service the version is not available.
- Method GET/POST etc
- The service provider. This is the code which does the real work of connection to CICS, MQ, or passing the request through. IBM MQ for z/OS, restclient-1.0
- Request id starts at 1 and increments for the life of the server. If you restart the server it will restart from 1. I do not think this is very useful.
- For “pass through” requests, z/OS Connect confusingly calls the back end service the Statement of Record or (SOR). (MQ is a transport, not a Statement of Record.) The “pass through” service definition is built from a parameter file and zconbt program. The reported data is
- SOR ID the host and port 10.1.3.10:19443. These are from the <zosconnect_zosConnectServiceRestClientConnection host and port values.
- SOR Reference restClientServiceY This is from the connectionRef=in the parameter file and the <zosconnect_zosConnectServiceRestClientConnection…> definition
- SOR Resource zosConnect/apis/through. This is from the uri= in the parameter file.
- Time of entry and time of return of the SOR service.
- From the times calculate the difference to get the duration of the remote request.
- It would be useful to have this “pass through time” for services calling MQ, CICS etc, so we could get a true picture of the time spent processing the requests.
- The size of the data payload (0) , and the size of the response(94) excluding any headers.
- A tracking token. The hex 64 byte string is passed to the some called servers. It is passed to some backends (CICS and “pass through”) to be able to correlate the request across systems. It is not passed to MQ. See X-Correlation-ID below for an example. This field is nulls for Admin request. When a request was “passed through” to another z/OS Connect server which processed the request, the tracking token was not reported in the SMF data of the second system. I dont know if the CICS SMF data records this token, but it is of little use for MQ, or for “pass through”.
- You get 4 request header, and 4 response header fields. They were blank in my measurements, even though headers were passed to the pass through service. Looking a the http traffic, the request coming in had a header “Content-Type:application/json”. The request passed through to the back end included
- User-Agent: IBM_zOS_Connect_REST_SP/open beta/20200803-0918
- X-Correlation-ID: BAQ1wsHYAQAYwcTDxNfTQEDi8ObxQEBAQNikjpk+1klAAA==
What can you do with the data?
Do I need to use the SMF data?
From a performance perspective these records do not provide much information, as they are lacking information about CPU usage. From an audit perspective they have some useful information – but the records are missing information which would provide useful audit information. There is an overlap between the information in the SMF 123 records and the …/servers/…logs/http_access.log file which provides, date time, userid, URI, HTTP code.
What do I want to report on?
Decide what elements of the URI you want to report on. For example the URI /stockmanager/stock/items/999999 includes the stock part number, which may be different for each request. You might decide to summarise API usage on just the first two elements /stockmanager/stock/. You may have to treat each API individually to extract the key information.
I’ll use the term key for the interesting part of the URI – for example /stockmanager/stock.
What reports are interesting?
I think typical questions are:
- Which is the most popular API key?
- Is the usage of an API key increasing?
- How many API key requests were unsuccessful? This can show set-up problems, or penetration attempts.
- What is the response time profile of the requests? Are you meeting the business response time criteria?
- Which sites are sending in most of the requests. You cannot charge back on CPU used, as you do not know the CPU usage. You could do charge back at a fixed cost per API request, with each API having a different rate.
- Which userids are sending in most of the requests. You may want to provide more granular certificate to userid mapping to give you more detailed information