Understanding Z/OS Connect SMF 123 subtype 2 record

Introduction to the z/OS Connect SMF records

z/OS Connect can provide two types of SMF record

  1. SMF 120 subtype 11, provided by the base Liberty support. This gives information on the URL used to access Liberty, and the CPU used to perform requests. This is enabled at the Server level – so you can have records for all request, or no requests. There is one SMF record for each web server request.
  2. SMF 123 provides information about API and the service used, and the “pass through” services. It provides elapsed time of the request, and of the the “pass through” requests. It does not provide CPU usage figures. This can be configured to produce records depending on the http host and port used to access z/OS Connect. One SMF record can have data for multiple web server requests. The SMF records are produced when the SMF record is full – or the server is shut down.

The SMF 120-11 and SMF 123 records are produced independently, and there is no correlating field between them. They both have a URI field, and time stamps, so at low volumes it may be possible to correlate the SMF data.

I’ll document the fields which I think are interesting. If you think other fields are useful please let me know and I’ll update this document.

I have written an SMF formatter in C which prints out interesting data, and summarises it.

SMF 123 subtype 2 fields

  • You get the standard date and time the record was produced, and with the LPAR. You can use PGM=IFASMFDP with the following to filter which records are copied
DATE(2020282,2020282)
START(1000)
START(2359)
  • There is server version (3), system(SOW1), SYSPLEX(ADCDPLEX) and job id(STC04774) which are not very interesting
  • Server job name(SM3) is more interesting. I started the server with s baqstrt,parms=’MQTEST’,jobname=sm3
  • The config dir (/var/zosconnect/servers/MQTEST/) is boring – as is server level (open beta)
  • The HTTP code, for example 200 OK, 403 Forbidden. You may want to report requests with errors in a separate file
    1. So you know you have errors and can fix them
    2. Your statistics, such as average response time do not have dirty data in them.
  • An HTTP 1 character flag – this has always been 00 for me. I cannot find it documented.
  • The Client IP address. 10.1.1.1
  • You get userid information.
    • I used a certificate to authenticate. The DN from the certificate is not available. You only get the userid from the RACF mapping of DN to userid. This mapped userid was in the 64 byte field. The 8 byte userid field was empty for me. The lack of certificate DN, and having the userid in the wrong field feels like a couple of buglets. If you use LDAP, I think the long ID is stored in the long field, and the z/OS userid stored in the short field – so inconsistent.
  • You get the URL(URI) used /stockmanager/stock/items/999999. I treat this as a main key for processing and summarising data. You may need to process this value as the partnumber (999999) will be different for each query. You may want to have standards which say the first two elements (/stockmanager/stock) are useful for reporting. The remaining elements should be ignored when summarising records.
  • The start and stop times (2020/10/08 09:18:19.852360 and 2020/10/08 09:18:22.073946) are interesting. You can calculate the duration – which is the difference between them.
  • Request type API, Service, Admin. An Admin request is like using an URL like /zosConnect/services/stockQuery to get information about the stockQuery service.
  • The API name and version – stockmanager 1.0.0
  • The service name and version – stockQuery 1.0.0. You get the version information. If you do an online display of the service the version is not available.
  • Method GET/POST etc
  • The service provider. This is the code which does the real work of connection to CICS, MQ, or passing the request through. IBM MQ for z/OS, restclient-1.0
  • Request id starts at 1 and increments for the life of the server. If you restart the server it will restart from 1. I do not think this is very useful.
  • For “pass through” requests, z/OS Connect confusingly calls the back end service the Statement of Record or (SOR). (MQ is a transport, not a Statement of Record.) The “pass through” service definition is built from a parameter file and zconbt program. The reported data is
    • SOR ID the host and port 10.1.3.10:19443. These are from the <zosconnect_zosConnectServiceRestClientConnection host and port values.
    • SOR Reference restClientServiceY This is from the connectionRef=in the parameter file and the <zosconnect_zosConnectServiceRestClientConnection…> definition
    • SOR Resource zosConnect/apis/through. This is from the uri= in the parameter file.
    • Time of entry and time of return of the SOR service.
    • From the times calculate the difference to get the duration of the remote request.
  • It would be useful to have this “pass through time” for services calling MQ, CICS etc, so we could get a true picture of the time spent processing the requests.
  • The size of the data payload (0) , and the size of the response(94) excluding any headers.
  • A tracking token. The hex 64 byte string is passed to the some called servers. It is passed to some backends (CICS and “pass through”) to be able to correlate the request across systems. It is not passed to MQ. See X-Correlation-ID below for an example. This field is nulls for Admin request. When a request was “passed through” to another z/OS Connect server which processed the request, the tracking token was not reported in the SMF data of the second system. I dont know if the CICS SMF data records this token, but it is of little use for MQ, or for “pass through”.
  • You get 4 request header, and 4 response header fields. They were blank in my measurements, even though headers were passed to the pass through service. Looking a the http traffic, the request coming in had a header “Content-Type:application/json”. The request passed through to the back end included
    • User-Agent: IBM_zOS_Connect_REST_SP/open beta/20200803-0918
    • X-Correlation-ID: BAQ1wsHYAQAYwcTDxNfTQEDi8ObxQEBAQNikjpk+1klAAA==

What can you do with the data?

Do I need to use the SMF data?

From a performance perspective these records do not provide much information, as they are lacking information about CPU usage. From an audit perspective they have some useful information – but the records are missing information which would provide useful audit information. There is an overlap between the information in the SMF 123 records and the …/servers/…logs/http_access.log file which provides, date time, userid, URI, HTTP code.

What do I want to report on?

Decide what elements of the URI you want to report on. For example the URI /stockmanager/stock/items/999999 includes the stock part number, which may be different for each request. You might decide to summarise API usage on just the first two elements /stockmanager/stock/. You may have to treat each API individually to extract the key information.

I’ll use the term key for the interesting part of the URI – for example /stockmanager/stock.

What reports are interesting?

I think typical questions are:

  1. Which is the most popular API key?
  2. Is the usage of an API key increasing?
  3. How many API key requests were unsuccessful? This can show set-up problems, or penetration attempts.
  4. What is the response time profile of the requests? Are you meeting the business response time criteria?
  5. Which sites are sending in most of the requests. You cannot charge back on CPU used, as you do not know the CPU usage. You could do charge back at a fixed cost per API request, with each API having a different rate.
  6. Which userids are sending in most of the requests. You may want to provide more granular certificate to userid mapping to give you more detailed information

Understanding z/OS Connect SMF 120 subtype 11 data

z/OS Connect can provide two types of SMF record

  1. SMF 120 subtype 11, provided by the base Liberty support. This gives information on the URL used to access Liberty, and the CPU used to perform requests. This is enabled at the Server level – so you can have records for all request, or no requests. There is one SMF record for each web server request. Would I use this to report CPU used ? No – see the bottom of this blog.
  2. SMF 123 provides information about API and the service used, and the “pass through” services. It provides elapsed time of the request, and of the the “pass through” requests. It does not provide CPU usage figures. This can be configured to produce records depending on the http host and port used to access z/OS Connect. One SMF record can have data for multiple web server requests. The SMF records are produced when the SMF record is full – or the server is shut down.

The SMF 120-11 and SMF 123 records are produced independently, and there is no correlating field between them. They both have a URI field, and time stamps, so at low volumes it may be possible to correlate the SMF data.

I’ll document the fields which I think are interesting. If you think other fields are useful please let me know and I’ll update this document.

I have written an SMF formatter in C which prints out interesting data, and summarises it.

SMF 120-11

  • You get the standard date and time the record was produced, and with the LPAR. You can use PGM=IFASMFDP with the following to filter which records are copied
DATE(2020282,2020282)
START(1000)
START(2359)
  • There is server version (3), system(SOW1), and job id(STC04774) which are not very interesting
  • Server job name(SM3) is more interesting. I started the server with s baqstrt,parms=’MQTEST’,jobname=sm3
  • The config dir (/var/zosconnect/servers/MQTEST/) is boring – as is code level (20.0.0.6)
  • The start and stop times (2020/10/08 09:18:19.852360 and 2020/10/08 09:18:22.073946) are interesting as is the duration – which is the difference between them.
  • You get userid information.
    • I used a certificate to authenticate. The DN from the certificate is not available. You only get the userid from the RACF mapping of DN to userid. This mapped userid was in the 64 byte field. The 8 byte userid field was empty for me. The lack of certificate DN, and having the userid in the wrong field feels like a couple of buglets.
  • You get the URL used /stockmanager/stock/items/999999 I treat this as a main key for processing and summarising data. If you want to summarise the data, you may want so summarise it just on /stockmanager/stock/. The full URI contains the part number – and so I would expect a large number of parts.
  • You can configure your requests to WLM. For example
<wlmClassification>
<httpClassification transactionClass="TCI1" method="GET" 
    resource="/zosConnect/services/stockQuery"/>
</wlmClassification>

This produced in the SMF record

WLMTRan :TCI1
WLM Classify type :URI :/zosConnect/services/stockQuery
WLM Classify type :Target Host :10.1.3.10
WLM Classify type :Target Port :19443

This means that the URL, the host, and the port were passed to WLM to classify.

If you get the WLM classification you also get CPU figures when the enclave request ended (was deleted).

  • You get the ports associated with the request.
    • Which port was used on the server – Target Port :9443
    • Where did the request come from? Origin :10.1.1.1 and port :36786
  • The number of bytes in the response Response bytes :791
  • CPU figures for the CPU used on the TCB. See discussion below on the usefulness of this number. You get the CPU figures before the request, and after the request – so you have to calculate the difference yourself! It does not look like the developers knew what the end user wanted. The values come from the timeused facility. You can calculate the delta and get
    • CPU Used Total : 0.967417
    • CPU Used on CP : 0.026327
    • and calculate these to to get CPU Delta. on Z**P : 0.941090 This is the CPU offloaded to ZIIP or ZAAP.
  • If you had the URI classified with WLM, you get Enclave data, see below for a discussion on what the numbers mean.
    • Enclave CPU time : 0.148803
    • Enclave CPU service : 0.000000
    • Enclave ZIIP time : 0.148803
    • Enclave ZIIP Service : 0.000000

What do the CPU numbers mean?

Typically a transaction flow is as follows

  1. A listening thread listens on the HTTP(s) host and port.
  2. When a request arrives, it passes the request to a worker thread, and goes back to listening
    1. The worker thread may do some work and send the response back
    2. The worker thread may need to call another thread to do some work. For example to issue an MQ request,
      1. the MQ code looks for a thread in a pool for a matching queue manager and userid. If it find one it uses it the thread and issues the MQ request.
      2. If it does not find a matching thing thread it may allocate a new thread, and issue an MQCONN to connect to MQ. These are both expensive operations, which is why having a pool of threads with queue manager and userid is a good way of saving CPU
      3. The work is done
      4. The thread is put back into the MQ pool, and the application returns to the worker thread
      5. The worker thread sends the response back to the originator

A thread can ask the operating system, how much CPU time it(the thread) has used. What usually happens is

  1. the thread requests how much CPU it has used
  2. the thread does some work
  3. the thread requests how much CPU it has used,
  4. the thread calculates the difference between the two CPU values and reports this delta.

I the SMF 120 record records the CPU from just the worker thread – and no other thread.

Enclaves

When there are more than one thread involved it gets more complex, as you could have a CICS transaction issuing an MQ request, then a DB2 request, and then an IMS request. You can set up z/OS WorkLoad Manager(WLM) to say “these CICS transactions in this CICS region are high priority”.

With some subsystems you can pass a WLM token into a request. The thread being invoked call tell WLM that the thread is now working on behalf of this token. The thread does some work, and tells WLM that it has finished doing the work. WLM can manage the priority of the threads to achieve the best throughput, for example making the thread high or low priority. WLM can manage a thread doing work in multiple LPARs across a sysplex!

WLM records the CPU used by the thread while performing the work, accumulates and reports this.

This use of multiple threads for a business transaction across one or more address spaces is known as an enclave.

What happens with enclaves?

  1. A request arrives at the listener thread.
  2. The Liberty looks up the URI in the <wlmClassification httpClassification…. It compare the server’s host, server’s port, the URI resource /stockmanager… method ( GET) and finds the best match for the transactionClass.
    1. If there is a transactionClass,
      1. the server calls WLM with the Subsystem type of CB, the specified collectionName, and the transactionClass.
      2. WLM looks for these parameters and if WLM has a matching definition then WLM will manage the priority of the work,
      3. WLM returns a WLM token.
      4. This WLM token is passed to threads which are set up for enclaves.
    2. If there is no transaction class specified in Liberty, or WLM does not have the subsystem, collectionname, transactionClass then there is no token or a null WLM token
    3. The work continue as before.
    4. If another thread is used then pass the WLM token. If the code is set up for WLM token then report “work started”, when it has finished report “work ended”

What happens if the request is not known to WLM.

The worker thread calculates the CPU used for just its work, and reports this. The CPU used by any other thread is not report. The figures reported are the CPUTotal timeused values. You have to calculate the difference yourself

What happens if the request is known to WLM.

You get the timeused CPU for the worker thread – as with the case where the request is not known to WLM.

From RMF (or other products) you get out reports for an interval with

  1. The number of requests in the interval
  2. The rate of requests in the interval
  3. The amount of time on a CP engine in seconds
  4. The amount of time on a ZIIP engine is seconds
  5. The amount of time on a ZAAP in seconds.
  6. Over the interval, what percentage of time was CP on CP engines, zAAP on zAAP engines, zAAP on CP engines, zIIP on zIIP engines.

From the SMF 120 records you get

Enclave CPU time
Enclave ZAAP time
Enclave ZIIP time

Example Enclave figures.

For 100 API requests, the figures as reported by SMF 120-11, and I averaged the values.

  1. Average CPU(1) 0.023
  2. Average CPU(2) 0.0008
  3. Enclave CPU 0.029
  4. Enclave ZAAP 0
  5. Enclave ZIIP 0.028

The figures reported by RMF per request

  1. CPU 0.031
  2. ZIIP 0.039
  3. ZAAP 0.000
  4. Total 0.070 seconds of CPU per transaction

These figures tie up – the Enclave CPU, ZIIP, and ZAAP are similar.

The CPU used by the server address space was

  1. CPU 30.1 seconds
  2. ZIIP 28.7 seconds
  3. ZAPP 0 seconds.
  4. Total 58.8.

Each request took 0.070, and there were 100 requests – so reported 7 second of CPU.

The difference(51) seconds is not reported in the transaction costs. It looks like the “timeused” value is less than 1% of the CPU value, and the enclave figures are under 2% of the grand total.

Looking at the trace in a dump, I can see many hot TCBs using much more CPU that is reported by WLM and RMF. I expect that many TCBs used in a request, but they do not have the enclave support in them. Overall – pretty useless for charge back and understanding the cost per transaction.

What’s the difference between MQ Web, and z/OS Connect MQ support?

With MQ Web

  1. You can issue commands to administer MQ  for example display, define, delete MQ objects.
  2. You can put and get messages to and from a queue.  The message is what you specify – typically a character string.

With Z/OS Connect MQ support

  1. You can put and get messages to and from a queue, and do transformations on the message.  For example mapping a COBOL structure to JSON.  
  2. You can do field validation.
  3. You can covert HTTP code “200” to “great it worked”.

What is common?

They both use z/OS WebSphere Liberty to provide the basic web server.

Looking for an MQ reason code in Liberty? Get your safari helmet, anti malarial tablets and follow me to find the treasure.

I was using an MQ application in Liberty, and rather do things the easy way, I did what I normally do, and did it the hard way.  On my z/OS I did not have the queue manager defined, because I wanted to see what happened.  I was not expecting the expedition.

You configure MQ in Liberty using configuration like

<jmsConnectionFactory jndiName="jms/cf1" connectionManagerRef="ConMgr1"> 
<properties.wmqJms transportType="BINDINGS" queueManager="MQPA"/>

 

I was expecting a message like the following in the job output.

Application COLINAPP MQCONN call to MQPA failed with compcode 
'2' ('MQCC_FAILED')reason '2058' ('MQRC_Q_MGR_NAME_ERROR').

Oh no, it was not that easy.  It was quite a trek into the jungle to find the information.

In the Liberty server’s logs directory there is a message.log file.  In this file I had

9/14/20 19:16:32:242 GMT 00000060 com.ibm.ws.logging.internal.impl.IncidentImpl I FFDC1015I: An FFDC Incident has been created: "com.ibm.mq.connector.DetailedResourceException: MQJCA1011: Failed to allocate a JMS connection., error code: MQJCA1011 An internal error caused an attempt to allocate a connection to fail. See the linked exception for  details of the failure. com.ibm.ejs.j2c.poolmanager.FreePool.createManagedConnectionWithMCWrapper 199" at 
ffdc_20.09.14_19.16.28.0.log

This was one long line, and I had to scroll sideways (just like you did) to see the content (or use the ISPF line prefix command “tf” to flow the text to the display width).  A key hint was the message MQJCA1011 An internal error caused an attempt to allocate a connection to fail  so I knew I was on the right trail.  I now knew the name of the file – ffdc_20.09.14_19.16.28.0.log.

Knowing the name of the file did not help very much, as if you use ISPF 3.17  (z/OS UNIX Directory List ) it showed a list of 40 files with the name ffdc_20.09.14_1 (ffdc_yy.mm.dd_h).   This is because it only displays the first part of the name. Thanks to Steve Porter who said ..

To increase column size in 3.17, >
Options
1. Directory List Options…
Width of filename column . . . . . . . . 15 (Default value – increase as necessary)

 

The file has a name ffdc_20.09.14_19.16.28.0.log and a displayed time stamp of 2020/09/14 18:16:32 which is close enough – allowing for the time zone difference and the time take to write the file.  I was fortunate not to be running a workload and producing many of these files.

I edited the file – and I could see the full file name at the top of the page, so I knew I was in the right file.

The file has long lines, so I had to scroll or use the “tf” line command to reformat it.

Near the top it had

Stack Dump = com.ibm.mq.connector.DetailedResourceException: 
MQJCA1011: Failed to allocate a JMS connection., error code:  
MQJCA1011 An internal error caused an attempt to allocate a connection to fail. 
See the linked exception for details of the  failure.

Further down it had

Caused by: com.ibm.msg.client.jms.DetailedJMSException: 
JMSWMQ0018: Failed to connect to queue manager 'MQPA' with connection 
mode 'Bindings' and host name 'localhost(1414)'.

and further further down (line 50) I found the treasure

Caused by: com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with 
compcode '2' ('MQCC_FAILED') reason '2058'  ('MQRC_Q_MGR_NAME_ERROR').

What a trek to find the information I needed!

Next time I’ll just list the logs/ffdc directory, edit (not browse) each file and search for “compcode”.   You cannot use “grep compcode” from uss because the file is in UTF8 and does not find it.  You can just use oedit file_name in uss.

It would be nice if the MQ code could be enhanced to have an option “makeErrorsHardToFind” which you could set to “no”, and still keep the default “yes”.

 

Getting z/OS Explorer to work with z/OS Connect EE

Ive been trying to set up z/OS Connect, so I could look at the MQ support within it.

Setting up z/OS Connect in the first place, was a challenge, which I’ll blog about some other time.  I was looking for an Installation Verification Program (IVP) and tried to use the z/OS Explorer.  This was another challenge.  Like many problem there are answers, but it is hard to find the information.

Installing z/OS Explorer

This was easy.  I started here and installed z/OS explorer for Aqua – Eclipse tools.  Then select  IBM z/OS Connect EE.  I selected Aqua 3.2, and chose to install using eclipse p2. I have tried to avoid installation manager as it always seemed very complex and frustrating.

I tried to extend an existing eclipse, but this failed due to incompatibilities.  I used start from fresh, and this worked fine.

Adjust the z/OS Connect server configuration.

I enabled logon logging.

 <httpEndpoint id="defaultHttpEndpoint" 
    host="*" 
    accessLoggingRef="hal1" 
    httpPort="19080" 
    httpsPort="19443" > 
   <accessLogging enabled="true" 
     logFormat='h:%h i:%i u:%u t:%t r:%r s:%s b:%b D: %D m:%m' 
    /> 
<sslOptions sslRef="defaultSSLSettings"/> 
</httpEndpoint> 

This creates a file in the  location http_access.log within the log directory. It has output like

10.1.1.1 ADCDC 08/Sep/2020:17:50:40 +0000 "GET /zosConnect/services/stockQuery HTTP/1.1" 200

You can see where the request came from (10.1.1.1), user (ADCDC), the date and time, the request (“GET /zosConnect/services/stockQuery HTTP/1.1”), and the response code(200).

Getting started with z/OS Explorer

You need to define host connections.

If you totally disable security on your server you can use http.

  1. On z/OS explorer,display the Connections tab. (Window -> Show View -> Host connections)
  2. Right click on z/OS Connect Enterprise Edition, and select New z/OS Connect Enterprise Edition, Connection
    1. Name: this is displayed in the tooling
    2. Host name: I used 10.1.3.10 which is my VIPA address of the server
    3. Port number:   This comes from the  httpEndpoint for the server.  The default is http:9080 and https:9443 – but as every Liberty product uses these values, your server may have different values.  I used 19080.
    4. I initially left Secure connection(TLS/SSL) unticked
    5. Click Save and Connect
  3. A panel was displayed asking for credentials. Either create new credentials (userid and password) or select an existing credential.
  4. Double click on the connection you just created.
    1. An error of “302, Found” is an http response meaning redirection.  In the z/OS connect case, this means you are trying to use an http connection when an https ( a TLS connection) was expected.  I got this because I had not disabled security in my server.

The normal way of accessing z/OS connect is to use TLS to protect the session.  As well as TLS to protect the session you can also use client certificate authentication.  This is what I used.

You will need to set up certificates, keystores and keyrings on z/OS and get the Certificate Authority certificates sent to the “other” system.  I used my definitions from using MQWEB.

  1. On z/OS explorer, set up the keystores
    1. Window -> Preferences -> Explorer-> certificate manager
    2. The truststore contains the CA certificates to validate the certificate send down from the z/OS server.  Enter the file name (or use Browse), the pass phrase, and the key store.  My truststore was JKS.
    3. The keystore contains the client certificate used to identify this client to the server.
    4. Smart card details.  Ignore this – (despite it saying you must configure a PKCS11 driver).   This section is used if you select smart card to identify yourself, and it would be better if the wording said “If you are using Smart card authentication you must configure a PKCS11 driver ).
    5. Leave the “Do not validate server certificate trust” unticked.  This will check the passwords etc of the key stores.
    6. At the bottom I used “Secure socket protocol-> TLS v1.2” though this is optional.
    7. Select Apply and Close
  2. Display the Connections tab. (Window -> Show View -> Host connections)
  3. Right click on z/OS Connect Enterprise Edition, and select New z/OS Connect Enterprise Edition, Connection
    1. Name: this is displayed in the tooling
    2. Host name: I used 10.1.3.10 which is my VIPA address of the server
    3. Port number:   This comes from the  httpEndpoint for the server.  The default is http:9080 and https:9443 – but as every Liberty product uses these values, your server may have different values.  I used 19443
    4. I ticked Secure connection(TLS/SSL).  If you do not select this, you will not be able to use a certificate to logon.
    5. Click Save and Connect
  4. A panel was displayed asking for credentials.   When I used an existing credential I failed to connect to the server.
    1. Select Create new credentials
    2. Click on Username and Password pull down – and select Certificate from Keystore.
    3. Enter credentials name – this is just used within the tooling
    4. Userid – this seems to be ignored.  I used certificate mapping on the z/OS to map the certificate to a userid.
    5. Choose a certificate – select one from the pull down.  In my Linux box the choice of certificates came out in yellow writing on a yellow background!
    6. Click OK
    7. The connection should appear on the Connections page, under z/OS Connect Enterprise Edition.  It should go yellow while it is connecting, and green, with a padlock once it has connected

Use z/OS Connect

Use Window-> Show View -> zOS Connect EE Servers

You should see your connection displayed  with the IP address and port. Underneath this are any APIs or Services you have defined.

If you have any APIs or Services, you should be able to right click and select Show Properties View.  You can click on the links, or copy the links and use them, for example  in a web browser directly,or via curl.

If you try to use the APIs or Services, you may not be authorised.  You will need to configure

  1. <zosconnect_zosConnectManager …>
  2. <zosconnect_zosConnectAPIs>   <zosConnectAPI name=”stockmanager”  ….
  3. <zosconnect_service>  <service name=”stockquery”

Good luck.

 

 

 

Planning your Liberty – this is not an escape plan

With the web being the new front end to z/OS, most z/OS products are using Liberty as their web server to deliver web content.   Each product seems to be documented as if it is the only Liberty instance on the z/OS image, for example they all default to use http port 9080.

This blog post helps identify what planning you need to do before you can configure a Liberty instance, be it z/OSMF, MQWEB, CICS, or zOS Connect EE.

Roles

Often the person installing and configuring the server is part of the “installation team”, and may not be familiar with the  detailed use of the product, or product based tooling, for example the eclipse based  z/OS tooling.   This person’s role is to configure the server so it meets the enterprise requirements, and the configuration within the server is down to the team who requested it.

Update proclib

You need to decide if each instance will need its own procedure in proclib, or one procedure can start multiple servers.

You will need an Angel process – there can only be one Angel started task across all of your Liberty instances on an LPAR.  Ensure it is at the highest service level.

If you want isolation you may want to set up an test instance with a different started task userid to the production instance.

Where do the product ZFS libraries go

Usually the product code goes in /usr/lpp/IBM/product_name. Typically you make a directory /usr/lpp/IBM/product_name, then mount the product file system over this directory.  Some times this file system needs to be mounted RW during customisation.   When you upgrade you can just mount the new file system instead of the old file system.

Where do the configuration files go?

The Liberty configuration files can go in /var/… or /u/… . If you intend for the server to be started on another LPAR, then the configuration files need to be available on the other LPAR.  Having a variable with the LPAR name as part of the directory will not work.   The configuration files are defined with the WLP_USER_DIR environment variable. You may want shell scripts which define this variable.  For example the shell script prodmqweb could have export WLP_USER_DIR=/u/mqweb/production/MQPA.   You then use sh prodmqweb to define the variable, or pass commands to it, such as sh prodmqweb dspmqweb so you can be sure you are using the right configuration file.

UNIX Directory List Utility ISPF 3.17 has space for 56 characters in the default directory name, but only 40 when working with files, such as new file or rename.   You may want to have a short prefix, or use an alias. For example /u/zoscA/servers/stockManager/server.xml instead of /MVSA/var/zosconnect/servers/stockManager/server.xml

What configuration files will be shared?

If you already have Liberty running in your environment you may be able to reuse some files, and include them in your server.xml. For example if your keystores are defined in a file you could use <include location=”…./keystore.xml”/> .

If you are providing duplicate servers for availability, you can put you common definitions in one file and share this, and server specific definitions in a different file.

It is easier to manage the configuration files if you provide small function specific files.  For example saf.xml, trace.xml,applications.xml, andr keystores.xml .

What TCPIP ports will be used

Each product will need its own ports, typically one port for http, and another port for https.   You can define multiple ports with different characteristics.  You use httpEndpoint and can specify for this port log this information, for that port log different information to a different place.

If you want to have two instances running on an LPAR using the same port,  the port needs to be defined as SHAREDPORT.

You may want to have the same port defined on each LPAR, so no matter which LPAR is used, use port 9443.

You may want to use VIPA so you have one external IP address into your SYSPLEX, and  z/OS SYSPLEX Distributor to route connection requests or distribute connection requests to available servers. If you want to do this you will need a configured VIPA TCPIP address.

You may need to specify which TCPIP instance to use if you have more than one TCPIP instance on an LPAR.

What keystores will be used

You need two keystores

  1. To identify the server – the keystore
  2. To validate certificates passed into the instance – the trust store.  Typically this has Certificate Authority certificates, and any self signed certificates.

These can be file based or SAF based using z/OS keyrings.  For example <keyStore filebased=”false” id=”racfKeyStore”
location=”safkeyring://START1/KEY” password=”password” readOnly=”true” type=”JCERACFKS”/> 

You might have enterprise keystores available to every one, or provide isolation so you have keystores for bank1 servers, and different keystores for bank2 servers.

Defining the APPL and SERVER resource

The Liberty default  APPL definition is BBGZDFLT. This allows people to access the server (the front door).  If you already have a Liberty installed then you may be able to use the existing definition.

If you want isolation, for example test and production, or between two major applications you will need to select and define different APPL resources.

You will need

<safCredentials profilePrefix="ZZZZDFLT"

and

RDEFINE APPL ZZZZDFLT UACC(NONE) 
PERMIT ZZZZDFLT CLASS(APPL) ACCESS(READ) ID(START1) 
PERMIT ZZZZDFLT CLASS(APPL) ACCESS(READ) ID(GRALL) 
SETROPTS RACLIST(APPL )REFRESH 

RDEFINE SERVER BBG.SECPFX.ZZZZDFLT UACC(NONE) 
PERMIT BBG.SECPFX.ZZZZDFLT CLASS(SERVER) ACCESS(READ) ID(START1) 
SETROPTS RACLIST(SERVER,APPL)REFRESH

   /* for z/OS Connect
RDEFINE EJBROLE ZZZZDFLT..zos.connect.access.roles.zosConnectAccess   +
   UACC(NONE) 
PERMIT ZZZZDFLT..zos.connect.access.roles.zosConnectAccess + 
   CLASS(EJBROLE)ACCESS(READ) ID(IBMUSER) 
SETROPTS RACLIST(EJBROLE )REFRESH 

   /* for MQ Web
RDEFINE EJBROLE MQWEB.com.ibm.mq.console.MQWebAdmin UACC(NONE)

These tend to be mixed case, so take care when defining them.

Starting the instance

If you use

S BAQSTRT,PARM='server1'
S BAQSTRT,PARM='server2'

If you use STOP BAQSTRT then both servers will stop.

If you use

S BAQSTRT.BAQ1,PARMS='server1' 
S BAQSTRT.BAQ2,PARMS='server2'

You can use STOP BAQ1 to stop just server1.

You can also use

S BAQSTRT,PARMS=’server1′,jobname=BAQ1
S BAQSTRT,PARMS=’server2′,jobname=BAQ2

Then you can use P BAQ1, and also have WLM give BAQ1 and BAQ2 different service classes, and so give them different priorities

Set up monitoring

There may be SMF data available, which you can collect if you enable the SMF collection classes.

You may have data from JMX which you collect and report.

Accessing the server

You will need to set up a profile and give permission to groups of people, just to be able to use the server.

You may need to protect individual applications, for example ability to start or stop applications, or to invoke the application.  This can be done once the basic setup has been done, and the system handed over.

For example in server.xml for z/OS connect

<zosconnect_service> 
<service name="stockquery" 
    serviceDescription="stockQueryServiceDescriptionColin" 
    id="stockQueryService" 
adminGroup="a3Admin,a4Admin" 
invokeGroup="a3Invoke" 
operationsGroup="a3Ops" 
readerGroup="a3Reader" 
/> 
</zosconnect_service>

It is good policy to only grant access to groups, and not individual ids as it simplified userid administration.  You would define a3Admin, a4Admin, a3Invoke as groups

 

How many servers do I need? Every one know this – or no one knows this

I was planning on installing a product on z/OS and was going through the documentation.  It is hard to see things that are not there, but I was surprised to see nothing about initial planning and how to set up the product.  I looked at other products – and they were also missing this information. It feels like this information is so obvious that every one knows – or the person responsible for the installation instructions only had experience of installing one image and documenting it.  (Tick the box, job done).

There are two reason for configuring more than one instance of a product

  1. You want multiple copies of the same configuration
  2. You want a different configuration.

This is obvious but it took me half an hour to realise this.  I’ll cover the implications of these decisions below

You want multiple copies of the same configuration

There are many reasons for this:

  1. You have to support more than one LPAR.
  2. You want to have more than one instance on an LPAR for availability, scalability and performance.  For example with a z/OS queue manager; it can support up to 10,000 channels, and log at about 100MB a second.  If you want to do more than this you need a second queue manager.
  3. You want availability by having instances running on multiple LPARs, so if one LPAR is shutdown  work can continue to flow to the other LPARs.

You want a different configuration

The main reason for this is isolation:

  1. You have different environments for example production and test.
  2. You have different customers for example you support, bank1 and bank2.  You want isolation so if bank1 fills up the disk space, bank2 is not affected.
  3. You want bank1 and bank2 systems to use different certificate authorities, so bank1 end users cannot connect to bank2 systems.
  4. You have different performance criteria – you configure bank1 systems to have higher proirity in LWM than bank2 systems.
  5. If you have bank1 and bank2 sharing a system, and you want to restart it, you have to get agreement from both the end users.  Getting agreement for a date from one bank is easier than getting agreement for a date from multiple banks.

Implications of “you want multiple copies of the same configuration”

Ideally you want to replicate a system with very little work – a so called cookie-cutter approach.

Configuration files

Create self contained configuration files.  For example with Liberty Web Server on z/OS each server has its own server.xml file.  Create a file called keystore.xml and include that in the server.xml file.  The server.xml configuration file may look like

<server>
<include location="${server.config.dir}/keystore.xml"/> 
<include location="${server.config.dir}/mq.xml"/> 
<include location="${server.config.dir}/saf.xml"/>
instance specific data
</server

With JCL you can use

//OUTPUT1 INCLUDE MEMBER=....

to include common configuration.

TCPIP ports

You are likely to use the same port number across different LPARs.  You can define a port as a SHAREPORT, and have multiple applications listening on the same port number on the same LPAR.

Started task userid

Have the instances start with the same userid, so they have the same  access to resources.

Liberty profilePrefix

Have the instances use the same profile prefix, default BBGZDFLT.  This is defined using RDEFINE APPLY BBGZDFLT…

Define who can access the instance using SAF EJBROLE for example giving different groups of people access to the  BBGZDFLT.zos.connect.access.roles.zosConnectAccess profile.

Isolated instances

If you want to isolate instances, then use the above list and make sure you use different values.

Future proof your definitions.

An advantage of “define something once in an include file, and reuse it” is that if you change the contents, for example /usr/lpp/mqm/V9R1M1/java/lib, you change it once, and restart the servers.

If the servers each have a unique configuration file, you have to make the same change in each file.   This can be good – for example you want to change this server this week, and that server next week.

You could also have an alias /usr/mq pointing to /usr/lpp/mqm/V9R1M1.   When you want to change the version of MQ, change the alias and restart the servers.  To undo the change, change the alias to the old version and restart.  This is much easier than changing the individual files ( how many change records would you have to raise?).

Liberty Specials.

With Liberty you have have one JCL procedure, and start multiple servers.  This environment is a mixture of multiple copies of the same configuration, and isolated instances.

Each server has its own server.xml file, but uses the same JCL file, so common userid etc.

You can start a liberty server

s baqstrt,parms='stockManager',jobname=smanager

And have WLM classify this under SMANAGER.

The default Liberty profile is stored under /var/zosconnect.  If you want to have a copy of this running on two LPARs you will need  to have the profiles store in different places.  You could have

/var/zonsconnect/LPAR1/…

but if you need to move the server to a different LPAR this might point to the wrong directory.

You could change the procedure so you pass in the location of the profile.

s baqstrt,parms=’stockManager’,jobname=smanager,profile=’/var/zosconnect1/instance1′

 

 

How do I find out about my VIPA configuration?

This follows on from setting up VIPA for the Liberty web server to provide High Availability.  I had a few problems setting it up, and this blog post is about some of the commands I used to get it working.

I cover

  1. Is the VIPA active?
  2. Where is it running
  3. Are applications processing requests

Some IP basics.

  1. Every connection has an IP address at each end.  An address looks like 10.3.4.15 or 4 * 8 bit numbers.
  2. My machine has several connections, ethernet, wireless, and a tunnelling connection to z/OS. Each connection has a different IP address.
  3. Packets get routed through the network depending on the destination IP address.  The router has logic like,  packets going to 10.4.5.* go does this connection, packets for 17.2.2.* go down that connection, any other packets – try sending them to down the connection 11.13.6.6.
  4. The router uses a netmask to calculate which connection to use.
    1. A net mask is a string of 1’s followed by 0s.  For example 255.255.255.0 – or 3 * 8 =24 ones.
    2. A router takes a packet IP address and a netmask and logically ands them together, and uses the result to decide where to route the packet.
    3. A connection handling 10.4.1.0 to 10.4.1.255 would have a netmask of 255.255.255.0 (also written /24 bits) a default connection may handle all packets for 10.* with a netmask of 255.0.0.0 or /8.

My scenario

  1. I have my desktop machine running Ubnutu Linux
  2. I have z/OS (called SOW1) running on my desktop using the zPDT.
  3. I have 3 TCPIP images (stacks) running on the z/OS image
    1. TCPIP running as the front end
    2. TCPIP2 running as a backend – this could be on another LPAR
    3. TCPIp3 running as a backend
  4. I have a VIPA defined with address 10.1.3.10

What configuration does Ubuntu have?

There are many commands to display network configuration information on Linux.

What address does Ubuntu have?

ip address gives a lot of information – but I did not use it

What packet routing does my desktop have?

the command ip route gives

  1. 10.1.0.0/24 dev eno1 proto kernel scope link src 10.1.0.3 metric 100
  2. 10.1.1.0/24 dev tap0 proto kernel scope link src 10.1.1.1
  3. 10.1.2.0/24 dev tap1 proto kernel scope link src 10.1.2.1
  4. 10.1.3.0/24 dev tap0 scope link
  5. 10.20.2.4 dev tap0 scope link
  6. 192.168.1.0/24 dev wlxd037450ab7ac proto kernel scope link src 192.168.1.67 metric 600

Bold line(2) shows

  • Traffic for any address between 10.1.1.0 and 10.1.1.255 (remember the netmask /24 means 24 bits or 255.255.255.0) goes  to device(connection) tap0
  • The IP address for the desktop end of the connection is 1.1.1.1

Bold line(4) shows

  • that any traffic 10.1.3.0 to 10.1.3.255 goes to device tap0

The command used to set this up was sudo ip route add 10.1.3.0/24 dev tap0

Bold line(5) shows

  • that traffic to 10.20.2.4 goes to device tap0.

The command used to set this up was sudo ip route add  10.20.2.4 dev tap0

What is the routing for an IP address ?

You can use traceroute command to display which route a packet would take. For example

  • traceroute 10.1.3.10
    • traceroute to 10.1.3.10 (10.1.3.10), 30 hops max, 60 byte packets
    • 1 10.1.3.10 (10.1.3.10) 4.963 ms 4.980 ms 5.887 ms

For a connection that is not defined

traceroute 10.20.2.5 
traceroute to 10.20.2.5 (10.20.2.5), 30 hops max, 60 byte packets
1 bthub.home (192.nnn.1.mmm) 3.170 ms 4.742 ms 6.379 ms
2 * * *

So we can see it went to my bt hub  wireless router.

You can also use the ping command.  On linux there is the -R option for display route.

ping -R 10.1.3.10 
PING 10.1.3.10 (10.1.3.10) 56(124) bytes of data.
64 bytes from 10.1.1.2: icmp_seq=1 ttl=64 time=2.54 ms
NOP
    RR: 10.1.1.1
        10.1.1.2
        10.1.1.1

The request went to 10.1.1.1.  10.1.1.2 caught it, and sent the reply back, via 10.1.1.1

I was looking for my VIPA address, 10.1.3.10, and we can see it got to 10.1.1.2.

For the ping to work, there must be a server processing the ping request.  If there are no applications processing the VIPA, the VIPA is not active, so a ping will fail.

A successful ping to a VIPA address means a packet can get to the LPAR, be processed and  the reply set back.  If the ping does not respond it could be

  1. The VIPA is not active
  2. The VIPA is active and a packet was sent to the LPAR hosting the VIPA, but it could not send a response back due to a set up error.

How to issue change TCPIP configuration on z/OS

You can change the configuration of a TCPIP image using the operator command

V TCPIP,TCPIPn,OBEY,filename

Where

  1. V TCPIP tells z/OS to route this TCPIP
  2. TCPIPn is the name of the TCPIP address space to direct the command to, for example V TCPIP,TCPIP2.  If there is only one TCPIP running you can use V TCPIP,,
  3. OBEY this is the TCP command
  4. filename is the parameter passed to the OBEY command.   The filename containing the commands/configuration to be executed.

How to display information on z/OS

There are three ways of displaying TCPIP information, for example the IP address(es) of the TCP image

  1. The operator command D TCPIP,TCPIP2,NETSTAT,HOME… similar in syntax to the V TCPIP command above
  2. The TSO command NETSTAT HOME TCP TCPIP2
  3. The USS command netstat -h -p tcpip   The commands are similar to but different from Linux commands!

The output is usually similar between the commands.

What is the IP address of my TCPIP image?

From the TSO NETSTAT HOME command

EZZ2350I MVS TCP/IP NETSTAT CS V2R4 TCPIP Name: TCPIP2 17:15:53
EZZ2700I Home address list:
EZZ2701I Address   Link         Flg
EZZ2702I -------   ----         ---
EZZ2703I 10.1.1.3  ETH1         P
EZZ2703I 10.1.2.3  ETHB
EZZ2703I 172.1.1.2 EZASAMEMVS
EZZ2703I 10.1.3.10 VIPL0A01030A I
EZZ2703I 127.0.0.1 LOOPBACK

10.1.1.3  ties up with the information on the desktop which had IP addresses had 10.1.1.1 for device tap0, and 10.1.2.3 ties up with 10.1.2.1 for device tap1.

For the links

  1. I configured link ETH1 and ETHB.
  2. The VIPL0A01030A takes the IP address and converts it to hex so 10.1.3.10 becomes VIPL 0A 01 03 0A
  3. EZASAMEMVS is prefix EZA and “SAME MVS”.   This is generated by TCPIP from the DYNAMIXCF configuration.

What routing is there?

The command TSO command NETSTAT ROUTE TCP TCPIP2 or the USS command netstat -r -p tcpip gives

MVS TCP/IP NETSTAT CS V2R4 TCPIP Name: TCPIP2 16:15:43 
Destination  Gateway  Flags Refcnt     Interface 
----------- -------   ----- ------     --------- 
Default      10.1.1.1 UGS   0000000000 ETH1 
10.0.0.0/8   0.0.0.0  US    0000000000 ETH1 
10.1.1.3/32  0.0.0.0  UH    0000000000 ETH1 
10.1.2.0/24  0.0.0.0  US    0000000000 ETHB 
10.1.2.3/32  0.0.0.0  UH    0000000000 ETHB 
127.0.0.1/32 0.0.0.0  UH    0000000000 LOOPBACK 
172.1.1.1/32 0.0.0.0  UHS   0000000000 EZASAMEMVS 
172.1.1.2/32 0.0.0.0  UH    0000000000 EZASAMEMVS 
172.1.1.3/32 0.0.0.0  UHS   0000000000 EZASAMEMVS

This shows that to get to 10.1.2.0 to10.1.2.255 (with a netmask of /24 or  255.255.255.0) it goes by link(interface) ETHB

What is happening to my VIPA on  z/OS?

On the OSA connection (think ethernet connection)  from the desktop to my z/OS environment there could be several LPARs using the OSA, each with multiple TCP images.

The operator command D TCPIP,TCPIP2,SYSPLEX,VIPADYN issued on any LPAR on any active TCPIP image gives a Sysplex view of the VIPA configuration

11.54.05 STC09473  EZZ8260I SYSPLEX CS V2R4 387                         C  
VIPA DYNAMIC DISPLAY FROM TCPIP    AT S0W1                                 
IPADDR: 10.1.3.10  LINKNAME: VIPL0A01030A                                  
  ORIGIN: VIPADEFINE                                                       
  TCPNAME  MVSNAME  STATUS RANK ADDRESS MASK    NETWORK PREFIX  DIST       
  -------- -------- ------ ---- --------------- --------------- ----       
  TCPIP    S0W1     ACTIVE      255.255.255.0   10.1.3.0        DIST       
  TCPIP2   S0W1     BACKUP 001                                  DEST       
  TCPIP3   S0W1     ACTIVE                                      DEST       
IPADDR: 10.1.4.10                                                          
  TCPNAME  MVSNAME  STATUS RANK ADDRESS MASK    NETWORK PREFIX  DIST       
  -------- -------- ------ ---- --------------- --------------- ----       
  TCPIP3   S0W1     ACTIVE      255.255.255.0   10.1.4.0                   
  TCPIP2   S0W1     MOVING      255.255.255.0   0.0.0.0                    

IPADDR:10.1.3.10

The VIPA 10.1.3.10 was created using a VIPADEFINE.

We see that TCPIP on S0W1 “owns” the VIPA  10.1.3.10 and is responsible for distributing requests.  This image is DISTributing requests to other TCPIP Images.

The DEST means it is a target for connections ( a DESTination)  and has a server processing requests. BOTH means it is a DESTination and  DISTributing connections, and has a server processing them.

IPADDR:10.1.4.10

We can see that TCPIP3 is processing request.   TCPIP2 is not processing requests, it does not have a network prefix.

How are DVIPA connection requests distributed?

You need to ask the TCP that owns the VIPA. In my case, from the previous section, this is TCPIP.

The TSO command NETSTAT VDPT TCP TCPIP  or the USS command netstat -O -p tcpip gives

MVS TCP/IP NETSTAT CS V2R4 TCPIP Name: TCPIP 16:49:42 
Dynamic VIPA Destination Port Table for TCP/IP stacks: 
Dest IPaddr DPort DestXCF Addr Rdy TotalConn  WLM TSR Flg 
----------- ----- ------------ --- ---------  --- --- --- 
10.1.3.10   08443 172.1.1.2    001 0000000005  01 100 
  DistMethod: Roundrobin 
  TCSR: 100 CER: 100 SEF: 100 
  ActConn:   0000000000 
10.1.3.10   08443 172.1.1.3    000 0000000000  01 100 
  DistMethod: Roundrobin 
  TCSR: 100 CER: 100 SEF: 100 
  ActConn:  0000000000

We have a heading showing the TCPIP image name, and we are looking at Dynamic VIPA Destination Port Table for TCP/IP stacks.

When report was generated the application on 172.1.1.2 (TCPIP2) was active, and the application on TCPIP3 had been stopped.

From

Dest IPaddr DPort DestXCF Addr Rdy TotalConn  WLM TSR Flg 
----------- ----- ------------ --- ---------  --- --- --- 
10.1.3.10   08443 172.1.1.2    001 0000000005  01 100 
ActConn: 0000000000

We can see

  • Dest IPaddr: 10.1.3.10 is our VIPA address
  • DPort :08443 is the destination port
  • DestXCF Addr: 172.1.1.2 is where the request is going – we know this is TCPIP2.  It would be good if it could say SOW1.TCPIP2
  • Rdy: 001 there is one active application listening
  • TotalConn: 0000000005 there have been 5 requests to this application
  • ActConn: 0000000000 there are no active connections to this application

As TotalConn is greater than 0, this means there have been connections to the application, so is a good sign to show the set-up is working.

Because the front end TCPIP is distributing the requests using Roundrobin – each TCPIP should get a connection in turn.

When I started the application on TCPIP3, and started another application on TCPIP2.  When I ran a workload I had 10 requests go to TCPIP3 and 10 requests go to TCPIP2.  On TCPIP2 the requests were evenly distributed between the two servers.  It looked like round robin, but I do know know if this was design or chance

How do I know if I have a backup configuration defined?

I set up TCPIP with a VIPABACKUP configuration.   The operator command d tcpip,tcpip,sysplex,vipadyn  gave me

VIPA DYNAMIC DISPLAY FROM TCPIP AT S0W1 
IPADDR: 10.1.3.10 LINKNAME: VIPL0A01030A 
ORIGIN: VIPADEFINE 
TCPNAME  MVSNAME  STATUS RANK ADDRESS MASK    NETWORK PREFIX  DIST 
-------- -------- ------ ---- --------------- --------------- ---- 
TCPIP    S0W1     ACTIVE      255.255.255.0   10.1.3.0        DIST 
TCPIP2   S0W1     BACKUP 001                                  DEST 
TCPIP3   S0W1     ACTIVE                                      DEST

We can see that TCPIP2 is defined as being the backup.

What connections does this TCPIP have

You can use the TSO command NETSTAT ALLCONN TCP TCPIP2 or the USS command  netstat -a -p tcpip2  to show what sessions are active.

MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP2          07:19:15  
User Id  Conn     Local Socket           Foreign Socket         State  
-------  ----     ------------           --------------         -----  
MYSERVER 0000003F 10.1.3.10..8443        0.0.0.0..0             Listen 
MYSERVER 0000003C 10.1.3.10..8443        0.0.0.0..0             Listen 
MYSERVER 0000004B 10.1.3.10..8443        0.0.0.0..0             Listen 

This shows there are 3 instances of MYSERVER running using IP address 10.1.3.10 and port 8443.

There will usually be a lot of output.  You can filter the request by

  • tso netstat allconn tcp tcpip2 (ipaddr 10.1.3.10
  • tso netstat allconn tcp tcpip2 (port 8443
  • uss  netstat -a -p tcpip2 -I 10.1.3.10 
  • uss netstat -a -p tcpip2 -P 8443
  • operator D TCPIP,tcpip2,netstat,allconn,ipaddr=10.1.3.10 
  • operator D TCPIP,tcpip2,netstat,allconn,port=8443

 

What VIPA stuff does this TCPIP have?

USS netstat -v  -p tcpip3 or TSO NETSTAT VIPADYN TCP TCPIP3

MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP3          10:21:04 
Dynamic VIPA: 
  IP Address      AddressMask     Status    Origination     DistStat 
  ----------      -----------     ------    -----------     -------- 
  10.1.3.10       255.255.255.0   Active                    Dest 
    ActTime:      08/30/2020 10:40:10 
  10.1.4.10       255.255.255.0   Active    VIPARange Bind 
    ActTime:      08/30/2020 11:03:05        JobName:        MYSERVER

The 10.1.3.10 VIPA is created using VIPADEFINE.  VIPA 10.1.4.10 was create by means of VIPARANGE.

There may be multiple jobs processing the port. MYSERVER is just one of them.

 

HA Liberty web server – implementing VIPA with distributing connections.

Overview of VIPA solutions

You can implement VIPA, where you give your application its own IP address, across multiple TCPIP images.   This solves the problem of certificates not matching the host IP address.

You can have

  • One TCPIP image processing the connection requests. You have multiple TCPIP images – but only one TCPIP image at a time processes the connections.   If the TCPIP image stops, another can take over.
  • Multiple TCPIP stacks can process connection requests. A front end TCPIP image takes the connection requests and distributes them to TCPIP instances where the application is running.   You can use load balancing across multiple TCPIP images such distributing the connection techniques such as Round Robin, or Hot Standby.  This is based on Sysplex Distributor technology.

This blog post discusses the second case.

To provide background information, I created

Sysplex Distributor background

Sysplex Distributor is like having a router inside your TCPIP on z/OS; it can route traffic transparently to other TCP images in the environment.  The Sysplex Distributor can distribute connection requests for a  VIPA requests to TCPIP images where the application is running.    This set up is called Distributed VIPA (DVIPA).

It took me about 2 weeks to get a Sysplex DVIPA working – about a week understanding the documentation on VIPA, and the other week trying to understand why it didn’t work – and the simple configuration error I had.

I’ll break it down into simple stages which should help you understand the documentation.

The scenario

The scenario I used was going from my laptop, to z/OS running under zPDT on my laptop.  In effect there was an OSA connection between Ubuntu and my z/OS LPAR.  If you do not know what an OSA is, think of it as an Ethernet connection which can plug into multiple z/OS LPARs.

  1. My Ubuntu had an address of 10.1.1.1 over the tunnel connection
  2. I had one LPAR  with three TCPIP images
    1. TCPIP, host address 10.1.1.1 for the primary “front end”
    2. TCPIP2, host address 10.1.1.2 for the backup “front end” and where a server instance was running
    3. TCPIP3, host address 10.1.1.3 with a server instance running.
  3. I used a VIPA address of 10.1.3.10

The steps are

  1. Connect from Ubuntu to the z/OS
  2. Configure the LPAR(s)
  3. Define the VIPA configuration
  4. Define the “routing” to where the server was running.
  5. Getting the server to use the VIPA
  6. Commands to see what is going on (or not as the case may be)

Connect from Ubuntu to the z/OS

I had an existing tunnel connection from Ubuntu to z/OS.

I used the ip route command

sudo ip r add 10.1.3.10 link tap0

to define the route to 10.1.3.10 via the tunnelling device tap0 .  This looks like an OSA connection into z/OS.

Configure the LPAR(s)

The Sysplex Router uses XCF communications between LPARs and TCPIP images on the LPARs.

You configure each TCPIP with a statement

IFCONFIG DYNAMICXCF 172.1.2.x

My “frontend” TCP/IP had  IFCONFIG DYNAMICXCF 172.1.2.1, the other two TCP/IP images had 172.1.2.2 and 172.1.2.3.

The 172.1.2.x address can be any address not used by your enterprise.  It is internal to the Sysplex configuration.

Define the VIPA configuration

You define the configuration once, in the front end TCPIP.  It is visible from the other TCP/IP images because the information is shared via the DYNAMICXCF.

You define the VIPA in the front end TCP/IP image with a VIPADEFINE netmask address.  I used

VIPADYNAMIC
  VIPADEFINE 255.255.255.0 10.1.3.10
...
ENDVIPADYNAMIC

You can define VIPABACKUP in another TCPIP image, so if the main front end TCP/IP is not available then a backup can take the traffic and distribute it to the other TCP/IP stacks.

When the main front end TCP/IP image is restarted, you can have it take back the routing.

Define the “routing” to where the server was running.

You can define a variety of ways of routing the work

  • A Hot Standby – where the input to the front end TCP/IP image is routed to a single “backend” application’s TCP/IP image.  If this fails, the work is routed to a running backup application.
  • A Round Robin – where requests are routed in turn to each TCPIP with an active application.
  • Routing depending on WLM or other load characteristics.

This routing is done by the VIPADISTRIBUTE command in the front end TCP/IP image.

The definitions for the front end TCPIP

IPCONFIG SYSPLEXROUTING 
    DYNAMICXCF 172.1.1.1 255.255.255.0 3 

VIPADYNAMIC 
   VIPADEFINE 255.255.255.0 10.1.3.10 

   VIPADISTRIBUTE DEFINE DISTM ROUNDROBIN 10.1 .3.10 PORT 8443 
      DESTIP 
         172.1.1.2 
         172.1.1.3 
ENDVIPADYNAMIC 

This routes connection requests to the TCPIP images with the DYNAMICXCF of 172.1.1.2 (TCPIP2) and 172.1.1.3 (TCPIP3)

The definitions for TCPIP2

IPCONFIG SYSPLEXROUTING 
DYNAMICXCF 172.1.1.2 255.255.255.0 3 

The definitions for TCIP3

IPCONFIG SYSPLEXROUTING 
DYNAMICXCF 172.1.1.3 255.255.255.0 3

Use of VIPABACKUP

If the backup TCPIP front end image it used, it can have its own VIPADISTRIBUTE statement, or just use the same statement shared from the main front end TCP/IP image.  It is better to have the VIPADISTIBUTE statements, for the case when the backup TCPIP is started before the front end TCPIP.   The backup needs the VIPADISTRIBUTE statements. (These statements can be put into a PDS, and included using the INCLUDE dataset(member) statement in both primary and backup environments.)

To define TCPIP2 as a backup I used

VIPADYNAMIC 
    VIPABACKUP MOVEABLE IMMEDIATE 255.255.255.0 10.1.3.10 
    VIPADISTRIBUTE DEFINE DISTM ROUNDROBIN 10.1.3.10 PORT 8443 
        DESTIP 
        172.1.1.2 
        172.1.1.3 
ENDVIPADYNAMIC 

Getting the server to use the VIPA

The TCP/IP images hosting the applications just have the IFCONFIG DYNAMICXCF aa.bb.cc.dd statement.  They do not have any VIPADYNAMIC … ENDVIPADYNAMIC statements unless they are the main or backup front end TCP/IP images.

The application can connect using the VIPA address, for example create the SSLSOCKET Listener passing the VIPA address.   You can also configure TCP/IP so when a port is used, it binds to a particular IP address for example

PORT 8443 BIND 10.1.3.10

So an application using port 8443 to listen, will get IP address 10.1.3.10 – which in my case is a VIPA address.

You can use

PORT
9443 TCP * SHAREPORT BIND 10.1.3.7

to allow the port to be shared by many applications on a TCPIP Instance.

How are the connections distributed?

The VIPADISTRIBUTE  has many routing options. I used Hot Standby and Round Robin.

With RoundRobin, I had

  • the front end TCPIP
  • TCPIP2 with two Liberty servers
  • TCPIP3 with  one Liberty server

I ran some workload and found that the server on TCPIP3 had half the requests, and each of the two servers on TCPIP2 had a quarter of the overall requests.  This shows the routing is done at the TCPIP level – not the number of servers.

HA Liberty web server – implementing VIPA using the simpler VIPARANGE technology

Overview of VIPA solutions

You can implement VIPA, where you give your application its own IP address, across multiple TCPIP images.   This solves the problem of certificates not matching the host IP address.

  • One TCPIP image processes the connection requests. You have multiple TCPIP images – but only one TCPIP image at a time processes the connections.   If the TCPIP image stops, another can take over.
  • Multiple TCPIP stacks can process connection requests. This uses Sysplex Distributor;  a front end TCPIP image takes the connection requests and distributes them to TCPIP instances where the application is running.   You can use load balancing such as Round Robin, or Hot Standby.

This blog post discusses the first case.

To provide background information, I created

Using VIPARANGE configuration

The technique uses the VIPARANGE configuration statement.

The concept is that many LPARs can be attached to an OSA adapter, one, just one,  TCPIP stack (I dont know which of the available images) takes the connection requests and passes them on to the application on that TCPIP image.

You allocate a range of TCPIP address for your applications, with the same network prefix, for example 9.4.6.x   Allocate a host id to a Liberty, for example 9.4.6.7.   The Liberty instance keeps this address for whereever it runs.  You configure your routers so that  9.4.5.* are routed to the OSA adapter.

For each TCPIP image where you want to run Liberty, add to the  TCPIP startup configuration (or to an OBEY file)

VIPADYNAMIC 
   VIPARANGE DEFINE 255.255.255.0 9.4.6.7 
ENDVIPADYNAMIC

The 255.255.255.0 is the  subnet mask.  If your organisation uses a different subnet mask, it affects the IP addresses used.

These instructions say that they are defining a range of IP addresses on this LPAR, for  9.4.6.1 to 9.4.6.254

If an application connects to TCPIP, and the bind specifies a value in this range (9.4.6.1 to 9.4.6.254) then it is considered a VIPA address.  If the application used 9.4.6.7 this would count as a VIPA address.

When your application (Liberty) connects to TCP and uses an address in the VIPARANGE,  the TCPIP instance will create a dynamic IP address.   When I started my server application,   I got a z/OS console message

EZD1205I DYNAMIC VIPA 9.4.6.7 WAS CREATED USING BIND BY jobname ON TCPIP2.

When I shut down the server I got

EZD1298I DYNAMIC VIPA  9.4.6.7 DELETED FROM TCPIP2
EZD1207I DYNAMIC VIPA 9.4.6.7 WAS DELETED USING CLOSE API BY jobname ON TCPIP2

If the VIPA address is active on more than one TCPIP image, just one image will get all of the requests.  If you stop this image, another TCPIP image can take over.

If you have a different server using the same IP address, but a different port number, because they use the same IP address, the same LPAR will process the requests.

With VIPAROUTE you do not get connections distributed to more than one TCPIP image.

In your browser use  9.4.6.7:9443 address, the network router, routes this to the OSA, a TCPIP captures this and passes it to the application (Liberty).   As part of the handshake Liberty sends down its certificate, which has a SAN of  9.4.6.7 which matches the IP address, so this works.

On another day, when a different z/OS image is capturing the VIPA address connections,  the TCPIP address is 9.4.6.7 as before, so this matches the SAN in the certificate.

Testing it

In a test I used “ping -R 9.4.6.7 ” to the VIPA address.
This reported it was sent to TCPIP stack with 10.1.1.2. When I shut this TCPIP image down, ping reported the request was sent to 10.1.1.3.  It did this with no manual intervention.