Trace PAGENT and AT-TLS

Many components of TCPIP write information to syslogd. This is a process that captures the data sent to it over a UDP socket, and writes it to files in the Unix file system. If the syslogd is not active then messages may be written to the job log. When I was trying to set up AT-TLS, I had 10s of messages on the job log, each time a client tried to use AT-TLS.

The IBM documentation is not very clear, it tells you how to turn on debug, trace etc but does not clearly explain the difference, and when they are used.

It look like the PAGENT job is to take a configuration file, parse it, and pass the data to TCPIP.

If you are using AT-TLS to set up TLS channels, the trace data comes from the TCPIP address space.

Modify the PAGENT address space.

You can pass commands to the PAGENT address space.

Configuration processing.

You can control how much information is logged when parsing configuration statements. The value 127 covers most levels of information (including warnings).

F PAGENT,LOGLEVEL,LEVEL=127

You can use

F PAGENT,TRACE,LEVEL=..
F PAGENT,DEBUG,LEVEL=…

But these do not seem to control the level of trace produced.

Trace PAGENT startup and parse of the configuration

To collect the PAGENT startup and display information on the configuration file as it is processed change the started task JCL to include the -d option.

//PAGENT EXEC PGM=PAGENT,REGION=0K,TIME=NOLIMIT,
// PARM=’ENVAR(“_CEE_ENVFILE_S=DD:STDENV”)/ -d 4‘

By default the output trace goes to /tmp/pagent.log. It has content like

05/29 17:17:54 EVENT :005: pzos_install_A_PolicyRule: Finished installing policy rule: ‘REMOTE-TO-CSQ1’

Trace PAGENT use of TLS

My PAGENT JCL is

//CPAGENT  PROC 
//  SET EN='ENVAR("_CEE_ENVFILE_S=DD:STDENV")' 
//PAGENT   EXEC PGM=PAGENT,REGION=0K,TIME=NOLIMIT, 
//       PARM='&EN/                                      -d 4' 
//STDENV   DD DISP=SHR,DSN=USER.Z24C.TCPPARMS(PAGENTEN) 
//SYSPRINT DD SYSOUT=H 
//SYSERR   DD SYSOUT=H 
//SYSOUT   DD SYSOUT=H 
//* 
//CEEDUMP  DD SYSOUT=*,DCB=(RECFM=FB,LRECL=132,BLKSIZE=132)

With the environment file USER.Z24C.TCPPARMS(PAGENTEN) having

_CEE_ENVFILE_COMMENT=# 
PAGENT_CONFIG_FILE=//'USER.Z24C.TCPPARMS(PAGENTCF)' 
LIBPATH=/usr/lib 
GSK_TRACE=0x00
GSK_TRACE_FILE=/var/log/GSK

You can collect the GSK calls made by PAGENT at startup by using the environment variables

GSK_TRACE=0xFF
GSK_TRACE_FILE=/var/log/GSK

Note: This turns it on for all requests! I could not find how to do selective tracing.

You have to format the trace file using

gsktrace /var/log/GSK /var/log/GSK.txt

This has about 40 lines with information like

05/28/2022-17:53:30 Thd-5 INFO crypto_init(): SHA-1 crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): SHA-224 crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): SHA-256 crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): SHA-384 crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): SHA-512 crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): DES crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): DES3 crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): AES 128-bit crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): AES 256-bit crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): AES-GCM crypto assist is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): Cryptographic accelerator is not available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): Cryptographic coprocessor is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): Public key hardware support is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): Max RSA key sizes in hardware – signature 4096, encryption 4096,
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): Maximum RSA token size 3500
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): ECC clear key support is available
05/28/2022-17:53:30 Thd-5 INFO crypto_init(): ECC secure key support is available. Maximum key size 521

Remember this is the PAGENT invoking GSK – but PAGENT does not do any TLS work – this is done by TCPIP.

Trace an AT-TLS connection.

You need to enable trace in the AT-TLS configuration for example

TTLSEnvironmentAction CSQ1-INBOUND-ENVIRONMENT-ACTION
{
HandshakeRole SERVER
TTLSKeyringParmsRef CSQ1-KEYRING
TTLSCipherParmsRef CSQ1-CIPHERPARM
TTLSEnvironmentAdvancedParmsRef CSQ1-ENVIRONMENT-ADVANCED
Trace 255
}

If the syslogd daemon is not configured the output goes to the TCPIP job log.

If the syslogd daemon is configured, you need to have a syslogd configuration with

*.TCPIP.*.* /var/log/TCPIP
*.*.*. /var/log/all

Where TCPIP is the TCPIP address space name, and *.*.*.* is a catch-all. It took me about a day to realize that my trace was being thrown away because I didn’t have TCPIP, and the catch-all.

The trace file has data like

May 29 09:25:30 S0W1 TTLS[67174439]: 09:25:30 TCPIP EZD1284I TTLS Flow GRPID: 00000021 ENVID: 00000009 CONNID: 00000053 RC: 0 Set GSK_USER_DATA(200) – 000000007F280610

May 29 09:25:30 S0W1 TTLS[67174439]: 09:25:30 TCPIP EZD1285I TTLS Data CONNID: 00000053 RECV CIPHER 160303007B

This shows a GSK call was made to GSK_USER_DATA which completed with return code 0, and the connection RECeiVed data which was the CIPHER specs ( 4 chars or 2 chars) 160303007b.

You need to configure the syslogd procedure.

See if SYSLOGD is running, if not, try to start it. If it does not exist…

Copy /usr/lpp/tcpip/samples/syslog.conf to its default configuration file /etc/syslog.conf, or another file.
Copy TCPIP.SEZAINST(SYSLOGD) to your proclib concatenation.
The program uses environment variables defined in STDENV to control operations. The default configuration file location is /etc/syslog.conf

You can configure syslog.conf for example

*.TCPIP.*.* /var/log/%Y/%m/%d/TCPIP
*.SYSLOGD.*.* /var/log/%Y/%m/%d/syslogd
*.err /var/log/%Y/%m/%d/errors

This says all messages for SYSLOGD go to a file like /var/log/2022/05/14/syslogd, and error messages go to /var/log/2022/05/14/errors

This means you get a file of messages for each day. For me, I just used /var/log/syslogd.log and /var/log/errors.log, and deleted them periodically.

I also added a the end of the file, the catchall

*.*.*.* /var/log/all.log

ISPF interface

There is an ISPF syslog browser tool which displays information about the logs, and helps you browse the logs of interest. The documentation for this is not very good.

I got this to work by experimentation. I created an exec like MYSYSLOG

/* Rexx */ 
address ispexec 
"LIBDEF ISPMLIB DATASET ID('TCPIP.SEZAMENU') STACK" 
"LIBDEF ISPPLIB DATASET ID('TCPIP.SEZAPENU') STACK" 
address tso "ALTLIB ACTIVATE APPLICATION(CLIST) 
              DATASET('TCPIP.SEZAEXEC') " 
"SELECT CMD(EZASYRGO) NEWPOOL PASSLIB NEWAPPL(EZAS)" 
address tso "ALTLIB DEACTIVATE APPLICATION(CLIST)" 
"LIBDEF ISPPLIB" 
"LIBDEF ISPMLIB"

You can execute this from ISPF option 6 or have this built into the ISPF panels.

Originally this exec was called syslogd; when I used it, I got

SYSTEM COMPLETION CODE=4C5 REASON CODE=77A53217

Where 4C5 is TCPIP, and 3217 – the program has the wrong AC (APF related). This is because there is a command syslogd which was executed in preference to my exec. When I renamed the exec to MYSYSLOG it used the exec and it worked fine!

The first panel is

EZASYP01 ----------------- z/OS CS Syslogd Browser ---------------- Row 1 of 1
Command ===>                                                  Scroll ===> PAGE
                                                                               
Enter syslogd browser options                                                  
  Recall migrated data sets ==> NO     (Yes/No) Recall data sets or not        
  Maximum hits to display   ==> 200    (1-99999) Search results to display     
  Maximum file archives     ==> 30     (0-400) Days to look for file archives  
  Display start date/time   ==> YES    (Yes/No) Retrieve start date/time       
  Display active files only ==> NO     (Yes/No) Active files only, no archives 
  DSN Prefix override value ==>                                                
                                                                               
Enter file or data set name of syslogd configuration, or select one from below:
                                                                               
  File/DS Name ==> /etc/syslog.conf
                                                                               
Press ENTER to continue, or press END PF key to exit without a selection       
                                                                               
Line commands: S Select, R Remove from list, B Browse content, E Edit content  
                                                                               
Cmd Recently used syslogd configuration file or data set name                  
--- -------------------------------------------------------------------------- 
    /etc/syslog.conf                                                          
******************************* Bottom of data ********************************

Pressing enter, gave me another panel with

EZASYP00 ----------------- z/OS CS Syslogd Browser ---------------- Row 1 of 6
OPTION ===>                                                   Scroll ===> PAGE
                                                                               
Select one of the following, or press END PF key to exit the syslogd browser   
                                                                               
  1 Change current syslogd configuration file and/or options                   
  2 Guide me to a possible syslogd destination                                 
  3 Clear guide-me hits (indicated by ==> in the Cmd column)                   
  4 Search across all active syslogd files                                     
                                                                               
Current config file ==> /etc/syslog.conf                                      
                                                                               
Line commands: B Browse, A List archives, S Search active file and archives,   
               SF Search active file, SA Search archives, I File/DSN info      
                                                                    Archive    
Cmd Rule/Active UNIX file name                    Start Time        Type Avail.
--- --------------------------------------------- ----------------- ---- ------
    *.SYSLOGD*.*.*                                28 May 2022 13:31 None 0     
    /var/log/syslogd                                                          
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    *.INETD*.*.*                                  Empty       N/A   None 0     
    /var/log/inetd                                                            
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    auth.*                                        Empty       N/A   None 0     
    /var/log/auth

I could then browse the error log for SYSLOGD.

You can search for userid, strings etc, and give date ranges.

However for my small, one person usage, I found it was easier to use Unix services and use the command

oedit /var/syslogd.log

to edit the file.

Capturing the right data

If the config file you can specify options like

*.TCPIP.*.* /var/log/TCPIP
*.*.*. /var/log/all

The entries are Userid.Jobname.facility.priority.

PAGENT AT-TLS request are reported via TCPIP.

I could not find how to filter the TCPIP data so the AT-TLS data went to one file, and other TCPIP data went to another file. For TCPIP it looks like the “facility” is either “daemon” or “auth”, which you can specify in the TTLS configuration. So not very useful.

Getting AT-TLS and PAGENT to work on z/OS – start here.

With traditional TLS applications, the application code has to issue the requests to use TLS, for example specify the keystore, and which cipher specs to use and does the encryption and decryption of the data. The application then issues TCP send and receive request as usual.

With AT-TLS, the TLS work is moved out of the application and into the TCPIP subsystem. The application just does the normal sends and receives, and TCPIP does the work of establishing the session and handling the encryption. There are rules and policies to define how the session should be established. It uses the PAGENT address space (Policy Agent) to manage the configuration.

Is it easier than having MQ or WAS Liberty do the TLS stuff? – I don’t think so. When it works it is fine. Getting it working is a challenge, because the trace and diagnostics are poor.

My other blog posts on PAGENT and AT-TLS

What is PAGENT?

Having used PAGENT to configure AT-TLS with TCPIP, I see PAGENT is a program which reads configuration information from a file – and gives the configuration to TCPIP. TCP then does the work.

General

It feels that the PAGENT setup and configuration was not designed with the z/OS environment in mind. It “breaks” so many things.

You can have only one PAGENT running per LPAR – even with different name. This means you cannot have a “test” and production PAGENT in the same LPAR.
PAGENT can be configured to have information on:
1. Common Intrusion detection services (IDS).
2. Common IP filtering, and manual and dynamic virtual private network (VPN) tunnels (IPSEC).
3. Common Routing (Policy-based routing enables the TCP/IP stack to make routing decisions that take into account criteria other than just the destination IP address. The additional criteria can include job name, source port, destination port, protocol type (TCP or UDP), source IP address, NetAccess security zone, and multilevel secure environment security label).
4. AT-TLS Common definitions.
5. AT-TLS for TCPIP Image level which can have sections on
  1. IDS
  2. IPSec
  3. Qos
  4. Routing
  5. AT-TLS.
As there is only one active PAGENT allowed per LPAR, you have to make your configuration changes to the production PAGENT, refresh it, and fix any configuration errors. The documentation says “make a change to production, if it doesn’t work back out the changes”!
There is one initial configuration file per PAGENT, which can “include” other files. You cannot have a concatenated list of files.
You cannot validate definitions before making them active. The configuration is processed only when the referenced TCPIP stack is active.
Error messages do not have error message numbers, so there is no ability to look up the errors messages.
It lacks good diagnostics. For example
1. I got error message “Resource temporarily unavailable” when it could not find the security profile “EZB.INITSTACK.*.TCPIP2” on my system. The PAGENT code checks to see if the profile exists and if not, it dies quietly. It does not actually use the security profile which would cause RACF to produce a message saying missing profile.
2. I deliberately misconfigured a file to use a file that does not exist. It just reported …processing_Stmt_TTLSConfig: processing: ‘ TTLSConfig //’USER.Z24C.TCPPARMS(BLAHBLAH)’ . It should report file not found. Some missing files get “Cannot get FILE handle for information.”

My set up

I could not find any good guidance on setting up PAGENT and AT-TLS, so I’ve documented what I did. It may not be correct…

It took about a day to understand the AT-TLS setup – as I was a typical user with typos etc which slowed me down.

Errors

I naively assumed errors would be reported in //SYSPRINT. On my system they were in /tmp/pagent.log. This file location can be configured with an Environment variable.

The output can be verbose, so I use oedit, and ISPF search

f err 15 25

to find the errors. You may find fields SYSERR or OBJERR.

When errors occur, you do not get file and line number of the error. You have to hunt around. Invalid statements are often just ignored.

With a configuration error the PAGENT job gave me a message on syslog

EZZ8438I PAGENT POLICY DEFINITIONS CONTAIN ERRORS FOR TCPIP : TTLS

In the /tmp/pagent.log file I had

05/30 07:21:12 EVENT :005: pinit_fetch_TTLS_policy_profile: Processing Image TTLS config file: ‘//’USER.Z24C.TCPPARMS(TTLS)” for image ‘TCPIP’

05/30 07:21:12 OBJERR :005: process_TTLS_attribute_table: Unknown attribute ‘ZocalAddr’ for TTLSRule

My common mistakes were

spelling errors for example TLSConfig instead of TTLSConfig. (I commented, then uncommented a line and lost the initial T)
incorrect dataset names, either the data set, or the member.

Configuration concepts

In the PAGENT configuration file, the AT-TLS specific stuff is like

CommonTTLSConfig //’USER.Z24C.TCPPARMS(TTLSCOM)’
tcpImage TCPIP //’USER.Z24C.TCPPARMS(PAGENT)’
TcpImage TCPIP2 //’USER.Z24C.TCPPARMS(PAGENTT2)’

This defines common stuff for AT-TLS in //’USER.Z24C.TCPPARMS(TTLSCOM)’, and specific TCPIP image in its own file.

The TCPIP specific file has

TTLSConfig //’USER.Z24C.TCPPARMS(TTLS2)’ FLUSH PURGE

This says the TTLS stuff is in the member TTLS2.

You can have the entry without a file or dataset name.

TTLSConfig FLUSH PURGE

This says use the definition in the CommonTTLSConfig.

You need a TTLSConfig, statement, to get AT-TLS definitions configured in the LPAR.

How to update definitions

So I did not break “production” I created a second TCPIP stack (TCPIP2), and created a configuration within PAGENT for the TCPIP2 stack. (This seems a lot of work just to validate some definitions. I raised an RFE on this, but it was declined).

When I was happy with the definitions, I merged them with the the common/production ones.

When I defined a second TCPIP (TCPIP2), the configuration statements were only parsed, when TCPIP2 was started, and so PAGENT produced the error messages once TCPIP2 was active

PAGENT has started – what next?

Pagent operator commands

You can “modify” the PAGENT address space

For example

f pagent,loglevel,level=n
f pagent,trace,level=m
f pagent,debug,level=d
f pagent,query
f pagent,update

What is my configuration?

Once you have configured PAGENT you can use the Unix command

pasearch -c 1>a

to give output like

TCP/IP pasearch CS V2R4 Image Name: TCPIP1
Date: 05/23/2022 Time: 17:34:44
PAPI Version: 14 DLL Version: 14
TTLS Policy Object:
ConfigLocation: Local LDAPServer: False
CommonFileName: //'USER.Z24C.TCPPARMS(TTLSCOM)'
ImageFileName: 


TCP/IP pasearch CS V2R4 Image Name: TCPIP2
Date: 05/23/2022 Time: 17:34:44
PAPI Version: 14 DLL Version: 14
TTLS Policy Object:
ConfigLocation: Local LDAPServer: False
CommonFileName: //'USER.Z24C.TCPPARMS(TTLSCOM)'
ImageFileName: //'USER.Z24C.TCPPARMS(TTLS2)'

The command

pasearch -f CPJES2IN > a

gave the output for just the TTLSRule CPJES2IN.

The command

pasearch -p TCPIP2 1>a

gave the configuration for just the TCPIP stack TCPIP2, including

...
policyRule:             TLSCOM 
  Rule Type:            TTLS 
...
policyRule:             TLSCP3 
  Rule Type:            TTLS 
...
policyRule:             TLSCP4 
  Rule Type:            TTLS 
...

You get the definitions – but you do not know where they came from. I happen to know that TLSCOM comes from the common definition.

A definition can be in both Common and TCPIP Image files.

Instead of relying on PAGENT to report configuration errors I used the Unix command pasearch to display the configuration.

Display the configuration for a TCPIP image

Use the Unix command pasearch to display the configuration.

pasearch -p TCPIP2 >a

Display the object types configured to PAGENT

pasearch -c 1>a

TCP/IP pasearch CS V2R4 Image Name: TCPIP
Qos Policy Object:…
Ids Policy Object:…
IPSec Policy Object:…
IpFilter Policy Object:…
KeyExchange Policy Object:…
LocalDynVpn Policy Object:…
Routing Policy Object:…
TTLS Policy Object:…

TCP/IP pasearch CS V2R4 Image Name: TCPIP2
…

For example

TTLS Policy Object:
ConfigLocation: Local LDAPServer: False
CommonFileName: //'ADCD.Z24C.TCPPARMS(TLSPOLY1)'
ImageFileName: //'ADCD.Z24C.TCPPARMS(TLSPOLY1)'
ApplyFlush: True PolicyFlush: True
ApplyPurge: True PurgePolicies: True
AtomicParse: True DeleteOnNoflush: False
DummyOnEmptyPolicy: True ModifyOnIDChange: False
Configured: True UpdateInterval: 1800
TTLS Enabled: True
InstanceId: 1653375346
LastPolicyChanged: Tue May 24 07:55:46 2022

Overall

PAGENT feels like it not of the standard that I would expect z/OS products to have. For example, you cannot validate changes before making them live, and the changes are only validated when the TCPIP stack is active.

This means you are making unvalidated changes to your production system!

Configuring PAGENT for AT-TLS.

I covered the initial set up of PAGNET here.

What does TLS need?

When setting up TLS you need to make decisions, for example

Once you decide on the classification you need to decide which attributes are to be used for example

Is TLS to be used or not?
What levels of SSL and TLS will be supported?
Which keyring is to be used on the z/OS end?
Does the server need the client to authenticate and send its certificate?
Should there be any constraints on the TLS parameters, such as Cipher Spec, key size etc?
The preferred order of cipher specs to be used?
Any GSK specific parameters?
Should parameters be retrieved from LDAP?
Should OCSP be used to validate a certificate?

You can configure PAGENT to map sessions to TLS definitions, by giving rules and configuration data.

You need to create rules to match between the users, and the TLS configuration they get.

You can create rules based on

Input port numbers
Input IP addresses
Output port numbers
Output IP addresses
Jobnames (on z/OS)
Userids (on z/OS)

The starting point for the configuration is a TTLSRULE entry for example.

Some simple rules and associated definitions

TTLSRule TLS1414
{
   LocalPortRange 1414
   TTLSGroupActionRef GrpActOn2
   TTLSEnvironmentActionRef        TNCP3-GrpEnvAct  
}
TTLSRule TLSGRPA
{
   LocalPortRangeRef  MYPORTS
   TTLSGroupActionRef GrpActOn2
   TTLSEnvironmentActionRef        TNCP3-GrpEnvAct  
}
TTLSGroupAction                   GrpActOn2             
{                                                       
  TTLSEnabled                     On
}                                                       
PortRange MYPORTS
{
  Port 2141 2151
}
TTLSGroupAction
{
  TLSKeyringParms...
  Handshake...such as ServerWithClientAuth
  TLSCipherParms...
  Trace...
}

This example shows

You can have multiple rules – each with a unique name.
You can specify information inline – for example LocalPortRange 1414
You can point to a (shared) definition LocalPortRangeRef MYPORTS -> PortRange MYPORTS.
Every TTLSRULe needs a group action, and is pointed to by a TTLSGroupActionRef statement
A definition group has { } at the start of the line

If you use the pasearch -p TCPIP2 1>a unix command you get can display the configuration for the TCPIP instance, and get output like

policyRule:             TLS1414 
  Rule Type:            TTLS 
  Version:              3                 Status:            Active 
  Weight:               1                 ForLoadDist:       False 
  Priority:             1                 Sequence Actions:  Don't Care 
  No. Policy Action:    2 
  policyAction:         GA1 
   ActionType:          TTLS Group 
   Action Sequence:     0 
  policyAction:         TNCP3-GrpEnvAct 
   ActionType:          TTLS Environment 
   Action Sequence:     0 
  Time Periods: 
     ...
  TTLS Condition Summary:                 NegativeIndicator: Off 
   Local Address: 
    FromAddr:           All 
    ToAddr:             All 
   Remote Address: 
    FromAddr:           All 
    ToAddr:             All 
   LocalPortFrom:       1414              LocalPortTo:       1414 
   RemotePortFrom:      0                 RemotePortTo:      0 
   JobName:                               UserId: 
   ServiceDirection:    Inbound 
  Policy created: Tue May 24 11:01:04 2022 
  Policy updated: Tue May 24 11:01:04 2022
...

Within this output is

TTLS Action:                  GA1 
  Version:                    3 
  Status:                     Active 
  Scope:                      Group 
   TTLSEnabled:                On 
   CtraceClearText:            Off 
   Trace:                      2 
   FIPS140:                    Off 
   TTLSGroupAdvancedParms: 
    SecondaryMap:              Off 
    SyslogFacility:            Daemon 
   Policy created: Tue May 24 11:01:04 2022 
   Policy updated: Tue May 24 11:01:04 2022 

TTLS Action:                  TNCP3-GrpEnvAct 
  Version:                    3 
  Status:                     Active 
  Scope:                      Environment 
    HandshakeRole:              Server 
    SuiteBProfile:              Off 
    TTLSKeyringParms: 
     Keyring:                   TNCP4.TTLS 
...

Where

Scope: Group is for the TTLSGroupAction GA1 {} definition
Scope: Environment is for the TTLSEnvironmentAction {} definition
The keyring is INCP4.TTLS

Changing the configuration

If you change the configuration files you can use the F PAGNET,REFRESH to reprocess the configuration files. You can configure PAGNET to check to see if Unix files have been changed, and do an automatic refresh.

If you have a mistake with your definitions, then the new definitions are not activated. If you stop and restart PAGENT while the configuration has errors, then you will get no AT-TLS definitions!

Setting up syslogd on z/OS

The IBM documentation is not very clear, it tells you how to turn on debug, trace etc but does not clearly explain the difference, and when they are used.

It look like the PAGENT’s job is to take a configuration file, parse it, and pass the configuration data to TCPIP.

If you are using AT-TLS to set up TLS channels, the trace data comes from the TCPIP address space into syslogd.

Configure syslogd

See if SYSLOGD is running, if not, try to start it. If it does not exist…

Copy /usr/lpp/tcpip/samples/syslog.conf to its default configuration file /etc/syslog.conf, or another file.
Copy TCPIP.SEZAINST(SYSLOGD) to your proclib concatenation.
The program uses environment variables defined in STDENV to control operations. The default configuration file location is /etc/syslog.conf .

You can configure syslog.conf for example

*.TCPIP.*.* /var/log/%Y/%m/%d/TCPIP
*.SYSLOGD.*.* /var/log/%Y/%m/%d/syslogd
*.err /var/log/%Y/%m/%d/errors

This says all messages for SYSLOGD go to a file like /var/log/2022/05/14/syslogd, and error messages go to /var/log/2022/05/14/errors

This means you get a file of messages for each day. For me, I just used /var/log/syslogd.log and /var/log/errors.log, and deleted them periodically. My syslog.conf is

*.INETD*.*.*       /var/log/inetd 
auth.* /var/log/auth 
mail.* /var/log//mail -F 640 -D 770 
local1.err       /var/log/local1 
*.err            /var/log/errors 
*.CPAGENT.*.*       /var/log/CPAGENT 
*.TTLS*.*.*          /var/log/TTLS 
*.Pagent.*.*        /var/log/Pagent 
*.TCPIP.*.debug     /var/log/TCPIPdebug 
*.TCPIP.*.warning   /var/log/TCPIP 
*.TCPIP.*.err       /var/log/TCPIPerr 
*.TCPIP.*.info      /var/log/TCPIPinfo 
*.SYSLOGD*.*.*      /var/log/syslogd 
*.TN3270*.*.*       /var/log/tn3270 
*.SSHD*.*.*         /var/log/SSHD

The syntax is

userid.jobname.facility.priority …
facility.priority ….

Priority

Data logged to syslogd has a “priority”. For example AT-TLS trace level 32 (Data) have a priority of “debug”. You can use this for example

*.TCPIP.*.debug     /var/log/TCPIPdebug 
*.TCPIP.*.*         /var/log/TCPIP

This says

for messages from TCPIP with priority debug or higher( debug, info, notice, warning, error, crit, alert, emerg) then write the data to /var/log/TCPIPdebug .
Write all messages to /var/log/TCPIP

As debug is the lowest level of priority, these statements are effectively the same.

It may be better to have

*.TCPIP.*.debug     /var/log/TCPIPdebug 
*.TCPIP.*.warning   /var/log/TCPIP

The priority can be “none” which means do not log any messages.

How do I capture messages not handled else where?

This is a bit clumsy.

When a message arrives each of the rules are check. If the check is true the message is logged.

You can have compound checks separated by a semicolon for a rule.

For example

*.*.*.* ; *.TCPIP.*.none ; *.PAGENT.*.none /var/log/all.log

This says log all messages – but not from TCPIP or PAGENT. If you have have one file for 20 jobs, you need to have 20 statements with the semicolon.

You can spread the definition over several lines. The semicolon at the end of the line says read next line. The code is not smart enough to put the /var/log/all on its own line, so using a ‘dummy’ may make it easier to maintain.

*.*.*.*; 
       *.TCPIP.*.none; 
       *.PAGENT.*.none;
       *.DUMMY.*.none /var/log/all

Capturing the right data

If the config file you can specify options like

*.TCPIP.*.* /var/log/TCPIP
*.*.*. /var/log/all

The entries are Userid.Jobname.facility.priority.

PAGENT AT-TLS request are reported via TCPIP.

ISPF interface

There is an ISPF syslog browser tool which displays information about the logs, and helps you browse the logs of interest. The documentation for this is not very good.

I got this to work by experimentation. I created an exec like MYSYSLOG

/* Rexx */ 
address ispexec 
"LIBDEF ISPMLIB DATASET ID('TCPIP.SEZAMENU') STACK" 
"LIBDEF ISPPLIB DATASET ID('TCPIP.SEZAPENU') STACK" 
address tso "ALTLIB ACTIVATE APPLICATION(CLIST) 
              DATASET('TCPIP.SEZAEXEC') " 
"SELECT CMD(EZASYRGO) NEWPOOL PASSLIB NEWAPPL(EZAS)" 
address tso "ALTLIB DEACTIVATE APPLICATION(CLIST)" 
"LIBDEF ISPPLIB" 
"LIBDEF ISPMLIB"

You can execute this from ISPF option 6 or have this built into the ISPF panels.

Originally this exec was called syslogd; when I used it, I got

SYSTEM COMPLETION CODE=4C5 REASON CODE=77A53217

Where 4C5 is TCPIP’s abend code and 3217 – the program has the wrong Authrorization Code (APF related). This is because there is a command syslogd which was executed in preference to my exec. When I renamed the exec to MYSYSLOG it used the exec and it worked fine!

The first panel is

EZASYP01 ----------------- z/OS CS Syslogd Browser ---------------- Row 1 of 1
Command ===>                                                  Scroll ===> PAGE
                                                                               
Enter syslogd browser options                                                  
  Recall migrated data sets ==> NO     (Yes/No) Recall data sets or not        
  Maximum hits to display   ==> 200    (1-99999) Search results to display     
  Maximum file archives     ==> 30     (0-400) Days to look for file archives  
  Display start date/time   ==> YES    (Yes/No) Retrieve start date/time       
  Display active files only ==> NO     (Yes/No) Active files only, no archives 
  DSN Prefix override value ==>                                                
                                                                               
Enter file or data set name of syslogd configuration, or select one from below:
                                                                               
  File/DS Name ==> /etc/syslog.conf
                                                                               
Press ENTER to continue, or press END PF key to exit without a selection       
                                                                               
Line commands: S Select, R Remove from list, B Browse content, E Edit content  
                                                                               
Cmd Recently used syslogd configuration file or data set name                  
--- -------------------------------------------------------------------------- 
    /etc/syslog.conf                                                          
******************************* Bottom of data ********************************

Pressing enter, gave me another panel with

EZASYP00 ----------------- z/OS CS Syslogd Browser ---------------- Row 1 of 6
OPTION ===>                                                   Scroll ===> PAGE
                                                                               
Select one of the following, or press END PF key to exit the syslogd browser   
                                                                               
  1 Change current syslogd configuration file and/or options                   
  2 Guide me to a possible syslogd destination                                 
  3 Clear guide-me hits (indicated by ==> in the Cmd column)                   
  4 Search across all active syslogd files                                     
                                                                               
Current config file ==> /etc/syslog.conf                                      
                                                                               
Line commands: B Browse, A List archives, S Search active file and archives,   
               SF Search active file, SA Search archives, I File/DSN info      
                                                                    Archive    
Cmd Rule/Active UNIX file name                    Start Time        Type Avail.
--- --------------------------------------------- ----------------- ---- ------
    *.SYSLOGD*.*.*                                28 May 2022 13:31 None 0     
    /var/log/syslogd                                                          
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    *.INETD*.*.*                                  Empty       N/A   None 0     
    /var/log/inetd                                                            
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    auth.*                                        Empty       N/A   None 0     
    /var/log/auth

I could then browse the error log for SYSLOGD.

You can search for userid, strings etc, and give date ranges.

However for my small, one person usage, I found it was easier to use Unix services and use the command

oedit /var/syslogd.log

to edit the file.

Getting TCPIP syslogd working, and tracing PAGENT

You need to configure the syslogd procedure.

See if SYSLOGD is running, if not, try to start it. If it does not exist…

Copy /usr/lpp/tcpip/samples/syslog.conf to its default configuration file /etc/syslog.conf, or another file.
Copy TCPIP.SEZAINST(SYSLOGD) to your proclib concatenation.
The program uses environment variables defined in STDENV to control operations. The default configuration file location is /etc/syslog.conf

You can configure syslog.conf for example

*.SYSLOGD.*.* /var/log/%Y/%m/%d/syslogd
*.err /var/log/%Y/%m/%d/errors

This says all messages for SYSLOGD go to a file like /var/log/2022/05/14/syslogd, and error messages go to /var/log/2022/05/14/errors

This means you get a file of messages for each day. For me, I just used /var/log/syslogd.log and /var/log/errors.log, and deleted them periodically.

ISPF interface

There is an ISPF syslog browser tool which displays information about the logs, and helps you browse the logs of interest. The documentation for this is not very good.

I got this to work by experimentation. I created an exec like MYSYSLOG

/* Rexx */ 
address ispexec 
"LIBDEF ISPMLIB DATASET ID('TCPIP.SEZAMENU') STACK" 
"LIBDEF ISPPLIB DATASET ID('TCPIP.SEZAPENU') STACK" 
address tso "ALTLIB ACTIVATE APPLICATION(CLIST) 
              DATASET('TCPIP.SEZAEXEC') " 
"SELECT CMD(EZASYRGO) NEWPOOL PASSLIB NEWAPPL(EZAS)" 
address tso "ALTLIB DEACTIVATE APPLICATION(CLIST)" 
"LIBDEF ISPPLIB" 
"LIBDEF ISPMLIB"

You can execute this from ISPF option 6 or have this built into the ISPF panels.

Originally this exec was called syslogd; when I used it, I got

SYSTEM COMPLETION CODE=4C5 REASON CODE=77A53217

The first panel is

EZASYP01 ----------------- z/OS CS Syslogd Browser ---------------- Row 1 of 1
Command ===>                                                  Scroll ===> PAGE
                                                                               
Enter syslogd browser options                                                  
  Recall migrated data sets ==> NO     (Yes/No) Recall data sets or not        
  Maximum hits to display   ==> 200    (1-99999) Search results to display     
  Maximum file archives     ==> 30     (0-400) Days to look for file archives  
  Display start date/time   ==> YES    (Yes/No) Retrieve start date/time       
  Display active files only ==> NO     (Yes/No) Active files only, no archives 
  DSN Prefix override value ==>                                                
                                                                               
Enter file or data set name of syslogd configuration, or select one from below:
                                                                               
  File/DS Name ==> /etc/syslog.conf
                                                                               
Press ENTER to continue, or press END PF key to exit without a selection       
                                                                               
Line commands: S Select, R Remove from list, B Browse content, E Edit content  
                                                                               
Cmd Recently used syslogd configuration file or data set name                  
--- -------------------------------------------------------------------------- 
    /etc/syslog.conf                                                          
******************************* Bottom of data ********************************

Pressing enter, gave me another panel with

EZASYP00 ----------------- z/OS CS Syslogd Browser ---------------- Row 1 of 6
OPTION ===>                                                   Scroll ===> PAGE
                                                                               
Select one of the following, or press END PF key to exit the syslogd browser   
                                                                               
  1 Change current syslogd configuration file and/or options                   
  2 Guide me to a possible syslogd destination                                 
  3 Clear guide-me hits (indicated by ==> in the Cmd column)                   
  4 Search across all active syslogd files                                     
                                                                               
Current config file ==> /etc/syslog.conf                                      
                                                                               
Line commands: B Browse, A List archives, S Search active file and archives,   
               SF Search active file, SA Search archives, I File/DSN info      
                                                                    Archive    
Cmd Rule/Active UNIX file name                    Start Time        Type Avail.
--- --------------------------------------------- ----------------- ---- ------
    *.SYSLOGD*.*.*                                28 May 2022 13:31 None 0     
    /var/log/syslogd                                                          
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    *.INETD*.*.*                                  Empty       N/A   None 0     
    /var/log/inetd                                                            
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    auth.*                                        Empty       N/A   None 0     
    /var/log/auth

I could then browse the error log for SYSLOGD.

You can search for userid, strings etc, and give date ranges.

However for my small, one person usage, I found it was easier to use Unix services and use the command

oedit /var/syslogd.log

to edit the file.

Capturing the right data

If the config file you can specify options like

*.TCPIP.*.* /var/log/TCPIP
*.*.*. /var/log/all

The entries are Userid.Jobname.facility.priority.

PAGENT AT-TLS request are reported via TCPIP.

Why is this Linux slower to download than that one

I have a laptop which is my primary work station, and an under desk server for running my z/OS system on top of the same Linux.

Running “apt update” on the laptop was always faster on the laptop compared to the server. Was this because all traffic for the server was going through my laptop? How do I tell?

The boxes are connected with an Ethernet cable, I had to purchase a wireless dongle for my server, my laptop has a built in wireless adapter.

The linux ifconfig or the ip command gives information about the configuration. For example ip a

eno1: flags=4163 mtu 1500
    inet 10.1.0.3 netmask 255.255.255.0 broadcast 10.1.0.255
    inet6 fe80::.... prefixlen 64 scopeid 0x20
    ether 00:... txqueuelen 1000 (Ethernet)
    RX packets 5136 bytes 1445665 (1.4 MB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 4933 bytes 1692274 (1.6 MB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    device interrupt 17 memory 0xb1200000-b1220000  
...
wlxd037450ab7ac: flags=4163 mtu 1500
    inet 192.... netmask 255.255.255.0 broadcast 192....
    inet6 2a00:... prefixlen 64 scopeid 0x0
    inet6 fe80::... prefixlen 64 scopeid 0x20
    inet6 2a00:... prefixlen 64 scopeid 0x0
    ether d0:... txqueuelen 1000 (Ethernet)
    RX packets 42427 bytes 60919847 (60.9 MB)
    RX errors 0 dropped 1 overruns 0 frame 0
    TX packets 25996 bytes 2397812 (2.3 MB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

The EtherNet connection eno1 has received 5136 packets and 1.4 MB of data
The WireLess connection wlx037450ab7ac has received 42427 packets and 60.9 MB of data.

As I had just done an apt upgrade, the wireless had all of the traffic to download the files, so the traffic was not coming through my laptop.

Once the system was updated, the only traffic flowing was down the Ethernet cable as I used the server from my laptop.

Ping

A ping from each system gave a similar response time.

traceroute

traceroute shows you the hops to the destination.

For example

traceroute abc.xyz.com

To specify the interface you need to run as a superuser.

sudo traceroute abc.xyz.com -i wlxd037450ab7ac

gave

1 bthub.home (192....) 4.654 ms 38.438 ms 38.425 ms
2 * * *
3 * * *
4 31.55.185.184 (31.55.185.184) 75.897 ms 75.890 ms 75.861 ms

If you are not running as a superuser you will get:

setsockopt SO_BINDTODEVICE: Operation not permitted

What else is there to help me?

On z/OS the netstat command gives a lot of information about the session, for example the send window size, the receive windows size etc. This information tends not to be available on other platforms.

On linux there is the ss (Socket Statistics) command.

Example output from ss -t -i included

ESTAB 0 0 192.168.1.223:58212 192.0.78.12:
https
cubic the congestion algorithm name, the default congestion algorithm is “cubic”
wscale:9,7 if window scale option is used, this field shows the send scale factor and receive scale factor
rto:232
rtt:29.236/5.693
ato:40 mss:1452
pmtu:1500
rcvmss:880
advmss:1460
cwnd:10 congestion window size
bytes_sent:68335
bytes_acked:68336
bytes_received:16334
segs_out:276
segs_in:202
data_segs_out:151
data_segs_in:123
send 4.0Mbps number of bits sent/time of send.
lastsnd:376 how long time since the last packet sent, the unit is millisecond
lastrcv:348 how long time since the last packet received, the unit is millisecond
lastack:348 how long time since the last packet acknowledged, the unit is millisecond
pacing_rate 7.9Mbps
delivery_rate 2.6Mbps
delivered:152
app_limited busy:3700ms
rcv_space:14600
rcv_ssthresh:64076
minrtt:23.773

I did not find most of this information very useful. It is all to easy for a developer(I have done it myself) to provide statistics from information which is readily available, rather than ask what information would be useful to debug problems – then collect and publish that information.

My favourite TCP commands for z/OS

Console commands

d tcpip – show which TCPs are active
d tcpip,tcpip,netstat,home issue netstat home command to TCPIP job TCPIP
d tcpip,tcpip,help show all the console commands available for job TCPIP
d tcpip,,help show all the console commands available for default TCPIP
d tcpip,,netstat,home issue netstat home command to default (only) TCPIP stack
d tcpip,tcpip2,netstat,devlinks issuenetstat devlinks to TCPIP job 2
v tcpip,tcpip,syntaxcheck,USER.Z24A.TCPPARMS(TCIP) Check the syntax of commands
v tcpip,tcpip,obeyfile,USER.Z24A.TCPPARMS(TCIP) Issue the commands.

Display VIPA stuff

D TCPIP,TCPIP,NETSTAT,VIPACFG Displays the current dynamic VIPA configuration information
D TCPIP,TCPIP,N,VIPADYN Displays the current dynamic VIPA and VIPAROUTE information
D TCPIP,TCPIP,N,VIPADYN,DVIPA Displays the current dynamic VIPA information only.
D TCPIP,TCPIP,N,VIPADYN,VIPAROUTE Displays the current VIPAROUTE information only.
D TCPIP,TCPIP,N,VCRT Displays the dynamic VIPA Connection Routing Table information
D TCPIP,TCPIP,N,VDPT Displays the dynamic VIPA Destination Port Table information.

OSPF

Either D TCPIP,,OMP,command or F OMPOUTE,command. If you use S OMPROUTE.P1, you can use F P1,command

F OMP1,OSPF,areasum Display a summary of the configuration, number of links, number of interfaces (one line of output per area).
F OMP1,ospf,database,areaid=0.0.0.0 Display the routers.
F OMP1,ospf,neighbor Display the directly connected links.
F OMP1,OSPF, LSA, LSTYPE=2,LSID=… Display all links, by IP address from
F OMP1,OSPF, LSA, LSTYPE=1,LSID=… Display links for a specific ospf router.
Display the IP addresses in the network. Either use F OMP1,RTTABLE or for each router F OMP1,OSPF,LSA,LSTYPE=1,LSID=…. , LINK ID: is the IP address of the remote end, LINK DATA: is the IP address of the router’s end.
F OMP1,ospf,database,areaid=0.0.0.0 Display all of the routers in the network and extract “LS ORIGINATOR”
F OMP1,OSPF,LSA,LSTYPE=1,LSID=….Display the connection from an ospf router.
F OMPROUTE,OSPF,LIST,ALL
F OMPROUTE,OSPF,LIST,AREAS
F P1,OSPF,LIST,IFS display interfaces – but it does not tell you the interface name.
F P1,OSPF,LIST,NBRS list the NeighBoRS
F P1,OSPF,IF display the interface, one line per interface
F P1,OSPF,IF,NAME=name displays a lot of data including traffic statistics
F P1,OSPF,NBR display the neighbour
F P1,OSPF,STATISTICS
F P1,OSPF,ROUTERS The doc says
- Displays all routes to area-border routers and autonomous system boundary routers that have been calculated by OSPF and are currently present in the routing table.

Netstat

Netstat has two formats TSO and OMVS

TSO format is like NETSTAT CONN (can also be issued from operator console)
OMVS format is like netstat -c

There is a comparison table here

The omvs command is good for netstat -c > filename

TSO command

netstat conn Displays the information about each active TCP connection and UDP socket
netstat conn ( PORT 10443 who is using port 10443 Gives Foreign socket 10.1.0.2..48518 (see below)
netstat allconn Provides information for all TCP connections and UDP sockets, including recently closed ones.
netstat allconn ( ipport 10.1.0.2+48518 Show information about this socket coming in from 10.1.0.2 port 48518
netstat all ( ipport 10.1.0.2+48518 shows all information about the remote port.
netstat conn TCP TCPIP2 for TSO command for TCPIP stack TCPIP2
netstat home give the IP addresses this TCPIP stack uses. For example 10.1.1.2 amd 127.0.0.1
netstat Dev give stats about all interfaces
netstat home report dsn ‘colin.output’ issue the home command, and write the output to ‘colin.output’
netstat conn report hlq colin ( port 1414 the hlq says output the report with data set name colin.netstat.conn
netstat allconn (client csq9web show all current connections (ports in use), and recent ones for the job csq9web. netstat conn just shows active ones.

OMVS

netstat -c Displays the information about each active TCP connection and UDP socket
netstat -c -P 10443 who is using port 10443 Gives Foreign socket 10.1.0.2..48518 (see below)
netstat -a Provides information for all TCP connections and UDP sockets, including recently closed ones.
netstat -a -B 10.1.0.2+48518 Show information about this socket coming in from 10.1.0.2 port 48518
netstat -A -B 10.1.0.2+48518 shows all information about the remote port.
netstat -c -p TCPIP2 display all connections for TCPIP jobname TCPIP2
netstat -c -p TCPIP2 -P 10443 which jobs are using port 10443 – direct the request to TCPIP2 stack
netstat -h give the IP addresses this TCPIP stack uses. For example 10.1.1.2 amd 127.0.0.1
netstat -d give stats about all connections

Ping

TSO Ping address

USS ping address

Trace route

TSO TRACERTE address

USS traceroute address

DNS resolver

The DNS resolver maps WWW.xxx.yyy to an IP address, and an IP address to WWW.xxx.yyy.

F RESOLVER,DISPLAY to display the configuration
F RESOLVER,REFRESH,SETUP=ADCD.Z31B.TCPPARMS(GBLRESOL) to change the resolver to use the specified file. You can use SETUP=…. to specify a data set, member or Unix file.

Ubuntu commands

ip route display configured routes
ip route get 10.1.2.3 show the path to 10.1.2.3
sudo ip route add 10.1.1.10 dev tap0 define route to 10.1.1.10 via device tap0

One Minute MVS performance – TCP/IP

Question: In your car how do you tell if your car has a problem? Answer: You look at the dashboard and see if there is a red light showing. You may not know how to fix it – but you know that you need to get help to fix it.

The aim of this series of blog posts is to show you what to look for in z/OS performance and if you have a problem.

I will cover

What is a TCP/IP performance problem?

People complain about a TCP/IP performance problem when “it” seems slow. This could be caused by a variety of problems

Data between two ends is being discarded. This can occur on an unreliable, or overloaded component, whose default action is to throw away data, knowing it will be resent.
The time taken to get from one end to the other and back (“a ping”) is slow. This can be caused by slow or overloaded components.
There is a lot of data to send, for example a movie, or a web page with lots of javascript or graphics.
Or all of the above.

There is a quote “Never under estimate the bandwith with of a lorry full of tapes”. It might take 10 hours, but a truck 6 ft wide by 20 ft long could hold 300,000 1TB tapes and deliver 8 TBytes/second (with a round trip time of 20 hours). Which is more than the internet can provide!

You need to know

Are packets being thrown away? You see this from the number of packets which were resent.
What is the round trip time? (You could use ping – but you may not be able to)
Is data being sent efficiently – in big blocks?

TCP/IP concepts

With TCP/IP there is a connection between a sender and a receiver. The sender sends numbered packets of data to the receiver. The receiver sends an acknowledgement that a packet has been received.

The following is a representation of the flow

The sender sends packet 1
The sender sends packet 2
The sender sends packet 3
The receiver receives packet 1 and sends an acknowledgement for packet 1
The sender sends packet 4
The receiver receives packet 2 and sends an acknowledgement for packet 2
The sender waits until the acknowledgement of packet 1 has been received
The sender sends packet 5 and waits till the acknowledgement of packet2 has been received
etc

This way it is self limiting. It means the sender cannot send more than the receiver can handle.

If a packet goes missing, eventually the sender gets a time out, and resends it.

There are two parts to “performance”.

FTP like: How much data can be sent per second. This is of interest to FTP and MQ, where there is mainly a one way transmission of lots of data. The round trip time is not so critical if you can have a lot of data in transit.
Transactional: Send some data and wait for the remote end to respond, for example a web browser. The amount of data may be measured in KB, but the round trip time is important.

The term “window” is often used in TCP/IP.

The term “send window” on the sender side represents the total number of packets yet to be acknowledged by the receiver. With a bigger window, there is more data in the pipe line, and the throughput goes up. With a window of 1, one packet is sent and the sender waits for the acknowledgement before sending the next. With this, if there is a high latency, the overall throughput will be low.

More details

One of the factors that affects performance is the receive buffer size. If this was set to 4KB, it means that an application can read up to 4 KB of data at a time. This receive buffer size is sent to the sender, and basically says “send chunks up to this size – as that is all the receiver can take” – this sets the send-buffer-size.

The term Dynamic Right Sizing(DRS) allows the TCP receive buffer size to expand if the network conditions are favourable.

The term Outbound Right Sizing(ORS) allows the TCP send buffer size to expand if the network conditions are favourable.

Another term used is congestion window. If too much data is sent, or the network is unreliable, packets will get lost or thrown away. The congestion window is a measure of how much data can be in-flight. If packets get lost, the congestion window is made smaller. If packets are not lost, then it will try to increase the congestion window. This is a very rough indication of the quality of the network.

FTP like performance

There are several factors which can improve the throughput down a connection

Make packets bigger. In the early days of TCP/IP a typical packet was 256 bytes. These days a typical default packet size can be 64KB or more.
- One of the Smarts in the protocol is called dynamic right sizing, where TCP will send increasing larger packets until the receiver says “big enough”. The packet size can change with load.
How much data to send before waiting for the acknowledgement. For a reliable connection, where data is never lost, it is efficient to send a lot of data before waiting. This is called a large send window.
If the connection is unreliable, it may be more efficient to have only a small send window, before waiting for the acknowledgement.

Transactional work

Having big buffers may not improve throughput, for example with a web page, the data may all fit into 2KB. In this case having a buffer size of 16KB or 64 KB may make no difference to throughput or performance.
Typically if one packet contains all the data, then this will be acknowledge as soon as it arrives.
Some web pages with a lot of javascript or images, may require big buffers, and many packets.

How to see what is going on

You can use the well known “ping” command to send data to the remote end, and get the response. This gives a measure of the network time.

I found most of the data for looking at performance, is available from the netstat command. I found it useful to capture the output of the command in a file or data set.

What connections are connected to this server?

I use the netstat command in TSO , because my fingers are more used to it, and the command options are more memorable than the omvs command ( for example with omvs netstat, do I need the -a or -A option)

netstat conn (port 1414
netstat conn report hlq colin ( port 1414
netstat conn report dsn ‘colin.output’ ( port 1414

These all gave the same output. The report hlq colin creates a data set colin.netstat.conn. The data set name is from the hlq, ‘netstat’, and the subcommand. You can specify a data set name using the ‘dsn’ option.

For omvs you can use

netstat -c -p TCPIP -P 1414 > filename

That lists all of the connections for port 1414.

The command gave me

MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:18:34    
User Id  Conn     Local Socket           Foreign Socket         State    
-------  ----     ------------           --------------         -----    
CSQ9CHIN 00000023 10.1.1.2..1414         10.1.0.2..60538        Establsh 
CSQ9CHIN 00000022 0.0.0.0..1414          0.0.0.0..0             Listen

There is one connection established from 10.1.0.2 port 60538 to the server with the port listening on 1414.

The commands below give a lot of information about the connection

netstat all report hlq colin (ipport 10.1.0.2+60538
netstat -A -p TCPIP -B 10.1.0.2+60538 > all.port1

Output from the netstat command

The fields are described at the bottom of this page.

Both commands gave me the same output.

There is a lot of data. I’ve broken it into sections with comments after the interesting fields.

  MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:23:29 
  Client Name: CSQ9CHIN                 Client Id: 00000023 
  Local Socket: 10.1.1.2..1414          Foreign Socket: 10.1.0.2..60538 
  BytesIn:            0000002988        BytesOut:           0000002912 
  SegmentsIn:         0000000019        SegmentsOut:        0000000011

09:23:29 is the time when request was made. If you repeat the command you can get the interval between commands, and so calculate rates.
You get the client (job) name CSQ9CHIN.
The listener socket for the job (local socket) 10.1.1.2 with port 1414.
The foreign socket – the remote end of the connection. IP address 10.1.0.2 port 60538.
You can get the data rate If you repeat the command, calculate the deltas BytesIn and BytesOut, and divide by the time between measurement.

  StartDate:          06/16/2021        StartTime:          10:00:21 
  Last Touched:       10:20:37          State:              Establsh 
  RcvNxt:             2019327903        SndNxt:             0864946572 
  ClientRcvNxt:       2019327903        ClientSndNxt:       0864946572 
  InitRcvSeqNum:      2019324914        InitSndSeqNum:      0864943659 
  CongestionWindow:  0000018720        SlowStartThreshold: 0000065535

Look at the congestion window. Big is good. Small may indicate small amounts of data being sent or it may indicate network problems, either slow connections or packets are being dropped.

IncomingWindowNum:  2019458463        OutgoingWindowNum:  0865008524 
SndWl1:             2019327903        SndWl2:             0864946572 
SndWnd:             0000061952        MaxSndWnd:          0000064256

Check the send window. A small (1KB) send window can indicate poor configuration at the remote client, or only small amounts of data are being sent.

SndUna:             0864946572        rtt_seq:            0864946064 
MaximumSegmentSize: 0000001440        DSField:            00 
Round-trip information:
  Smooth trip time: 6.000              SmoothTripVariance: 12.000

Monitor the smooth route trip time (in milliseconds) this the local end to the remote end, and back. The variance gives a measure of the spread of response times. These are not strictly averages.

If you had a million requests taking 1 millisecond, and then had a long request taking 1000 milliseconds. The “Average” response time would change by a very small amount (to 1.09 milliseconds). The smoothed (or weighted average) may be something like – (99 * previous average + current value) /100. In this case the “average” goes up to 10.9 milliseconds, which is noticeable different.

ReXmt:              0000000000        ReXmtCount:         0000000000

The re transmits should be zero – or not changing. If this number increases it means the network has lost packets.

DupACKs:            0000000000        RcvWnd:             0000130560

The receive window is usually set to 2 * receive buffer.

SockOpt:            88                TcpTimer:           00

Check SockOpt. Check bit 0x08. If set this indicates “delayed acknowledgement disabled”. See Nagle algorithm. This value being set is good.

If this is not set, then sender can delay sending data for up to about 200 ms, and so combine data from different applications into the same packet for the same destination. This reduces network traffic as there are fewer packets, but it delays the data being sent.

TcpSig:             04                TcpSel:             40 
TcpDet:             E4                TcpPol:             00 
TcpPrf:            81                TcpPrf2:            20 
TcpPrf3:            00

For FTP type applications check the TCP Performance Flag TcpPrf. This says if Dynamic Right sizing (using bigger buffers) is enabled. The flag bits are x80 – enabled, x40 Active, x20 Active but disabled. X80 |X40 is good.

The TCP performance flag2 TcpPrf2. This is for outbound right sizing (ORS). A non zero value is good.

DelayAck:           Yes 
QOSPolicy:          No 
TTLSPolicy:         No 
RoutingPolicy:      No 
ReceiveBufferSize:  0000065536        SendBufferSize:     0000065536

These buffer sizes should be large with 64KB or larger, if so the system can dynamically increase them.

They can be configured at the TCP/IP level, or by the application. If they are 64KB or higher then TCP Dynamic Right Sizing can be used (adjust the buffers to match the load).

ReceiveDataQueued:  0000000000 
SendDataQueued:     0000000000

These should always be zero.

Received data queued means the application is slow to retrieve the data
Send data queued – the application has issued a send – but TCP/IP cannot process it.

SendStalled:        No 
Ancillary Input Queue: N/A

Send stalled should always be no.

What do you need to check?

SendStalled, ReceiveDataQueued,SendDataQueued should all be 0. They usually are 0. They would be non zero if there was a problem right now. If the problem gets better, these values would be 0.
Check ReXmt = The total number of times a packet has been retransmitted for this connection. This count is historical for the life of the connection.
- If this is zero then there have been no re transmits, and so no packets lost.
- If this is non zero, then it could be a historical problem. Wait and reissue the netstat command. If the ReXmt value has changed, this indicates packets are being lost.
Check the round trip time (and variance). Is the value what you expected? If there is traffic flowing on the connection, display the value multiple times, and see if there is significant variation.
Check ReceiveBufferSize and SendBufferSize. Values of 64KB or larger are good. Small is not good.
Check congestion window.

It is good to have some data for a normal day, and a problem day. For example if the packets are often lost, then this may not be the problem. If the SendBufferSize is only 8KB today and was 64KB last week – this would a good place to start looking. So capture and save NETSTAT reports for typical sessions.

What about connections into z/OS

Windows has a netstat command.

On Linux Netstat has been superseded with ss for example

ss –info dst 10.1.1.2
ss –info dst 10.1.1.2:1414
ss –info src 101.0.2
This is ss dash dash info …

gives similar information for connections going to 10.1.1.2, or the address and port 10.1.1.2:1414

Example netstat output from a slow FTP in connection

Client Name: IBMUSER                  Client Id: 000006FE 
Local Socket: 10.1.1.2..1109          Foreign Socket: 10.1.0.2..35508 
  BytesIn:            0220191104        BytesOut:           0000000000
  SegmentsIn:         0000152946        SegmentsOut:        0000083051
  StartDate:          06/28/2021        StartTime:          13:47:56 
  Last Touched:       14:24:28          State:              Establsh 
  RcvNxt:             3569682809        SndNxt:             2105824963
  ClientRcvNxt:       3569577977        ClientSndNxt:       2105824963
  InitRcvSeqNum:      3349491704        InitSndSeqNum:      2105824962
  CongestionWindow:   0000005760        SlowStartThreshold: 0000065535
  IncomingWindowNum:  3569946679        OutgoingWindowNum:  2105889219
  SndWl1:             3569681369        SndWl2:             2105824963
  SndWnd:             0000064256        MaxSndWnd:          0000064256
  SndUna:             2105824963        rtt_seq:            2105824962
  MaximumSegmentSize: 0000001440        DSField:            00 
  Round-trip information: 
    Smooth trip time: 3.000             SmoothTripVariance: 2.000 
  ReXmt:              0000000000        ReXmtCount:         0000000000
  DupACKs:            0000000000        RcvWnd:             0000263870 
  SockOpt:            A0                TcpTimer:           00 
  TcpSig:             04                TcpSel:             40 
  TcpDet:             E0                TcpPol:             00 
  TcpPrf:             E0                TcpPrf2:            28 
  TcpPrf3:            00 
  DelayAck:           Yes 
  QOSPolicy:          No 
  TTLSPolicy:         No 
  RoutingPolicy:      No 
  ReceiveBufferSize:  0000184351        SendBufferSize:     0000184320 
  ReceiveDataQueued:  0000104832 
    OldQDate:         06/28/2021        OldQTime:           14:24:27 
  SendDataQueued:     0000000000 
  SendStalled:        No 
  Ancillary Input Queue: N/A 
  Application Data:   EZAFTP0S D IBMUSER   C      FSSH

Comments

Congestion window low
Smooth trip time: 3.00 good
ReXmt: 0 good
Receive buffr 184351- good
Receive buffer queued 104832 – BAD

HOME sweet HOME – understanding TCP/IP home statements

I was just(!) trying to get my Liberty web server running on z/OS to be able to be moved to a different LPAR, and get it working. Moving it was easy, but the server’s certificate needs the IP address of the TCP/IP stack – with RACF you can only have one “Subject Alternative Name”. A SAN of IP:10.1.2.4 works fine when it comes from TCP/IP stack 10.1.2.4 – but not from TCP/IP stack 10.1.2.5. The web browser checks, and complains if they do not match.

To get this to work I read the z/OS TCP/IP documentation. There is lots of it, but it seems to be written for people who are experts in it. There is a saying “Question: how do you eat an elephant? Answer: A bit at a time”. This post takes a small bit area – and expands it in terms I understand. It may not be accurate – but the concepts should be right.

What is a TCP/IP stack?

This is another name of a TCP/IP instance, a started address space.

What is HOME?

Each connection coming into to a TCP/IP instance has an IP address. On TCPIP1 I have a connection (a virtual bit of wire) defined for IP address 10.1.1.2

On Linux if I use the command ip route get 10.1.1.2 it says

10.1.1.2 dev tap0 src 10.1.1.1 uid 1000

So 10.1.1.2 is going via device tap0 (which is a tunnel device, TAP = Tunnelling Application Protocol?). The Linux machine has IP address 10.1.1.1. Through some configuration magic this ends up in my TCP/IP instance as

DEVICE PORTA MPCIPA
LINK ETH1 IPAQENET PORTA
HOME 10.1.1.2 ETH1

Where 10.1.1.2 is the address for a link called ETH1 on the TCPIP instance on my LPAR. The magic is a bit like the Negro spirtual song Dem Bones which has “Thigh bone connected to the hip bone, Hip bone connected to the back bone, Back bone connected to the shoulder bone Now hear the word of the Lord”. ETH1 is defined as being on PORTA, and PORTA is a device which maps to a tunnelled device using protocol MPCIPA. … maps to a VTAM TRL definition, Now hear the word of the Lord.

I can use the TSO netstat command for TCP address space called TCPIP1 netstat home tcp tcpip1

Home address list:
Address       Link        Flg
-------       ----        ---
10.1.1.2      ETH1        P
192.168.0.61  ETH2
10.1.1.5      EZASAMEMVS
127.0.0.1     LOOPBACK

For external connections the IP address, 10.1.1.2, must match with the definition on Linux. This may be pointed to from outside Linux. The other “home” connections are described below.

I have multiple instances working together.

I have three TCP/IP instances on my LPAR. You might have an instance to talk to your internal network,(the intranet), and an instance talking to the internet, facing out from your enterprise.

You can also have TCP/IP instances with different security profiles, and provide total isolation.

You can set up connections between your enterprise and my enterprise, these definitions will need a DEVICE, and a LINK etc as above (or an INTERFACE definition).

Setting up multiple instances within a Sysplex, or within an LPAR.

In your TCP/IP definitions you can set up

IPCONFIG DYNAMICXCF 10.1.1.6 255.255.255.0 2

and another instance with 10.1.1.5 etc. The DYNAMICXCF says this is within the Sysplex(LPAR). The software is smart enough to generate the device and link statements automatically. In the netstat home command above, it gave 10.1.1.5 EZASAMEMVS which is eza_SAME_ MVS, and it has found a second TCP instance in the same LPAR. See here for dynamic XCF, and here for the IPCONFIG statement.

I think you can use any unused IP address range; so you could use 2.2.2.5 and 2.2.2.6 instead of 10.1.1.5 and 10.1.1.6. I believe these address are only used within the Sysplex. So as long as these addresses are consistent and not being used else where, the values are not critical.

For a TCP/IP instance TCPIP3 on my LPAR with no external connections netstat home tc tcip3 gave me

Home address list:
Address   Link       Flg
-------   ----       ---
10.1.1.7  EZASAMEMVS P
127.0.0.1 LOOPBACK

This has the ever present LOOPBACK, and a virtual connection 10.1.1.7 to a TCP/IP instance in the same LPAR because of the ezaSAMEmvs definition

A practical path to installing Liberty and z/OS Connect servers – 6 Enabling TLS

Introduction

I’ll cover the instructions to install z/OS Connect, but the instructions are similar for other products. The steps are to create the minimum server configuration and gradually add more function to it.

The steps below guide you through

Overview
planning to help you decide what you need to create, and what options you have to choose
initial customisation and creating a server, creating defaults and creating function specific configuration files, for example a file for SAF
starting the server
enable logon security and add SAF definitions
add keystores for TLS, and client authentication
adding an API and service application
protecting the API and service applications
collecting monitoring data including SMF
use the MQ sample
using WLM to classify a service

With each step there are instructions on how to check the work has been successful.

Configuring TLS

You can configure the server to creates a keystore file on its first use. This creates a self signed certificate. This is good enough to provide encryption of the traffic. Certificates sent from the client are ignored as the trust store does not have the Certificate Authority certificate to validate them.
You can use your site’s keystore and trust store. The server can use them to process certificate sent from the client for authentication.

Decide how you want to authenticate

Most of the functions require an https connection. This will require a keystore.

You can decide if

The server uses the client’s certificate for authentication,
1. if that does not work then use userid and password
2. if that does not work, then fail the request; there is no fall back to userid and password.
The server does not use the clients certificate.
1. You can configure that userid and password will used for authentication
2. There is no authentication

Have the server create a keystore.

You can get Liberty to create a keystore for you. This creates a self signed certificate and is used to encrypt the traffic between client and server. This is a good start, while you validate the set up, but is not a good long term solution.

Create keystore.xml with

<server>
<keyStore id="defaultKeyStore" password="${keystore_password}" /> 

<ssl clientAuthentication="false" 
    clientAuthenticationSupported="false" 
    keyStoreRef="defaultKeyStore" 
    id="defaultSSLSettings" 
    sslProtocol="TLSv1.2" 
/> 
</server>

Add to the bottom of the server.xml file

 <include location="${server.config.dir}/keystore.xml"/>

If you have keyStore id=”defaultKeyStore”, (it must be defaultKeyStore) and do not have a keystore defined, the the server will create the keystore in the default location (${server.output.dir}/resources/security/key.p12) with the password taken from the server.env file. See here.

Restart the server.

I got the messages

CWWKO0219I: TCP Channel defaultHttpEndpoint-ssl has been started 
and is now listening for requests on host 10.1.3.10  
(IPv4: 10.1.3.10) port 9443.

Showing TLS was active, and listening on the 9443 port.

If the keystore was created, you will get messages like

[AUDIT   ] CWPKI0803A: SSL certificate created in 87.578 seconds. 
SSL key file: /var/zosconnect/servers/d3/resources/security/key.p12 
[INFO    ] Successfully loaded default keystore: 
/var/zosconnect/servers/d3/resources/security/key.p12 of type: PKCS12

The certificate has a problem (a bug). It has been generated with CN:localhost, O:ibm: ou:d3 where d3 is the server name. The Subject Alternative Name (SAN) is DNS:localhost. It should have a SAN of the server’s IP address (10.3.1.10 in my case).

Clients check the SAN and compare it with the server’s IP address.

Chrome complain. “Your connection is not private NET:ERROR_CERT_AUTHORITY_INVALID”, and the option to accept it
Firefox gives “Warning: Potential Security Risk Ahead”, and the option to accept it.
Z/OS explorer gives a Server certificate alert pop up, saying “Host:10.1.3.10 does not match certificate:localhost” and gives two buttons Decline or Accept.
With curl I got SSL_ERROR_SYSCALL.

You can accept it, and use it until you have your own keystores set up. You can also reset this decision.

Using a RACF keyring as the keystore.

You can use a file based keystore or a RACF keying. Below are the definitions for my RACF keyrings. The started task userid is START1. The keystore (containing the private key for the server is keyring START1/KEY. The server should use key ZZZZ.

The trust store, containing the Certificate Authority certificates and any self signed certificates from clients, is START/TRUST.

The <ssl.. /> points to the different keystores, so it makes sense to keep all these definitions in one file. You may already have a file of these definitions which you can use from another Liberty server.

<server>

<sslDefault sslRef="defaultSSLSettings"/> 
<ssl clientAuthentication="true" 
    clientAuthenticationSupported="true" 
    id="defaultSSLSettings" keyStoreRef="racfKeyStore"  
    serverKeyAlias="ZZZZ" 
    sslProtocol="TLSv1.2" 
    trustStoreRef="racfTrustStore"/> 
                                                                                                                  
  <keyStore filebased="false" id="racfKeyStore" 
     location="safkeyring://START1/KEY" 
     password="password" 
     readOnly="true" 
     type="JCERACFKS"/> 
                                                                                                                  
  <keyStore filebased="false" id="racfTrustStore" 
     location="safkeyring://START1/TRUST" 
     password="password" 
     readOnly="true" 
     type="JCERACFKS"/>                                                                                                                  
</server>

This sets clientAuthentication=”true” and clientAuthenticationSupported=”true”

Specify if you want to use a client certificate for authentication

If you specify clientAuthenticationSupported=”true”… the server requests that a client sends a certificate. However, if the client does not have a certificate, or the certificate is not trusted by the server, the handshake might still succeed.

The default keystore will not be able to validate any certificates sent from the client. When connecting to Chrome with certificates set up, I got an FFDC and messages

[INFO ] FFDC1015I: An FFDC Incident has been created: “java.security.cert.CertPathBuilderException: PKIXCertPathBuilderImpl could not build a valid CertPath.; internal cause is: java.security.cert.CertPathValidatorException: The certificate issued by CN=SSCA8, OU=CA, O=SSS, C=GB is not trusted; internal cause is: java.security.cert.CertPathValidatorException:
[ERROR ] CWWKO0801E: Unable to initialize SSL connection. Unauthorized access was denied or security settings have expired.

If you specify clientAuthentication=”false” (the default) the server does not request that a client send a certificate during the handshake.

If you specify <webAppSecurity allowFailOverToBasicAuth=”true” /> the client certificate connection is not used or it fails,

if you specify<webAppSecurity allowFailOverToBasicAuth=”true” /> the user will be prompted for userid and password
If you specify <webAppSecurity allowFailOverToBasicAuth= false > or not specified, the connection will fail.

If a userid and password can be used, the first time a browser uses the server it will be prompted for userid and password. As part of the handshake, the LTPA2 cookie is sent from the server. This has the userid and password encrypted within it. If you close down the browser and restart it (not just restart it from within the browser) you will be prompted again for userid and password. You can also be prompted for userid and password once the LPTA cookie has expired.

If you are using z/OS explorer and get a code 401, unauthorised, you may be using a certificate credential ( format userid@CertificateAuthority(CommonName)) rather than a userid and password with format of just the userid eg COLIN. Use “Set Credentials” to change credentials.

You can see what userid is being used for the requests, from the …/logs/http_access.log file.

To make it even more complex you can have different keystores for different connections or ports. See here. But I would not try that just yet.

Map client certificates to a SAF userid

If you are using certificate authentication you will need to map the certificate to a userid using the RACDCERT MAP command.

Testing it

If the server starts successfully you can use a web browser with URL

  http:/10.1.3.10:9443/zosConnect/api-docs

and it should display json data.

If you get “Context Root Not Found” or code 404 you should wait and retry, as the https processing code is active, but the code to process the requests is not yet active.

Review the contents of …/servers/…/logs/http_access.log to see the request being issued and the http completion code.

If you have problems connecting clients over TLS add -Djavax.net.debug=ssl:handshake to the jvm.options file and restart the server.

If you connect to the z/OS Explorer, and logon to the z/OS Connect EE Server, you should have a folder for APIs and Services – which may have no elements.