Colin’s list of MQ messages

This blog post is my annotations to MQ messages, containing descriptions of what I did to get an MQ message, and what I did to fix the problem. It is meant for web search programs, rather than humans.

I will extend it as I experience problems.

AMQ7026E: A principal or group name was invalid.

I could display an auth entry

dspmqaut -m qml -t qmgr -g “cn=dynamic, o=Your Company”


Entity cn=dynamic, o=Your Company has the following authorizations for object qml:
connect

but not delete it

setmqaut -m qml -t qmgr -g “cn=dynamic,o=Your Company” -connect

AMQ7026E: A principal or group name was invalid.

I had set this up using LDAP, but MQ was not able to find the record in LDAP. This is because I had changed the configuration. There was an LDAP search

(&(objectClass=groupOfNames)(OU=cn=dynamic, o=Your Company))

It could not be found because there was no LDAP entry with objectClass=groupOfNames with an entry (OU=cn=dynamic, o=Your Company)

I had changed

DEFINE AUTHINFO(MYLDAP) +
AUTHTYPE(IDPWLDAP) +
GRPFIELD(sn)

to

GRPFIELD(ou)

So when MQ came to delete it, it looked for

(&(objectClass=groupOfNames)(OU=cn=dynamic, o=Your Company))

instead of

(&(objectClass=groupOfNames)(SN=cn=dynamic, o=Your Company))

and could not delete it.

I changed the LDAP group to add the OU, and the SETMQAUT command worked.

AMQ5532E: Error authorizing entity in LDAP

EXPLANATION:
The LDAP authorization service has failed in the ldap_first_entry call while
trying to find user or group ‘NULL’. Returned count is 0. Additional context is
‘cn=mqadmin,ou=groups,o=your Company’.

Colin’s comments

I had defined a dynamic group, by specifying the group name as part of the LDAP user entry.

When I changed the authinfo object yo have nestgrp(yes), I got the above message because there was no record for cn=mqadmin,ou=groups,o=your Company’.

Define the record with the appropriate object class as defined by the MQ AUTHINFO attribute CLASSGRP. (CLASSGRP(‘groupOfNames’) in my case).

AMQ5530E: Error from LDAP authentication and authorization service

EXPLANATION:
The LDAP authentication and authorization service has failed. The
‘ldap_ssl_environment_init’ call returned error 113 : ‘SSL initialization call
failed’. The context string is ‘keyfile=”/var/mqm/qmgrs/qml/ssl/key.kdb”
SSL/TLS rc=408 (ERROR BAD KEYFILE PASSWORD)’. Additional code is 0.

Colin’s comments.

I had changed the keyring using alter qmgr CERTLABL(ECRSA1024) SSLKEYR(‘/home/colinpaice/mq/zzserver’)

AMQ5530E: Error from LDAP authentication and authorization service

EXPLANATION:
The LDAP authentication and authorization service has failed. The
‘ldap_simple_bind’ call returned error 49 : ‘Invalid credentials’. The context
string is ‘10.1.1.2:389 ‘. Additional code is 0.

Colin’s comments

An anonymous logon to LDAP was attempted (LDAPUSER and LDAPPWD omitted) and the LDAP server had allowAnonymousBinds off.

Specify userid and password.


LDAP error messages and codes

This blog post is a repository for the LDAP error codes I experienced, and the actions I took to resolve the problems.

LDAP return codes

Messages include return codes like “3”, but the LDAP programming book has terms like “LDAP_PARAM_ERROR”.

These are defined in

/usr/include/ldap*.h,

SSL initialization failures reason codes.

https://www.ibm.com/docs/en/zos/2.5.0?topic=utilities-ssltls-information-ldap-client#idg18488__failrc

GLD1342E Unwilling to open file or directory ‘/var/ldap/schema’:

File or directory UID 0, UID of program 990023, GID of file or directory 1, GIDs of program (990018).

Colin’s comments

The LDAP started task expects to be the file owner of the /var/ldap/* files. On ADCD they were OMVSKERN:OMVSGRP. I used

chown -R gldsrv:gldgrp /var/ldap/*

to change the file owner.

Object class violation: additional info: R001026 No structural object class specified for ‘cn=ibmuser, o=Your Company’.

Colin’s comments

I had an ldif file with

dn: cn=mq, o=Your Company
changetype: add
objectclass: top
#objectclass: person
#objectclass: organizationalPerson
objectclass: ibm-nativeAuthentication
cn: mq
telephoneNumber: 1234567
telephoneNumber: 12345672
sn: Administrator

And no proper object class. When I uncommented person or organizationalPerson it worked.

R001030 Entry contains attribute ‘ibm-nativeid’ which is not allowed for object class

I was trying to add ‘ibm-nativeid’ to an entry. This attributed belongs to object class ibm-nativeAuthentication. The object has to have this object class, for example, add the lines in the bold font.

dn: cn=colin, o=Your Company
objectclass: top
objectclass: person
objectclass: organizationalPerson
objectclass: ibm-nativeAuthentication
cn: LDAP Administrator
sn: Administrator
ibm-nativeId: COLIN

Credentials are not valid: R004062 Credentials are not valid

By accident I overwrote my administration userid definition.

Colin’s comments

Edit GLD.CNFOUT(DSCONFIG) ( or what every config file you are using)

Comment out the adminDN, and add in the cn=admin and its password.

From

adminDN “cn=ibmuser, o=Your Company”
# adminDN “cn=Admin”
# adminPW secret

to

# adminDN “cn=ibmuser, o=Your Company”
adminDN “cn=Admin”
adminPW secret

  • Stop and restart LDAP.
  • Fix the userid
  • Change the admin definitions back
  • restart LDAP.

R003070 Access denied because user does not have ‘write’ permission for all modified attributes

I used a command like ldapmodify -a -h 127.0.0.1 -p 389 -D “cn=adcda, o=Your Company” -w adcdapw1 -f mqacl.* to change some definitions, but the userid cn=adcda,o=Your Company did not have the correct permissions.

You can enable the LDAP acl trace using f gldsrv,debug 128, and reset it using f gldsrv,debug 0

To change/add/delete an ACL the id needs restricted:rscw

For example

dn: o=Your Company
changetype: modify
replace: aclEntry
aclEntry : access-id:cn=ibmuser, o=Your Company:
object:ad:normal:grant:rscw:sensitive:rscw:critical:rscw
aclEntry : access-id:cn=adcda, o=Your Company:
object:ad:normal:rscw:sensitive:rscw:critical:rscw:restricted:rscw

Insufficient access: R003057 Access denied because user does not have ‘add’ permission for the parent entry

Trying to add an entry.

The userid is not authorised to add an entry. It needs an acl with object:ad ( a for add, d for delete)

dn: o=Your Company
changetype: modify
replace: aclEntry
aclEntry : access-id:cn=ibmuser, o=Your Company:
object:ad:normal:rscw:sensitive:rscw:critical:rscw
aclEntry : access-id:cn=adcda, o=Your Company:
object:ad:normal:rscw

GLD1063E Unable to initialize the SSL environment: 416 – Permission denied.

BPXM023I (GLDSRV) GLD1160E Unable to initialize the LDAP client SSL support: Error 113, Reason -17.

ICH408I USER(GLDSRV ) GROUP(GLDGRP ) IRR.DIGTCERT.LISTRING CL(FACILITY) INSUFFICIENT ACCESS AUTHORITY ACCESS INTENT(READ ) ACCESS ALLOWED(NONE )

In the LDAP config file I had sslKeyRingFile START1.MQRING. The userid GLDSRV did not have read access to the list ring facitity IRR.DIGTCERT.LISTRING CL(FACILITY)

permit IRR.DIGTCERT.LISTRING CL(FACILITY) id(GLDSRV)
SETROPTS RACLIST(FACILITY) REFRESH

GLD1160E Unable to initialize the LDAP client SSL support: Error 113, Reason 2.
GLD1063E Unable to initialize the SSL environment: 202 – Error detected while opening the certificate database.

  • reason code 2: Keyring open error
  • SSL return code 202: Keyring open error

Actions

  • Check value specified
  • Check access
    • rdefine rdatalib START1.MQRING.LST UACC(NONE)
    • SETROPTS RACLIST(RDATALIB) REFRESH
    • permit START1.MQRING.LST class(RDATALIB) ACCESS(READ) id(GLDSRV)

Check the keyring exists ( list the contents of it)

RACDCERT LISTRING(name) ID(COLIN)

Get out a gsk trace .

  • Add GSK_TRACE=0xff to the env file.
  • By default the output goes to gskssl.*.trc
  • Format it using gsktrace gskssl.*.trc gsktrace.out
  • oedit gsktrace.out search for ERROR. I had ERROR gsk_open_keyring(): IRRSDL00 GetData failed: SAF 8, RC 8, Reason 84.
  • These are documented here. 8 8 84 means keyring not found.

GLD1116E Unable to initialize an SSL connection …

533 – Remote partner indicates unsupported certificate.

LDAPSEARCH client on Linubbx

ldap_search_ext: Bad search filter (-7)

with -b “o=Your Company” “&(objectClass=*)”

remove the &()s

-b “o=Your Company” “objectClass=*”

worked

What they don’t tell you about using a REST interface.

After I stumbled on a change to my Python program which gave 10 times the throughput to a Web Server, I realised that I knew only a little about using REST. It is the difference between the knowledge to get a Proof Of Concept working, and the knowledge to run properly in production; it is the difference between one request a minute to 100 requests a second.

This blog post compares REST and traditional client server and suggests ways of using REST in production. The same arguments also apply to long running classical client server applications.

A REST request is a stateless, self contained request which you send to the back-end server, and get one response back. It is also known as a one shot request. Traditional client server applications can send several requests to the back-end as part of a unit of work.

In the table below I compare an extreme REST transaction, and an extreme traditional Client Server

AttributeRESTClient Server
ConnectionCreate a new connection for every request.Connect once, stay connected all day, reuse the session, disconnect at end of day.
Workload BalancingThe request can select from any available server, and so on average, requests will be spread across all connections. If a new server is added, then it will get used.The application connects to a server and stays connected. If the session ends and restarts, it may select a different server.
If a new server is added, it may not be used.
AuthenticationEach request needs authentication. If the userid is invalidated, the request will fail. Note that servers cache userid information, so it may take minutes before the request is
re-authenticated.
Authentication is done as part of the connection. If the userid is invalidated during the day, the application will carry on working until it restarts.
IdentificationBoth userid+password, and client certificate can be used to give the userid.Both userid+password, and client certificate can be used to give the userid. If you want to change which identity is used, you should disconnect and reconnect.
CostIt is very expensive to create a new connection. It is even more expensive when using TLS, because of the generation of the secret key. As a result it is very very expensive to use REST requests.The expensive create connection is done once, at start of day. Successive request do not have this overhead, so are much cheaper
Renew TLS session keyBecause there is only one transfer per connection you do not need to renew the encryption key.Using the same session key for a whole day is weak, as it makes it easier to break it. Renewing the session key after an amount of data has been processed, or after a time period is good practice.
RequestSome requests are suitable for packaging in one request, for example where just one server is involved.This can support more complex requests, for example DB2 on system A, and MQ on system B.
Number of connectionsThe connection is active only when it is used.The connection is active even though it has not been used for a long time. This can waste resources, and prevent other connections from being made to the server.
StatisticsYou get an SMF record for every request. Creating an SMF record costs CPU.You get one SMF record for collection of work, so reducing the overall costs. The worst case is one SMF record for the whole day.

What are good practices for using REST (and Client Server) in production?

Do not have a new connection for every request. Create a session which can be reused for perhaps 50 requests or ten minutes, depending on workload. This has the advantages :

  • You reduce the costs of creating the new connection for every request, by reusing the session.
  • You get workload balancing. With the connection ending and being recreated periodically, you will get the connections spread across all available connections. You should randomise the time a connection is active for, so you do not get a lot of time-out activity occurring at the same time
  • You get the re-authentication regularly.
  • The TLS key is renewed periodically.
  • You avoid the long running connections doing nothing.
  • For a REST request you may get fewer SMF records, for a Client-Server you get more SMF requests, and so more granular data.

How can I do this?

With Java you can open a connection, and have the client control how long it is open for.

With Python and the requests package, you can use

s = requests.Session()
res = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert)

res = s.get(geturl,headers=my_header,verify=v,cookies=jar,cert=cpcert)
etc

With Curl you can reuse the session.

Do I need to worry if my throughput is low?

No, If you are likely to have only one request to a server, and so cannot benefit from having multiple requests per connection you might just as well stay with a “one shot” and not use any of the tuning suggestions.

One Minute MVS performance – TCP/IP

Question: In your car how do you tell if your car has a problem? Answer: You look at the dashboard and see if there is a red light showing. You may not know how to fix it – but you know that you need to get help to fix it.

The aim of this series of blog posts is to show you what to look for in z/OS performance and if you have a problem.

I will cover

What is a TCP/IP performance problem?

People complain about a TCP/IP performance problem when “it” seems slow. This could be caused by a variety of problems

  • Data between two ends is being discarded. This can occur on an unreliable, or overloaded component, whose default action is to throw away data, knowing it will be resent.
  • The time taken to get from one end to the other and back (“a ping”) is slow. This can be caused by slow or overloaded components.
  • There is a lot of data to send, for example a movie, or a web page with lots of javascript or graphics.
  • Or all of the above.

There is a quote “Never under estimate the bandwith with of a lorry full of tapes”. It might take 10 hours, but a truck 6 ft wide by 20 ft long could hold 300,000 1TB tapes and deliver 8 TBytes/second (with a round trip time of 20 hours). Which is more than the internet can provide!

You need to know

  • Are packets being thrown away? You see this from the number of packets which were resent.
  • What is the round trip time? (You could use ping – but you may not be able to)
  • Is data being sent efficiently – in big blocks?

TCP/IP concepts

With TCP/IP there is a connection between a sender and a receiver. The sender sends numbered packets of data to the receiver. The receiver sends an acknowledgement that a packet has been received.

The following is a representation of the flow

  • The sender sends packet 1
  • The sender sends packet 2
  • The sender sends packet 3
  • The receiver receives packet 1 and sends an acknowledgement for packet 1
  • The sender sends packet 4
  • The receiver receives packet 2 and sends an acknowledgement for packet 2
  • The sender waits until the acknowledgement of packet 1 has been received
  • The sender sends packet 5 and waits till the acknowledgement of packet2 has been received
  • etc

This way it is self limiting. It means the sender cannot send more than the receiver can handle.

If a packet goes missing, eventually the sender gets a time out, and resends it.

There are two parts to “performance”.

  1. FTP like: How much data can be sent per second. This is of interest to FTP and MQ, where there is mainly a one way transmission of lots of data. The round trip time is not so critical if you can have a lot of data in transit.
  2. Transactional: Send some data and wait for the remote end to respond, for example a web browser. The amount of data may be measured in KB, but the round trip time is important.

The term “window” is often used in TCP/IP.

The term “send window” on the sender side represents the total number of packets yet to be acknowledged by the receiver. With a bigger window, there is more data in the pipe line, and the throughput goes up. With a window of 1, one packet is sent and the sender waits for the acknowledgement before sending the next. With this, if there is a high latency, the overall throughput will be low.

More details

One of the factors that affects performance is the receive buffer size. If this was set to 4KB, it means that an application can read up to 4 KB of data at a time. This receive buffer size is sent to the sender, and basically says “send chunks up to this size – as that is all the receiver can take” – this sets the send-buffer-size.

The term Dynamic Right Sizing(DRS) allows the TCP receive buffer size to expand if the network conditions are favourable.

The term Outbound Right Sizing(ORS) allows the TCP send buffer size to expand if the network conditions are favourable.

Another term used is congestion window. If too much data is sent, or the network is unreliable, packets will get lost or thrown away. The congestion window is a measure of how much data can be in-flight. If packets get lost, the congestion window is made smaller. If packets are not lost, then it will try to increase the congestion window. This is a very rough indication of the quality of the network.

FTP like performance

There are several factors which can improve the throughput down a connection

  • Make packets bigger. In the early days of TCP/IP a typical packet was 256 bytes. These days a typical default packet size can be 64KB or more.
    • One of the Smarts in the protocol is called dynamic right sizing, where TCP will send increasing larger packets until the receiver says “big enough”. The packet size can change with load.
  • How much data to send before waiting for the acknowledgement. For a reliable connection, where data is never lost, it is efficient to send a lot of data before waiting. This is called a large send window.
  • If the connection is unreliable, it may be more efficient to have only a small send window, before waiting for the acknowledgement.

Transactional work

  • Having big buffers may not improve throughput, for example with a web page, the data may all fit into 2KB. In this case having a buffer size of 16KB or 64 KB may make no difference to throughput or performance.
  • Typically if one packet contains all the data, then this will be acknowledge as soon as it arrives.
  • Some web pages with a lot of javascript or images, may require big buffers, and many packets.

How to see what is going on

You can use the well known “ping” command to send data to the remote end, and get the response. This gives a measure of the network time.

I found most of the data for looking at performance, is available from the netstat command. I found it useful to capture the output of the command in a file or data set.

What connections are connected to this server?

I use the netstat command in TSO , because my fingers are more used to it, and the command options are more memorable than the omvs command ( for example with omvs netstat, do I need the -a or -A option)

netstat conn (port 1414
netstat conn report hlq colin ( port 1414
netstat conn report dsn ‘colin.output’ ( port 1414

These all gave the same output. The report hlq colin creates a data set colin.netstat.conn. The data set name is from the hlq, ‘netstat’, and the subcommand. You can specify a data set name using the ‘dsn’ option.

For omvs you can use

netstat -c -p TCPIP -P 1414 > filename

That lists all of the connections for port 1414.

The command gave me

MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:18:34    
User Id  Conn     Local Socket           Foreign Socket         State    
-------  ----     ------------           --------------         -----    
CSQ9CHIN 00000023 10.1.1.2..1414         10.1.0.2..60538        Establsh 
CSQ9CHIN 00000022 0.0.0.0..1414          0.0.0.0..0             Listen   

There is one connection established from 10.1.0.2 port 60538 to the server with the port listening on 1414.

The commands below give a lot of information about the connection

netstat all report hlq colin (ipport 10.1.0.2+60538
netstat -A -p TCPIP -B 10.1.0.2+60538 > all.port1

Output from the netstat command

The fields are described at the bottom of this page.

Both commands gave me the same output.

There is a lot of data. I’ve broken it into sections with comments after the interesting fields.

  MVS TCP/IP NETSTAT CS V2R4       TCPIP Name: TCPIP           09:23:29 
  Client Name: CSQ9CHIN                 Client Id: 00000023 
  Local Socket: 10.1.1.2..1414          Foreign Socket: 10.1.0.2..60538 
  BytesIn:            0000002988        BytesOut:           0000002912 
  SegmentsIn:         0000000019        SegmentsOut:        0000000011   
  
  • 09:23:29 is the time when request was made. If you repeat the command you can get the interval between commands, and so calculate rates.
  • You get the client (job) name CSQ9CHIN.
  • The listener socket for the job (local socket) 10.1.1.2 with port 1414.
  • The foreign socket – the remote end of the connection. IP address 10.1.0.2 port 60538.
  • You can get the data rate If you repeat the command, calculate the deltas BytesIn and BytesOut, and divide by the time between measurement.
  StartDate:          06/16/2021        StartTime:          10:00:21 
  Last Touched:       10:20:37          State:              Establsh 
  RcvNxt:             2019327903        SndNxt:             0864946572 
  ClientRcvNxt:       2019327903        ClientSndNxt:       0864946572 
  InitRcvSeqNum:      2019324914        InitSndSeqNum:      0864943659 
  CongestionWindow:  0000018720        SlowStartThreshold: 0000065535 
  

Look at the congestion window. Big is good. Small may indicate small amounts of data being sent or it may indicate network problems, either slow connections or packets are being dropped.

  IncomingWindowNum:  2019458463        OutgoingWindowNum:  0865008524 
  SndWl1:             2019327903        SndWl2:             0864946572 
  SndWnd:             0000061952        MaxSndWnd:          0000064256 
  

Check the send window. A small (1KB) send window can indicate poor configuration at the remote client, or only small amounts of data are being sent.

  SndUna:             0864946572        rtt_seq:            0864946064 
  MaximumSegmentSize: 0000001440        DSField:            00 
  Round-trip information:
    Smooth trip time: 6.000              SmoothTripVariance: 12.000 
  

Monitor the smooth route trip time (in milliseconds) this the local end to the remote end, and back. The variance gives a measure of the spread of response times. These are not strictly averages.

If you had a million requests taking 1 millisecond, and then had a long request taking 1000 milliseconds. The “Average” response time would change by a very small amount (to 1.09 milliseconds). The smoothed (or weighted average) may be something like – (99 * previous average + current value) /100. In this case the “average” goes up to 10.9 milliseconds, which is noticeable different.

  ReXmt:              0000000000        ReXmtCount:         0000000000 

The re transmits should be zero – or not changing. If this number increases it means the network has lost packets.

  DupACKs:            0000000000        RcvWnd:             0000130560  

The receive window is usually set to 2 * receive buffer.

   SockOpt:            88                TcpTimer:           00   

Check SockOpt. Check bit 0x08. If set this indicates “delayed acknowledgement disabled”. See Nagle algorithm. This value being set is good.

If this is not set, then sender can delay sending data for up to about 200 ms, and so combine data from different applications into the same packet for the same destination. This reduces network traffic as there are fewer packets, but it delays the data being sent.

  TcpSig:             04                TcpSel:             40 
  TcpDet:             E4                TcpPol:             00 
  TcpPrf:            81                TcpPrf2:            20 
  TcpPrf3:            00

For FTP type applications check the TCP Performance Flag TcpPrf. This says if Dynamic Right sizing (using bigger buffers) is enabled. The flag bits are x80 – enabled, x40 Active, x20 Active but disabled. X80 |X40 is good.

The TCP performance flag2 TcpPrf2. This is for outbound right sizing (ORS). A non zero value is good.

  DelayAck:           Yes 
  QOSPolicy:          No 
  TTLSPolicy:         No 
  RoutingPolicy:      No 
  ReceiveBufferSize:  0000065536        SendBufferSize:     0000065536  

These buffer sizes should be large with 64KB or larger, if so the system can dynamically increase them.

They can be configured at the TCP/IP level, or by the application. If they are 64KB or higher then TCP Dynamic Right Sizing can be used (adjust the buffers to match the load).

  ReceiveDataQueued:  0000000000 
  SendDataQueued:     0000000000  

These should always be zero.

  • Received data queued means the application is slow to retrieve the data
  • Send data queued – the application has issued a send – but TCP/IP cannot process it.
  SendStalled:        No 
  Ancillary Input Queue: N/A  

Send stalled should always be no.

What do you need to check?

  • SendStalled, ReceiveDataQueued,SendDataQueued should all be 0. They usually are 0. They would be non zero if there was a problem right now. If the problem gets better, these values would be 0.
  • Check ReXmt = The total number of times a packet has been retransmitted for this connection. This count is historical for the life of the connection.
    • If this is zero then there have been no re transmits, and so no packets lost.
    • If this is non zero, then it could be a historical problem. Wait and reissue the netstat command. If the ReXmt value has changed, this indicates packets are being lost.
  • Check the round trip time (and variance). Is the value what you expected? If there is traffic flowing on the connection, display the value multiple times, and see if there is significant variation.
  • Check ReceiveBufferSize and SendBufferSize. Values of 64KB or larger are good. Small is not good.
  • Check congestion window.

It is good to have some data for a normal day, and a problem day. For example if the packets are often lost, then this may not be the problem. If the SendBufferSize is only 8KB today and was 64KB last week – this would a good place to start looking. So capture and save NETSTAT reports for typical sessions.

What about connections into z/OS

Windows has a netstat command.

On Linux Netstat has been superseded with ss for example

ss –info dst 10.1.1.2
ss –info dst 10.1.1.2:1414
ss –info src 101.0.2

This is ss dash dash info …

gives similar information for connections going to 10.1.1.2, or the address and port 10.1.1.2:1414

Example netstat output from a slow FTP in connection

Client Name: IBMUSER                  Client Id: 000006FE 
Local Socket: 10.1.1.2..1109          Foreign Socket: 10.1.0.2..35508 
  BytesIn:            0220191104        BytesOut:           0000000000
  SegmentsIn:         0000152946        SegmentsOut:        0000083051
  StartDate:          06/28/2021        StartTime:          13:47:56 
  Last Touched:       14:24:28          State:              Establsh 
  RcvNxt:             3569682809        SndNxt:             2105824963
  ClientRcvNxt:       3569577977        ClientSndNxt:       2105824963
  InitRcvSeqNum:      3349491704        InitSndSeqNum:      2105824962
  CongestionWindow:   0000005760        SlowStartThreshold: 0000065535
  IncomingWindowNum:  3569946679        OutgoingWindowNum:  2105889219
  SndWl1:             3569681369        SndWl2:             2105824963
  SndWnd:             0000064256        MaxSndWnd:          0000064256
  SndUna:             2105824963        rtt_seq:            2105824962
  MaximumSegmentSize: 0000001440        DSField:            00 
  Round-trip information: 
    Smooth trip time: 3.000             SmoothTripVariance: 2.000 
  ReXmt:              0000000000        ReXmtCount:         0000000000
  DupACKs:            0000000000        RcvWnd:             0000263870 
  SockOpt:            A0                TcpTimer:           00 
  TcpSig:             04                TcpSel:             40 
  TcpDet:             E0                TcpPol:             00 
  TcpPrf:             E0                TcpPrf2:            28 
  TcpPrf3:            00 
  DelayAck:           Yes 
  QOSPolicy:          No 
  TTLSPolicy:         No 
  RoutingPolicy:      No 
  ReceiveBufferSize:  0000184351        SendBufferSize:     0000184320 
  ReceiveDataQueued:  0000104832 
    OldQDate:         06/28/2021        OldQTime:           14:24:27 
  SendDataQueued:     0000000000 
  SendStalled:        No 
  Ancillary Input Queue: N/A 
  Application Data:   EZAFTP0S D IBMUSER   C      FSSH 

Comments

  • Congestion window low
  • Smooth trip time: 3.00 good
  • ReXmt: 0 good
  • Receive buffr 184351- good
  • Receive buffer queued 104832 – BAD

How to move a queue from one page set to another page set on z/OS?

I was asked this excellent question, and a quick search in the documentation showed there is a section in the documentation How to balance loads on page sets. Great – this worked a treat – for user queues, but there are a few additional things you need to consider. You also need to be careful when moving system queues.

Moving application queues.

Once you have moved the queues, you should backup the definitions so if you have to recreate the queue manager, you have a copy of the correct definitions you can use.

You need to update your central repository with the new storage class, and the updated definition for the queues and storage class. This is for when you deploy a new queue manager, it picks up the correct definitions.

Moving system queues.

This is the same as for application queues, but you have to do more.


Many people use the CSQINP1 and CSQINP2 data sets provided by the queue manager so they are executed at startup. This is what happens if you use the default QMGR JCL. If you move the SYSTEM.* queues you will need to make a copy of the datasets, make changes to the data sets, so the objects have the correct storage class, and then change the queue manager job to point to the new data sets. Alternatively create a file with the DEFINE QL.. objects you have changed, and have this member first in the list. This file would be executed first. If the objects do not exist, they would be created.

Note: If definitions have DEFINE … REPLACE the definition will override any existing definition.

SYSTEM.COMMAND.INPUT queue

You will not be able to move SYSTEM.COMMAND.INPUT using commands in CSQUTIL, as the command processor reads from this queue. You need to

+cpf alter ql(SYSTEM.COMMAND.INPUT) get(disabled) put(disabled)

This will stop the command processor.

Use commands from the operator console to move it to the new page set

Once you have moved the queue use

+cpf start cmdserv

to restart the command server.

If you want to use SYSTEM.* objects used by the CHINIT you will need to stop the CHINIT for the duration of the moves.

Other system queues

You should also review any model queues, for example SYSTEM.CLUSTER.TRANSMIT.MODEL.QUEUE, and SYSTEM.COMMAND.REPLY.MODEL, so any future queues are created on the correct page set.

Someone pointed out that they were not able to move SYSTEM.PENDING.DATA.QUEUE because a system thread had it open. The altered the queue to get(disabled), the system thread closed the queue, and so they were able to move it.

When to do this?

You may want to schedule an outage while moving queues around, especially SYSTEM.* queues. The moves should be very quick (unless you have deep queues). The other tasks may take longer to do.

MQ Context on z/OS

Having struggled to get MQ Context working on mid range MQ, I thought I would try the same on z/OS.

If you want to allow applications to set Putdate, Putime, PutApplName etc. The application needs access to MQ Context. MQ MCA channels use this when putting a message from a remote queue manager, to keep the original values.

Which profiles are used?

You can disable context checking by defining a profile ‘qmgr.NO.CONTEXT.CHECKS’. If you want to enable context checking remove this profile if it exists.

You can display it using

RLIST MQADMIN CSQ9.NO.CONTEXT.CHECKS

You configure queue context using the profile qmrg.context.queue

for example

RLIST MQADMIN CSQ9.CONTEXT.CP0000 all
CLASS NAME
----- ----
MQADMIN CSQ9.CONTEXT.** (G)
...
LEVEL  OWNER      UNIVERSAL ACCESS  YOUR ACCESS  WARNING
-----  --------   ----------------  -----------  -------
 00    IBMUSER          NONE             ALTER    NO
...
USER ACCESS
---- ------
IBMUSER ALTER

This says that for the queue CP0000, display the profile CSQ9.CONTEXT.CP0000. It returned

  • MQADMIN CSQ9.CONTEXT.** this is the profile used
  • IBMUSER ALTER the only user authorised to this resource – with ALTER access it IBMUSER
  • The default access is NONE.

When a userid tried to open the queue – with set context options, the open got return code 2035 and a message on the console.

ICH408I USER(COLIN ) GROUP(SYS1 ) NAME(COLIN PAICE )
CSQ9.CONTEXT.CP0000 CL(MQADMIN )
INSUFFICIENT ACCESS AUTHORITY
FROM CSQ9.CONTEXT.** (G)
ACCESS INTENT(CONTROL) ACCESS ALLOWED(NONE )

This shows the resource used CSQ9.CONTEXT.CP0000. The RACF profile used was CSQ9.CONTEXT.**. The userid had NONE access, and wanted CONTROL access.

You could define a more specific profile for example CSQ9.CONTEXT.CP*, and that would be used in preference to the CSQ9.CONTEXT.** profile.

The z/OS documentation Determining RACF protection says

Although multiple generic profiles can match a general resource name, only the most specific profile
actually protects it. For example, AB.CD, AB.CD.* and AB.**.CD all match the general resource name AB.CD, but AB.CD.* protects the resource.

With Midrange MQ on Unix, the permission is taken from all of the groups the userid is in- if one of the userid’s groups has get authority, the userid has get authority. With z/OS just one profile is used.

Changing a profile – don’t forget to refresh.

When changing a profile you need to remember to refresh the RACF in memory profiles, and tell MQ to pick up the changes.

I changed a profile

ralter MQADMIN CSQ9.CONTEXT.** UACC(CONTROL)

Refreshed the RACF in-memory profiles

setropts racflist(MQADMIN) refresh

And told MQ to refresh its profiles

%csq9 refresh security

How easy is it to display security information for MQ on z/OS?

I asked this question for midrange, and here are the answers for z/OS.

Key question are

Displaying security information

The RACF commands are

RLIST to display profile information, and who has access to the profile

SEARCH This allows you to search for profiles matching a parameter.

RLIST display profile information

An example command and output

RLIST MQADMIN CSQ9.CONTEXT.CP0000 all
CLASS NAME
----- ----
MQADMIN CSQ9.CONTEXT.** (G)
...
LEVEL  OWNER      UNIVERSAL ACCESS  YOUR ACCESS  WARNING
-----  --------   ----------------  -----------  -------
 00    IBMUSER          NONE             ALTER    NO
...
USER ACCESS
---- ------
IBMUSER ALTER

This says display the profile CSQ9.CONTEXT.CP0000. It returned

  • MQADMIN CSQ9.CONTEXT.** this is the profile used by RACF to determine the permissions
  • IBMUSER ALTER the only user authorised to this resource is IBMUSER, with ALTER access
  • The default access for any userid not covered is NONE.

SEARCH for a profile

An example command to list all MQQueue profiles for queue manager CSQ9.

SEARCH CLASS(MQQUEUE) FILTER(CSQ9.*)
CSQ9.AMSQ
CSQ9.NONE
CSQ9.ZZZZ
CSQ9.** (G)

Is a user authorised to use this queue?

Use RLIST to tell you the profile used for checking

  • Check the Universal Access
  • Check to see if the userid in the list
  • Check the groups in the list and see if the userid is a member of the group.

Which profile gave what access to the queue

Use the RLIST MQQUEUE qmgr.queueName.

Who is authorised to this queue

Use the rlist command as described above. You may have to write a script to post process the data, and replace the group name with the member of the group. I used the Rexx interface IRRXUTIL and wrote about 100 lines of code to do this. Please contact me if you are interested in this.

Can I audit the list of people and their access to queues beginning with CP?

Not easily.

The command

SEARCH CLASS(MQQUEUE) FILTER(CSQ9.CP*)

gives ICH31005I NO ENTRIES MEET SEARCH CRITERIA

The command

SEARCH CLASS(MQQUEUE) FILTER(CSQ9.A*)

Gives one queue (CSQ9.AMSQ). It does not list the default CSQ9.** for any other queues

You would have to issue the MQ command to get a list of queues, the parse the list, and pass the queue name to the RLIST command, and collect the set of userids and groups. Finally, change any groups to the list of members of the group.

I used the Rexx interface IRRXUTIL and wrote about 100 lines of code to do this. Please contact me if you are interested in this.

Where do the security violations go for MQ on z/OS?

This question came in from a customer who was reviewing the subsystem security on z/OS. For example CICS reports its own violations.

MQ security violations are reported by the security manager, RACF, and are displayed on the job log.

MQ delegates the security checks to RACF, so auditing is mostly done by RACF. The only exception is the RESLEVEL profile, which MQ writes its own audit records to RACF.

See a section in the IBM documentation.

For example, userid COLIN is not authorised to issue MQ commands, so there are messages on the job log.

%CSQ9 START CHINIT
ICH408I USER(COLIN ) GROUP(SYS1 ) NAME(COLIN PAICE )
CSQ9.START.CHINIT CL(MQCMDS )
INSUFFICIENT ACCESS AUTHORITY
FROM CSQ9.** (G)
ACCESS INTENT(CONTROL) ACCESS ALLOWED(NONE )

%CSQ9 DEF QL(AAAA)
ICH408I USER(COLIN ) GROUP(SYS1 ) NAME(COLIN PAICE )
CSQ9.DEFINE.QLOCAL CL(MQCMDS )
INSUFFICIENT ACCESS AUTHORITY
FROM CSQ9.** (G)
ACCESS INTENT(ALTER ) ACCESS ALLOWED(NONE )

Trying to use a queue

ICH408I USER(COLIN ) GROUP(SYS1 ) NAME(COLIN PAICE )
CSQ9.ZZZZ CL(MQQUEUE )
INSUFFICIENT ACCESS AUTHORITY
ACCESS INTENT(UPDATE ) ACCESS ALLOWED(NONE )

The queue had been define with AUDITING FAILURES(READ)

Another queue had been defined with NOTIFY(COLIN). This means that whenever there was a violation, userid COLIN got a message sent to its TSO session.

RACF reports violations and audit information to SMF. You can use standard RACF facilities, such as RACF report writer, to process the SMF data.

Using RACF report writer

This RACFRW command is documented in the Z/OS Security Server RACF Auditors Guide. (Note this is deprecated, but the replacement seems to leave it to the user to do all the summarising etc.)

//SMFDUMP EXEC PGM=IFASMFDP,REGION=0M
//SYSPRINT DD SYSOUT=A
//ADUPRINT DD SYSOUT=A
//OUTDD DD DISP=(MOD,CATLG),DSN=IBMUSER.SMF,
// SPACE=(CYL,(5,5)),
// DCB=(BLKSIZE=13000,RECFM=VB)
//SMFDATA DD DISP=SHR,DSN=SYS1.S0W1.MAN1
//SMFDATB DD DISP=SHR,DSN=SYS1.S0W1.MAN2
//SMFOUT DD DISP=(NEW,PASS,DELETE),SPACE=(CYL,(10,1)),
// DSN=&&SMFOUT
//SYSIN DD *
  INDD(SMFDATA,OPTIONS(DUMP))
  INDD(SMFDATB,OPTIONS(DUMP))
  OUTDD(SMFOUT,TYPE(020,030,080,081,083))
  DATE(2020221,2022221)
  START(0000)
  ABEND(NORETRY)
  USER2(IRRADU00) 
  USER3(IRRADU86) 
/* 
//S1  EXEC PGM=IKJEFT01,REGION=0M 
//SYSPRINT DD SYSOUT=* 
//SORTWK01 DD  DISP=(NEW,PASS,DELETE),SPACE=(CYL,(10,1)), 
//             DSN=&&SORT1 
//SYSTSPRT DD SYSOUT=* 
//RSMFIN  DD DISP=(SHR,DELETE),DSN=*.SMFDUMP.SMFOUT 
//SYSTSIN DD * 
  RACFRW TITLE('RACF REPORTS') GENSUM 
  SELECT VIOLATIONS 
  SUMMARY RESOURCE BY(USER) 
  END 
/* 

The report gave


USER/                                               -------- I N T E N T S--------           
    *JOB                  SUCCESS WARNING VIOLATION ALTER CONTROL UPDATE READ TOTAL 

MQCMDS =+CSQ9.REFRESH.SECURITY                                                                                    
    COLIN      COLIN PAICE      0       0         1     1       0      0    0     1 

MQQUEUE=CSQ9.ZZZZ 
    ADCDC      ADCDC            0       0         1     0       0      1    0     1 
    COLIN      COLIN PAICE      0       0         6     0       0      6    0     6

From this we can see userid COLIN (with owner’s name COLIN PAICE) had 6 violations trying to get UPDATE access to the queue(MQQUEUE) ZZZZ in queue manager CSQ9.

The userid COLIN also tried to use the REFRESH SECURITY command. The + in +CSQ9, means that a generic profile was used. There was one violation, needing ALTER access.

Auditing successes

When the queue had AUDITING ALL(READ) it wrote a record for all accesses to the queue – success or failure.

using

//SYSTSIN DD *
RACFRW TITLE('RACF REPORTS') GENSUM
SUMMARY RESOURCE BY(USER)
END
/*

and no Select statement, it reported all records. I had an application which opened a queue for output, put a message to it, opened the queue for input, got the message. The output of RACFRW had

USER/                                               -------- I N T E N T S--------           
    *JOB            SUCCESS WARNING VIOLATION ALTER CONTROL UPDATE READ TOTAL                                                                     
MQADMIN = 
    COLIN   COLIN PAICE    8     0         0     0       0      0    0      8
MQQUEUE = CSQ9.ZZZZ 
    COLIN   COLIN PAICE   14     0         1     0       0     15    0     15
    IBMUSER                2     0         0     2       0      0    0      2

For every open/close of the ZZZZ queue, there were two opens for update, and and open of the MQADMIN class – with no object.

With AUDITING FAILURES(READ), so only failures of READ access or above are logged, the output was

USER/                                               -------- I N T E N T S--------           
    *JOB            SUCCESS WARNING VIOLATION ALTER CONTROL UPDATE READ TOTAL                                                                     
MQADMIN = 
    COLIN   COLIN PAICE    2     0         0     0       0      0    0      2

With an entry once for each job.

How to administer AMS policies, and use the set policy command.

I had been using the setmqspl command (on z/OS and midrange) to manage my AMS policies. This command has the drawback that if you want to change a policy, for example add a new recipipient, you had to specify the whole command. Jon Rumsey pointed out the mid range MQSC commands “set policy” and “display policy” which allow you to add, delete, or replace; recipients and signers.

Examples of midrange runmqsc set policy command

Exporting parameters

If you want to keep a copy of the AMS definitions you can use display policy command, but this gives output like RECIP(CN=BBB,C=GB), without quotes. The set policy command needs the value within single quotes. The dmpmqcfg command does not support AMS policies.

To be able to capture the output so you can reuse it, you need to use the dspmqspl -export command. This gives output like

setmqspl -m QMA -p ABC -s SHA512 -e AES256 -r “CN=BBB,C=GB” -c 0 -t 0

This gives the parameters if a format that can be used directly.

Add or remove recipients or signers

Using runmqsc define a policy using the default action(replace)

set policy(ABC) signalg(SHA512) recip(‘CN=AAA,C=GB’)  ENCALG(AES256) 

You can add a new recipient

set policy(ABC) signalg(SHA512) recip(‘CN=BBB,C=GB’) ENCALG(AES256) action(ADD)

You can now display it

DIS policy (ABC)

AMQ9086I: Display IBM MQ Advanced Message Security policy details.
POLICY(ABC) SIGNALG(SHA512)
ENCALG(AES256)
RECIP(CN=BBB,C=GB)
RECIP(CN=AAA,C=GB)
KEYREUSE(DISABLED)
ENFORCE

You can delete a recipient

set policy(ABC) SIGNALG(SHA256) ENCALG(AES128) RECIP(‘CN=AAA,C=GB’) action(remove)

and display it

DIS policy(Abc)
AMQ9086I: Display IBM MQ Advanced Message Security policy details.
POLICY(ABC) SIGNALG(SHA512)
ENCALG(AES256) RECIP(CN=BBB,C=GB)


KEYREUSE(DISABLED)
ENFORCE

You have to specify SIGNALG and/or ENCALG each time, but for action(REMOVE|ADD) it can have any valid value (except NONE). The value is only used when ACTION(REPLACE) is used, or ACTION() is omitted. The following will add the recipient, and not change the signalg or encalg values.

set policy(ABC) recip(‘CN=CCC,C=GB’) action(ADD) signalg(MD5) encalg(RC2)

You can specify multiple RECIP

set policy(ABC) signalg(SHA512) recip(‘CN=BBB,C=GB’) recip(‘CN=DDD,C=GB’) ENCALG(AES256) action(ADD)

or multiple signers

set policy(ABC) signalg(SHA512) signer(‘CN=BBB,C=GB’) signer(‘CN=DDD,C=GB’) ENCALG(AES256) action(ADD)

or multiple signers and recipients.

Changing other parameters

If want to change an algorithm, the tolerate|enforce that every message must be protected, or the key reuse, then you must use the action(replace), and specify all the parameters, so it might be easier to use setmqspl -m … -policy … -export, and output it to a file, then modify the file.

Administering AMS on z/OS

On z/OS (and mid-range) you have dspmqspl and setmqspl commands. With the setmqspl command, you replace the entire statement.

It is good practice to have a PDSE with all of your definitions in, one member per policy, or perhaps all policies in one member – depending on how many policies you have. If you have a problem with your queue manager, you have a copy of the definitions.

Another good practice is to take a copy of a definition before you make the change (and keep it unchanged), so you can roll back to it if you need to undo a change.

You can use the export command, to output all policies, or a selected policy. You can have this going into a sequential data set or a PDSE member. You might want to have two copies,

  1. The before image – from before the change
  2. The copy you update.

Of course you could always use the previous copy, but you cannot tell if someone has updated the definitions outside of your change control system, so taking a copy of the existing definitions is a good idea. You could always compare the previous copy, with the copy you just created to check there were no unauthorised changes.

You may want to make the same change to multiple queue managers, so having updates in a PDSE member is a good way of doing it. Just change the queue manager name and rerun the job.

On z/OS, remember to use the refresh command on the AMS address space for it to pick up any changes.

Other AMS blog posts

checkAMS: program to check your AMS defintions are consistent with z/OS keyring

A C program to verify that the certificates in MQ AMS configuration are in a RACF keyring. See here.

Overview of program

With AMS you specify the Distinquished Names(DN) of users who are allowed to sign or encrypt MQ messages. The certificates for these DN’s need to be in the xxxxAMSM’s drq.ams.keyring. If they are not present, or have problems, such as they are not valid, the messages from AMS are not very helpful. The messages are as helpful as “one of the DN’s in the configuration has a problem but I am not telling you which DN it was, nor what the problem was”.

CheckAMS has two parts:

  1. Provide a useful list of information in the keyring
  2. Takes the output of the AMS dspmqspl command, and checks the DN’s are in the key store

Provide a useful list of the contents of a keyring.

With the RACDCERT commands you can list the contents of a keyring, for example owner and label; and you can display details about a certificate, such as the DN of the subject, and the Certificate Authority, but you cannot issue one command to display all the important information, nor ask, “is the DN for this issuer in the keystore”.

Example output from checkAMS, listing certificates in keyring:

Subject CN=SSCARSA1024,OU=CA,O=SSS,C=GB                                                         
Issuer  CN=SSCARSA1024,OU=CA,O=SSS,C=GB                                                         
Self signed                                                                                     
Valid date range 21/02/13 12:32:33 to 24/02/13 12:32:33                                         
Owner irrcerta/LINUXCA                                                                          
Usage:Certauth Status:Trust                                                                     
                                                                                                           
Subject CN=colin,OU=longou,O=SSS                                                                
Issuer  CN=TEMP4Certification Authority,OU=TEST,O=TEMP                                          
Valid date range 21/03/25 00:00:00 to 22/03/25 23:59:59                                         
Owner COLIN/TEST                                                                                
Usage:Site Status:Trust      

The first certificate is owned by irrcerta and has label LINUXA. Userid irrcerta means it belongs to CERTAUTH. The certificate is self signed, and has a long validity date. It has a usage of CERTAUTH, and is trusted.

The second certificate belongs to userid COLIN, it has label TEST. It has a subject DN of Subject CN=colin,OU=longou,O=SSS, and was issued by CN=TEMP4Certification Authority,OU=TEST,O=TEMP. It has a usage of Site, and is trusted.

Check the AMS set up

The program takes as input the output of the dspmqspl -m… -export command, and checks the DN against certificates in the keyring.

Example output

Userid START1, ring drq.ams.keyring                                                                                  
* Exported on Mon Mar 29 09:23:31 2021                                                                               
                                                                                                                      
dspmqspl -m CSQ9  -export                                                                                          
setmqspl -m CSQ9                                                                                                     
 -p AMSQ                                                                                                             
 -s SHA256                                                                                                           
 -a "CN=COLIN,O=SSS"                                                                                                 
   Owner COLIN/AMS Usage:Site Status:Trust Valid date range 21/03/21 00:00:00 to 22/03/21 18:45:00                  
 -a "O=aaaa, C=GB,CN=ja2"                                                                                            
 ! O=aaaa,C=GB,CN=ja2 Not found in key ring                                                                           
 -e AES256                                                                                                           
 -r "CN=COLIN,O=SSS"                                                                                                 
  Owner COLIN/AMS Usage:Site Status:Trust Valid date range 21/03/21 00:00:00 to 22/03/21 18:45:00                  
 -r "CN=ADCDB,O=SSS"

This shows the keyring was START1/drq.ams.keyring.

It prints out the exported file, and for the -a and -r records, it adds information about the certificate, or reports if it is not found.

It reports that “CN=COLIN,O=SSS” was found, the certificate belongs to userid COLIN,label AMS, it has usage of Site, it is trusted, and has a valid date.

It also reports O=aaaa,C=GB,CN=ja2 Not found in key ring This is because the definition in AMS has the wrong order. The standard order is CN=ja2,O=aaaa,c=GB. This certificate is in the keyring , but the program could not find it. I could not see a way of converting bad format DNs to good DNs.

Contents of package.

The package is on git.

FTP the amscheck.xmit.bit to z/OS as binary. Then use TSO receive indsn(amscheck.xmit) to create the load module in a PDS.

Upload runamsch, ccasmch, asmcheck. and parmlist.h to a PDS.

Edit and submit runamsch. It runs dspmqspl and puts the output into a temporary file. The parm PARM=’START1 drq.ams.keyring’ is for userid START1 and the keyring drq.ams.keyring. Your userid will need access to the userid’s keyring.

if you want to compile the program

If you want to compile the program, you can edit ccasmch, and change the SYSIN, and where the header file is imported from.