Understanding PCF cmdscope on z/OS

I wanted to check the response to a PCF command when cmdscope=”*” was issued.

It took a while to understand it. It feels like it was implemented by different people with different strategies. I’ve documented what I found in case someone else struggles down this path. As an extra challenge I used Python to issue the command and process the responses.

High level view of the cmdscope reply

A PCF request is sent to a server, and responses are turned to the specified reply-to-queue. These reply messages all have the same correlid, but different msgids.

  • The first message gives information about the remainder of the messages, and how many queue managers are involved.
  • The last message is a message saying “finished”, and loosely ties up with the first messages.
  • In between there are sets of messages from each queue manager. There are one or more messages in the set. Each set has its own identifier.

More information

The first message

  • This has a Response id – it is not used anywhere else.
  • Says there replies from N queue manager. The value of N is given.
  • There is a Response Set structure (A) giving the Response id for the “end of command scope” message.
  • There are one or more Response Set structures giving the Response id for each queue manager. There are N of these structures.

The last message has a Response-id which matches the Response set (A). The content of this last message is “the request (with command scope) has finished”.

Each queue manager returns a set one or more messages.

  • All messages in the set have a Response-id for the set; which is given in the first message above. The name of the queue manager providing the information is given.
  • The last message in the set has the “Last message in the set” flag set in the PCF header.

Response-id values

Each message has a Response-id field. This has a value like “CSQ M801 GM802…” where the … is hex data. The original request was to queue manager M801, and in this case the response was for queue manager M802. There are several field that look similar to this, for example “CSQ M801 RM801…”. The content of this response is not an API.

How many messages should I expect?

The answer is “it depends”.

  • You will get a matching first and last message (2)
  • You will get a set of messages for each queue manager
    • There will be one or more messages with data.
    • The number of messages with data varies. For example the Inquire archive request returns returns the values at startup. Sysp Type : Type Initial.
    • If a Set Archive command has been issue, a second record with Sysp Type : Type Set, will be present, which shows the parameters which have been changed since startup.
    • A last in set message.

You know when you have the last message when you have received the “the request (with command scope) has finished” message.

At a practical level…

I wrote code like that below, to get all the messages in a request. If is more complex than written as the PCF body may not have the structure with the data type.

cmdscopeloop = no
_______________

Loop: Get message with PCF header and PCF body
## If cmdscope was specified, keep looping until end cmdscope message
If header.type ==MQCFT_XR_MSG and
body[MQIACF_COMMAND_INFO] == MQCMDI_CMDSCOPE_ACCEPTED
then cmdscopeloop = yes

If header.type ==MQCFT_XR_MSG and
body[MQIACF_COMMAND_INFO] == MQCMDI_CMDSCOPE_COMPLETED
then cmdscopeloop = no
## Requests like inquire chinit have a command accepted
If header.type ==MQCFT_XR_MSG and
body[MQIACF_COMMAND_INFO] == MQCMDI_COMMAND_ACCEPTED
then state = “CommandAccepted”

## Most messages have
if header.type ==MQCFT_XR_ITEM
then state = “Item”

## The summary record means end of set.
##For many requests this is end of data
if header.type ==MQCFT_XR_SUMMARY
then state = “endset”

## see if we need to keep looping.
if state == “endset” and cmdscopeloop == no
then return
else loop to get more messages

The request and detailed responses

I used PCF with the command MQCMD_INQUIRE_ARCHIVE, cmdscope=”*”. I send the request to queue manager M801. There were three queue managers in the QSG: M801, M802, M803.

The data description like “Response Q Mgr Name”, is decoded from the PCF value, and post processed to make it more readable, using the MQ sample code (provided on MQ Midrange).

Values like b’M801′ is a Python byte string, or hexadecimal string. Other character data may be in UTF-8 code page.

Response (1) meta information

PCF.Type 17 MQCFT_XR_MSG
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 1
PCF.Control 1 Last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 8

  1. Response Id : CSQ M801 RM801…
  2. Response Q Mgr Name : b’M801′
  3. Command Info : Cmdscope Accepted
  4. Cmdscope Q Mgr Count : 3
  5. Response Set : CSQ M801 SM801… This is for the final “end of cmdscope” message
  6. Response Set : CSQ M801 GM801… These are for the data returned from a queue manager
  7. Response Set : CSQ M801 GM802…
  8. Response Set : CSQ M801 GM803…

Response (2), for queue manager M801. The initial values

PCF.Type 18 MQCFT_XR_ITEM
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 1
PCF.Control 0 Not last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 19

  1. Response Id : CSQ M801 GM801… Matching a response set value in the first message
  2. Response Q Mgr Name : b’M801′
  3. Sysp Type : Type Initial
  4. Sysp Archive Unit1 : b’TAPE’
    ….

19. Sysp Quiesce Interval : 5

Response (3), for queue manager M801. The fields which have been set

On queue manager M801 the command SET ARCHIVE UNIT(DISK) was issued. If the DISPLAY ARCHIVE command is issued it will display DISK under the “SET value” column.

PCF.Type 18 MQCFT_XR_ITEM
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 2
PCF.Control 0 Not last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 4

  1. Response Id : CSQ M801 GM801…
  2. Response Q Mgr Name : b’M801′
  3. Sysp Type : Type Set
  4. Sysp Archive Unit1 : b’DISK’

Response(4), for queue manager M801. End of queue manager’s data

PCF.Type 19 MQCFT_XR_SUMMARY
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 3
PCF.Control 1 Last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 2

  1. Response Id : CSQ M801 GM801…
  2. Response Q Mgr Name : b’M801′

Response (5), for queue manager M802. The initial values

PCF.Type 18 MQCFT_XR_ITEM
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 1
PCF.Control 0 Not last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 19

  1. Response Id : CSQ M801 GM802…
  2. Response Q Mgr Name : b’M802′
  3. Sysp Type : Type Initial
  4. Sysp Archive Unit1 : b’TAPE’

Response (6), for queue manager M802. End of queue manager’s data

PCF.Type 19 MQCFT_XR_SUMMARY
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 2
PCF.Control 1 Last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 2

  1. Response Id : CSQ M801 GM802…
  2. Response Q Mgr Name : b’M802′

Response (7), for queue manager M803. The initial values

PCF.Type 18 MQCFT_XR_ITEM
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 1
PCF.Control 0 Not last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 19

  1. Response Id : CSQ M801 GM803…
  2. Response Q Mgr Name : b’M803′
  3. Sysp Type : Type Initial
  4. Sysp Archive Unit1 : b’TAPE’

Response (8), for queue manager M803. End of queue manager’s data

PCF.Type 19 MQCFT_XR_SUMMARY
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 2
PCF.Control 1 Last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 2

  1. Response Id : CSQ M801 GM803…
  2. Response Q Mgr Name : b’M803′

Response (9), for queue manager M801. End of command.

PCF.Type 17 MQCFT_XR_MSG
PCF.Command 114 MQCMD_INQUIRE_ARCHIVE
PCF.MsgSeq 1
PCF.Control 1 Last message in set
PCF.CompCode 0
PCF.Reason 0
PCF.ParmCount 3

  1. CSQ M801 SM801…
  2. Response Q Mgr Name : b’M801′
  3. Command Info : Cmdscope Completed

What does MQRCCF_PARM_CONFLICT mean?

I’ve been using PCF (from Python) and have been getting MQRCCF_PARM_CONFLICT. For example use MQCMD_CHANGE_Q_MGR and set MQIA_TCP_CHANNELS to 201.

The documentation does not help. It says

Explanation

Incompatible parameters or parameter values.

The parameters or parameter values for a command are incompatible. One of the following occurred:

  1. A parameter was not specified that is required by another parameter or parameter value.
    • The MQIA_TCP_CHANNELS can be used on its own – so it is not this one.
  2. A parameter or parameter value was specified that is not allowed with some other parameter or parameter value.
    • The MQIA_TCP_CHANNELS was specified on its own – so it is not this one.
  3. The values for two specified parameters were not both blank or non-blank.
    • Only one parameter was specified – so not this one.
  4. The values for two specified parameters were incompatible.
    • Only one parameter was specified – so not this one.

I found the problem by issuing the command and seeing what the response was.

QM01 ALTER QMGR TCPCHL(201)

gave

CSQM150I QM01 CSQMAMMS ‘TCPCHL’ AND ‘MAXCHL’ VALUES ARE INCOMPATIBLE
CSQ9023E QM01 CSQMAMMS ‘ ALTER QMGR’ ABNORMAL COMPLETION

So I would add reason

5. The specified value is inconsistent with the configuration.

How do I use SFTP to ftp to a z/OS data set?

You cannot do it directly, you have to do a two step process.

Sending files to z/OS using SFTP

SFTP can send files to Unix Services Subsystem. It cannot send to data sets.

sftp colin@10.99.88.77

You can use commands like

  • cd to change directory on the remote system
  • lcd change directory on the local system
  • put
  • get
  • chmod
  • chown
  • exit

There is no command “bin” nor “quote…”.

Getting from a Unix Services file to a dataset.

You can use the cp command.

To copy a binary file to a dataset

For example as if you were using FTP with BIN; quote site cyl pri=1 sec=1 recfm=fb blksize=3200 lrecl=80; put mp1b.load.xmit ‘COLIN.MP1B.LOAD.XMIT’)

cp -W “seqparms=’RECFM=FB,SPACE=(500,100),LRECL=80,BLKSIZE=3200′” mp1b.load.xmit “//’COLIN.MP1B.LOAD.XMIT'”

Where the seqparms are in upper case. If they are in mixed case you get

FSUM6258 cannot open file “… “: EDC5121I Invalid argument.

I could then do TSO RECEIVE INDSN(‘COLIN.MP1B.LOAD.XMIT’).

To copy a text file to a data set

If you use SFTP to copy a text file to Unix Services, it gets sent in bin, and, on z/OS, looks like

ñà ÈÇÁ ËøÁÄÑÃÑÁÀ….

You can tag a file so Unix Services knows it is an ASCII file, using

chtag -tc ISO8859-1 aaa

This makes the file editable from Unix Services, but you cannot just use cp to copy and create a dataset, as above.

You can convert it from ASCII to EBCDIC using

iconv -f ISO8859-1 -t IBM-037 ascii_file  > ebcdic-file

Then use

cp -W “seqparms=’RECFM=VB,SPACE=(CYL,(1,1)),LRECL=800,BLKSIZE=8000′” ebcdic-file “//’COLIN.EBCDFILE'”

Setting up SMDS in MQ on z/OS

Setting up and using SMDS

MQ on z/OS provides shared queue, where a coupling facility structure is shared amongst queue managers in a sysplex to provide a queue sharing group.

You can set up Shared Message Data Set(SMDS) to handle large messages over 63 KB in size, or to provide additional capacity, so as the structure fills up messages are written to a data set – but available from all queue managers in the QSG. Ultimately the structure just has pointers to messages on the data set.

For messages under 63 KB in size, instead of using SMDS you can use the z/OS structure overflow called Storage Class Memory. I think of this as paging for a structure. If you have many messages on a queue it will “page out” the messages in the middle out of the queue, and as the queue is processed sequentially it will “page in” in the messages before they are needed.

There is lots of good information on SMDS, but I found it hard to locate the information I needed.

This blog post aims to fill some of the holes.

Before you start

The term CFLEVEL is used in two contexts, MQ and z/OS

  • z/OS Coupling Facility Resource Management (CFRM) uses a CFLEVEL. For example ,the CFRM definitions include if the structure is duplexed. If CFRM CFLEVEL > 21 then it supports asynchronous duplexing.
  • MQ has a CFLEVEL. MQ CFLEVEL(5) allows offload, and allows the queue manager to tolerate a loss of connectivity to the CF structure.

SMDS requires the CF structures to have MQ CFLEVEL(5). This is not a major change, but needs to be planned. Increasing the MQ CFLEVEL in the z/OS CF structure definition, will probably increase the size of the structure.

In “Backing up and recovering data in SMDS” below, it discusses having a queue manager within the QSG just for backing up and recovering messages in structures and especially when there are a large number of messages in SMDS. You need to consider this, and decide if you want to implement it. This can be done at any time. You may need to review how often you backup your MQ CF structures.

Backing up (and recovering) data in SMDS

Unlike MQ pagesets you do not recover an SMDS by restoring it from a backup. The CF structure and the SMDS are one logical entity, with the structure pointing to the location on disk. If you restore a backup of an SMDS, the CF will not have the matching information and you will have corrupt data.

Instead of backing up the SMDS you use the MQ BACKUP CFSTRUCT command. This reads the CF and any messages on the SMDSs, and writes the content to the log data sets. Ideally you do this when the CF structure is close to empty. If you have a large SMDS full of messages, this will write a lot of data to the active logs, possibly impacting normal throughput.

When the RECOVER CFSTRUCT command is issued (to recreate the data) , the log is read to recover the data. If you have normal message data logged, this could mean that a lot of log data needs to be read to recover the structure, because the normal message traffic and the backup data will be intermixed on the log.

Some customers have a queue manager dedicated to backing up the CF Structures. This has the following advantages:

  • The backup activity writing to the active log does not impact normal business activity
  • When recover cfstruct is issued, only the data for CF structures is in the logs, and so less data overall needs to be read from the logs, which means that recovery is faster.

Create the SMDS

Each queue manager needs its own SMDS, the queue manager can read and write to its own SMDS.

A queue manager can read the data in another queue manager’s SMDS, (but updates are only made from the owning queue manager).

The SMDS can be of different sizes, for example if two queue managers are “customer facing”, and two queue managers are “back end facing” the “customer facing” SMDS may get a lot of activity, and the “back end facing” may get no activity because these queue managers only do gets.

You allocate an SMDS with a primary size, and can specify a secondary size. If a secondary size is zero or not specified, and expansion is allowed, then the queue manager will try expanding with a secondary space of 10% of the existing size. This means the 100th extent will be larger than the 10th extent.

An SMDS can have up to 255 extents. If the data set is SMS managed and the SMS configuration has Extent Constraint Removal specified in the DFSMS Data Class, then the data set can be on up to 59 volumes, with up to 123 extents per volume (or 7257 extents). See here for more information.

You cannot make the SMDS smaller.

If enabled, automatic expansion is attempted when the SMDS is 90% full. If the message rate is not too high then the automatic expansion may be enough so the SMDS does not get 100% full.

If there is a spike in workload, then the expansion may not be able to keep up, and so some requests will get no space available, until any in-progress expansion has completed. In this case to avoid applications getting the “no space” condition you should allocate the SMDS with a primary extent big enough to hold the expected peak of messages, or allocate a queue on the CF structure, and put many messages to cause expansion (then drain or delete the queue).

Note: The time taken to expand the SMDS depends on the size of the increment. Larger extents mean there are more pages to format, and so take longer.

Allocating the data set

When the CF Structure is not full, then most of the messages may be in the CF Structure. As the CF becomes fuller, more messages are written to the SMDS. If all messages are on an SMDS, and there is a high message rate, the SMDS will have a lot of I/O activity, read and write.

You need to allocate the SMDS on fast disks to minimise response time of processing messages.

If you are not using SMS managed data sets, you need specify additional volumes to allow for expansion.

What logical block size should be used?

You need to know the profile of the size of your messages so you know what value to use. Each page within SMDS is 4KB. The block size defines how many of these 4KB pages are allocated to each message. I/O is done at the block level.

The default DSBLOCK is 256K.

Once an SMDS has been opened the DSBLOCK value cannot be changed, so you need to plan this before you start to use the SMDS.

If you find the value is not optimum, then

  • if the value is too large, the impact is more disk space will be used and more 64 bit buffers are used.
  • if the value is too small, then more I/O operations (of smaller blocks) will be done, and the time to process messages may take longer (possibly just a few milliseconds).

If you have big messages then a DSBLOCK value of 256K may be OK. If you have many small messages (under 64K) then 64K or smaller may be better.

Defining SMDS to MQ

If you define a new CF Structure and specify CFLEVEL(5) then the offload parameter defaults to SMDS.

If you alter an existing CF Structure from a lower CFLEVEL to CFLEVEL(5) the OFFLOAD defaults to DB2. This was done for migration reasons.

Once you have created your SMDS for each queue manager. You use the

  • alter cfstruct.(..) cflevel(5)
  • alter cfstruct(..) DSGROUP(‘data.set.name.*’) DSBLOCK(…)
  • alter cfstruct(…) offload(SMDS)

The number of buffers (of size DSBLOCK) is specified in DSBUFS. This can be set on a queue manager using the ALTER SMDS DSBUFS command.

This value can be changed dynamically, it defaults to 100.

You need to work with the z/OS systems programmer before using this to make sure there is enough resources. For example I defined the block size as 1MB, and defined 9999 buffers. This is over 9 GB of virtual storage and z/OS ran out of auxiliary storage and real storage. I had to shutdown the QSG and reset the size in CSQINP2.

When reading from a different queue managers’ SMDS, a buffer is allocated from the pool, data is read in, and the buffer is released back to the pool.

When a queue manager is writing to its own SMDS, a buffer is obtained, written to the SMDS and the buffer released(still with its content). If an MQGET is issued, the data may be in a buffer, and so avoid disk I/O.

Generally SMDS works ok with a small number of buffers – as long as there are enough buffers for the concurrent requests. Having more buffers may make little difference to performance. You need to look at the statistics to see if there were buffer shortages.

Other SMDS activities

If you need to work with the SMDS, you can use the RESET SMDS command to disable access to the SMDS. Message processing which does not use the SMDS can continue.

The RESET SMDS ACCESS(DISABLED) causes all of the queue managers to close it normally and deallocate it. When the data set is ready to be used, it can be altered to ACCESS(ENABLED) allowing the queue managers to access it again.

Media recovery

If there is a problem with shared queue, the CF structure may need to be rebuilt, and the messages restored from the log data sets.

One queue manager does the Media recovery. This needs access to the log of all the queue managers in the QSG, and needs update access to the SMDS data sets.

Automatic expansion

You can enable automatic expansion of the SMDS using DSEXPAND(YES) .

You can disable automatic expansion of the SMDS using DSEXPAND(NO). You may want to do this is you have an exceptional peak, and do not want the SMDS to expand(because you cannot make an SMDS smaller). If the CF structure or the SMDS is full a put request will get no space available reason code.

Automatic expansion may fail

  • If the number of extents exceeds the limit
  • If there is no space available to expand. You may be able to fix this by making more volumes available to be used. You may need to talk to your data manager about the best way to resolve this problem.

What to monitor?

SMF

You should collect SMF data from a good day, so if you have a bad day, you have some comparison data.

SMDS writes statistics data to SMF. A program to print the SMF data, is available in SupportPac MP1B. This describes the fields, and how to interpret the data. There are other tools which format the data, but they do not tend to interpret the data and give useful information.

Information available includes

  • I/O requests to an SMDS, write to local, read from local, read from other qmgr’s SMDS
    • count of requests,
    • number of pages,
    • average I/O duration,
    • average wait before the I/O could be started.
  • Numbers of buffers
    • Total,
    • In use,
    • Current count of waiting for a free buffer,
    • Highest count in the interval of waiting for a free buffer,
    • Current count of waiting for a busy buffer,
    • Highest count in the interval of waiting for a busy buffer,
    • Number of times there were no buffers available.

Display command

SupportPac MP16: Capacity Planning and Tuning for IBM MQ for z/O describes SMDS. It gives information on DSBLOCK and impact of message size, and other good information.

If there are insufficient buffers to handle the maximum concurrent number of I/O requests, then requests will have to wait for buffers, causing a significant performance impact. This can be seen in the CSQE285I message (issued as a response to the “DISPLAY USAGE TYPE(SMDS)” command) when the “lowest free” is zero or negative and the “wait rate” is non-zero. If the “lowest free” is negative, increasing the DSBUFS parameter by that number of buffers should avoid waits in similar situations.

MP16

The number of extents

You can use MQ commands (or MQ SMF data) to monitor the size of SMDS. You can also automate the expansion messages as expansion should be an exception rather than an every day occurrence.

You might prepare for SMDS expansion deciding in advance if you want to use SMDS expansion, or disable expansion, and prepare the system operations to handle this.

MQ commands relating to SMDS

Some of the commands for SMDS are dissimilar to normal MQ commands. For example

DIS smds(*) CFSTRUCT(CSQSYSAPPL)

gets a reply from all queue managers in the QSG. You can issue the following command from queue manager QM01, about queue manager QM02

QM01 DIS SMDS(QM02) CFSTRUCT(CSQSYSAPPL)

Or you could just issue the command to QM02

QM02 DIS SMDS(QM02) CFSTRUCT(CSQSYSAPPL)

I forget this every time!

  • ALTER CFSTRUCT. Define OFFLOAD parameters, data set name, thresholds for offloading messages to SMDS. Change numbers of buffers for QSG specific value, the number of buffers for individual queue managers, can be overwritten by ALTER SMDS.
  • ALTER SMDS. Change the number of buffers for a queue manager; Change DSEXPAND().
  • DISPLAY CFSTATUS. Display the status of one or more CF application structures.
    • DISPLAY CFSTATUS TYPE(SMDS). Display shared message data set information. You get a reply from each queue manager.
  • DISPLAY CFSTRUCT. Display the attributes of one or more CF application structures.
  • DISPLAY SMDS. Display the parameters of existing SMDSs associated with a specified application structure.
  • DISPLAY SMDSCONN. Display status and availability information about the connection between the queue manager and the shared message data sets for the specified CFSTRUCT
  • RECOVER CFSTRUCT. Recreate the persistent messages in the CF (and SMDS) from the logs.
  • RESET SMDS. Modify availability or status information relating to one or more shared message data sets associated with a specific application structure. For example make SMDS not available to queue managers, while it is made larger.

Should stress testing be stressful? and a tale of two managers.

I had two different things about stress testing happen in the last week – one of life’s little coincidences. The first was a paper I rediscovered about CICS stress testing written for a prestigious IBM journal – about 30 years ago, which was not accepted for publication because it was “obvious” and didn’t have graphs and complex equations. The other event was talking to someone who was involved with testing military hardware.

Testing military hardware

John’s department in the army was to take new kit from the manufactures, use it, and give feedback. A couple of weeks later, retest it when “development” had “fixed” the problems. John found that the easy problems were usually fixed, but the hard problems were not.

John hatched a cunning plan. He got managers from the manufacturer to come down, have a good lunch, and see their vehicles in action. After the good lunch, he “invited” the VIPs to get into army fatigues and go for a ride, so they could experience their product first hand. After their ride at speed over the army vehicle assault course, seeing all the capabilities of their product first hand (eg firing guns) , they eventually got back to the lunch venue. The visitors got out the vehicle, looking very ill; some had been sick, some had bruises, some were temporarily deaf.

After they had time to recover John went through the list of outstanding problems, with words like “As you may have experienced…”. Afterwards, John’s commanding officer called him into his office and gave him a telling off for putting civilians lives at risk etc… then said “well done – good work – don’t do it again – dismiss”. Overall this was a success as the next set of vehicles to test had some of the major problems fixed.

The moral of this story is you need to test in a realistic situation with all of the problems that your end user may encounter.

The IBM non article.

The IBM article I recently discovered in a box, was written in the early days of stress testing. I remember doing the CICS stress test 40 years ago. 10 of us sat in a room, each with a terminal and typed in random CICS commands for 1 hour. If CICS didn’t crash, this was a successful test. A few years later testing had moved on. The testing had simulated users running scripts. There was 1000 “end user” running complex applications. From this base line it was significantly enhanced. This enhanced testing was so successful the team were asked to write it up for the prestigious IBM Journal of Research and Development. I reviewed the document, and thought it was great. Rather than talk about CICS, terminal control, SIP etc, they approached system testing as testing a hypothetical “IBM car”, which every one could understand. While some testers took the car on the road to visit their parents, the stress testers had a plan. Go off road and see what happens.

  • “Bang” a tyre caught on a sharp rock and caused a puncture. Ahh there is no spare tyre. Defect.
  • Once that was fixed, do it again, “Bang”. This time there is a spare wheel, but no jack to allow the wheel to be changed. Defect
  • Whoops they bumped into a gate. Take the car back to the garage, and get them to repair it. No matching paint? Defect.
  • Now that all works. Try driving it in reverse across a field – hmm poor visibility. Defect
  • No problems found? – do it even faster.
  • Now fill the car with (heavy) bags of compost (hmmm, there is a nasty lip on the edge of the boot – defect) and the car is hard to drive round corners at speed – defect.
  • Get your partner to drive to your parents – what do you mean, she cannot reach the pedals?

As I said – it was a great article, but not what the journal wanted.

Thoughts about stress testing.

CICS has various pools of resources, for example the number of threads it can support. If this is tunable, you may set the value at 1000, and try running with 1000 threads, and more than 1000 threads to make sure the code works at the limit. You may find you run out of CPU before you can drive the workload at these limits.

Testing with 1000 concurrent threads is hard. It is better to make the pool smaller, say 10 threads, and test with 10 or more threads. To simulate peak workloads you can either increase the number of threads, or make the pool size smaller, so while CICS is running decrease the size of the pool down from 1000 to 10. Let CICS sort itself out, and then put the limit back to 1000 – and repeat.

In the car analogy, drive every where in first gear – this will test the high revs without going very fast! It depends on how you look at the problem.

A tale of two managers

There is the story of a large development team. There was a senior development manager. Under him were two managers, and each of these had four managers who managed the day to day work.

One second line manager said he liked graphs with lots of green “Test cases successful”. “Problem” test cases were reviewed, and if they were considered unrealistic, or not likely to happen, they were quietly removed from the test plan. He worked hard to get more green on his charts. He was very proud of the green charts outside of his office.

The other manager said “Green charts mean you are not testing hard enough. I want to know where it breaks, why it breaks, the impact on the overall system”. When his charts showing testcases successful had 75% green and 20% red (and 5% investigate), he called the test team in and said “here is a bigger box” get testing. The tests successful went down to 50% – and he was very pleased. Eventually they got the successful tests up to 95% green.

The product was shipped to customers, and the “red” manager became the change team manager, in charge of customer support. It was interesting that although both areas had customer problems, the “green” code had more problems. Some of the “interesting” problems found in the green code, had tests, which had quietly been dropped.

Because the “red” code had experienced tougher problems during testing, development had added better diagnostics, so difficult customer problems were “easier” to debug (still hard – just not so hard).

The moral of the story is if you are not breaking the product – you are not pushing it hard enough. You can have as many green lines on charts as you like – they may not reflect reality.

Why is this Linux slower to download than that one

I have a laptop which is my primary work station, and an under desk server for running my z/OS system on top of the same Linux.

Running “apt update” on the laptop was always faster on the laptop compared to the server. Was this because all traffic for the server was going through my laptop? How do I tell?

The boxes are connected with an Ethernet cable, I had to purchase a wireless dongle for my server, my laptop has a built in wireless adapter.

The linux ifconfig or the ip command gives information about the configuration. For example ip a

eno1: flags=4163 mtu 1500
    inet 10.1.0.3 netmask 255.255.255.0 broadcast 10.1.0.255
    inet6 fe80::.... prefixlen 64 scopeid 0x20
    ether 00:... txqueuelen 1000 (Ethernet)
    RX packets 5136 bytes 1445665 (1.4 MB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 4933 bytes 1692274 (1.6 MB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    device interrupt 17 memory 0xb1200000-b1220000  
...
wlxd037450ab7ac: flags=4163 mtu 1500
    inet 192.... netmask 255.255.255.0 broadcast 192....
    inet6 2a00:... prefixlen 64 scopeid 0x0
    inet6 fe80::... prefixlen 64 scopeid 0x20
    inet6 2a00:... prefixlen 64 scopeid 0x0
    ether d0:... txqueuelen 1000 (Ethernet)
    RX packets 42427 bytes 60919847 (60.9 MB)
    RX errors 0 dropped 1 overruns 0 frame 0
    TX packets 25996 bytes 2397812 (2.3 MB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
  • The EtherNet connection eno1 has received 5136 packets and 1.4 MB of data
  • The WireLess connection wlx037450ab7ac has received 42427 packets and 60.9 MB of data.

As I had just done an apt upgrade, the wireless had all of the traffic to download the files, so the traffic was not coming through my laptop.

Once the system was updated, the only traffic flowing was down the Ethernet cable as I used the server from my laptop.

Ping

A ping from each system gave a similar response time.

traceroute

traceroute shows you the hops to the destination.

For example

traceroute abc.xyz.com

To specify the interface you need to run as a superuser.

sudo traceroute abc.xyz.com -i wlxd037450ab7ac

gave

1 bthub.home (192....) 4.654 ms 38.438 ms 38.425 ms
2 * * *
3 * * *
4 31.55.185.184 (31.55.185.184) 75.897 ms 75.890 ms 75.861 ms

If you are not running as a superuser you will get:

setsockopt SO_BINDTODEVICE: Operation not permitted

What else is there to help me?

On z/OS the netstat command gives a lot of information about the session, for example the send window size, the receive windows size etc. This information tends not to be available on other platforms.

On linux there is the ss (Socket Statistics) command.

Example output from ss -t -i included

  • ESTAB 0 0 192.168.1.223:58212 192.0.78.12:
  • https
  • cubic the congestion algorithm name, the default congestion algorithm is “cubic”
  • wscale:9,7 if window scale option is used, this field shows the send scale factor and receive scale factor
  • rto:232
  • rtt:29.236/5.693
  • ato:40 mss:1452
  • pmtu:1500
  • rcvmss:880
  • advmss:1460
  • cwnd:10 congestion window size
  • bytes_sent:68335
  • bytes_acked:68336
  • bytes_received:16334
  • segs_out:276
  • segs_in:202
  • data_segs_out:151
  • data_segs_in:123
  • send 4.0Mbps number of bits sent/time of send.
  • lastsnd:376 how long time since the last packet sent, the unit is millisecond
  • lastrcv:348 how long time since the last packet received, the unit is millisecond
  • lastack:348 how long time since the last packet acknowledged, the unit is millisecond
  • pacing_rate 7.9Mbps
  • delivery_rate 2.6Mbps
  • delivered:152
  • app_limited busy:3700ms
  • rcv_space:14600
  • rcv_ssthresh:64076
  • minrtt:23.773

I did not find most of this information very useful. It is all to easy for a developer(I have done it myself) to provide statistics from information which is readily available, rather than ask what information would be useful to debug problems – then collect and publish that information.

Using CISCO openconnect to tunnel to another system from Linux

I needed to use openconnect from CISCO to be able to logon from my Ubuntu system to someone else’s z/OS system.

This was pretty easy, but understanding some of the under the cover’s bits took a bit of time.

Basic install

  • Use sudo apt install openconnect
  • Download the VPNC script from http://www.infradead.org/ 
  • Create the configuration script
    • I saved the script as vpnc-script.sh
    • Using ls /etc/vpnc showed the directory did not exist. Create it and move the file
    • sudo mkdir /etc/vpnc/
    • sudo mv vpnc-script.sh /etc/vpnc/
    • sudo chmod +x /etc/vpnc/vpnc-script.sh
  • You need information from the owners of the vpn server.
    • vpn userid
    • vpn password
    • name of their system
    • IP address of their internal system
    • tso userid
    • tso password
  • I created a script (openc.sh), where XXXXX is the short userid, and password is your long userid:
    • printf ‘%s’ “password” | sudo openconnect –user=XXXXXX –script=/etc/vpnc/vpnc-script.sh vpn.customer.com
  • When you run openc.sh it prompts for your su password on the machine. The print… means you can store the password in the shell script. If you do not specify it, openconnect will prompt you for it.
  • Once the connection is made you can use ping 10.66.77.88, or x3270 -model 5 10.66.77.88 to access the system, where 10.66.77.88 is the IP address the owner of the vpn server gave you.

x3270

The owner of the vpn server gave me the address of the z/OS machine, my userid and password.

I then used

x3270 -model 5 10.66.77.88 to logon to the system.

Hot key

I like to hot key to my z/OS sessions. I used Ubuntu “Settings”-> Keyboard shortcuts, and added a shortcut

  • name: mvsCust
  • Command: wmctrl -a 10.66.77.88
  • Hot key: Ctrl + H

The wmctl -a says make the window active which has 10.66.77.88 in the window page title.

When I press Ctrl +H it makes the customers x3270 session the active window.

Change the x3270 colours

I wanted to change the screen colours, to distinguish it from other 3270 sessions. See Making x3270 green screens blue or red, or yellow with green bits.

FTP

I had to use SFTP colin@10.66.77.88 to ftp to the remote z/OS system (where colin is my TSO userid).

What happens with openconnect, under the covers.

The handshake has several stages

  • Establish a TLS session using the certificate from the server. Once this has completed, any traffic is encrypted. In my case I used the vpn userid and password. The vpn server can be configured to accept certificates instead of userid and password.
  • The server sends down configuration information from the vpn server’s configuration. For example
    • The IP addresses it supports , such as 10.66.0.0 and netmask 255.255.0.0
    • Any changes to the DNS configuration, so it knows to route 10.66.77.78 via the VPN session.
    • The “banner” such as “Welcome to mycom.com. Users of this system do so at their own risk”.
    • A default domain.
    • Which tunnelling device to use – such as tun0.
    • How many configuration statements.
    • Each set of configuration statements.
    • You can see this information by using the -v option on the openconnect command.
  • Using the information sent from the the vpn server, the openconnect client creates environment variables.
  • The script defined (or defaulted, for example /etc/vpnc/vpnc-script.sh) on the openconnect command is invoked, and it uses these environment variables to manage the ip and dns configuration, changing files like /etc/resolv.conf (the local DNS file).

Oh p*x, I’ve lost my changes

I have been using pax to backup the files in my Unix Services directory and needed to restore a file so I could compare it with the last version (and work out why my updates didn’t work). Unfortunately I managed to overwrite my latest version instead of creating a copy.
I backed up my directory using

pax -W “seqparms=’space=(cyl,(10,10))'” -wzvf “//’COLIN.PAX.PYMQI2′” -x os390 /u/tmp/pymqi2/

This created a data set COLIN.PAX.PYMQI2 with the give space parameters, and os390 format.

To list the contents of this file use

pax -f “//’COLIN.PAX.PYMQI2′”

To display a subset of the files use

pax -f “//’COLIN.PAX.PYMQI2′” /u/tmp/pymqi2/code

which gave

/u/tmp/pymqi2/code/
/u/tmp/pymqi2/code/pymqi/
/u/tmp/pymqi2/code/pymqi/__init__.py
/u/tmp/pymqi2/code/pymqi/old__init__.old
/u/tmp/pymqi2/code/pymqi/aa

And provide more information using the -v option

drwxrwxrwx 1 COLIN    1000      0 Jan 22 17:04 /u/tmp/pymqi2/code/
drwxr-xr-x 1 COLIN    1000      0 Feb 11 13:10 /u/tmp/pymqi2/code/pymqi/
-rw-r--r-- 1 OMVSKERN 1000 133011 Feb 22 13:15 /u/tmp/pymqi2/code/pymqi/init.py
-rw-r----- 1 COLIN    1000 119592 Feb  3 12:59 /u/tmp/pymqi2/code/pymqi/old__init__.old
-rwx------ 1 OMVSKERN 1000 119565 Jan 22 16:43 /u/tmp/pymqi2/code/pymqi/aa

The whoops

To restore an individual file and overwrite the original I used the -r option.

pax -rf “//’COLIN.PAX.PYMQI2′” /u/tmp/pymqi2/pymqi/__init__.py

I was expecting the file to be restored relative to the directory I was in; No – because I had backed up the files using an absolute path it restored the file to the same place, and so it overwrote my changes to the file. I had changed to a temporary directory, but I had not realised how the command worked.

There are several ways of doing it properly.

Restore with rename

pax -rf “//’COLIN.PAX.PYMQI2′” -i /u/tmp/pymqi2/pymqi/__init__.py

The -i option means rename.

I ran the command and it prompted me to rename it

Rename “/u/tmp/pymqi2/pymqi/__init__.py” as…

/tmp/oldinit.py

Set “do not overwrite”

I could also have used the -k option which prevents the overwriting of existing files.

Rename on restore

I could also have used the rename

pax -rf “//’COLIN.PAX.PYMQI2′” -s#/u/tmp/pymqi2/pymqi#/tmp/# /u/tmp/pymqi2/pymqi/__init__.py

Where the -s#/u/tmp/pymqi2/pymqi#/tmp/# / says use the regular expression to change /u/tmp/pymqi2/pymqi to /tmp and so restore it to a different place. Note: The more obvious -s/abc/xyz/, where / is used as the delimiter, would not work, as there is a ‘/’ in the file path.

All of the above

I could have use all of the options -i -k -s…. .

A better way to backup.

I had specified an absolute directory /u/tmp/pymi2/. If I was in this directory when I did the backup I could have used

pax … -x os390 .

Where the . at the end means from this directory, and so backup a relative directory.

If I list the files I get

pax -f “//’COLIN.PAX.PYMQI2A'” ./aa
./aa

And now if I restore the file…

pax -rf “//’COLIN.PAX.PYMQI2A'” ./aa

It restored the file into my working directory /tmp/aa .

So out of all the good ways of backing up and restoring – I chose the worst one. It only took me about 2 hours to remake all the changes I had lost.