Getting the best throughput with MQ TCPIP channels

Nov 9 2015

There are some tuning options that you can set to improve throughput of MQ channels on z/OS. These may involve MQ and TCP changes. See here for instructions for distributed servers because you need to change both ends of the channel

(This entry was updated February 2017 to add comments about Outbound Right Sizing(ORS) introduced in z/OS 2.2)

Background

An MQ sender channel gets a message from the XMIT queue, and sends the data in chunks of up to 32KB. At the end of the batch of data, the remote queue manager sends back a small end-of-batch response.

TCP uses various techniques to control the flow of data between two end points. One of these is known as a send window (this is well documented). This send window can increase and decrease if the receiving application changes the rate at which it receives the data.

The size of the send window is determined automatically by TCP/IP, but it can be influenced by configuring the receive buffer size on the remote host.

The send window only effects how much TCP can send. Write Blocking only comes into effect when there is no send window available AND the senders TCP send buffer becomes filled.

By default it uses the values in SYS1.TCPPARMS(PROFxx) for TCPCONFIG: TCPSENDBFRSIZE xxxx TCPRCVBUFRSIZE yyyy. These may not be in the configuration, in which case defaults apply.

In z/OS 2.1 the default sizes are 64KB (65536 bytes), before this the default sizes were 16KB

An application can configure a socket by using the setsockopt() function with the SO_RCVBUF parameter to specify the receive buffer size, and setsockopt() with the SO_SNDBUF parameter to specify the senders buffer size.

The maximum receive buffer size is specified in SYS1.TCPPARMS(PROFxx), TCPCONFIG, TCPMAXRCVBufrsize

The default maximum size is 256 KB, for z/OS 2.1 the maximum size is 2MB, before this it was 512KB.

For z/OS 2.1 the maximum send buffer size is specified in SYS1.TCPPARMS(PROFxx), TCPCONFIG, TCPMAXSENDBUFRSIZE

The default maximum size is 256 KB, the maximum size is 2MB

The amount of data on the network is limited by the smaller of the send buffer size and the receive window size.

Space in the send buffer is not released until TCP receives confirmation of receipt of the packet by the other end of the conversation. A larger send window size, means more packets can be inflight between sender and receiver, and less synchronization between the sender application and the network. If the size of the send window is too small, only a small number of packets (of data) can be sent before TCP/IP blocks the sending application. Even if there is plenty more bandwidth available, it cannot be used unless the send window can be scaled up, and that cannot be done if the send buffer is too small.

Network latency also plays a role here. On high latency networks the longer delays while waiting for acknowledgements can greatly limit the effective bandwidth between the two locations. When high latency is a factor, the send window (the receiver’s TCP receive window) and the sender’s TCP send buffer must be further increased in order to make full efficient use of the network link.

With modern networks, a send window of 16KB or even 64KB is often too small.

If the initial receive buffer size is 64KB or greater, then TCP can use a technique called Dynamic Right Sizing (DRS) to change the size of this send window. TCP will gradually increase the size of the send window until the optimum size is found.

If the receiving application is slow to process the data in the buffers, DRS will be disabled. In z/OS 2.1 and before, once it has been disabled it remains disabled until the connection is restarted.

Customers have increased TCPRCVBUFRSIZE and TCPSENDBFRSIZE to 256KB and got a major increase in throughput. This is system wide, so test it before changing these values. Also note that these values maybe still too small, and may need to be increased. In z/OS CommServer V2R2 autonomics have been added to dynamically grow send and receive buffers based on network flows between the two streaming hosts.

The network condition varies significantly in different customers’ environment, so you need to test, tune, and iterate to find out the best values for your environment.

How can I tell what sizes my buffers are – and if DRS is being used?

The TSO NETSTAT CONFIG command reports the default receive buffer size, the default send buffer size, and the default maximum receive buffer size. For example
TCP Configuration Table:
DefaultRcvBufSize: 00065536 DefaultSndBufSize: 00065536
DefltMaxRcvBufSize: 00262144 (256KB)

The NETSTAT command for the receiver end of the connection, such as TSO NETSTAT ALL (IPPORT nnnn where nnnn is the port number.
reports information like
RcvWnd: 0000131072 (128KB)
ReceiveBufferSize: 0000369185

For the sender the information was
SndWnd: 0000065535
when the channel first started, and was
SndWnd: 0000524288 (512KB)
after it had sent lots of messages, showing the buffer size has increased.

For the receiving end, the byte field TcpPrf gives information about DRS.

If this has x’80’ set then DRS indicates the connection is eligible for DRS.

If this has x’40’ set then DRS is being used.
If bit X’02’ is set then DRS was enabled, but has now been disabled. If it has been disabled, the receive window size (RcvWnd) is reset to the initial value.

For example
TcpPrf: E0 shows that DRS is enabled.
SndWnd: 0000524288 (512KB)
MaxSndWnd: 0000524288 (512KB)
RcvWnd: 0000736770
ReceiveBufferSize: 0000369185

The send window (SndWnd) advertised from the remote host has increased to 524288 and has not decreased.

In z/OS 2.2 there is now Outbound Right Sizing(ORS). This seems to be better than DRS.
You can tell if this is being used by looking at the TcpPrf2 field.

40 .1.. …. If outbound right sizing (ORS) is active for this connection, the stack expanded the send buffer beyond its original size.

20 ..1. …. Indicates that this connection is eligible for ORS optimization support.

10 …1 …. Indicates that ORS is active for this connection so that the stack automatically tunes the send buffer size. The SendBufferSize field shows the current size of the send buffer for this connection.

How do I set these send and receive buffer sizes?

There are several ways of setting the values of the buffer sizes
1. Changing the values of TCPSENDBFRSIZE and TCPRCVBUFRSIZE in the TCPPARMS(PROFxx) member. This will change the values for all connections so you should take care if you change these values.
2. You may need to change the value of TCPMAXRCVBufrsize (and TCPMAXSENDBUFRSIZE) in the TCPPARMS(PROFxx) member to allow applications to use big buffers (specify 2M) – even if you do not want to change the values of TCPSENDBFRSIZE and TCPRCVBUFRSIZE. You cannot set TCPSENDBFRSIZE nor TCPRCVBUFRSIZE to a value greater that the maximum configured TCPMAXRCVBufrsize or TCPMAXSENDBufrsize.
3. MQ has some tuning option to set the the buffer size values.

For MQ V8 you can use the commands
+cpf RECOVER QMGR(TUNE CHINTCPRBDYNSZ nnnnn)
+cpf RECOVER QMGR(TUNE CHINTCPSBDYNSZ nnnnn)
Which sets the SO_RCVBUF and SO_SNDBUF for the channels to the size in bytes specified in nnnnn.

You can display the current values, for example +cpf RECOVER QMGR(TUNE CHINTCPRBDYNSZ)

For V710 contact IBM service for instructions on how to change the buffer sizes.

You will need to change the configuration at each end of the channel. See the appropriate documentation, for example see RcvBuffSize in the MQ distributed configuration file QM.INI

Updated information setting up MQ logs

A customer asked me about setting up the logs for MQ. We went through the documentation, and I found that some of it was out of date, and did not reflect todays environments. Ive copied in some of the doc in the info center and made changes to bring it up to date.

Planning your logging environment

Logs are used

write recovery information about persistent messages
record information about units of work using persistent messages
to record information about changes to objects, such as define queue
backup of CF structures
other internal information.

Use this topic to plan the number, size and placement of the logs, and log archives used by WebSphere MQ.

The WebSphere® MQ logging environment is established using the system parameter macros to specify options, such as whether to have single or dual active logs, what media to use for the archive log volumes, and how many log buffers to have. These macros are described in Task 14: Create the bootstrap and log data sets and Task 18: Tailor your system parameter module.
This section contains information about the following topics:

Planning your logs
Log data set definitions
Logs and archive storage
Planning your log archive storage.

Note: If you are using queue-sharing groups, ensure that you define the bootstrap and log data sets with SHAREOPTIONS(2 3).

Log data set definitions

Use this topic to decide on the most appropriate configuration for your log data sets.
This topic contains information about the following:

Should your installation use single or dual logging?
How many active log data sets do you need?
How large should the active logs be?
Active log placement
Should your installation use single or dual logging?

In general you should use dual logging for production to minimize the risk of losing data. If you want your test system to reflect production both should use dual logging, other wise test systems can use single logging.

With single logging data is written to one set of log data sets. With dual logging data is written to two sets of log data sets, so in the event of a problem with one log data set, such as the data set being accidentally deleted, the equivalent data set in the other set of logs can be used to recover the data. With dual logging you will require twice as much DASD as with single logging.

If you are using dual logging, then also use dual BSDSs and dual archiving to ensure adequate provision for data recovery.

Dual active logging adds a small performance cost.
Attention: Always use dual logging and dual BSDSs rather than dual writing to DASD (mirroring). If a mirrored data set is accidentally deleted, both copies are lost.

If you use persistent messages, single logging can increase maximum throughput by 10-30% and can also improve response times.

Single logging uses 2 – 31 active log data sets, whereas dual logging uses 4 – 62 to provide the same number of active logs. Thus single logging reduces the amount of data logged, which might be important if your installation is I/O constrained.

How many active log data sets do you need?

The number of logs depends on the activities of your queue manager. For a test system with low throughput, 3 active log data sets may be suitable. For a high throughput production system you may want the maximum number of logs available, so if there is a problem with offloading logs you have more time to resolve the problems.

You must have at least three active log data sets but it is preferable to define more. For example, if the time taken to fill a log is likely to approach the time taken to archive a log during peak load, define more logs. You are also recommended to define more logs to offset possible delays in log archiving. If you use archive logs on tape, allow for the time required to mount the tape.

Consider having enough active log space to keep a day’s worth of data, in case the system is unable to archive because of lack of DASD or because it cannot write to tape.

It is possible to dynamically define new active log data sets as a way of minimizing the effect of archive delays or problems. New data sets can be brought on-line rapidly, using the DEFINE LOG command to avoid queue manager ‘stall’ due to lack of space in the active log.

How large should the active logs be?

On V710 and before, the maximum supported active log size when archiving to disk was 3GB. With V800 this was increased to 4GB.

When archiving to tape the maximum active log size is 4GB.

You should create active logs of at least 1GB in size for production and test systems.

Important: You need to be careful when allocating data sets because IDCAMS will round up the size you allocate.
To allocate a 3GB log specify one of the following options:

Cylinders(4369)
Megabytes(3071)
TRACKS(65535)
RECORD(786420)

Any one of these allocates 2.99995 GB.
To allocate a 4GB log specify one of the following options:

Cylinders(5825)
Megabytes(4095)
TRACKS(87375)
RECORD(1048500)

Any one of these allocates 3.9997 GB.

When using striped data sets where the data set is spread across multiple volumes, the specified size value is allocated on each DASD volume used for striping. So if you wanted to use 4GB logs and 4 volumes for striping you should specify.

CYLinders(1456)
Megabytes(1023)

These will allocate 4*1456 = 5824 Cylinders or 4 * 1023 = 4092 Megabytes

Note: Striping is supported when using extended format data sets. This is usually set by the storage manager.

Active log placement

For performance reasons you should consider striping you active log data sets. The I/O is spread across multiple volumes and reduces the I/O response times and high thoughtput. See above for information about allocating the size of the active logs when using striping.

You should review the I/O statistics using reports from RMF or a similar product. You should review these statistics monthly (or more frequently). For the MQ data sets to ensure there are no delays due to the location of the data sets. In some situations there can be a lot of MQ page set I/O, and this can impact the MQ log performance if they are locate on the same DASD.

If you use dual logging, ensure that each set of active and archive logs is kept apart. For example, allocate them on separate DASD subsystems, or on different devices. This reduces the risk of them both being lost if one of the volumes is corrupted or destroyed. If both copies of the log are lost, the probability of data loss is high.

When you create a new active log data set you should pre-format it using CSQJUFMT. If the log is not pre-formatted then the queue manager will format the log the first time it is used, which will impact the performance.

With older DASD with large spinning disks, you had to be careful which volumes were used to get the best performance. With modern DASD where data is spread over many PC sized disks, you do not need to worry so much about which volumes are used. Your storage manager should be reviewing the enterprise DASD to review and resolve any performance problems. For availability you may want to use one set of logs on one DASD subsystem, and the dual logs on a different subsystem.

Planning your log archive storage

Use this topic to understand the different ways of maintaining your archive log data sets.
You can place archive log data sets on standard-label tapes, or DASD, and you can manage them by data facility hierarchical storage manager (DFHSM). Each z/OS® logical record in an archive log data set is a VSAM control interval from the active log data set. The block size is a multiple of 4 KB.

Archive log data sets are dynamically allocated, with names chosen by WebSphere® MQ. The data set name prefix, block size, unit name, and DASD sizes needed for such allocations are specified in the system parameter module. You can also choose, at installation time, to have WebSphere MQ add a date and time to the archive log data set name.

It is not possible to specify with MQ, specific volumes for new archive logs, but you can use Storage Management routines to manage this. If allocation errors occur, offloading is postponed until the next time offloading is triggered.

If you specify dual archive logs at installation time, each log control interval retrieved from the active log is written to two archive log data sets. The log records that are contained in the pair of archive log data sets are identical, but the end-of-volume points are not synchronized for multivolume data sets.

Should your archive logs reside on tape or DASD?

When deciding whether to use tape or DASD for your archive logs, there are a number of factors that you should consider:

Review your operating procedures before deciding about tape or disk. For example, if you choose to archive to tape, there must be enough tape drive when they are required. After a disaster, all subsystems may want tape drives and you may not have as many free tape drives as you expect.

During recovery, archive logs on tape are available as soon as the tape is mounted. If DASD archives have been used, and the data sets migrated to tape using hierarchical storage manager (HSM), there is a delay while HSM recalls each data set to disk. You can recall the data sets before the archive log is used. However, it is not always possible to predict the correct order in which they are required.

When using archive logs on DASD, if many logs are required (which might be the case when recovering a page set after restoring from a backup) you might require a significant quantity of DASD to hold all the archive logs.

In a low-usage system or test system, it might be more convenient to have archive logs on DASD to eliminate the need for tape mounts.

Both issuing a RECOVER CFSTRUCT command and backing out a persistent unit of work result in the log being read backwards. Tape drives with hardware compression perform badly on operations that read backwards. Plan sufficient log data on DASD to avoid reading backwards from tape.

Archiving to DASD offers faster recoverability but is more expensive than archiving to tape. If you use dual logging, you can specify that the primary copy of the archive log go to DASD and the secondary copy go to tape. This increases recovery speed without using as much DASD, and you can use the tape as a backup.

Archiving to tape

If you choose to archive to a tape device, WebSphere MQ can extend to a maximum of 20 volumes.

If you are considering changing the size of the active log data set so that the set fits on one tape volume, note that a copy of the BSDS is placed on the same tape volume as the copy of the active log data set. Adjust the size of the active log data set downward to offset the space required for the BSDS on the tape volume.

If you use dual archive logs on tape, it is typical for one copy to be held locally, and the other copy to be held off-site for use in disaster recovery.

Archiving to DASD volumes

WebSphere MQ requires that you catalog all archive log data sets allocated on non-tape devices (DASD). If you choose to archive to DASD, the CATALOG parameter of the CSQ6ARVP macro must be YES. If this parameter is NO, and you decide to place archive log data sets on DASD, you receive message CSQJ072E each time an archive log data set is allocated, although WebSphere MQ still catalogs the data set.
If the archive log data set is held on DASD, the archive log data sets can extend to another volume; multi-volume is supported.

If you choose to use DASD, make sure that the primary space allocation (both quantity and block size) is large enough to contain either the data coming from the active log data set, or that from the corresponding BSDS, whichever is the larger of the two. This minimizes the possibility of unwanted z/OS X’B37′ or X’E37′ abends during the offload process. The primary space allocation is set with the PRIQTY (primary quantity) parameter of the CSQ6ARVP macro.
Archive log data sets cannot exist on large or extended-format sequential data sets. If using Automatic Class Selection (ACS) the archive log data set should not be assigned a data class which has DSNTYPE(LARGE) or DSNTYPE(EXT).

Using SMS with archive log data sets

If you have MVS/DFP storage management subsystem (DFSMS) installed, you can write an Automatic Class Selection (ACS) user-exit filter for your archive log data sets, which helps you convert them for the SMS environment. Such a filter, for example, can route your output to a DASD data set, which DFSMS can manage. You must exercise caution if you use an ACS filter in this manner. Because SMS requires DASD data sets to be cataloged, you must make sure the CATALOG DATA field of the CSQ6ARVP macro contains YES. If it does not, message CSQJ072E is returned; however, the data set is still cataloged by WebSphere MQ.
For more information about ACS filters, see the DFP Storage Administration Reference manual, and the SMS Migration Planning Guide.

How long do I need to keep archive logs for?

You specify how long archive logs are kept in days , using the ARCRETN parameter in CSQ6ARVP or SET SYSTEM command. After this period the data sets may be deleted by z/OS.
You can manually delete archive log data sets when they are no longer needed.
1. The queue manager may need the archive logs for recovery. The queue manager can only keep the most recent 1000 archives in the BSDS, When the archive logs are not in the BSDS they cannot be used for recovery, and are only of use for audit, analysis, replay type purposes.
2. You may want to keep the archive logs so that you can extract information from the logs, for example extracting messages from the log and reviewing which userid put or got the message.

The BSDS contains information on logs and other recovery information. This dataset is a fixed size, when the number of archive logs reaches the value of MAXARCH in CSQ6LOGP, or when the BSDS fills up, the oldest archive log information is overwritten. There are utilities to remove archive log entries from the BSDS, but most customers just let the BSDS wrap and overlay the oldest archive log record.

Use the DISPLAY USAGE TYPE(ALL) to display the log RBA ( and so the log datasets) needed for recovery.

When is an archive log needed?

You need to backup your page sets regularly. The frequency of backups determines which archive logs are needed in the event of losing a page set.
You need to backup your CF structures regularly. The frequency of backups determines which archive logs are needed in the event of losing data in the CF structure.

The archive log may be needed for recovery. The information below explains when the archive log may be needed when there are problems with different MQ resources.

Loss of Page set 0

Recover from backup. Restart the queue manager. The logs from when the backup was taken plus up to three active logs are needed.

Loss of any other page set

Recover from backup. Restart the queue manager. The logs from when the backup was taken plus up to the three active logs are needed.

All LPARS lose connectivity to a structure, or the structure is unavailable

Use the RECOVER CFSTRUCT command to read from the last CF backup on the logs. If you have been doing frequent backups of the CF, the data should be in active logs – archive logs should not be needed.

Admin structure rebuild

If the admin structure needs to be rebuilt then the information is read from the last checkpoint of each queue manager’s log. If a queue manager is not active, then another queue manager will read the log. Archive logs should not be needed.

Loss of a SMDS dataset

If you lose an SMDS dataset or it gets corrupted, then it will become unusable and the status for it is set to FAILED. The CF structure is unchanged. In order to restore the SMDS dataset, it needs to be redefined and the CF structure needs to be failed and then recovered.
Issuing the RECOVER CFSTRUCT command twice will achieve this, issuing it the first time will set the structure state to failed, issuing it a second time will do the actual recovery. Note that all non persistent messages on the CF structure will be lost. All persistent messages will be restored.

You will need the logs from time time the BACKUP CFSTRUCT command was issued, so this may require archive logs.

If all LPARs lose connectivity to the structure then the structure is recreated, possibly in an alternative CF (your structure CFRM PREFLIST attribute must contain multiple CFs) and persistent messages are recreated by reading the log for the last CF backup, reading the logs from all queue managers that have used the structure and merging updates since the backup.
Non persistent messages will be lost.

The logs from all queue managers that have accessed the structure since the last backup will be required, back to the time when the backup was taken, plus the structure backup itself in the log of the qmgr that took the backup.

BSDS

Do you need single or dual BSDS?

If you are using dual active logs you should use dual BSDS

How big does the BSDS need to be?

The BSDS does not need to be very large and a primary and secondary of 1 Cylinder should be sufficient.