Oct 24 2014
My colleague Tony Sharkey, did some work to compare IMS V12 with the new IMS V13 – and initially was disappointed at the low throughput he obtained.
The steps to tune IMS are listed below, and eventually IMS V13 had a higher throughput than IMS V12.
Running with IMS v12 the IMS Bridge tests were driving the throughput to around 11,000 transactions per second. Running with IMS V13 the throughput dropped to 700 transactions a second.!
The environment was 3 QM’s in QSG where each QM was on a separate LPAR.
QM 1 is located on the same LPAR as the IMS control region.
QM’s 2 and 3 access the IMS via XCF.
Batch jobs on each LPAR looped putting a message to the IMS bridge queue and waiting for its reply.
The MPRs were on the same LPAR as QM1 and the IMS system
Each LPAR was initally defined with 3 dedicated CP’s, but LPAR 1 can use up to 32 dedicated CPs, LPAR 2 can use up to 10 dedicated CPs and LPAR 3 is limited to 3 dedicated CPs. The CF has 4 dedicated processors.
Actions taken to improve performance
* Ensure PWFI is on.
This actually was already on, but it does make a difference.
* Check IMS attribute QBUF.
For some reason the QBUF setting changed between IMS 10 –> IMS 13 for the default Hursley IMS installs.
IMS 10 had been tuned to 255 (by me)
IMS 12 did not specify QBUFS and fell back to the value in defined to BUFFERS in the MSGQUEUE macro. In our case this was set to 255.
IMS 13’s initial value was 5.
In the IMS trace report there is a field “NUMBER OF WAITS BECAUSE NO BUFFER AVAILABLE” which gives a hint as to lack of buffers..
* Check IMS attribute PSBW
When reviewing the performance data, it was observed that not all of the available MPRs were processing workload.
This was because there wasn’t enough storage set aside to run all of the MPRs concurrently.
By increasing the value of the PSBW from 24 to 200 (KB), we were able to use all 16 MPRs.
* Increase the size of the online datasets
The default sizes of the DFSOLP* and DFSOLS* datasets meant that they were switching frequently..
By increasing them from 50 to 1500 cylinders each, we reduced the frequency of switching.
* Increase the number of online datasets
With the increase throughput, the IMS archive jobs were trying to process multiple logs in each step.. It got to the point where no logs were available.
By increasing from 5 to 20 logs we were no longer waiting for active logs.
* Increased the size of the IMSMON dataset
Strictly this isn’t a performance improvement but the size originally allocated was far too small to capture 1 minutes worth of data.
Increased to 270 cylinders which currently seems large enough (when trace is enabled)
* Enable zHPF
Given that IMS always logs data to disk, zHPF makes sense..and in some cases saw a 20% improvement in throughput.
(Indeed it turned out that this was the difference between my “baseline” and “fast” tests!)
Even though zHPF was enabled across all 3 LPARs, the impact was greatest on the LPAR with the IMS region.
For some reason, making IMS archiving faster allowed the QM SRB’s to get/send messages at a higher rate on QM 1 than the other QM’s.
When zHPF was disabled, even with sufficient CPUs, the distribution of workload was more even across the QM’s.
All of these changes brought the IMS v13 bridge tests up to parity with IMS v12…
What was slowing it down?
At this point, Tony wanted to see what was slowing it down..
It turned out that with sufficient TPIPES in use, the issue was CPU on the LPAR that hosted the IMS region.
Increasing the CPUs to 6:3:3 across the LPARs saw the transaction rate increase to 19,000 transactions per second but still constrained.
With a further increase to 9:3:3, the transaction rate peaked at 21,400 per second.
At this point, LPARs 2 and 3 were constrained. Additionally the CF was running at 75% busy, which is above the recommended usage for a multi-way CF.
Potentially it could be driven harder by adding more CPUs (simply on LPAR 1 and LPAR 2 and with a system change on LPAR 3 and to the CF).
Oct 21 2014
It is hard to tell what definitions you need to enable security on a z/OS queue manager.
The definitions below show what you need to do to define queue security for system.* queues.
The checking depends on the presence or absence of a profile. Where it says defined you need to define the appropriate profile. Where it says not defined the profile must not be defined. This applies to generic profiles as well as specific profiles. You can use the RACF command like rlist mqadmin QMB.NO.SUBSYS.SECURITY to see if a profile exists for QMB.NO.SUBSYS.SECURITY.
The term SSID profile is the profile for a particular queue manager, QSG is used for the Queue Sharing Group name.
Use QSG.* profiles and not ssid profiles
MQADMIN ssid.NO.QMGR.CHECKS for each ssid
To have QMGR QMA definitons override the QSG define MQADMIN QMA.YES.QMGR.CHECKS
See below for the selection logic
Use QMGR.* profile and not ssid profiles
Use QSG and ssid profiles
To protect a queue you need MQQUEUE ssid.system.* or MQQUEUE qsg.system.*
If profile MQQUEUE ssid.system.* found
get userid access to profile MQQUEUE ssid.system.*
else (profile MQQUEUE ssid.system.* not found ) get userid access to MQQUEUE qsg.system.*
How can you tell what is being used?
The command +cpf REFRESH SEC(*) will list the profiles found
Oct 9 2014
QREP is an IBM product that does DB2 to DB2 replication over MQ, to mirror updates to one DB2 environment, to a remote DB2.
This can involve sending large amounts of data ( many MB/second).
If you are using QREP, there are some tuning things you should consider
- Use a dedicated queue manager just for QREP
- Each QREP consistency group should have a dedicated XMITQ
- One XMITQ per page set,
- One page set per buffer pool
- Enable read ahead. WQM V7.1 needs APAR PM81785 See http://www-01.ibm.com/support/docview.wss?uid=swg1PM81785 for instructions. This is needed for both capture and apply queue managers.
- Use QREP option transbatchsize, so there are many requests per MQ message. The message size of between 10KB and 1MB should be OK.
- Use the QREP parameter MAX_MESSAGE_SIZE=1M to limit the amount of data written in an MQ message.
- Use a batch size of 200 messages per batch.
- Use batchlim of at least 5000 ( the default). Depending on network capacity and speed, using larger batchlim values up to 100000 may provide higher throughput.
- The buffer pool for a queue needs to be bigger than for the data put within one unit of work from QREP, and the size of the data in an MQ batch. In theory there are at most 2 messages at any one time.
- At the apply side, if the MQ queues have lots of messages, then you need to tune QREP, perhaps more threads, or reduce DB2 contention.
- You should use MQ APAR PM71966 to increase TCPIP buffer sizes. Your SYS1.TCPPARMS(xx) needs to have TCPCONFIG TCPSENDBFRSIZE 65536 TCPRCVBUFRSIZE 65536 to allow dynamic rightsizing
- In SupportPac MP1B there is a C program (MQCMD) which issues MQ commands and traps the response. You need to do this for the channels, issuing DIS CHS(channel) all on the sending end. This displays how much data per message, how many messages a second, how many batches per second, and the nettime. If the nettime is high ( > 1ms) then there could be a network problem.
- Use a tso ping remote system ( count 10 verbose length 4096 and ping remote system ( count 10 verbose length 256 to get an estimate the network time
- Log performance
- Place MQ volumes on modern DASD, eg DS8870 instead of DS8300
- Stripping MQ log volumes with 4 stripes
- Enable zHPF
- If using dual DASD consider using separate DASD control units for each log copy to reduce I/O contention
- If the queue manager is only used for QREP consider to disable log archive
We had a customer ask if their SSL channels were using hardware assist – and how could they tell. This blog explains what happens, and what hardware can be sued.
What cryptographic hardware options are there ?
On-chip assist. This provides symmetrical encryption and decryption of data. This capability is known as Central Processor for Cryptographic Functions (CPACF).
A card you pay for and plug into the processor. The Crypto Express4S feature is tamper-sensing, tamper-responding, programmable cryptographic feature providing a secure cryptographic environment.
It can be configured three ways, see http://www-03.ibm.com/systems/z/advantages/security/zec12cryptography.html for more details
1) IBM Common Cryptographic Architecture (CCA) coprocessor
2) IBM Enterprise PKCS #11 (EP11) processor
What does MQ use?
MQ uses System SSL to securely transport data over both MCA and SVRCONN channels.
There are 2 phases involved
1) The handshake – will use accelerator or coprocessor, or in the absence of these will use software
2) The data transport – will use CPACF to encrypt and decrypt the data, but if the cipherspec is not supported, will use software.
What does the handshake use?
MQ’s use of SSL channels means that the only work that can be offloaded to the crypto hardware (co-processor or accelerator) is the secret key negotiation.
The secret key negotiation occurs in 2 places – channel start and when the amount of data flowed over the channel exceeds the value in the SSLRKEYC attribute.
Use of the DISPLAY CHSTATUS command can show how frequently the secret key is being negotiated. It may be that for a channel sending high volumes of data that the frequency is high.
For example if a channel is sending 50MB/sec, does it need to re-negotiate every 1MB or 50MB or would 500MB (or even larger) be sufficient?
By contrast a channel that only sends data occasionally ( < 1MB/sec ) may not want to renegotiate every 500MB..
Any use of the coprocessor or accelerator hardware will show up in the RMF reports.
What does encryption/decryption use?
Data encryption/decryption will be performed in hardware or software.
On the zEC12 there is 1 CPACF( (Central Processor Assist Cryptographic Functions) for each core. Older models for example zEnterprise 196 shared 1 CPACF between 2 cores. When the CPACF is used, a synchronous instruction is used to execute the request on the CPACF. This does not show up in any RMF reports.
System SSL dynamically determines what ciphers are supported on the hardware it is running on, and will exploit whatever is there. If the CPACF is not available the encryption/decryption will be performed in the software layer
Whether the encryption/decryption is performed in software or by CPACF, there will be an increase in CPU costs in the channel initiator address space, but software may be more expensive.
To determine whether there is hardware available for secret key negotiation and/or encryption, you can add the following to the channel initiator JCL and restart the channel initiator.
//CEEOPTS DD *
When the channel is started, information is logged to STDOUT in the form:
System SSL: SHA-1 crypto assist is available
System SSL: SHA-224 crypto assist is available
System SSL: SHA-256 crypto assist is available
System SSL: SHA-384 crypto assist is available
System SSL: SHA-512 crypto assist is available
System SSL: DES crypto assist is available
System SSL: DES3 crypto assist is available
System SSL: AES 128-bit crypto assist is available
System SSL: AES 256-bit crypto assist is available
System SSL: AES-GCM crypto assist is available
System SSL: Cryptographic accelerator is available – if available will be used for secret key negotiation
System SSL: Cryptographic coprocessor is available – if available will be used for secret key negotiation
System SSL: Public key hardware support is available
System SSL: Max RSA key sizes in hardware – signature 4096, encryption 4096, verification 4096
System SSL: ECC secure key support is available. Maximum key size 521
System SSL: ICSF Secure key PKCS11 support is not available
System SSL: ICSF FMID is HCR77A0
To determine where the encryption is being performed in, it is necessary either to:
1) enable System SSL trace
2) or to collect crypto counter data using the CPU/MF tool (HIS)
Not all cipher specs are supported on the CPACF.
What affects the costs?
1) Message size
2) SSLRKEYC setting
3) CIPHERSPEC being used
4) Key size
In our measurements enabling SSL on a workload using 10KB messages saw a 42% increase over the non-SSL baseline (the channel initiator increased nearly 2x).
In our measurements with larger messages, enabling SSL with 1MB messages saw a 64% increase in cost – although the cost in the sending channel initiator increased nearly 3x.
The action of renegotiating the secret key is a relatively costly operation.
If the production environment enforces a more frequent renegotiation either by allowing fewer bytes through before (i.e. SSLRKEYC is lower in production than test), or the message size is greater (e.g. test uses 2KB and production uses 1MB), you may expect to see a higher cost per transaction.
“DISPLAY CHS(<chlName>) SSLRKEYS MSGS” will show how many negotiations have taken place and how many messages. If this is run several times, it is possible to determine how many messages are flowing between each renegotiation – and how many messages are flowing over the channel. It might be that for large messages flowing at a high rate, the secret key could be negotiated less frequently.
In our tests, renegotiating when 1MB of data flowed over the channel saw the channel initiator cost per message increase 2.5x for 10KB messages but for 1MB messages (i.e. renegotiating every message) the channel initiator increased nearly 6x.
CIPHERSPEC being used:
Different cipherspecs cost different amounts – in our tests using 1MB messages and no secret key negotiation, the TRIPLE_DES_SHA_US transfer cost was 30% higher than ECDHE_RSA_AES_256_CBC_SHA384 in the channel initiator – although this was with GSK_TRACE enabled. So ECDHE_RSA_AES_256_CBC_SHA384 was cheaper than TRIPLE_DES_SHA_US.
The size of the key used can make a difference to the cost of the renegotiation – the larger the key the higher the cost. That said, this additional cost will only be incurred when negotiating the secret key, so the prime concern should be how often the secret key is being negotiated.
These blog posts are from when I worked at IBM and are copyright © IBM 2014.