Running a z/Linux container as an address space on z/OS – WOW!

I was at the Guide Share Europe conference in the UK last week.  I had not been for a few years, and it was great to be able to brush up my latest z/OS skills.   It was the largest attendance – over 500 people – and about 50 young people which was great  (so young they were not allowed to drink alcohol).  It was also CICS’s 50th birthday, so a dinner, lots of cake and impressive fireworks.

One presentation caught my eye.   Running a z/Linux container in a z/OS address space.  yes – a z/Linux container in an address space, not USS.  Instead of having to install z/VM, or having to carve out an LPAR for z/Linux, you “just” configure the address space.  It looks about as complex as installing MQ on z/OS.  For example you have to define linear datasets for the Linux to use.   These are accessed by page number – just like a page set.  You control it using the z/OS modify command.  You access it via TCP/IP so there is no cross memory interfaces into it.
You can now run all of the clouds stuff like Jenkins within z/OS in an address space – WOW!

Recently someone said that virtualization had made a huge difference to the way systems are deployed these days.   I said I was using virtualisation on vm/370 before he was born.
I wonder what will be “new” on z/OS in 20 years time?

Does your highly available solution depend on a bit of rusty kit?

I heard second ( or third hand) about a customer involved in distribution who found a little problem with his highly available system.

They had great software that made sure that avocados and aubergines can be sent to Arundel; Blackberries and Blackcurrants sent to Blackpool, and chives and chickory sent to Chichester.  The software would give instructions to the packers where to store the vegetables, and which order to put the trolleys into the container, so when the container was delivered the right goods were in the right place in the container.   This made unloading very efficient.    Things happened automatically, or instructions were sent to tablets telling people what to do.   There was almost no paper involved in the distribution.

Paper was used by the drivers, who would come to the shed to get instructions as to which container to collect, and told where to go, so the the delivery did not go from Arundel to Chichester by way of Blackpool.   The teeny weeny problem they had was when the printer got old and finally stopped working.  They could not print out the drivers instructions, and so the drivers did not know where to go to.   They could not route the printing to another printer as other printers were not configured to CICS.  As a result they had a day when they could not deliver the containers, and their perishable contents had to be thrown away.

 

So remember the end to end solution is truly end to end ….  not just the walls of your machine room.

 

Midrange now DIS APSTATUS command

This is a new command on 9.1.3 mid-range, part of the “uniform clustering” support .  (Uniform clustering  is what I would call connection balancing see Uniform clustering gets a tick from me).

For example  I have two instances of program oemput and it gave

dis apSTATUS('oemput') 
AMQ8932I: Display application status details.
  APPLNAME(oemput) CLUSTER( )
  COUNT(2) MOVCOUNT(0) 
  BALANCED(NOTAPPLIC)

and

dis apSTATUS('oemput') type(local)
AMQ8932I: Display application status details.
  APPLNAME(oemput) 
  CONNTAG(MQCT4509BF5D0368DB23QMA_2018-08-16_13.32.14oemput)
  CONNS(1) IMMREASN(NOTCLIENT)
  IMMCOUNT(0) IMMDATE( )
  IMMTIME( ) MOVABLE(NO)
AMQ8932I: Display application status details.
  APPLNAME(oemput) 
  CONNTAG(MQCT4509BF5D017BDB23QMA_2018-08-16_13.32.14oemput)
  CONNS(1) IMMREASN(NOTCLIENT)
  IMMCOUNT(0) IMMDATE( )
  IMMTIME( ) MOVABLE(NO)

 

There is a  different conntag for each instances of the program.  DIS QMGR QMGRID gives QMID(QMA_2018-08-16_13.32.14) .

The tags are MQCT4509BF5D017BDB23QMA_2018-08-16_13.32.14oemput and  MQCT4509BF5D0368DB23QMA_2018-08-16_13.32.14oemput.
(Thanks to eagle eyed Morag for pointing out the difference.)

Stackoverflow: What throughput can a standalone Java program achieve?

There was a question on the MQ section on StackOverflow

I have a standalone multi threaded java application which listen messages from IBM MQ.
Current system take around 500ms for processing of 1 message after it read from queue and till it commit.
I want to know how many messages I can consume

  • Concurrently:
  • Max number of messages can be processed? or throttle limit

A good meaty performance question I thought.  Let me break this into pieces.

Current system take around 500ms for processing of 1 message after it read from queue and till it commit.

Processing one messages and commit should take about 10 milliseconds or less( say 30 ms for a two phase commit).    There is clearly something else going on.  Fix this first.

  1. A long database call.   This could be due to database locking, or a badly designed statement, for example a query which needs to access thousands or millions of rows.
  2. A request to a server far far away
  3. A file system with the speed of writing an illuminated letter to parchment

How many messages I can consume: Concurrently:

Take the worst case of using persistent messages, which require log IO during commit.

For one thread, processing multiple messages before doing a commit means the thread can do more work.  Consider a get taking 1 millisecond, and a commit taking 10 ms. This is one message processed every 11 ms.  If you did 50 gets – taking 50 ms and a commit taking 10 ms, this is 50 messages in 50 + 10 ms which equates to one message every 1.2 milliseconds almost 10 times faster.    This is how channels can send messages efficiently.   There is a “sweet spot” of messages per commit to give you maximum data processed per second.   This depends on the message size, logging rates and other factors.  For a 100MB message it is one message per commit.  For 10KB messages,  this may be 1000 messages per commit.

This may be selfish

This is clearly a great improvement, but possibly selfish.  If the application logic is a get followed by a database insert, followed by a commit, then doing 50 gets, 50 inserts and a commit, will work much faster.  The down side is that the database requests will keep locks until the commit.  These locks may prevent other applications from accessing data, either the recently inserted  records, page locks, or index locks. So overall MQ throughput goes up – but the business transaction suffers.    You need to understand the database and find the optimum number of requests per commit for your business transaction.

How long before the data is visible?

Rather than have one thread process 1000 messages per commit (taking 1010 ms) you may want to have multiple threads processing 10 messages per commit – taking 20 ms.  This means that the data in the database (or replies etc) are visible earlier.    This may be important to your business transaction if you have to worry about response time.

Parallel  threads

  1. Using more threads should improve throughput, unless this is delayed by external factors – such as database locks.
  2. One customer found one thread was optimum because there was no database delays.

How many messages I can consume: Max number of messages can be processed? or throttle limit

There are papers written on this but here is a one minute overview

As fast as the queue manager can process data

  1. The rate at which MQ can write its logs
  2. Keep queue data in memory – ( buffer pools on z/OS, queue buffer on midrange), so few messages on the queue.

Threads

  1. Having parallel threads gives you better throughput than one thread.  You get overlapped writing to the log, the units of work are shorter in duration, you can get parallel IO.
  2. You may be limited by the network.   Having multiple threads from an application means the network can be better utilized.  One thread can be receiving data down the wire, while another thread is waiting in commit.
  3. You may be limited by where your programs run – eg short of CPU, or slow IO (for your System.out.println statements)

Application design

  1. You may get delays due to serialization if all thread are using the same queue.
  2. Remove the debug printf or System.out.println statements.
  3. Using a queue per business application is better than all applications sharing the same queue
  4. Using one reply to queue per web server may be better than a shared reply to queue – especially if you use Apache Camel.
  5. Use get first if possible.  Avoid scans of the queues.

 

The short answer….

You should be able to get thousands of 1KB messages a second through your Java application when using multiple threads.

 

What’s the difference between an MQ Message and a JMS Message

I had problems using the MQI Interface  to create a message for a JMS program to receive.

To see what was in the JMS message,  I used a Java program using JMS to write a message, and used my trusty C program to display it.

I could see that there were message properties in the message

Property 0 name <mcd.Msd> value <jms_text>
Property 1 name <jms.Dst> value <queue:///JMSQ1>
Property 2 name <jms.Rto> value <queue:///JMSQ2>
Property 3 name <jms.Tms> value <1571902099742>
Property 4 name <jms.Dlv> value <2>

These are described here.

The mcd.Msd value is one of jms_none, jms_text, jms_bytes, jms_map, jms_stream, jms_object.   This depends on whether you use Message message, BytesMessage message etc to define your message type.  The jms program receiving the message may be expecting a particular type

The jms.Rto comes from the message.setJMSReplyTo(…).  It was set in the MQMD.ReplyToQ  as well as the message property.

It took me some time to find how to specify value such as for deliveryMode.  I found it here.  For example  message.setDeliveryMode(DeliveryMode.NON_PERSISTENT).   (This comes from javax.jms.DeliveryMode.NON_PERSISTENT,not a com.ibm…. file).

I converted my simple program from JMQI to JMS, in a couple of hours, and was surprised to find it used fewer lines of code than using the JMQI.   Of course I may find I omitted some work, such as error handling, but it seems to be working OK.

Magic methods to decode Java MQ constants to strings.

I had been struggling with MQ and java, and decoding what the return codes numbers were, and found some well gem methods here.

String reasonCode = MQConstants.lookup(2035, “MQRC_.*”);  gave MQRC_NOT_AUTHORIZED

and

String decode  = MQConstants.decodeOptions(gmo.options,”MQGMO_.*”);  gave me

MQGMO_WAIT | MQGMO_SYNCPOINT_IF_PERSISTENT | MQGMO_FAIL_IF_QUIESCING

I wish I had these a couple of years ago – it would have saved me a lot of time!

 

The methods are

static java.lang.String decodeOptions(int optionsP,
java.lang.String optionPattern)

This helper method takes an integer representing a set of IBM MQ options for an MQI structure, and converts them into a string displaying the constants that the options represent.
static int getIntValue(java.lang.String name)

Returns the value of the named MQSeries constant as an int.
static java.lang.Object getValue(java.lang.String name)

Returns the value of the named MQSeries constant.
static java.lang.String lookup(int value,
java.lang.String filter)

Returns the MQSeries constant name or names for the supplied int value.
static java.lang.String lookup(java.lang.Object value,
java.lang.String filter)

Returns the MQSeries constant name or names for the supplied value of type Integer, String, byte[], or char[].
static java.lang.String lookupCompCode(int reason)

Convenience method for finding the constant name for a completion code.
static java.lang.String lookupReasonCode(int reason)

Convenience method for finding the constant name for a reason code.
static void main(java.lang.String[] args)

MQRC_DATA_LENGTH_ERROR with client

We had an application working on one system, and we moved it to another system, and we got MQ RC 2010 data length error. It turns out that the

SYSTEM.DEF.SVRCONN had MAXMSGL of 1 – so the maximum message sized allowed on this channel was 1 bytes.

You can specify the maximum msg length on the client for example the MQCD or client table – but I think the negotiation is the lower of the values at each end.

 

Setting the value to one on the z/OS end was part of stopping people using the default channel definitons.

Any port in a storm? No.

Ive just spent a day resolving a problem with specifying a port value trying to connect to MQ.

I had

public long port = 1414;
String channel = “MYCHANNEL”;
String hostname = “127.0.0.1”;
Hashtable<String, Object> h = new Hashtable<String, Object>();
h.put(MQConstants.PORT_PROPERTY, dd.port);h.put(MQConstants.CHANNEL_PROPERTY, channel);
h.put(MQConstants.TRANSPORT_PROPERTY, MQConstants.TRANSPORT_MQSERIES_CLIENT);
h.put(MQConstants.HOST_NAME_PROPERTY, hostname);
queueManager = new MQQueueManager(“QMA”,h);

(did you spot the problem?)

This failed with

MQConnection to QMA com.ibm.mq.MQException: MQJE001: Completion Code ‘2’, Reason ‘2538’.
Caused by: com.ibm.mq.jmqi.JmqiException: CC=2;RC=2538;AMQ9204: Connection to host ‘127.0.0.1(0)’ rejected.

This is saying it tried to connect with port 0!

I tried

String port = “1414”;, that failed the same way.

If I used

MQEnvironment.port=”1414″; it worked.

This was tough to resolve, as there is no documentation to help me.

Someone suggested public int port = 1414; and it worked!  What a way to spend a nice autumn day.

Whoops -deploying MDB in weblogic

I was quite happily using my MDB in webLogic, but when I changed its configuration, it did not pick up the new changes.  It took a day to find out why,  and I have learned much more about deploying MDBS.

My connection factory was using SYSTEM.DEF.SVRCONN, I changed it to use a different client channel. I stopped SYSTEM.DEF.SVRCONN, ( so I could check the change had worked), and restarted the webLogic instance.  I was surprised when my MDB failed to start, because the channel was stopped.   The MDB was trying to use that channel.  It took a lot of head scratching to get it to work as I expected.

  • I had messages like <BEA-015073>  Message-Driven Bean …  is configured with unknown activation-config-property name failIfQuiesce.  This message is wrong, failIfQuiesce is supported by the IBM Resource adapter.
  • I had the same message with activation-config-property name cfLookup.   This was my problem.  I should have specified connectionFactoryLookup.
  • If you have <activation-config-property-name>connectionFactoryLookup… (specified in the ejb-jar) any other parameters you specify in the ejb-jar.xml file are ignored.
  • If you do not specify a connectionFactoryLookup, nor properties in the ejb-jar.xml file, defaults are provided, see Configuring the resource adapter for inbound communication.  In my case I had not specified  activation-config-property-name channel, and this defaulted to SYSTEM.DEF.SVRCONN, which is why it continued to use that channel.
  • It worth putting <activation-config-property-name>applicationName … in your definitions so you can see what you are using.
    • dis qstatus(JMSQ3) type(handle) gave me APPLTAG(CF3Name) so I can tell which definitions are being used.
    • If you get APPLTAG(weblogic.Server) then you are taking the defaults.
  • The Oracle documentation  says the precedence order is as below.    I do not think this is 100% accurate. (I could not specify some of the parameters on the weblogic-ejb-jar.xml file).  I didnt try the java program.
    1. properties set in the weblogic-ejb-jar.xml deployment descriptor
    2. activation-config-property properties set in the ejb-jar.xml deployment descriptor
    3. activationConfigProperty annotation properties in the java program.

What do I need to specify?

As a minimum you need to use connectionFactoryLookup or  specify

  1. applicationName – so you can identify which definitions are being used
  2. channel – which channel to use
  3. failIfQuiesce
  4. hostName
  5. port

 

The ejb-jar.xml file is in the META-INF directory.  Change  the ejb-jar.xml or  weblogic-ejb-jar.xml file. IUpdate the jar file using a command like jar -uvf MDB4.jar  META-INF/ejb-jar.xml,   and redeploy it.

How do I get a client to disconnect?

I had a question from a customer who asked how they can reduce the number of client connections in use.  They had tried setting a disconnect interval (DISCINT) on the channel, but the connections were like weeds – you kill them off, and they grow back again.

DISCINT is “the length of time after which a channel closes down, if no message arrives during that period”.  This sounds perfect for most people.   The application is in an MQGET, and if no messages arrive, the channel can be disconnected, and the application gets connection broken.   The application can then decide to disconnect or reconnect.
If the application is not in an MQGET, then it will get notified of the broken connection next time it tries to use MQ.

Independent applications

Many applications are well written in that when they get Connection Broken, they just reconnect again, and so the DISCINT has no effect on reducing the number of connections. This may be good for availability but not for resource usage.   It may be good to have 1000 application instances running the day, but perhaps not overnight when there is no work to do.   Ive seen instances where the applications do an MQGET every minute, and with 1000 instances this can use a lot of CPU and doing no useful work.  In this case you want unused application instances to stop, and be restarted when needed.

You cannot use triggering with client connections (unless you have a very smart trigger monitor to produce an event which says start a client program over there).

Use automation periodically check the queue depth, and number of input handles. If there is a high queue depth, or a low number of handles(eg 2)  then start more application instances, across your back-end servers.  Your applications can then disconnect if they have not received a message within say 10 minutes.  This should keep the right number of application instances active.

An administrator should be able to get this automation set up, but getting the application to connect could be a challenge, as this requires the application developer to change the code!

Running under a web server

If your applications are running under a web server you may have mis-configured connection pools.  You can specify the initial size of the pool, and this many connections are made.  As more connections are needed, then more can be added to the pool until the pool maximum is reached. You should specify a time out value, so periodically the pool gets cleaned up, and unused connections are removed, until the pool is back to the initial size.  You should review the initial size of the pools ( is it too large), and the value of the time out value.

This should just be an administrative change.

Good luck, you may be successful in reducing the number of client connections, but do not set your hopes too high.