Strange x370 messages

X3270 may give

Warning: Cannot convert string “-*-helvetica-bold-r-normal–14-*-100-100-p-*-iso8859-1” to type FontStruct

This is because some fonts are not available. On Ubuntu use the following commands

  • sudo apt-get install xfonts-75dpi
  • sudo apt-get install xfonts-100dpi
  • xset +fp /usr/share/fonts/X11/75dpi/
  • xset +fp /usr/share/fonts/X11/100dpi/
  • xset fp rehash

Then reboot.

Using the z/OS console screen

Most people are familiar with using SDSF from an ISPF screen to manage z/OS, and not from the operator console.  Ive been running z/OS under my laptop using zPDT, and Ive had to brush up my operating skills because I could not get the system up to be able to use ISPF and SDSF!

If you have problems and cannot get ISPF started you will need to use the operator console.
Use the MVS control command see here

Hints for using the z/OS console.

If you issue a command to z/OS (for example D A,L) you get a display like

- 09.23.55 STC00406 VTAMAP98I 011 system command(s) issued.
- 09.23.55 STC00406 VTAMAP99I Execution completed.
- 09.23.55 STC00406 IEF404I VTAM00 - ENDED - TIME=09.23.55
- 09.24.26 IEF404I IEESYSAS - ENDED - TIME=09.24.26
09.24.26 HWI006I BCPII ADDRESS SPACE HAS ENDED.
- 09.25.32 STC00418 GFSC507I CLIENT LOG DATA SET, NFS.CLIENT.LOG1, IS
- BEING USED.
- 09.25.32 STC00418 GFSC284I NETWORK FILE SYSTEM CLIENT COULD NOT GET GSS
- CREDENTIALS
- FOR THE NFS CLIENT : GSS API krb5_get_default_realm() FAILED
- WITH GSS MAJOR STATUS 96C73ADF GSS MINOR STATUS 00000000
- 09.25.32 STC00418 GFSC700I z/OS NETWORK FILE SYSTEM CLIENT (HDZ223N)
- started. OA53881, GFSC4XLO, Sep 22 2017 14:49:23 .
- 09.27.24 d a,l
   CNZ4105I 09.27.24 DISPLAY ACTIVITY   FRAME LAST   F      E   SYS=S0W1     
    JOBS     M/S    TS USERS    SYSAS    INITS   ACTIVE/MAX VTAM     OAS        
   00005    00016    00000      00033    00016    00000/00040       00015       
    LLA      LLA      LLA      NSW  S  JES2     JES2     IEFPROC  NSW  S        
    VLF      VLF      VLF      NSW  S  HZR      HZR      IEFPROC  NSW  S        
    VTAM     VTAM     VTAM     NSW  S  DLF      DLF      DLF      NSW  S        
    RACF     RACF     RACF     NSW  S  RRS      RRS      RRS      NSW  S        
    TSO      TSO      STEP1    OWT  S  SDSF     SDSF     SDSF     NSW  S        
    TCPIP    TCPIP    TCPIP    NSW  SO TN3270   TN3270   TN3270   NSW  SO       
    HTTPD1   HTTPD1   *OMVSEX  OWT  SO CSF      CSF      CSF      NSW  S        
    HZSPROC  HZSPROC  HZSSTEP  NSW  SO HTTPD11  STEP1    WEBSRV   OWT  AO       
    PORTMAP  PORTMAP  PMAP     OWT  SO HTTPD17  STEP1    WEBSRV   IN   AO       
    SSHD3    STEP1    START1   OWT  AO HTTPD18  STEP1    WEBSRV   IN   AO       
    HTTPD19  STEP1    WEBSRV   OWT  AO                                          
  IEE612I CN=L700     DEVNUM=0700 SYS=S0W1                                      


  IEE163I MODE= R 

To clear scrollable messages in the top box use the K command. Action messages will remain.
To clear (and remove) the lower box, use the K E,D command.

To remove the action messages from the top box use K E,1,1 This deletes the top message.
If you get too many action messages, you will not see the scrollable messages, and so you will need to clear some of the action messages.
You can use the “K S,REF” command to configure how the console is configured, and what messages automatically scroll off the screen.

Handling MQ events

I found it hard to find information about MQ events and what to do with them, so Ive documented my thoughts below.

Thanks to Gwydion and Morag for the many corrections!

MQ writes messages to system queues when specific activities occur, for example when a channel is made put(disabled) or a channel stops.  This allows you to have a program take these and take actions.

Ive categorised the actions into

  • When
    • Now – for example queue full – needs to be actioned today(now!)
    • Tomorrow – for example a configuration error – raise a change ticket and get it fixed
  • Who
    • Operations
    • System administrators – who define objects
    • Application programmers – responsible for application queues

Different queues are used depending on the event

Useful links

The one trick magician

Our neighbour’s son came up to me and said “I am a magician – look here is my trick”.   The trick was a good trick, but then I had to explain that being a magician is more than doing just one trick.

I thought of this as I was reviewing some old note books and found the comments I made about a customer visit, and the “MQ architect and lead MQ programmer”.
This architect knew about Request-Reply and Fire-and-Forget, but he was missing other tricks.

The conversation went along the following lines

Messages processed in strict sequence

Me: Do your messages have a requirement to be strictly processed in sequence?
Him:  I dont know, why?

Me:You can only have one putting application, one channel, and one application getting messages.  If you have more than any of these – you can get messages processed in the wrong order.

Availability and scalability

Me: What response time requirements do you have for the end-to-end transaction?

Him: under 5 seconds

Me: If the back end queue manager goes down, how long does it take to restart?
Him:About a minute

Me: So one backend will not provide the availability you need.

Him: But we use clustering!
Me: On how many queue managers is the queue defined on?

Him: One

Me: Do you use different clusters for online and batch traffic?
Him:Why?
Me: To isolate traffic, and ensure that batch does not impact online, either in channel throughput – or filling up the System.Cluster.Transmit.Queue.

and so it went on.

 

 

 

 

 

 

Some MQ Cluster defaults are dangerous

The defaults for many cluster channels are not very good.  For example the values for
CLWLPRTY,  CLWLRANK and CLWLWGHT are all 0, meaning the lowest priority.

If you want to make one connection or queue a lower priority, you have to alter all the cluster receiver channels, and local queues, to have a value  – such as 5, let the definitions propagate round the network, and then change the one you wanted!

It may be worth doing this over an extended period, so you get good values, without disrupting your MQ environment.

 

Average is not good enough

I was talking to someone in the MQ distributed change team about averages and how misleading they can be.
In Winchester England, the average number of arms that people have is 1.999 Wow – this is amazing!  Is this caused by years of in breeding  so that their left arm 1 cm shorter than their right  arm?   No,  it is because Winchester has some people with only one arm. If you add up the number of arms and divide by the number of people you get 1.999 !
The average number of children in a family is 2.5 – but you do not see many half children being pushed around in a pram.
These show averages can be misleading.
Moving on to what the change team guy was saying, from a MQ log perspective, he could see log response times of between 20 and 30 milliseconds with an average of 25 milliseconds.  From a linux iostat command, the average was 10ms.  From the SAN perspective it was an average of 2 ms. Who was right ? ….
They all were!
From a linux perspective the MQ requests were about 10% of the total requests. The other 90% had a response time of 5 ms.  On average (total time doing IO/number of IOs)this was 10 ms.
From the SAN perspective, there were many systems connected to the SAN, and they  got 1m response time.  So the average (sum of elapsed/count) came out as 2m!
What is a better measure?  This is tricky because average is easy to calculate. The median (sort the times and pick the middle one) is difficult to calculate because you need to remember the times of all IOs.  The maximum does not help. With MQ on z/OS. we capture the longest IO time.  But I did  not found this useful.
MQ uses calculations like long_ average_value =( 1023 * previous long_average_value + current)/1024.   It also uses short_average_value =  (63 * previous short_average_value + current)/64.
These are both easy to calculate and give a long term view and a short term view, so you can see a trend. I dont know how accurate or usable these are.
Perhaps the best is to have buckets; count the number in the range 0 to 1 ms;1 to 2; 2 to 5;5 to 10 10 to 20 and over 20.  However I dont think I’ll be able to persuade people to change their code.
My own experience of being confused by averages is when I had an big LPAR with 64 engines and I was testing the impact of putting just one persistent message (this was MQ 2.1).  The MVS data said I was short of CPU but the box was only 1% busy.  How can I be short of CPU with just one putter and 100 getters and 64 engines?
When I put the message it woke up all 100 of the getters. 63 of these were able to run. The other 37 had to wait for an engine to become free.  This showed we were short of CPU.  The getting application put the reply and issued a commit. During the commit, no applications were busy so the CPU usage dropped to 0!  On average (total CPU used/time) showed we were 1% busy but short of CPU!
It is hard to say what would be a better metric – as there was a spike of work for 5 microseconds and no activity for 1 millisecond.
So what does an average tell you?  It can can give an indication, but may not show what you want it to use – so be careful.