How long can you talk for without drawing breath?

I remember reading a book on technical writing which said you should read every report aloud to make sure the text flows, and makes sense. A comma is where you pause, (or have a side comment), and a full stop is where you finish the thought, and take a breath. If you cannot express the sentence in one breath – the sentence is too long and you should rewrite it.

If you have to read a sentence twice or more, consider rewriting it. Also remember the people reading your document may not have English as their first language. “Foreign words” take more brain power and the reader may not be entirely familiar with the word or the context used. I recently heard a conversation

  • “is it right to turn left here?”,
  • “no, right”

which maybe valid English, but it takes a moment to understand it.

Another advantage of reading it aloud is it it helps you to spot duplicated words or missing. (There was a double whoops in that sentence.)

I was reading a description in a technical report which took me several minutes to understand it. It used terms(defining some of them inline <and also had nested comments about the terms >), then using terms like “this” (when “this” could be one of several objects) (I kept thinking which this?, that this, or this this), and so I copied the text into an editor so I could break it into phrases (logical groups of thoughts) and work out what was important. See – it is hard work to understand some sentences.

I am a great believer that reading should be linear, you should not have to keep going back to reread text. I also think that people need signposts to help them through the text, along the lines of the old adage saying for a good presentation “tell them what you are going to tell them, tell them, tell them what you have told them”. Instead of one long rambling paragraph it might be clearer to say

There are three areas we need to consider, area1, area2, area3

Area1

Area2

Area3

I have noticed also noticed that people often read the first sentence, and ignore the second part, this is where good signposting can help. This applies to emails and conversations. My wife said to me “We’ll need some of the nice bread with seeds on it, and some rolls, from the bakers, but the bakers isn’t open yet’. So off I went to the bakers to find it closed. I was busy parsing the first part of the sentence, and missed the second part.

I found with some people that if you have two questions send them as two emails not one. They read the first question, reply to it, and do not read the rest of the email.

Why are you telling me this – and what do you want me to do with the information

I was talking to someone about providing useful information in reports, and we got onto the subject of “Why are you telling me this” a topic I once had in a mentoring presentation.

Why are you telling me this?

I was in a meeting with the Lab Director, on a hot summers afternoon, with the sound of the bees coming through the open window, and the smell of freshly cut grass. A team were presenting a topic to us, and it was only mildly interesting. It was easy to shut your eyes and listen (a polite way of saying fall asleep). At the end of the presentation they said “And we would like you (the Lab Director) to fund £1 million for four people to take this forward.” Whoa – we all woke up. The Lab Director said he had not been listening with his “funding ears”, and didn’t think he could fund it. He went on to say “Next time you want something from me, then at the beginning of the presentation say ‘We want you to fund this project to £1 Million’ and I’ll know what you want before we start.”

It is very useful to know why someone had come to see you. It would be very helpful if the person was to say:

  • I’m telling you this for your information. You do not need to do anything. You may get asked about it.
  • I’m telling you this for your action. I would like you to sign these expenses; give me advice; or go and talk to someone on my behalf.
  • I’ve had a great success – I just want to share it with someone!
  • I’m bored and I just want a chat. (We had someone who would come round and chat. We solved this by sending a message through the computer to a friend saying “please phone me”).

Writing reports

The “Why are you telling me this” questions applies to writing reports as well. You need to know what you expect your audience to do with the report.

I was asked to review a 60 page performance report before it was presented to the customer. Although I was jet lagged, I spend the evening going through the graphs and the explanation and marked up many comments. The next day I asked who wanted my comments, and the reply was “we don’t want any comments we just wanted you to review it”!

They planned to spend a couple of hours going through the report with the customer; a senior manager and his team. I managed to persuade them that the senior manager would not know the technical details, so he would not understand most of the charts. I asked the team “what do you expect the manager to do with the report?” the answer was “give it to one of the system programmers”. “What do you expect the systems programmers to do with it?”, “We don’t know”!

We had a lot of discussion, came up with a short presentation aimed at the executive. We took data from the report and summarised it, for example

  • We tested it up to 50,000 transactions a second” before we ran out of CPU. Your requirement is 10,000 a second.
  • The cost per transaction stayed pretty constant. At the high transaction rate, it was 20% more than than a low transaction rate.
  • In a disaster recovery it took us 2 minutes to switch to the backup servers, and continue working. It may take you longer depending on your database.

It is much better to tell people information, than have people work to get the same information. It is much quicker to read

The cost per transaction stayed pretty constant. At the high transaction rate, it was 20% more than than a low transaction rate

than to give them a chart they have to look at – read the title, the axis, the ranges of the data, and interpret it (should it be flat or should it slope up? Why is there a wobble in the data?). Many people do not listen while they are reading.

Graphs have their places, for example showing how the response time changes with transaction rate may be good. But

at 100 transactions a second the response time is 500 ms, and 1000 transactions a second the response time is 1000 milliseconds.

may be good enough if the requirement is 2000 milliseconds at 1000 transactions a second.

It all comes down to …

What you expect the audience to do with the data.

I have been working on some blog posts about z/OS performance. I had someone review it to make sure it was at the right level and I was surprised at the comments. For example

  • You have told me about the RMF reports, but you haven’t told me how to collect the data, or how to format it.
  • You said if this number was large, you have a problem. What do you mean by large?

I should have written a “specification” in comments at the top of the blog post.

Q:Why am I writing this?
A:So people can collect and report performance data; and see if there is a problem.

Q:What will they do with the report
A:Use it to collect performance data.
A:Use it to process the data
A:Understand which fields are important, and have background to explain
why the field is important
A:Identify good and bad values of fields
A:Understand what they can do about problems.

If I had done this and written the blog post to meet these goals it would have been a better document.

Feel the weight!

I once saw a report about MQ for a customer. It was about 100 pages long! They gave an introduction to MQ, listed all the parameters, gave the size of all the data sets etc. A lot of the data was just cut and paste of other data. When I mentioned this to the team, they replied, the customer executive likes to get value for money, and a long report shows value for money (feel the weight), it shows how much work we did.

I thought the executive was out of his depth.

This reminded me of Parkinson’s law, “work expands so as to fill the time available for its completion”. He also described “the bike shed problem”. A committee has two items on its agenda

  • Should we invest in a nuclear reactor?
  • What colour should we paint the bike shed?

No one had any experience of nuclear reactors, so after 5 minutes discussion they approved it. They spent the next 55 minutes discussing the bike shed. The cost of them discussing it was much more than the cost of a person to repaint the shed if they made the wrong decision.

They understood the bike shed problem, and knew nothing about the nuclear reactors, so focused on things they knew about rather than getting experts in to advise them.

This is not to be confused with the Peter Principal which observes that people in a hierarchy tend to rise to their “level of incompetence”. If the person is competent in the new role, they will be promoted again and will continue to be promoted until reaching a level at which they are incompetent. Being incompetent, the individual will not qualify for promotion again.

Nor is it to be confused with The Dilbert principle which says that incompetent employees are promoted to management positions to get them out of the workflow.

Who do you get to implement this critical fix? The senior techie or a grunt?

I was reading an article about making systems resilient and how events can conspire to cause problems. As Terry Pratchett said Million-to-one chances…crop up nine times out of ten. For example the day you have a power outage and fail over to generators, is the day you find the fuel tank for the generators has not been filled, and with the weekly tests, it has emptied the tank.

This article made me think about the best way of implementing changes. If you have a critical change that needs to be made with no possible errors; who do you get to implement it – the team leader, or a junior person in the team? (Grunt – slang: Junior person in a team, lacking prestige, and authority. The grunts do all of the heavy lifting, and when people lift heavy things, they grunt as they lift.)

I assume that you have the steps documented, and have tested them. For example using cut and paste instead of typing information. You copy files from test to production, rather than create from new.

I think it would be good for a junior person in the team to make the change, with the team leader watching. This has several advantages:

  • You only learn by doing. You can watch someone do something a hundred times – but it only when you have to do it do you learn (and get the fear of doing it wrong).
  • The junior person will be very careful, and double check things. The team leader may just check once – as there were no problems in the past.
  • The junior person will be focused on the changes. Having the team leader there as another pair of eyes, and looking at the systems overall can alert you to problems. Hmm we made the change – now that over there has produced an alert – is that connected? You need the team leader’s experience to make the judgement call, while the rest of the change is implemented.
  • When the change has been made and had unexpected side effects, the junior person can follow the backout process while the team leader is on the phone to management and the incident room to explain what happened, and what they are doing about it. There is nothing worse than trying to resolve a problem while explaining to the incident team conference call that you would rather fix the problem, than do a causal analysis of the root cause of the problems. I remember one manager said to the management call “We estimate it will take an hour to resolve. I will dial back in in 30 minutes and give a status – good bye”. It takes seniority and courage to make that sort of call.
  • When you are in a hole, it is easy to keep trying things, and making the hole deeper. Having the team leader as an observer means they can say “step back from the keyboard and let us discuss what we can do”.
  • One person whom I respected, said that his life goal was to make himself redundant, by getting his team to make the hard decisions. His teams always had a reputation of getting things done, and doing a good job. He said the hardest part of his job was standing back and letting the team gently struggle, and letting them get themselves out of the problem. He never told us what to do – he just asked us questions to make us find the solutions ourselves – to me that was the skilful part. As he said – “I am a manager, I am technical, but not deeply technical, and I could not solve the problem – but my job is to help you find the solution”

Using the Java Health Centre to look at Liberty on z/OS

The Java Health centre is an eclipse plugin which allows you to monitor Liberty( and so MQWEB, z/OSMF, z/OS Connect, etc) on z/OS.

You can look at

  • the Java classes being used,
  • the environment variables, and startup parameters.
  • charts of garbage collection
  • method profiling

Before you get to far into investigation be warned, this is basically insecure.

  • By default anyone can connect to it
  • You can specify a file with userid and password in it (in clear text)
  • You can use TLS – but you have to have a keystore in a file rather than the security manager, and RACF keyrings.
  • You cannot use RACF or other security manager for authentication and authorization.

Install the health centre on Eclipse.

I used Eclipse Marketplace to install IBM Monitoring and Diagnostic Tools – Health Center 3.0.12

Configuring Health center on Liberty.

You can use

  • headless mode, where files are written to the local file system, transported to your Eclipse environment and reviewed offline.
  • Connection from Eclipse to give real time information

See Configuring the monitoring agent and Health Center configuration properties.

Headless mode

Add the following to the JVM.options (or other start up), change the directory location.

-XHealthCenter -Dcom.ibm.diagnostics.healthcenter.headless=on -Dcom.ibm.java.diagnostics.healthcenter.data.collection.level=headless -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/u/adcd/java/ -Dcom.ibm.diagnostics.healthcenter.readonly=on

Restart the server.

Download the files from the specified directory to your works station, so the files are available from Eclipse.

In Eclipse

Window -> Open Perspective -> Open Perspective -> Health Center Status Summary

File -> Load Data…

Specify the file name.

Interactive mode

First time use (with little or no security)

-Xhealthcenter:transport=jrmp, port=8125 -Dcom.ibm.diagnostics.healthcenter.logging.level=info -Dcom.ibm.diagnostics.healthcenter.readonly=on -Dcom.ibm.diagnostics.healthcenter.data.profiling=off

Note: Use -Dcom.ibm.diagnostics.healthcenter.data.profiling=off to start the server with profiling disabled.

Restart the server. Check the error log for messages like INFO: Health Center agent started on port 8125.a

On Eclipse, once you have installed the health center.

Window -> Open Perspective -> Open Perspective -> Health Center Status Summary

Open a connection to the server

File->”New Connection” (do not select the “New” option)

JMX; Host name 10.1.1.2; Port 8125; Untick Scan next 100 ports

This should connect you.

Problems

If you get

  • Unable to contact agent: Non IBM version of Java has been detected. Check the server is fully active, and you get a message in STDERR INFO: Health Center agent started on port 8125. in STDERR

Ping the IP address to check connectivity.

Quick overview

Once you have connected, or loaded the files

  • Environment gives you the properties and class paths.
  • Classes
    • Classes loaded shows you the classes loaded, and if they can from the shared cache or not.
    • Class Histogram shows you information of all the fields and variables. For example I had 196106 Ljava/lang/String – so lots of string variables, and 146973 HashMap nodes.
  • Method profiling
    • Self – which was the hottest method.
    • Tree – includes parents. The top element in the tree is java.lan.Thread.run. All work runs on a thread, so all work will be reported under this.
    • Click a line and select Invocation Paths to show who called the method.
  • Where you get a chart with Elapsed time … you can use your mouse, to select a time period and zoom in. Click on “elapsed time” to unzoom.

Securing Health Center

This is documented in Securing Health Center – but is missing standard Z techniques of protecting resources.

  • By default anyone can connect to it
  • You can specify a file with userid and password in it (in clear text)
  • You can use TLS – but you have to have a keystore in a file rather than the security manager, and using RACF keyrings.
  • You cannot use RACF or other security manager for authentication and authorization.

Double whoops – I’ve been ransomware-d and … it gets worse

This story comes from a friend of a friend, so I cannot tell how much it true, but it another good example of showing some things are “obvious” only when you understand it.

I heard that some one’s office systems had been compromised by a ransomware attack, where their files had been encrypted and they money was demanded to decrypt them. The first whoops. While someone else was sorting the problem for the office, the person was pleased that he kept backups of all his key files, on a separate portable hard disk drive attached via a USB cable. The person realised that his backup files had been encrypted as well, so he was unable to restore his own backups of his key files. The second whoops. Looking back this was an obvious consequence of having the backups connected connected to the machine.

I would have expected that any decent backup package would make the files read only, but then if the ransomware was able to get into administrator mode, it would be able to change these “protected” files as well.

I think the only answer is to take backups off your machine – for example over a network, and hope the ransomware is smart enough not to corrupt files across a network. You could also backup your key files to a CD which is write once, and then becomes read only.

As I wrote this I remembered that I had been meaning to backup some family photographs and documents that only exist on my machine (and backups). I had sent a copy to my brother, but when he got a new machine he did not copy the files across!

I was also reminded of the University that diligently backed up the system every week. Which was fine until the building with the computer, and cupboard full of backups was destroyed in a fire.

Making diff even better

I was using diff option -y  on Linux to compare two files side by side,  but it did not display the fill width of the window.

diff -W $(( $(tput cols) - 2 )) -y file1 file2 |less

solved it.

It went from being like

javax.net.ssl|AL   |    javax.net.ssl|DE
javax.net.ssl|IN   |    javax.net.ssl|DE
javax.net.ssl|DE   <
javax.net.ssl|DE   <
javax.net.ssl|DE   <
"ClientHello": {        "ClientHello": {

to


javax.net.ssl|ALL|01|main|2021-01-29 16:39:00.297 GMT|Si   |    javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:15.405 GMT|
javax.net.ssl|INFO|01|main|2021-01-29 16:39:00.297 GMT|A   |    javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:15.407 GMT|
javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:00.297 GMT|   <
javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:00.298 GMT|   <
javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:00.299 GMT|   <
"ClientHello": {                                                "ClientHello": {
  "client version"      : "TLSv1.2",                              "client version"      : "TLSv1.2",

diff -W $(( 120))…. used a terminal width of 120 or each section 60 wide.

Finding the nugget of useful information in a TLS(SSL) trace

Understanding the TLS trace, for example trying to get a client to use a web server over TLS, is a difficulty-squared problem.

In this post, I give some tools to reduce the amount of trace data, and provide and annotated trace so you can understand what is going on, and spot the errors.   I’ve also annotated the trace with common user errors, and links to possible error causes.

 This post, and the referenced pages are still work in progress while I sort out some of the little problems that creep in.  Please send me comments on any mistakes or suggestions for improvements (or additional reasons why a handshake fails).

Understanding the TLS trace is hard.

It is hard enough to understand what the trace is showing, but it is made even more difficult to use.

  1. Where there are concurrent threads running; the trace records are interleaved, and it can be hard to tell which data belongs to which thread.
  2. The formatting is sometimes poor. Instead of giving one line with a list of 50 comma separated numbers, it gives you 100 lines with either a number, or a comma.  And when you have two threads doing this, it is a nightmare trying to work out what data belongs to which thread.  (But I usually ignore these records).
  3. The trace tends to be “provide all information that might possibly be useful”, rather than provide information needed by most people to resolve why a client cannot connect to the server.  For example the trace gives a print out of the encrypted data – in hex!
  4. You turn the trace on with a java -D… parameter.  Other applications, such as web servers, have different ways of turning the trace on.    I could not find a way of turning it on and off dynamically so you can get a lot of output if you have to run for a long period.  The output may go into trace files which wrap.
  5. Different implementations have  slightly different trace formats.

All these factors made it very difficult to understand the trace and find the cause of your problems.

What can you do to understand it.

This page from Java walks through a trace, but I dont find it very helpful.  This one which also covers TLS V1.3 is better.

Do not despair! 

  • On z/OS I created an edit macro, which I use to delete or transform data.  It reduces an 8000 line spool file down to 800 lines.  See ztrace.rexx.
  • On Linux I have a python script  which does the same. See tls.py.

In some traces sections are delimited by ***…. *** to make it easier to see the structure.

To find problems look for *** at the start of the line, or “exception”.

You may need to look at both ends of the trace to understand the problem.   One end may get a response “TLSv1.2 ALERT: warning, description = …” and you need to examine the other end to find the reason for the message.

Annotated trace file

I have taken a trace file from each end, and annotated then, so you can see what the flow is, and how and where the data is used.  I have colour coded some of the flows, and included some common errors (in red) , with a link to possible solutions.
Some lines have hover text.

If you have suggestions for additional information – or reasons why things do not work – please tell me and I’ll update the documentation.  

The trace is from a Linux client going to a Liberty on z/OS.

  1. Server starts up – and waits for a connection from a client
  2. Client starts up, and sends a “Client Hello” request to the server
  3. Server wakes up, processes the request
  4. Server sends “ServerHello” to the client
  5. Optional. If the server wants client authentication,  server send the “client Authentication request”
  6. *** ServerHelloDone. It has finished the processing, send the data and wait for the reply.
  7. Client wakes up and processes the “ServerHello”, optionally sends back the “CertificationResponse”, and sends verify.
  8. Servers processes the verify and ends the handshake.

 

 

Useful linux commands

Someone told me about a useful linux command, for using z/OS… so I thought I would pass on some one-liners to help other people.

sshfs colin@10.1.1.2: ~/mountpointz

This mounts a remote file system to give local access to files on a remote box.  This allows you to use gedit remotely, including z/OS files in USS.  For example gedit mountpointz/mqweb/servers/mqweb/mqwebuser.xml .

For z/OS the files must be tagged.   Use  chtag -p name to see the tag.  For me it needs a tag of  “t ISO8859-1 T=on” .  Without the tag it display the file in the wrong code page.

parcellite

This keeps a history of your clipboard, with hot keys.

Linux settings -> keyboard mapping

Ctrl+1 give me the x3270 with “tso@” in the title.   The command executed is wmctrl -a tso@
Ctrl+2 give me the x3270 with “colin@” in the title.   The command executed is wmctrl -a colin@

I start the x3270 with

x3270 -model 5 colin@localhost:3270 &
x3270 -model 5 tso@localhost:3270 &

model 5 makes the screen 132 characters wide by 80 deep

Map the 3270 keyboard to define the uss escape key

See x3270 – where’s the money key?

Gedit hot keys

See here

  • Find a string Ctrl-F , next string Ctrl-G, previous Ctrl-Shift-G
  • Top Ctrl-home, bottom Ctrl-End
  • Move left/right one word Alt- <- or Alt–>
  • Start/end of line Home/end or  Ctrl-pgup,Ctrl-pgdn
  • Next session Ctrl-Alt Pgup, Ctrl-Alt-Pgdn
  • Create new gedit window Ctrl-N
  • Create new tab Ctrl-T

 

Should I run MQ for Linux, on z/OS on my Linux?

Yes – this sounds crazy.  Let me break it down.

  1. zPDT is a product from IBM that allows me to run system 390 application on my laptop.  I am running z/OS 2.4, and MQ 9.1.3.   For normal editing and running MQ – it feels as fast as when I had my own real hardware at IBM.
  2. z/OS 2.4 can run docker images in a special z/OS address space called zCX.
  3. I can run MQ distributed in docker.

Stringing these together I can run MQ distributed in a docker environment. The docked environment runs on z/OS.    The z/OS runs runs on my laptop!  To learn more about running this scenario on real z/OS hardware… 

20 years ago, someone was enthusiastically telling me how you could partition distributed servers using a product called VMWare.    They kept saying how good it was, and asking why wasn’t I excited about it.  I said that when I joined IBM – 20 years before the discussion (so 40 years ago), the development platform was multiple VS1 (an early MVS) running on VM/360.  Someone had VM/360 running under VM/360 with VS1 running on that!  Now that was impressive!

Now if only I could get z/OS to run in docker….

You have to be brave to climb back up the slippery slope.

It is interesting that you notice things when you are sensitive to it.  Someone bought a new car in “Night Blue” because they wanted a car that no one else had – she had not seen any cars of that colour.  Once she got it, she noticed many other cars of the same make and colour.

I was sliding down a slippery slope, and realised that another project I was working on had also gone down a slippery slope.

My slippery slope.

I wanted a small program to do a task.  It worked, and I realised I needed to extend it, so with lots of cutting and pasting, and editing the file soon got to 10 times the size.  I then realised the problem was a bit more subtle and started making even more changes.  I left it, and went to have dinner with a glass of wine.

After the glass of wine I realised that now I understood the problem, there were easier (and finite) solutions to the problem.  Should I continue down the slippery slope or, now that I understood the problem, start again.

I tried a compromise, I wrote some Python code to process a file of data, to generate the C code, which I then used. As this worked, I used this solution, and so; yes it was worth stopping and going up the slippery slope and finding a different solution.

I had a cup of tea to celebrate, and realised that I could see the progress down a slippery path for another project I was working on.

The slippery slope of a product customisation.

I was  trying to configure a product, and thought the configuration process was very complex.    I could see the slippery slope the development team had taken with the product to end up with a very complex solution to a simple problem.

I looked into the configuration expecting to see a complex product which needed a complex configuration tool, but no, it looked just like many other products.

Many products (including MQ on z/OS) configuration consists of

  1. Define some VSAM files and initialize them
  2. Define some non VSAM files
  3. Create some system definitions
  4. Specify parameters for example MQ’s TCP/IP port number.

The developer of the product I was trying to install had realised that there were many parameters, for example the high level qualifier of data sets, as well as product specify parameters such as TCP/IP port number, and so developed some dialogs to make it easy to enter and validate the parameters.  This was good – it means that the value can be checked before they are used.

The next step down the slippery slope was to generate the JCL for the end user,  this was OK, but instead of having a job for each component, they had one job with “If configuring for component 1 then create the following datasets”  etc.  In order to debug problems with this, they then had to capture the output, and save it in a dataset.   They then needed a program to check this output and validate it.  By now they were well down the slippery slope.

The same configuration parameter was needed in multiple components, and rather use one file, used by all JCL, they copied the parameter into each component.

During  configuration it looks as if it copied files from the SMP target libraries, to intermediate libraries, then to instance specific libraries.  I compared the contents of the SMP target libraries with the final libraries and they were 95% common.  It meant each instance had its own self contained set of libraries. 

I do not want to rerun the configuration in case it overwrites the manual tweaking I had to do.

I would much rather have more control over the configuration, for example put JCL overrides such as where to allocate the volumes, in the JCL, so it is easy to change.

A manager said to me once, the first thing you should do every day, once you have your first coffee is to remind yourself of the overall goal, and ask if the work you are doing is for this goal, and not a distraction.  There is a popular phrase  – when you’re up to your neck in alligators, it’s hard to remember that your initial objective was to drain the swamp.