Who do you get to implement this critical fix? The senior techie or a grunt?

I was reading an article about making systems resilient and how events can conspire to cause problems. As Terry Pratchett said Million-to-one chances…crop up nine times out of ten. For example the day you have a power outage and fail over to generators, is the day you find the fuel tank for the generators has not been filled, and with the weekly tests, it has emptied the tank.

This article made me think about the best way of implementing changes. If you have a critical change that needs to be made with no possible errors; who do you get to implement it – the team leader, or a junior person in the team? (Grunt – slang: Junior person in a team, lacking prestige, and authority. The grunts do all of the heavy lifting, and when people lift heavy things, they grunt as they lift.)

I assume that you have the steps documented, and have tested them. For example using cut and paste instead of typing information. You copy files from test to production, rather than create from new.

I think it would be good for a junior person in the team to make the change, with the team leader watching. This has several advantages:

  • You only learn by doing. You can watch someone do something a hundred times – but it only when you have to do it do you learn (and get the fear of doing it wrong).
  • The junior person will be very careful, and double check things. The team leader may just check once – as there were no problems in the past.
  • The junior person will be focused on the changes. Having the team leader there as another pair of eyes, and looking at the systems overall can alert you to problems. Hmm we made the change – now that over there has produced an alert – is that connected? You need the team leader’s experience to make the judgement call, while the rest of the change is implemented.
  • When the change has been made and had unexpected side effects, the junior person can follow the backout process while the team leader is on the phone to management and the incident room to explain what happened, and what they are doing about it. There is nothing worse than trying to resolve a problem while explaining to the incident team conference call that you would rather fix the problem, than do a causal analysis of the root cause of the problems. I remember one manager said to the management call “We estimate it will take an hour to resolve. I will dial back in in 30 minutes and give a status – good bye”. It takes seniority and courage to make that sort of call.
  • When you are in a hole, it is easy to keep trying things, and making the hole deeper. Having the team leader as an observer means they can say “step back from the keyboard and let us discuss what we can do”.
  • One person whom I respected, said that his life goal was to make himself redundant, by getting his team to make the hard decisions. His teams always had a reputation of getting things done, and doing a good job. He said the hardest part of his job was standing back and letting the team gently struggle, and letting them get themselves out of the problem. He never told us what to do – he just asked us questions to make us find the solutions ourselves – to me that was the skilful part. As he said – “I am a manager, I am technical, but not deeply technical, and I could not solve the problem – but my job is to help you find the solution”

Using the Java Health Centre to look at Liberty on z/OS

The Java Health centre is an eclipse plugin which allows you to monitor Liberty( and so MQWEB, z/OSMF, z/OS Connect, etc) on z/OS.

You can look at

  • the Java classes being used,
  • the environment variables, and startup parameters.
  • charts of garbage collection
  • method profiling

Before you get to far into investigation be warned, this is basically insecure.

  • By default anyone can connect to it
  • You can specify a file with userid and password in it (in clear text)
  • You can use TLS – but you have to have a keystore in a file rather than the security manager, and RACF keyrings.
  • You cannot use RACF or other security manager for authentication and authorization.

Install the health centre on Eclipse.

I used Eclipse Marketplace to install IBM Monitoring and Diagnostic Tools – Health Center 3.0.12

Configuring Health center on Liberty.

You can use

  • headless mode, where files are written to the local file system, transported to your Eclipse environment and reviewed offline.
  • Connection from Eclipse to give real time information

See Configuring the monitoring agent and Health Center configuration properties, and here

Headless mode

Add the following to the JVM.options (or other start up), change the directory location.

-XHealthCenter 
-Dcom.ibm.diagnostics.healthcenter.headless=on
-Dcom.ibm.java.diagnostics.healthcenter.data.collection.level=headless
-Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/u/tmp/zowec/
-Dcom.ibm.diagnostics.healthcenter.readonly=on

Restart the server.

Download the files from the specified directory to your works station, so the files are available from Eclipse.

In Eclipse

Window -> Open Perspective -> Open Perspective -> Health Center Status Summary

File -> Load Data…

Specify the file name.

Interactive mode

First time use (with little or no security)

-Xhealthcenter:transport=jrmp,port=8125 
-Dcom.ibm.diagnostics.healthcenter.logging.level=info
-Dcom.ibm.diagnostics.healthcenter.readonly=on
-Dcom.ibm.diagnostics.healthcenter.data.profiling=off

Note: Use -Dcom.ibm.diagnostics.healthcenter.data.profiling=off to start the server with profiling disabled.

Restart the server. Check the error log for messages like INFO: Health Center agent started on port 8125.a

On Eclipse, once you have installed the health center.

Window -> Open Perspective -> Open Perspective -> Health Center Status Summary

Open a connection to the server

File->”New Connection” (do not select the “New” option)

JMX; Host name 10.1.1.2; Port 8125; Untick Scan next 100 ports

This should connect you.

Problems

If you get

  • Unable to contact agent: Non IBM version of Java has been detected. Check the server is fully active, and you get a message in STDERR INFO: Health Center agent started on port 8125. in STDERR

Ping the IP address to check connectivity.

Quick overview

Once you have connected, or loaded the files

  • Environment gives you the properties and class paths.
  • Classes
    • Classes loaded shows you the classes loaded, and if they can from the shared cache or not.
    • Class Histogram shows you information of all the fields and variables. For example I had 196106 Ljava/lang/String – so lots of string variables, and 146973 HashMap nodes.
  • Method profiling
    • Self – which was the hottest method.
    • Tree – includes parents. The top element in the tree is java.lan.Thread.run. All work runs on a thread, so all work will be reported under this.
    • Click a line and select Invocation Paths to show who called the method.
  • Where you get a chart with Elapsed time … you can use your mouse, to select a time period and zoom in. Click on “elapsed time” to unzoom.

Securing Health Center

This is documented in Securing Health Center – but is missing standard Z techniques of protecting resources.

  • By default anyone can connect to it
  • You can specify a file with userid and password in it (in clear text)
  • You can use TLS – but you have to have a keystore in a file rather than the security manager, and using RACF keyrings.
  • You cannot use RACF or other security manager for authentication and authorization.

Double whoops – I’ve been ransomware-d and … it gets worse

This story comes from a friend of a friend, so I cannot tell how much it true, but it another good example of showing some things are “obvious” only when you understand it.

I heard that some one’s office systems had been compromised by a ransomware attack, where their files had been encrypted and they money was demanded to decrypt them. The first whoops. While someone else was sorting the problem for the office, the person was pleased that he kept backups of all his key files, on a separate portable hard disk drive attached via a USB cable. The person realised that his backup files had been encrypted as well, so he was unable to restore his own backups of his key files. The second whoops. Looking back this was an obvious consequence of having the backups connected connected to the machine.

I would have expected that any decent backup package would make the files read only, but then if the ransomware was able to get into administrator mode, it would be able to change these “protected” files as well.

I think the only answer is to take backups off your machine – for example over a network, and hope the ransomware is smart enough not to corrupt files across a network. You could also backup your key files to a CD which is write once, and then becomes read only.

As I wrote this I remembered that I had been meaning to backup some family photographs and documents that only exist on my machine (and backups). I had sent a copy to my brother, but when he got a new machine he did not copy the files across!

I was also reminded of the University that diligently backed up the system every week. Which was fine until the building with the computer, and cupboard full of backups was destroyed in a fire.

Making diff even better

I was using diff option -y  on Linux to compare two files side by side,  but it did not display the fill width of the window.

diff -W $(( $(tput cols) - 2 )) -y file1 file2 |less

solved it.

It went from being like

javax.net.ssl|AL   |    javax.net.ssl|DE
javax.net.ssl|IN   |    javax.net.ssl|DE
javax.net.ssl|DE   <
javax.net.ssl|DE   <
javax.net.ssl|DE   <
"ClientHello": {        "ClientHello": {

to


javax.net.ssl|ALL|01|main|2021-01-29 16:39:00.297 GMT|Si   |    javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:15.405 GMT|
javax.net.ssl|INFO|01|main|2021-01-29 16:39:00.297 GMT|A   |    javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:15.407 GMT|
javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:00.297 GMT|   <
javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:00.298 GMT|   <
javax.net.ssl|DEBUG|01|main|2021-01-29 16:39:00.299 GMT|   <
"ClientHello": {                                                "ClientHello": {
  "client version"      : "TLSv1.2",                              "client version"      : "TLSv1.2",

diff -W $(( 120))…. used a terminal width of 120 or each section 60 wide.

Finding the nugget of useful information in a TLS(SSL) trace

Understanding the TLS trace, for example trying to get a client to use a web server over TLS, is a difficulty-squared problem.

In this post, I give some tools to reduce the amount of trace data, and provide and annotated trace so you can understand what is going on, and spot the errors.   I’ve also annotated the trace with common user errors, and links to possible error causes.

 This post, and the referenced pages are still work in progress while I sort out some of the little problems that creep in.  Please send me comments on any mistakes or suggestions for improvements (or additional reasons why a handshake fails).

Understanding the TLS trace is hard.

It is hard enough to understand what the trace is showing, but it is made even more difficult to use.

  1. Where there are concurrent threads running; the trace records are interleaved, and it can be hard to tell which data belongs to which thread.
  2. The formatting is sometimes poor. Instead of giving one line with a list of 50 comma separated numbers, it gives you 100 lines with either a number, or a comma.  And when you have two threads doing this, it is a nightmare trying to work out what data belongs to which thread.  (But I usually ignore these records).
  3. The trace tends to be “provide all information that might possibly be useful”, rather than provide information needed by most people to resolve why a client cannot connect to the server.  For example the trace gives a print out of the encrypted data – in hex!
  4. You turn the trace on with a java -D… parameter.  Other applications, such as web servers, have different ways of turning the trace on.    I could not find a way of turning it on and off dynamically so you can get a lot of output if you have to run for a long period.  The output may go into trace files which wrap.
  5. Different implementations have  slightly different trace formats.

All these factors made it very difficult to understand the trace and find the cause of your problems.

What can you do to understand it.

This page from Java walks through a trace, but I dont find it very helpful.  This one which also covers TLS V1.3 is better.

Do not despair! 

  • On z/OS I created an edit macro, which I use to delete or transform data.  It reduces an 8000 line spool file down to 800 lines.  See ztrace.rexx.
  • On Linux I have a python script  which does the same. See tls.py.

In some traces sections are delimited by ***…. *** to make it easier to see the structure.

To find problems look for *** at the start of the line, or “exception”.

You may need to look at both ends of the trace to understand the problem.   One end may get a response “TLSv1.2 ALERT: warning, description = …” and you need to examine the other end to find the reason for the message.

Annotated trace file

I have taken a trace file from each end, and annotated then, so you can see what the flow is, and how and where the data is used.  I have colour coded some of the flows, and included some common errors (in red) , with a link to possible solutions.
Some lines have hover text.

If you have suggestions for additional information – or reasons why things do not work – please tell me and I’ll update the documentation.  

The trace is from a Linux client going to a Liberty on z/OS.

  1. Server starts up – and waits for a connection from a client
  2. Client starts up, and sends a “Client Hello” request to the server
  3. Server wakes up, processes the request
  4. Server sends “ServerHello” to the client
  5. Optional. If the server wants client authentication,  server send the “client Authentication request”
  6. *** ServerHelloDone. It has finished the processing, send the data and wait for the reply.
  7. Client wakes up and processes the “ServerHello”, optionally sends back the “CertificationResponse”, and sends verify.
  8. Servers processes the verify and ends the handshake.

 

 

Useful linux commands

Someone told me about a useful linux command, for using z/OS… so I thought I would pass on some one-liners to help other people.

sshfs colin@10.1.1.2: ~/mountpointz

This mounts a remote file system to give local access to files on a remote box.  This allows you to use gedit remotely, including z/OS files in USS.  For example gedit mountpointz/mqweb/servers/mqweb/mqwebuser.xml .

For z/OS the files must be tagged.   Use  chtag -p name to see the tag.  For me it needs a tag of  “t ISO8859-1 T=on” .  Without the tag it display the file in the wrong code page.

parcellite

This keeps a history of your clipboard, with hot keys.

Linux settings -> keyboard mapping

Ctrl+1 give me the x3270 with “tso@” in the title.   The command executed is wmctrl -a tso@
Ctrl+2 give me the x3270 with “colin@” in the title.   The command executed is wmctrl -a colin@

I start the x3270 with

x3270 -model 5 colin@localhost:3270 &
x3270 -model 5 tso@localhost:3270 &

model 5 makes the screen 132 characters wide by 80 deep

Map the 3270 keyboard to define the uss escape key

See x3270 – where’s the money key?

Gedit hot keys

See here

  • Find a string Ctrl-F , next string Ctrl-G, previous Ctrl-Shift-G
  • Top Ctrl-home, bottom Ctrl-End
  • Move left/right one word Alt- <- or Alt–>
  • Start/end of line Home/end or  Ctrl-pgup,Ctrl-pgdn
  • Next session Ctrl-Alt Pgup, Ctrl-Alt-Pgdn
  • Create new gedit window Ctrl-N
  • Create new tab Ctrl-T

 

Should I run MQ for Linux, on z/OS on my Linux?

Yes – this sounds crazy.  Let me break it down.

  1. zPDT is a product from IBM that allows me to run system 390 application on my laptop.  I am running z/OS 2.4, and MQ 9.1.3.   For normal editing and running MQ – it feels as fast as when I had my own real hardware at IBM.
  2. z/OS 2.4 can run docker images in a special z/OS address space called zCX.
  3. I can run MQ distributed in docker.

Stringing these together I can run MQ distributed in a docker environment. The docked environment runs on z/OS.    The z/OS runs runs on my laptop!  To learn more about running this scenario on real z/OS hardware… 

20 years ago, someone was enthusiastically telling me how you could partition distributed servers using a product called VMWare.    They kept saying how good it was, and asking why wasn’t I excited about it.  I said that when I joined IBM – 20 years before the discussion (so 40 years ago), the development platform was multiple VS1 (an early MVS) running on VM/360.  Someone had VM/360 running under VM/360 with VS1 running on that!  Now that was impressive!

Now if only I could get z/OS to run in docker….

You have to be brave to climb back up the slippery slope.

It is interesting that you notice things when you are sensitive to it.  Someone bought a new car in “Night Blue” because they wanted a car that no one else had – she had not seen any cars of that colour.  Once she got it, she noticed many other cars of the same make and colour.

I was sliding down a slippery slope, and realised that another project I was working on had also gone down a slippery slope.

My slippery slope.

I wanted a small program to do a task.  It worked, and I realised I needed to extend it, so with lots of cutting and pasting, and editing the file soon got to 10 times the size.  I then realised the problem was a bit more subtle and started making even more changes.  I left it, and went to have dinner with a glass of wine.

After the glass of wine I realised that now I understood the problem, there were easier (and finite) solutions to the problem.  Should I continue down the slippery slope or, now that I understood the problem, start again.

I tried a compromise, I wrote some Python code to process a file of data, to generate the C code, which I then used. As this worked, I used this solution, and so; yes it was worth stopping and going up the slippery slope and finding a different solution.

I had a cup of tea to celebrate, and realised that I could see the progress down a slippery path for another project I was working on.

The slippery slope of a product customisation.

I was  trying to configure a product, and thought the configuration process was very complex.    I could see the slippery slope the development team had taken with the product to end up with a very complex solution to a simple problem.

I looked into the configuration expecting to see a complex product which needed a complex configuration tool, but no, it looked just like many other products.

Many products (including MQ on z/OS) configuration consists of

  1. Define some VSAM files and initialize them
  2. Define some non VSAM files
  3. Create some system definitions
  4. Specify parameters for example MQ’s TCP/IP port number.

The developer of the product I was trying to install had realised that there were many parameters, for example the high level qualifier of data sets, as well as product specify parameters such as TCP/IP port number, and so developed some dialogs to make it easy to enter and validate the parameters.  This was good – it means that the value can be checked before they are used.

The next step down the slippery slope was to generate the JCL for the end user,  this was OK, but instead of having a job for each component, they had one job with “If configuring for component 1 then create the following datasets”  etc.  In order to debug problems with this, they then had to capture the output, and save it in a dataset.   They then needed a program to check this output and validate it.  By now they were well down the slippery slope.

The same configuration parameter was needed in multiple components, and rather use one file, used by all JCL, they copied the parameter into each component.

During  configuration it looks as if it copied files from the SMP target libraries, to intermediate libraries, then to instance specific libraries.  I compared the contents of the SMP target libraries with the final libraries and they were 95% common.  It meant each instance had its own self contained set of libraries. 

I do not want to rerun the configuration in case it overwrites the manual tweaking I had to do.

I would much rather have more control over the configuration, for example put JCL overrides such as where to allocate the volumes, in the JCL, so it is easy to change.

A manager said to me once, the first thing you should do every day, once you have your first coffee is to remind yourself of the overall goal, and ask if the work you are doing is for this goal, and not a distraction.  There is a popular phrase  – when you’re up to your neck in alligators, it’s hard to remember that your initial objective was to drain the swamp.

 

If the facts do not match your view – do not ignore the facts.

It is easy to ignore facts if you do not like them, sometimes this can have major consequences.    I recently heard two similar experiences of this.

We were driving along in dense fog trying to visit someone who lived out in the country.  We had been there in the daylight and thought we knew the way.  The conversation went a bit like the following

  • It is along here on the right somewhere – there should be a big gate
  • There’s a gate.  Oh they must have painted it white since last time we were here
  • The track is a bit rough, I thought it was better than this.
  • Ah here’s another gate.   They must have installed it since we were here last.
  • Round the corner and here we – oh where’s the house gone?

Of course we we had taken the wrong side road.  We had noticed that the facts didn’t match our picture and so we changed the facts. Instead of thinking “that gate is the wrong colour” we thought “they must have painted the gate”.  Instead of “we were not expecting that gate” we thought “they must have installed a new gate”.  It was “interesting” backing the car up the track to the main road in the dark.

I was trying to install a product and having problems.  I had already experienced a few problems where the messages were a bit vague.  I had another message which implied I had mis-specified something.  I checked the 6 characters in a file and thought “The data is correct, the message must be wrong, I’ll ignore it”.   I gave up for the day.  Next day I looked at the problem, and found I had been editing the wrong file.  The message was correct and I had wasted 3 hours.

Enclaves in practice. How to capture all the CPU your application uses, and knowing where to look for it.

Z/OS has enclaves to manage work.  

  1. When an enclave is used, a transaction can issue a DB2 request, then DB2 uses some TCBs in the DB2 address spaces on behalf of the original request.  The CPU used used by these DB2 TCBs can be charged back to the original  application. 
  2. When an enclave is not used, the CPU used by the original TCB is charged to its address space, and the DB2 TCBs are charged to the DB2 address space.  You do not get a complete picture of the CPU used by the application.
  3. A transaction can be defined to Work Load Managed(WLM) to set the priority of the transaction, so online transactions have high priority, and background work gets low priority.    With an enclave, the DB2 TCBs have the same priority as the original request.  With no enclave the TCBs have the priority as determined by DB2.

When an application sets up an enclave

  1. Threads can join the enclave, so any CPU the thread uses while in the enclave, is recorded against the enclave.
  2. These threads can be in the same address space, a different address space on the same LPAR, or even in a different LPAR in the SYSPLEX.
  3. Enclaves are closely integrated with Work Load Manager(WLM).   When you create an enclave you can give information about the business transaction, (such as transaction name and  userid).   You classify the application against different factors. 
  4. The classification maps to a service class.   This service class determines the appropriate priority profile.  Any threads using the enclave will get this priority.
  5. WLM reports on the elapsed time of the business transaction, and the CPU used.

What enclave types are there?

In this simple explanation there are two enclave types

  1. Independent enclave – what I think of as Business Transaction, where work can span multiple address spaces.  You pass transaction information (transaction, userid, etc) to WLM so it can set the priority for the enclave. You can get reports on the enclave showing elapsed time, and CPU used.  There can be many independent enclaves in the lifetime of a job.  You can have these enclaves running in parallel within a job.
  2. Dependent enclave or Address space enclave.   I cannot see the reason for this.  This is for tasks running within an address space which are not doing work for an independent enclave.  It could be used for work related to transactions in general.   In the SMF 30 job information records you get information on CPU used in the dependent enclave.  
  3. Work not in an enclave.  Threads by default run with the priority assigned to the address space.  CPU is charged to the address space.

To help me understand enclave reports, I set up two jobs

  1. The parent job,
    1. Creates an independent (transactional) enclave with “subsystem=JES, definition=SM3” and “TRANSACTION NAME=TCI2”.  It displays the enclave token.
    2. Sets up a dependent enclave.
    3. Joins the dependent enclave.
    4. Does some CPU intensive work.
    5. Sleeps for 30 seconds.
    6. Leaves the dependent enclave.
    7. Deletes the dependent enclave.
    8. Deletes the independent enclave.
    9. Ends.
  2. The child, or subtask, job.
    1. This reads the enclave token as a parameter.
    2. Joins the enclave,if the enclave does not exist, use the dependent enclave.
    3. Does some CPU intensive work.
    4. Leaves the enclave.
    5. Ends.

Where is information reported?

  1. Information about a job and the resources used by the job is in SMF 30 records. It reports total CPU used,  CPU used by independent enclaves, CPU used by the dependant enclave.  In JCL output where it reports the CPU etc used by the job step, this comes from the SMF 30 record.
  2. Information about the independent enclave is summarised in an SMF 72 record over a period(typically 15 minutes) and tells you information about the response time distribution, and the CPU used.

I used three scenarios

  1. Run the parent job – but not not the child.  This shows the costs of the just parent – when there is no child running the workload for it.
  2. Run the child but not the parent.   This shows the cost of the expensive workload.
  3. Both parent and child active.   This shows the costs of running the independent enclave  in the child are charged to the parent. 

SMF job resource used report

From the SMF 30 record we get the CPU for the Independent enclave.

Parent CPUChild CPU
Parent, no child
Total              : 0.070
Dependent enclave : 0.020
Not applicable
Child,no parentNot applicable
Total CPU         : 2.930
Dependent Enclave : 2.900
Non Enclave : 0.030
Parent and child
Total               : 2.860 
Independent enclave : 2.820
Dependent enclave : 0.010
Non enclave : 0.030
Total            : 0.020
No enclave CPU reported

From the parent and child we can see that the CPU used by the enclave work in the child job has been recorded against the parent’s job under “Independent enclave CPU”.

The SMF type 30 record shows the Parent job had CPU under Independent enclave, Dependent enclave, and a small amount (0.03) which was not enclave.

SMF WLM reports

From the SMF 72 data displayed by RMF (see below for an example) you get the number of transactions and CPU usage for the report class, and service class. I had a report class for each of the parent, and the child job, and for the Independent enclave transaction.

Total CPUElapsed timeEnded
Parent2.81834.971
Child2.6366.761
Business transaction2.81930.041

It is clear there is some double accounting. The CPU used for the child doing enclave processing, is also recorded in the Parent’s cost. The CPU used for the Business transaction is another view of the data from the parent and child address spaces.

For charging based on address spaces you should use the SMF 30 records.

You can use the SMF 72 records for reporting on the transaction costs.

RMF workload reports

When processing the SMF data using RMF you get out workload reports

//POST EXEC PGM=ERBRMFPP 
//MFPINPUT DD DISP=SHR,DSN=smfinput.dataset  
//SYSIN DD * 
SYSRPTS(WLMGL(SCPER,RCLASS,RCPER,SCLASS)) 
/* 

For the child address space report class RMF reported

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF
 AVG        0.05  ACTUAL             6.760380
 MPL        0.05  EXECUTION          6.254239
 ENDED         1  QUEUED               506141
 ...
 ----SERVICE----   SERVICE TIME  ---APPL %---
 IOC        2152   CPU    2.348  CP      2.20
 CPU        2012   SRB    0.085  IIPCP   0.00
 MSO         502   RCT    0.004  IIP     0.00
 SRB          73   IIT    0.197  AAPCP   0.00
 TOT        4739   HST    0.002  AAP      N/A

There was 1 occurrence of the child job, it ran for 6.76 seconds on average, and used a total of 2.636 seconds of CPU (if you add up the service time).

For a more typical job using many short duration independent enclaves the report looked like

-TRANSACTIONS--  TRANS-TIME HHH.MM.SS.FFFFFF 
 AVG        0.11  ACTUAL                13395 
 MPL        0.11  EXECUTION             13395 
 ENDED      1000  QUEUED                    0 
 END/S      8.33  R/S AFFIN                 0 
 SWAPS         0  INELIGIBLE                0
 EXCTD         0  CONVERSION                0 
                  STD DEV                1325 
 ----SERVICE----   SERVICE TIME  
 IOC           0   CPU    1.448   
 CPU        1241   SRB    0.000   
 MSO           0   RCT    0.000  
 SRB           0   IIT    0.000  
 TOT        1241   HST    0.000  

This shows 1000 transaction ended in the period and the average transaction response time was 13.395 milliseconds. The total CPU time used was 1.448 seconds, or an average of 1.448 milliseconds of CPU per transaction.

For the service class with a response time definition, you get a response time profile. The data below shows the most most response times were between 15 and 20 ms. The service class was defined with “Average response time of 00:00:00.010”. This drives the range of response times reported. If this data was for a production system you may want to adjust the “Average response time” to 00:00:00.015 to get the peak in the middle of the range.

-----TIME--- -% TRANSACTIONS-    0  10   20   30   40   50 
    HH.MM.SS.FFF CUM TOTAL BUCKET|...|....|....|....|....|
 <= 00.00.00.005       0.0  0.0   >                           
 <= 00.00.00.006       0.0  0.0  >                           
 <= 00.00.00.007       0.0  0.0  >                           
 <= 00.00.00.008       0.0  0.0  >                           
 <= 00.00.00.009       0.0  0.0  >                           
 <= 00.00.00.010       0.0  0.0  >                           
 <= 00.00.00.011       0.3  0.3  >                           
 <= 00.00.00.012      10.6 10.3  >>>>>>                      
 <= 00.00.00.013      24.9 14.3  >>>>>>>>                    
 <= 00.00.00.014      52.3 27.4  >>>>>>>>>>>>>>              
 <= 00.00.00.015      96.7 44.4  >>>>>>>>>>>>>>>>>>>>>>>     
 <= 00.00.00.020     100.0  3.2  >>                          
 <= 00.00.00.040     100.0  0.1  >                           
    00.00.00.040     100.0  0.0  >                                                                                

Take care.

The field ENDED is the number of transactions that ended in the interval. If you have a measurement that spans an interval, you will get CPU usage in both intervals, but “ENDED” only when the transaction ends. With a large number of transactions this effect is small. With few transactions it could cause divide by zero exceptions!