What’s going on? – getting performance data from a z/OS systrace

On my little z/OS system, one address space was using a lot of CPU – but doing nothing. What was going on? The address space was a z/OSMF based on the Liberty Web server.

The blog post tells you how to take a dump, and use IPCS to display useful information from the system trace. The system trace contains deep down information like

  • Task A was dispatched on this processor at this time.
  • It issued a request MVS to get a block of storage, and time.
  • The request completed, with this return code, at this time.
  • Task A was interupted at this time
  • Task B was dispatched

There is a lot of detailed information, and it is overwhelming when you first look at it. This blog post shows how you can get summary information from the trace – while ignoring all of the detailed, scary stuff. It does not require any prior knowledge of IPCS or dumps.

Take your dump

DUMP COMM=(COLINS DUMP)
R xx,jobname=IZUSVR1

This gives output like

IEE600I REPLY TO 01 IS;JOBNAME=IZUSVR1
IEA045I AN SVC DUMP HAS STARTED AT TIME=16.24.56 DATE=06/21/2021 044
FOR ASID (0044) QUIESCE = YES

IEA611I COMPLETE DUMP ON SYS1.S0W1.Z24A.DMP00004

Go into IPCS

I find it easier to use a wide(132) screen for IPCS.

This may be in your ISPF panels, or you might need to issue a command before starting ISPF. You might need to talk to your system programmer.

You get the primary menu

 ------------------- z/OS 02.04.00 IPCS PRIMARY OPTION MENU
 OPTION  ===>                                              
                                                           
    0  DEFAULTS    - Specify default dump and options      
    1  BROWSE      - Browse dump data set                  
    2  ANALYSIS    - Analyze dump contents                 
    3  UTILITY     - Perform utility functions             
    4  INVENTORY   - Inventory of problem data             
    5  SUBMIT      - Submit problem analysis job to batch  
    6  COMMAND     - Enter subcommand, CLIST or REXX exec  
    T  TUTORIAL    - Learn how to use the IPCS dialog      
    X  EXIT        - Terminate using log and list defaults

Select option 0

------------------------- IPCS Default Values ---------------------------------
 Command ===>                                                                   
                                                                                
   You may change any of the defaults listed below.  The defaults shown before  
   any changes are LOCAL.  Change scope to GLOBAL to display global defaults.   
                                                                                
   Scope   ==> LOCAL   (LOCAL, GLOBAL, or BOTH)                                 
                                                                                
   If you change the Source default, IPCS will display the current default      
   Address Space for the new source and will ignore any data entered in         
   the Address Space field.                                                     
                                                                                
   Source  ==> DSNAME('SYS1.CTRACE1')
   Address Space   ==> RBA
   Message Routing ==> NOPRINT TERMINAL NOPDS
   Message Control ==> CONFIRM VERIFY FLAG(WARNING)
   Display Content ==> NOMACHINE REMARK REQUEST NOSTORAGE SYMBOL ALIGN
                                                                                
 Press ENTER to update defaults.                                                
                                                                                
 Use the END command to exit without an update.                                 

Replace the source with DSN(‘your dumpname’).

Change Scope from LOCAL to BOTH

Press enter to update. Use =6 on the command line to get to the IPCS command window.

Enter a free-form IPCS subcommand or a CLIST or REXX exec invocation below: 
                                                                            
===>                                                                        
                                                                            
                                                                            
----------------------- IPCS Subcommands and Abbreviations -----------------
ADDDUMP           | DROPDUMP, DROPD   | LISTDUMP, LDMP    | RENUM,    REN   
ANALYZE           | DROPMAP,  DROPM   | LISTMAP,  LMAP    | RUNCHAIN, RUNC  
ARCHECK           | DROPSYM,  DROPS   | LISTSYM,  LSYM    | SCAN            
ASCBEXIT, ASCBX   | EPTRACE           | LISTUCB,  LISTU   | SELECT          
ASMCHECK, ASMK    | EQUATE,   EQU, EQ | LITERAL           | SETDEF,   SETD  
CBFORMAT, CBF     | FIND,     F       | LPAMAP            | STACK           
CBSTAT            | FINDMOD,  FMOD    | MERGE             | STATUS,   ST    
CLOSE             | FINDUCB,  FINDU   | NAME              | SUMMARY,  SUMM  
COPYDDIR          | GTFTRACE, GTF     | NAMETOKN          | SYSTRACE        
COPYDUMP          | INTEGER           | NOTE,     N       | TCBEXIT,  TCBX  
COPYTRC           | IPCS HELP, H      | OPEN              | VERBEXIT, VERBX 
CTRACE            | LIST,     L       | PROFILE,  PROF    | WHERE,    W     

If you use the command “systrace” you will see the scary internal trace. PF3 out of it.
Use the command

systrace jobname(IZUSVR1) PERFDATA(DOWHERE) time(LOCAL)

Go to the bottom of the report ( type m and press PF8) and type

REPORT VIEW

This gives you the report in an editor session, so you can exclude, delete, sort count etc.

This gives a lot of data. It is in sections, the sections are…FLocal

Summary of the dump

Analysis from 06/21/2021 16:24:46.391102 to 16:24:56.042146 9.651044 seconds 

This gives the time of day, and the interval of the trace is 9.65 seconds.

Summary of CPU usage by engine

CPU  Type Pol  Park   SRB Time              TCB Time             Idle Time           
---- ---- ---- ---- --------------------- --------------------- ---------------------
0000 CP   High No       0.190562   1.974%     0.828988   8.589%     8.603271  89.143%
0001 CP   High No       0.098836   1.024%     0.393259   4.074%     9.143735  94.743%
0002 CP   High No       0.086573   0.897%     0.415063   4.300%     9.136385  94.667%
0003 zIIP High No       0.015463   0.160%     2.227832  23.083%     7.398707  76.662%
0004 zIIP High No       0.000000   0.000%     1.094373  11.339%     8.551280  88.604%
---- ---- ---- ---- --------------------- --------------------- ---------------------
                        0.391434              4.959518             42.833380 

This shows

  • Most of the time was spent in TCB “application thread” mode (4.959 seconds of CPU) rather than SRB “system thread” mode (0.391 seconds of CPU).
  • One ZIIP was busy 23 % of the time, the other ZIIP was busy 11 % of the time.

Summary of CPU overall over 5 engines

 SRB time      :     0.391434 
 TCB time      :     4.959518 
 Idle time     :    42.833380 
 CPU Overhead  :     0.070886 
                 ------------ 
         Total :    48.255220  

This summarises the data

  • 4.95 seconds of CPU in TCB mode in 9.65 seconds of trace
  • 42 seconds idle
  • 5 engines * 9.65 seconds duration = 48.25

CPU break down by ASID/Jobname

CPU breakdown by ASID: 
                                                    
ASID Jobname    SRB Time     TCB Time    Total Time 
---- -------- ------------ ------------ ------------
0001 *MASTER*     0.011086     0.017940     0.029027
0003 RASP         0.000186     0.000000     0.000186
0005 DUMPSRV      0.035545     0.008959     0.044504
0006 XCFAS        0.021590     0.074411     0.096001
...   
0044 IZUSVR1      0.021217     3.638295     3.659513
0045 COLIN        0.000000     0.000000     0.000000
0046 RMF          0.010238     0.020204     0.030442
0047 RMFGAT       0.019961     0.160512     0.180474
              ------------ ------------ ------------
                  0.391434     4.959518     5.350953

Most of the CPU was in ASID 44 for job IZUSVR1.

Breakdown by system thread (SRB) by address space/jobname

SRB breakdown by ASID: (WHERE command bypassed for CPU usage less than 0.100000): 
                                                                                   
ASID: 0001   Jobname: *MASTER* 
IASN      SRB PSW      # of SRBs     Time 
---- ----------------- --------- ------------ 
0001 070C0000 83D0E8A0         2     0.000314 
...  
ASID: 0003   Jobname: RASP 
IASN      SRB PSW      # of SRBs     Time

Ignore this unless the SRB usage was high.

Breakdown of CPU by used thread by address space/jobname


TCB breakdown by ASID: 
                                                   
ASID Jobname  TCB Adr  # of DSPs     Time 
---- -------- -------- --------- ------------ 
0001 *MASTER* 008EDE88         1     0.000535
... 
ASID Jobname  TCB Adr  # of DSPs     Time 
---- -------- -------- --------- ------------ 
0044 IZUSVR1  008C8E88        22     0.013143 
0044 IZUSVR1  008AD7A0        30     0.006694 
0044 IZUSVR1  008B97B8        37     0.015926 
0044 IZUSVR1  008BA3E8        50     0.017547 
0044 IZUSVR1  008B2628        15     0.007748 
0044 IZUSVR1  008C4840        19     0.008433 
0044 IZUSVR1  008BD2D8        20     0.008551 
0044 IZUSVR1  008CDC68        14     0.008107 
0044 IZUSVR1  008C8328        15     0.006540 
0044 IZUSVR1  008CAC68        16     0.006612 
0044 IZUSVR1  008C9E88        14     0.006634 
0044 IZUSVR1  008B5C68        14     0.005668 
0044 IZUSVR1  008CBBE0        28     0.015650 
0044 IZUSVR1  008ADE00        17     0.005861 
0044 IZUSVR1  008B9470        15     0.006014 
0044 IZUSVR1  008BEA48        17     0.017092 
0044 IZUSVR1  008C6CF0        20     0.010093
...
0044 IZUSVR1  008CC2D8       548     0.827491 
0044 IZUSVR1  008D2E88        25     0.445230 
0044 IZUSVR1  008D2510       819     0.412526 
0044 IZUSVR1  008CEE88        14     0.158703
0044 IZUSVR1  008D3E88         8     0.003189 
0044 IZUSVR1  008C4CF0        18     0.013237 
                                 ------------ 
                                     3.638295
  • There were 166 TCB’s which did something in the time period.
  • TCB with address 008D2510 was dispatched 819 times times in 9 seconds – using 0.4 seconds of CPU! This was being dispatched 100 times a second, and used 5 milliseconds of CPU on average per dispatch. This looks high considering the system was not doing any work.
  • TCB with address 008d2E88 was dispatched 25 times in 9 seconds, and used 0.44 seconds of CPU or 17 ms of CPU per dispatch. This is doing more work per dispatch than the previous TCB.

Display lock usage

Lock events for CEDQ   
  None found                
Lock events for CSMF       
  None found               
Lock events for CLAT        
  None found              
Lock events for CMS         
  None found                
Lock events for OTHR        
  None found                

Nothing of interest here.

Display local lock usage – locking the job

Lock events for LOCL of ASID 0010 OMVS 
Lock ASID Jobname  TCB/WEB  Type    PSW Address    IASN  Suspended at    Resumed at     Suspend Time 
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------ 
CML  0044 IZUSVR1  008C33E8 TCB  00000000_04868084 0010 16:24:49.612051 16:24:49.612076     0.000025 
CML  0044 IZUSVR1  008B4938 TCB  00000000_048687E4 0010 16:24:49.612090 16:24:49.612570     0.000480 
... 
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------ 
Suspends:      6  Contention Time:     0.000821    0.008%               WU Suspend Time:    0.000823 
                                                                                                        
Lock events for LOCL of ASID 0044 IZUSVR1 
Lock ASID Jobname  TCB/WEB  Type    PSW Address    IASN  Suspended at    Resumed at     Suspend Time 
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------ 
LOCL 0044 IZUSVR1  008D3E88 TCB  00000000_010CCD62 0044 16:24:46.404417 16:24:46.404561     0.000144 
LOCL 0044 IZUSVR1  008D3E88 TCB  00000000_010ADA78 0044 16:24:46.410188 16:24:46.412182     0.001993
Suspends:     83  Contention Time:     0.042103    0.436%               WU Suspend Time:    0.079177

The LOCal Lock (LOCL) is the MVS lock used to serialise on the address space, for example updating some MVS control blocks. For example if MVS wants to cancel an address space, it has to get the Local lock, to make sure that critical work completes.

  • For the OMVS addess space, address space IZUZVSR1 got the lock 6 times, and was delayed for 0.823 milliseconds waiting for the local lock.
  • For the IZUSVR1 address space, 83 TCBs got the local lock, and were suspended for a total of 79 milliseconds.

Display timer events (CPU Clock comparator CLKC and timer TIMR)

ASID: 0044   Jobname: IZUSVR1 
SRB/TCB  IASN   Interrupt PSW    Count   Where processing 
-------- ---- -------- -------- -------- ----------------------------------------
00000000 0044 070C0000 81D83CB8        2 IEANUC01.ASAXMPSR+00     READ/WRITE NUCLEUS 
...
008CEE88 0044 078D0401 945F2B28       11 AREA(Subpool252Key00)+CB28 EXTENDED PRIVATE 
008CEE88 0044 078D2401 945F99BE        1 AREA(Subpool252Key00)+0139BE EXTENDED PRIVATE 
008CEE88 0044 078D2401 FA63F71E        1 SPECIALNAME+03F71E     EXTENDED PRIVATE 
008CEE88 0044 078D0401 FAB00178        1 SPECIALNAME+0178     EXTENDED PRIVATE 
008CEE88 0044 078D2401 FAB447BE        1 SPECIALNAME+0447BE     EXTENDED PRIVATE 
008CEE88 0044 078D0401 FAD9E660        1 SPECIALNAME+29E660     EXTENDED PRIVATE 
008B17B8 0044 070C0000 81C92030        1 IEANUC01.IAXVP+4048     READ ONLY NUCLEUS 
008B27C0 0044 072C1001 91AF2460        1 BBGZAFSM+7520     EXTENDED CSA 
...
008D2E88 0044 078D0401 945F2B28       22 AREA(Subpool252Key00)+CB28     EXTENDED PRIVATE 
008D2E88 0044 078D0401 FB036F08        1 SPECIALNAME+036F08     EXTENDED PRIVATE 
...
008D2E88 0044 078D0401 FC145732        1 AREA(Subpool229Key00)+A732     EXTENDED PRIVATE 

This displays

  • The TCB
  • The virtual adress where the interrupt occurred
  • These entries are in time sequence, and so we can see the second entry had 11 interupts in quick succession (count is 11).
  • The Where processing, is a guess at converting the address into a module name. Sometimes it works, for example Module IEANUC01, csect ASAXMPSR, offset 0. Sometimes it cannot tell, for example from Java code.

This shows 2 things

  • The application said wake me up in a certain time period
  • The TCB was executing and z/OS interrupted it because it needed to go and dispatch some other work. This gives a clue as to hot spots in the code. If the same address occurs many times – the code was executing here many times. I look in the raw systrace to see if this is a TIMer (boring) or a CLKC interesting. Interesting gives you a profile of what code is CPU intensive.
  • You can delete all the records outside of this block, then sort 15 32 to sort on PSW address. For my IPCS dump the address 078D0401 945F2B28 occurred 35 times.

I/O activity

 Device   SSCH Issued    I/O Occurred     Duration   
 ------ --------------- --------------- ------------ 
  0A99  16:24:48.009819 16:24:48.010064     0.000244 
  0A99  16:24:48.033619 16:24:48.033856     0.000236 
  0A99  16:24:48.051014 16:24:48.051081     0.000066 
  0A99  16:24:48.057377 16:24:48.057434     0.000056 
  0A99  16:24:48.080331 16:24:48.080430     0.000098 
                                        ------------ 
                                            0.000702 
                                                     
    Events for 0A99 :            5                   
    Quickest I/O    :     0.000056                   
    Slowest  I/O    :     0.000244                   
    Total           :     0.000702                   
    Average         :     0.000140                   

This says for the device 0A99 there were 5 I/O requests, total time 0.7 milliseconds

  • I used the REPORT VIEW to get the data into ISPF edit,
  • deleted all the records above the I/O section
  • Used X ALL
  • F TOTAL ALL
  • This shows the totals for all I/Os. Most totals were under 1 ms. One I/O was over 5 seconds.
  • Displaying the detailed records above this TOTAL record showed one I/O took over 5 seconds!

End of report

End of PERFDATA analysis.

Advanced topic: Look at hot spots

I had seen that PSW 078D0401 945F2B28 was hot. If you go back to the IPCS command panel, you may be able to use the command

L 945F2B28

To display the storage. This will not work. You have to remove the top bit (80), so

L 145F2B28

may work.

If the first character is a letter (A-F) then you need to put a 0 on the front for example

L 0D2345678

You might need to put the address space in as well for example

L 145F2B28 ASID ID(x’44’)

You can say

L 145F2B28 ASID ID(x’44’) LENGTH(4096)

To display large sections of storage

Dig into the trace

You can use

systrace jobname(izusvr1)tcb(x’008CC2D8′)

to display all entries for the specified TCB and jobname.

  • Go to the bottom ( type Max press PF8)
  • use the report view command to get into and edit session
  • Columns 79 to 88 contain a description of some of the system calls
  • use X ALL;f ‘ ‘ 79 84 all;del all nx to delete all lines without a description

This gives you a picture of the MVS services being used.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s