On my little z/OS system, one address space was using a lot of CPU – but doing nothing. What was going on? The address space was a z/OSMF based on the Liberty Web server.
The blog post tells you how to take a dump, and use IPCS to display useful information from the system trace. The system trace contains deep down information like
- Task A was dispatched on this processor at this time.
- It issued a request MVS to get a block of storage, and time.
- The request completed, with this return code, at this time.
- Task A was interupted at this time
- Task B was dispatched
There is a lot of detailed information, and it is overwhelming when you first look at it. This blog post shows how you can get summary information from the trace – while ignoring all of the detailed, scary stuff. It does not require any prior knowledge of IPCS or dumps.
- Take your dump
- Go into IPCS
- Systrace output
- Summary of the dump
- Summary of CPU usage by engine
- Summary of the CPU over all engines
- CPU breakdown by jobname
- Breakdown by jobname and system threads(SRB)
- Breakdown by jobname and application thread(TCB)
- Lock usage
- Local lock (address space lock)
- Timer interrupts
- I/O Activity device and response time
- Advanced topics
Take your dump
DUMP COMM=(COLINS DUMP)
R xx,jobname=IZUSVR1
This gives output like
IEE600I REPLY TO 01 IS;JOBNAME=IZUSVR1
IEA045I AN SVC DUMP HAS STARTED AT TIME=16.24.56 DATE=06/21/2021 044
FOR ASID (0044) QUIESCE = YES
…
IEA611I COMPLETE DUMP ON SYS1.S0W1.Z24A.DMP00004
Go into IPCS
I find it easier to use a wide(132) screen for IPCS.
This may be in your ISPF panels, or you might need to issue a command before starting ISPF. You might need to talk to your system programmer.
You get the primary menu
------------------- z/OS 02.04.00 IPCS PRIMARY OPTION MENU
OPTION ===>
0 DEFAULTS - Specify default dump and options
1 BROWSE - Browse dump data set
2 ANALYSIS - Analyze dump contents
3 UTILITY - Perform utility functions
4 INVENTORY - Inventory of problem data
5 SUBMIT - Submit problem analysis job to batch
6 COMMAND - Enter subcommand, CLIST or REXX exec
T TUTORIAL - Learn how to use the IPCS dialog
X EXIT - Terminate using log and list defaults
Select option 0
------------------------- IPCS Default Values ---------------------------------
Command ===>
You may change any of the defaults listed below. The defaults shown before
any changes are LOCAL. Change scope to GLOBAL to display global defaults.
Scope ==> LOCAL (LOCAL, GLOBAL, or BOTH)
If you change the Source default, IPCS will display the current default
Address Space for the new source and will ignore any data entered in
the Address Space field.
Source ==> DSNAME('SYS1.CTRACE1')
Address Space ==> RBA
Message Routing ==> NOPRINT TERMINAL NOPDS
Message Control ==> CONFIRM VERIFY FLAG(WARNING)
Display Content ==> NOMACHINE REMARK REQUEST NOSTORAGE SYMBOL ALIGN
Press ENTER to update defaults.
Use the END command to exit without an update.
Replace the source with DSN(‘your dumpname’).
Change Scope from LOCAL to BOTH
Press enter to update. Use =6 on the command line to get to the IPCS command window.
Enter a free-form IPCS subcommand or a CLIST or REXX exec invocation below:
===>
----------------------- IPCS Subcommands and Abbreviations -----------------
ADDDUMP | DROPDUMP, DROPD | LISTDUMP, LDMP | RENUM, REN
ANALYZE | DROPMAP, DROPM | LISTMAP, LMAP | RUNCHAIN, RUNC
ARCHECK | DROPSYM, DROPS | LISTSYM, LSYM | SCAN
ASCBEXIT, ASCBX | EPTRACE | LISTUCB, LISTU | SELECT
ASMCHECK, ASMK | EQUATE, EQU, EQ | LITERAL | SETDEF, SETD
CBFORMAT, CBF | FIND, F | LPAMAP | STACK
CBSTAT | FINDMOD, FMOD | MERGE | STATUS, ST
CLOSE | FINDUCB, FINDU | NAME | SUMMARY, SUMM
COPYDDIR | GTFTRACE, GTF | NAMETOKN | SYSTRACE
COPYDUMP | INTEGER | NOTE, N | TCBEXIT, TCBX
COPYTRC | IPCS HELP, H | OPEN | VERBEXIT, VERBX
CTRACE | LIST, L | PROFILE, PROF | WHERE, W
If you use the command “systrace” you will see the scary internal trace. PF3 out of it.
Use the command
systrace jobname(IZUSVR1) PERFDATA(DOWHERE) time(LOCAL)
Go to the bottom of the report ( type m and press PF8) and type
REPORT VIEW
This gives you the report in an editor session, so you can exclude, delete, sort count etc.
This gives a lot of data. It is in sections, the sections are…FLocal
Summary of the dump
Analysis from 06/21/2021 16:24:46.391102 to 16:24:56.042146 9.651044 seconds
This gives the time of day, and the interval of the trace is 9.65 seconds.
Summary of CPU usage by engine
CPU Type Pol Park SRB Time TCB Time Idle Time
---- ---- ---- ---- --------------------- --------------------- ---------------------
0000 CP High No 0.190562 1.974% 0.828988 8.589% 8.603271 89.143%
0001 CP High No 0.098836 1.024% 0.393259 4.074% 9.143735 94.743%
0002 CP High No 0.086573 0.897% 0.415063 4.300% 9.136385 94.667%
0003 zIIP High No 0.015463 0.160% 2.227832 23.083% 7.398707 76.662%
0004 zIIP High No 0.000000 0.000% 1.094373 11.339% 8.551280 88.604%
---- ---- ---- ---- --------------------- --------------------- ---------------------
0.391434 4.959518 42.833380
This shows
- Most of the time was spent in TCB “application thread” mode (4.959 seconds of CPU) rather than SRB “system thread” mode (0.391 seconds of CPU).
- One ZIIP was busy 23 % of the time, the other ZIIP was busy 11 % of the time.
Summary of CPU overall over 5 engines
SRB time : 0.391434
TCB time : 4.959518
Idle time : 42.833380
CPU Overhead : 0.070886
------------
Total : 48.255220
This summarises the data
- 4.95 seconds of CPU in TCB mode in 9.65 seconds of trace
- 42 seconds idle
- 5 engines * 9.65 seconds duration = 48.25
CPU break down by ASID/Jobname
CPU breakdown by ASID:
ASID Jobname SRB Time TCB Time Total Time
---- -------- ------------ ------------ ------------
0001 *MASTER* 0.011086 0.017940 0.029027
0003 RASP 0.000186 0.000000 0.000186
0005 DUMPSRV 0.035545 0.008959 0.044504
0006 XCFAS 0.021590 0.074411 0.096001
...
0044 IZUSVR1 0.021217 3.638295 3.659513
0045 COLIN 0.000000 0.000000 0.000000
0046 RMF 0.010238 0.020204 0.030442
0047 RMFGAT 0.019961 0.160512 0.180474
------------ ------------ ------------
0.391434 4.959518 5.350953
Most of the CPU was in ASID 44 for job IZUSVR1.
Breakdown by system thread (SRB) by address space/jobname
SRB breakdown by ASID: (WHERE command bypassed for CPU usage less than 0.100000):
ASID: 0001 Jobname: *MASTER*
IASN SRB PSW # of SRBs Time
---- ----------------- --------- ------------
0001 070C0000 83D0E8A0 2 0.000314
...
ASID: 0003 Jobname: RASP
IASN SRB PSW # of SRBs Time
Ignore this unless the SRB usage was high.
Breakdown of CPU by used thread by address space/jobname
TCB breakdown by ASID:
ASID Jobname TCB Adr # of DSPs Time
---- -------- -------- --------- ------------
0001 *MASTER* 008EDE88 1 0.000535
...
ASID Jobname TCB Adr # of DSPs Time
---- -------- -------- --------- ------------
0044 IZUSVR1 008C8E88 22 0.013143
0044 IZUSVR1 008AD7A0 30 0.006694
0044 IZUSVR1 008B97B8 37 0.015926
0044 IZUSVR1 008BA3E8 50 0.017547
0044 IZUSVR1 008B2628 15 0.007748
0044 IZUSVR1 008C4840 19 0.008433
0044 IZUSVR1 008BD2D8 20 0.008551
0044 IZUSVR1 008CDC68 14 0.008107
0044 IZUSVR1 008C8328 15 0.006540
0044 IZUSVR1 008CAC68 16 0.006612
0044 IZUSVR1 008C9E88 14 0.006634
0044 IZUSVR1 008B5C68 14 0.005668
0044 IZUSVR1 008CBBE0 28 0.015650
0044 IZUSVR1 008ADE00 17 0.005861
0044 IZUSVR1 008B9470 15 0.006014
0044 IZUSVR1 008BEA48 17 0.017092
0044 IZUSVR1 008C6CF0 20 0.010093
...
0044 IZUSVR1 008CC2D8 548 0.827491
0044 IZUSVR1 008D2E88 25 0.445230
0044 IZUSVR1 008D2510 819 0.412526
0044 IZUSVR1 008CEE88 14 0.158703
0044 IZUSVR1 008D3E88 8 0.003189
0044 IZUSVR1 008C4CF0 18 0.013237
------------
3.638295
- There were 166 TCB’s which did something in the time period.
- TCB with address 008D2510 was dispatched 819 times times in 9 seconds – using 0.4 seconds of CPU! This was being dispatched 100 times a second, and used 5 milliseconds of CPU on average per dispatch. This looks high considering the system was not doing any work.
- TCB with address 008d2E88 was dispatched 25 times in 9 seconds, and used 0.44 seconds of CPU or 17 ms of CPU per dispatch. This is doing more work per dispatch than the previous TCB.
Display lock usage
Lock events for CEDQ None found Lock events for CSMF None found Lock events for CLAT None found Lock events for CMS None found Lock events for OTHR None found
Nothing of interest here.
Display local lock usage – locking the job
Lock events for LOCL of ASID 0010 OMVS
Lock ASID Jobname TCB/WEB Type PSW Address IASN Suspended at Resumed at Suspend Time
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------
CML 0044 IZUSVR1 008C33E8 TCB 00000000_04868084 0010 16:24:49.612051 16:24:49.612076 0.000025
CML 0044 IZUSVR1 008B4938 TCB 00000000_048687E4 0010 16:24:49.612090 16:24:49.612570 0.000480
...
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------
Suspends: 6 Contention Time: 0.000821 0.008% WU Suspend Time: 0.000823
Lock events for LOCL of ASID 0044 IZUSVR1
Lock ASID Jobname TCB/WEB Type PSW Address IASN Suspended at Resumed at Suspend Time
---- ---- -------- -------- ---- ----------------- ---- --------------- --------------- ------------
LOCL 0044 IZUSVR1 008D3E88 TCB 00000000_010CCD62 0044 16:24:46.404417 16:24:46.404561 0.000144
LOCL 0044 IZUSVR1 008D3E88 TCB 00000000_010ADA78 0044 16:24:46.410188 16:24:46.412182 0.001993
Suspends: 83 Contention Time: 0.042103 0.436% WU Suspend Time: 0.079177
The LOCal Lock (LOCL) is the MVS lock used to serialise on the address space, for example updating some MVS control blocks. For example if MVS wants to cancel an address space, it has to get the Local lock, to make sure that critical work completes.
- For the OMVS addess space, address space IZUZVSR1 got the lock 6 times, and was delayed for 0.823 milliseconds waiting for the local lock.
- For the IZUSVR1 address space, 83 TCBs got the local lock, and were suspended for a total of 79 milliseconds.
Display timer events (CPU Clock comparator CLKC and timer TIMR)
ASID: 0044 Jobname: IZUSVR1 SRB/TCB IASN Interrupt PSW Count Where processing -------- ---- -------- -------- -------- ---------------------------------------- 00000000 0044 070C0000 81D83CB8 2 IEANUC01.ASAXMPSR+00 READ/WRITE NUCLEUS ... 008CEE88 0044 078D0401 945F2B28 11 AREA(Subpool252Key00)+CB28 EXTENDED PRIVATE 008CEE88 0044 078D2401 945F99BE 1 AREA(Subpool252Key00)+0139BE EXTENDED PRIVATE 008CEE88 0044 078D2401 FA63F71E 1 SPECIALNAME+03F71E EXTENDED PRIVATE 008CEE88 0044 078D0401 FAB00178 1 SPECIALNAME+0178 EXTENDED PRIVATE 008CEE88 0044 078D2401 FAB447BE 1 SPECIALNAME+0447BE EXTENDED PRIVATE 008CEE88 0044 078D0401 FAD9E660 1 SPECIALNAME+29E660 EXTENDED PRIVATE 008B17B8 0044 070C0000 81C92030 1 IEANUC01.IAXVP+4048 READ ONLY NUCLEUS 008B27C0 0044 072C1001 91AF2460 1 BBGZAFSM+7520 EXTENDED CSA ... 008D2E88 0044 078D0401 945F2B28 22 AREA(Subpool252Key00)+CB28 EXTENDED PRIVATE 008D2E88 0044 078D0401 FB036F08 1 SPECIALNAME+036F08 EXTENDED PRIVATE ... 008D2E88 0044 078D0401 FC145732 1 AREA(Subpool229Key00)+A732 EXTENDED PRIVATE
This displays
- The TCB
- The virtual adress where the interrupt occurred
- These entries are in time sequence, and so we can see the second entry had 11 interupts in quick succession (count is 11).
- The Where processing, is a guess at converting the address into a module name. Sometimes it works, for example Module IEANUC01, csect ASAXMPSR, offset 0. Sometimes it cannot tell, for example from Java code.
This shows 2 things
- The application said wake me up in a certain time period
- The TCB was executing and z/OS interrupted it because it needed to go and dispatch some other work. This gives a clue as to hot spots in the code. If the same address occurs many times – the code was executing here many times. I look in the raw systrace to see if this is a TIMer (boring) or a CLKC interesting. Interesting gives you a profile of what code is CPU intensive.
- You can delete all the records outside of this block, then sort 15 32 to sort on PSW address. For my IPCS dump the address 078D0401 945F2B28 occurred 35 times.
I/O activity
Device SSCH Issued I/O Occurred Duration
------ --------------- --------------- ------------
0A99 16:24:48.009819 16:24:48.010064 0.000244
0A99 16:24:48.033619 16:24:48.033856 0.000236
0A99 16:24:48.051014 16:24:48.051081 0.000066
0A99 16:24:48.057377 16:24:48.057434 0.000056
0A99 16:24:48.080331 16:24:48.080430 0.000098
------------
0.000702
Events for 0A99 : 5
Quickest I/O : 0.000056
Slowest I/O : 0.000244
Total : 0.000702
Average : 0.000140
This says for the device 0A99 there were 5 I/O requests, total time 0.7 milliseconds
- I used the REPORT VIEW to get the data into ISPF edit,
- deleted all the records above the I/O section
- Used X ALL
- F TOTAL ALL
- This shows the totals for all I/Os. Most totals were under 1 ms. One I/O was over 5 seconds.
- Displaying the detailed records above this TOTAL record showed one I/O took over 5 seconds!
End of report
End of PERFDATA analysis.
Advanced topic: Look at hot spots
I had seen that PSW 078D0401 945F2B28 was hot. If you go back to the IPCS command panel, you may be able to use the command
L 945F2B28
To display the storage. This will not work. You have to remove the top bit (80), so
L 145F2B28
may work.
If the first character is a letter (A-F) then you need to put a 0 on the front for example
L 0D2345678
You might need to put the address space in as well for example
L 145F2B28 ASID ID(x’44’)
You can say
L 145F2B28 ASID ID(x’44’) LENGTH(4096)
To display large sections of storage
Dig into the trace
You can use
systrace jobname(izusvr1)tcb(x’008CC2D8′)
to display all entries for the specified TCB and jobname.
- Go to the bottom ( type Max press PF8)
- use the report view command to get into and edit session
- Columns 79 to 88 contain a description of some of the system calls
- use X ALL;f ‘ ‘ 79 84 all;del all nx to delete all lines without a description
This gives you a picture of the MVS services being used.