Using IPCS to look at z/OS dumps

I’ve been trying to find a big deep problem in my on z/OS, and I’ve been getting plenty of dumps.

I thought I’d give a short list of command for people new to IPCS. It is not detailed – but at least should give you sign posts as to what you can do.

First get your dump.

On my system dumps are in data sets like SYS1.S0W1.Z31B.DMP00001.

You can issue the operator command D D,E to give information about the dumps on your system. It provides information on where the problem occurred, and registers etc. This may be enough to help you solve your problem.

You can print the EREP report, again this gives a summary of the problem, and may be enough to solve your problem. See Logrec and example output.

Using ISPF editor on the output

On many of the displays you can use the command

REPORT VIEW

to capture what is currently displayed, and display it in an ISPF view session, so you can use ISPF edit commands on it.

If you display the first 24 lines of output, and use REPORT VIEW, you will get ISPF edit with the 24 lines of output. I tend to display the report, scroll down to the bottom of the report (so it is all displayed), and then use REPORT VIEW on the complete report.

Into IPCS

From the main ISPF IPCS panel, select option 0 defaults.

Tab down to

Source  ==> DSNAME('SYS1.S0W1.Z31B.DMP00001')

If you want IPCS to remember this next time you come in, change

Scope   ==> LOCAL   (LOCAL, GLOBAL, or BOTH)

Change local to both, and press enter.

Type =6

to issue an IPCS command.

If this is the first time you have used this dump, type

DROPD

This says drop all information you know about the current dump. This is because last week you may have had a different dump called SYS1.S0W1.Z31B.DMP00001. Any information stored about that dump does not apply to the current dump.

PF3 back to the command window.

status

This gives you a summary of the contents of the dump.

Dump Title: COMPON=BPX,COMPID=SCPX1,ISSUER=BPXMIPCE,MODULE=BPXVOTHD+00000244, ABEND=S00C4,REASON=00000004       

Use

find  'DIAGNOSTIC DATA REPORT'

This gives you information on

  • The symptom string. If you search on the internet with this string you may find other people have hit the same problem
  • Time of Error Information
    • The PSW pointing to (past) the failing instruction
    • Failing instruction text: 12 bytes of data around the failure
    • Translation exception address: If there was a problem with accessing the data
    • The registers and access registers
    • The Home, Primary and Secondary address spaces ids. (For cases where your program jumps to a different address space)

Commands

Display storage in the current ASID: IP WHERE 20E1F344

For the given address IPCS will information about which module it is in.

ASID(X'0010') 20E1F344. BPXINPVT+618344 IN EXTENDED PRIVATE       

Display storage in a different ASID: ip where 20E1F344 asid(x’59’)

If your program is using multiple address spaces, you can specify the address space id.

Display 64 bit data: ip l 050_814F7F30

This displays the storage at the address in standard dump format

Display storage key of the page: ip l 050_814F7F30 display(machine)

Also displays

 ASID(X'0010') ADDRESS(50_814F7F30.) KEY(00) ABSOLUTE(02_03C9AF30.)     

so you can see the key of the page.

Display the offset of the storage within the display: ip l 050_814F7F30 str

The str says display it as a structure – it gives the hex offsets of each line at the start of the line

+002B0 _14F81E0. 00000000 00000000 ...
+002D0 _14F8200. 00000000 00000000 ...

Display the instructions from the page: ip l 20E1F344 I

LIST 20E1F344. ASID(X'0010') LENGTH(X'1000') INSTRUCTION       
ASID(X'0010') ADDRESS(20E1F344.) KEY(00) ABSOLUTE(F9ECA344.)

20E1F354 | D28F 700C 2008 | MVC X'C'(X'90',R7),X'8'(R2)
20E1F35A | 9140 208C | TM X'8C'(R2),X'40'
20E1F35E | A774 0065 | BRC X'7',*+X'CA'
20E1F362 | 9180 208C | TM X'8C'(R2),X'80'
20E1F366 | A7E4 0008 | BRC X'E',*+X'10'
20E1F36A | A798 0004 | LHI R9,X'4'
20E1F36E | 5090 D2D0 | ST R9,X'2D0'(,R13)
20E1F372 | A7F4 0006 | BRC X'F',*+X'C'
20E1F376 | A728 0003 | LHI R2,X'3'
20E1F37A | 5020 D2D0 | ST R2,X'2D0'(,R13)
20E1F37E | 9602 D2D4 | OI X'2D4'(R13),X'02'
20E1F382 | 5898 0000 | L R9,X'0'(R8)
20E1F386 | 41E0 D2D0 | LA R14,X'2D0'(,R13)

Display storage for a given length: ip l 20E1F344 length(20)

I use

IP SETDEF LENGTH(4096) 

to display 4KB of data each time.

Set defaults: IP SETDEF

The IP SETDEF command (or ISPF option 0) gives

....
/*---------------- Local Default Values for IPCS Subcommands ---------------*/
SETDEF LOCAL NOPRINT TERMINAL NOPDS /* Routing of displays */
SETDEF LOCAL FLAG(WARNING) /* Optional diagnostic messages */
SETDEF LOCAL NOCONFIRM /* Double-checking major acts */
SETDEF LOCAL NOTEST /* IPCS application testing */
SETDEF LOCAL DSNAME('SYS1.S0W1.Z31B.DMP00001')
SETDEF LOCAL LENGTH(4096) /* Default data length */
SETDEF LOCAL VERIFY /* Optional dumping of data */
SETDEF LOCAL DISPLAY( MACHINE) /* Include storage keys, .... */
SETDEF LOCAL DISPLAY( REMARK) /* Include remark text */
SETDEF LOCAL DISPLAY( REQUEST) /* Include model LIST subcommand */
SETDEF LOCAL DISPLAY(NOSTORAGE) /* Include contents of storage */
SETDEF LOCAL DISPLAY( SYMBOL) /* Include associated symbol */
SETDEF LOCAL DISPLAY( ALIGN) /* Align output to byte */
SETDEF LOCAL ASID(X'0010') /* Default address space */

If you are using multiple address spaces, you can set the default address space so you do not have to specify ASID(x’..’) every time.

Browsing interesting storage

You can use option 1 from the main IPCS panel, to browse data, and keep track of your interesting data.

You enter the address, (and address space if required) and can browse the data.

This pointer data is stored in IPCS. If you give it a remark, it will persist across sessions, and so help you later.

Once you have defined at least one pointer, you can get a list of all of the pointers you have defined. Use the S prefix command to select it.

Finding data in the dump

You can search for data in the dump. You can specify a range to be searched. Using the variable X can be useful for a repeat search

find x'019EDC90' range(0:7fffffff)
ip l x str position(-512) length(4096)

find x'019EDC90' range(x+1:7fffffff)
ip l x str position(-512) length(4096)
  • The first find defines the range.
  • ip l x str position(-512) length(4096) says display 512 bytes before the found location for 4096 bytes… and the half page on each side of the data
  • The second find – says search from just past the previous location
  • ip l x str position(-512) length(4096) as above
  • I use PF12 as recall, so it is easy to display the storage, pf3, enter, pf12 etc

The output is like

LIST 8FF91C. ASID(X'0010') POSITION(X'-0200') LENGTH(X'1000') STRUCTURE                                                       
ASID(X'0010') ADDRESS(8FF71C.) KEY(00) ABSOLUTE(FA37771C.)
-001FC 008FF720. 00000000 018E0AD0 .... |.... ...
...
-0001C 008FF900. 001ED022 00000000 ... |..}.. ...
+00004 008FF920. 019EE13C 02F72A00 ... |........
+00024 008FF940. 00000000 00000001 ... |..... ...

System trace

This is where it gets harder

The command

SYSTRACE

displays information from the system trace. Some of the key fields are

0002-0010 008DC518  SVC      2 00000000_20BC9420 
07041001 80000000
0002-0010 008DC518 SVCR 2 00000000_20BC9420
07041001 80000000
  • 0002 is the CPU number (which I very rarely use)
  • 0010 is the asid
  • 008DC518 is the TCB address
  • SVC/SVCR 2 is the supervisor call/supervisor call return
  • 00000000_20BC9420 is the PSW where the SVC occured.

On the right of the display is a string like

E29F45B336755380 

This is a Store clock value. If you use the option SYSTRACE TIME(GMT) it is displayed as

06:39:12.803669218 

What is interesting?

I use REPORT VIEW get an ISPF edit session of the trace

  • X ALL
  • F ‘*’ 20 20 all

This gives information such as I/Os completing, but also

  • *RCVY recovery code was entered
  • *SVC D a dump request was made
  • *PGM… an exception occurred – such as invalid storage access.

I then label each line of interest, such as putting .aa in the field on the left hand side.

  • reset
  • loc .aa

To locate the line of interest. Note, lines of interest will normally be at the bottom of the output – just before the dump is taken.

The system carries on processing while dump is being taken, so you will continue to get records in the system trace after the point of failure/request for a dump.

You can use standard ISPF edit commands, such as sort, exclude, delete all X, delete all NX; and use ISPF edit macros (written in Rexx) for special processing.

Note: You cannot use the IPCS command from ISPF edit sessions to display information from the dump. You have to come out of ISPF edit.

Note: Some PGM exceptions are valid – such as page not in storage – and z/OS will read it into memory. These do not have the *; they are just PGM, and are usually not interesting.

Program calls (PC) and SVCs

A program call has output like

 PC     ...   0            00_01A5184A     0030A           LocAscb 
PR ... 0 00_01A5184A 0142923C
  • For every PC there is a matching return (PR) with the same address (00_01A5184A)
  • The PC number is 0030a – and because this is a system value – it knows it is for the LOCASBC request.

Other subsystems, CICS, MQ, DB2 also use PC routines, but their PC number can be different between IPLs, and if you can have more than one CICS, MQ, DB2 subsystems on an LPAR.

Other requests in the SYSTRACE

  • DSP this TCB was dispatched
  • SSRV 11E This TCB was paused ( suspended because there was no work for it to do)
  • EXT CLKC the system timer interrupted this thread

What is a PC doing?

In my trace I have

 PC     ...   0            00_20DDF46C     0C101 

What is 0c101 doing?

Issue the commands

  • SUMMARY FORMAT
    • This formats most of the significant control blocks.
  • Go to the bottom and use REPORT VIEW
  • FIND ‘0c101’ 1 10

This gives output like

                              **PC INFORMATION** 

AUTH
PC KEY EXEC ENTRY EXEC LATENT
NUMBER MASK ASID ADDRESS STATE PARMS
-------- ---- ---- ------- ----- ------------------ ...
0000C100 8000 0059 20811470 S 00000000 00000000 ...


0000C101 8000 0059 20811D68 S 00000000 00000000 ...

The PC number C101 goes to address space with ASID 59 – and address 20811D68

You can then use the commands

ip   where  20811D68  asid(x'59')
ip l 20811D68 asid(x'59') i

to display which load module the code is in, and the instructions which will be executed.

Where’s my dump?

I had been working on a bug and getting out a System Dump (SDUMP). Then, after a fix, I stopped getting dumps, and just got a message with a return code indicating an abend had occurred. So where was my dump?

z/OS has Dump Elimination and Analysis DAE. This keeps a note of the dumps being taken, and can be configured to say “If you get the same abend many times, don’t bother taking a dump”. This stops you getting many identical dumps, and filling your disk storage.

In IPCS option 3 UTILITY – Perform utility functions, 5 DAE – Process DAE data, you can display the contents of the DAE information. On my system the data set is SYS1.DAE.

Command ===>                                                  Scroll ===> PAGE
Enter an Action Code next to an entry.
Enter / next to an entry to choose from a list of Action Codes.

Dataset: 'SYS1.DAE'
Dumps since last DAE Display: 0 Total Dumps suppressed: 214
Events since last DAE Display: 0 Suppression rate: 81%

A Last Last Total Date of Symptom String information:
C Date System Events Dump Abend Reason Module CSECT
_ 05/01/26 S0W1 7 04/28/26 S0EC6 055B0718 BPXINPVT BPXFSMNT
_ 05/01/26 S0W1 66 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
_ 05/01/26 S0W1 10 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
s 05/01/26 S0W1 53 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD

Selecting a record with S gave

                           Date      Time       System Name                    
Last (most recent) Event: 05/01/26 19:02:16 S0W1
Dump Taken: 04/28/26 18:44:59 S0W1

Symptoms used for Dump Suppression:
MVS RETAIN
Key Key Symptom Data Explanation
MOD/ RIDS/ BPXINPVT LOAD MODULE NAME
CSECT/ RIDS/ BPXVOTHD ASSEMBLY MODULE CSECT NAME
PIDS/ PIDS/ 5752SCPX1 PRODUCT/COMPONENT IDENTIFIER
AB/S AB/S 00C4 ABEND CODE-SYSTEM
REXN/ RIDS/ BPXMIPCE RECOVERY ROUTINE CSECT NAME
FI/ VALU/H 7542A0704742A0708BB917 FAILING INSTRUCTION AREA
REGS/ REGS/ C07DC REG/PSW DIFFERENCE
HRC1/ PRCS/ 00000004 ABEND REASON CODE
SUB1/ VALU/C OPENMVS COMPONENT SUBFUNCTION

This information is called a Symptom String. It provides a very short summary of the problem. You can search the internet with this information, to see if the problem has been found before.

For me, this symptom string, there has been 66 occurrences of it in my system (see the first display).

DAE can be configured to say if you get the same symptom string, do not take a dump, as is is most probably the same problem. This means your disks are not fully of identical system dumps.

Similar dumps

In

Command ===>                                                  Scroll ===> PAGE
Enter an Action Code next to an entry.
Enter / next to an entry to choose from a list of Action Codes.

Dataset: 'SYS1.DAE'
Dumps since last DAE Display: 0 Total Dumps suppressed: 214
Events since last DAE Display: 0 Suppression rate: 81%

A Last Last Total Date of Symptom String information:
C Date System Events Dump Abend Reason Module CSECT
_ 05/01/26 S0W1 7 04/28/26 S0EC6 055B0718 BPXINPVT BPXFSMNT
_ 05/01/26 S0W1 66 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
_ 05/01/26 S0W1 10 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
_ 05/01/26 S0W1 53 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD

There is a record for S0EC6 055B0718, and several records for S00C4 00000004. The detailed symptom string for these S00C4 abends are similar but different. The abend occurred at a different place, so the symptom string is slightly different.

Configuring DAE

You configure DAE through parmlib members ADYSETXX

On my system member ADYSET01 stops DAE, so I get a dump for every abend.

In member ADYSET00 I have

 SVCDUMP(MATCH,SUPPRESSALL,UPDATE,NOTIFY(3,30)),   

Where

  • MATCH: Specifies that DAE is to compare the symptoms from the current memory dump to those that have already been recorded in the DAE data set. (Coding MATCH does not indicate that DAE suppresses duplicate memory dumps or update the DAE data set.)
  • SUPPRESSALL: Specifies that duplicate memory dumps are to be suppressed when all criteria for matching and suppressing memory dumps are met except for the VRADAE key.
  • Update: Specifies that the DAE data set is to be updated with the results of matching.
  • Notify: This sends an internal signal to system applications listening, that a dump has occurred. The default is three memory dumps in 30 minutes for a particular symptom string.

How am I meant to know what’s happened?

As well as DAE processing, a record is written to EREP/LOGREC. EREP is a repository for hardware errors, and software errors.

For example, formatting the dataset, gave me

TYPE:  SOFTWARE RECORD      REPORT:  SOFTWARE EDIT REPORT           DAY.YEAR      
(PROGRAM INTERRUPT) REPORT DATE: 121.26
...
SEARCH ARGUMENT ABSTRACT
PIDS/5752SCPX1 RIDS/BPXINPVT#L RIDS/BPXVOTHD AB/S00C4 PRCS/00000004 REGS/C082C
RIDS/BPXMIPCE#R

SYMPTOM DESCRIPTION
------- -----------
PIDS/5752SCPX1 PROGRAM ID: 5752SCPX1
RIDS/BPXINPVT#L LOAD MODULE NAME: BPXINPVT
RIDS/BPXVOTHD CSECT NAME: BPXVOTHD
AB/S00C4 SYSTEM ABEND CODE: 00C4
PRCS/00000004 ABEND REASON CODE: 00000004
REGS/C082C REGISTER/PSW DIFFERENCE FOR R0C:-082C
RIDS/BPXMIPCE#R RECOVERY ROUTINE CSECT NAME: BPXMIPCE
...
TIME OF ERROR INFORMATION
PSW: 07047001 80000000 00000000 20E1F2F4
INSTRUCTION LENGTH: 04 INTERRUPT CODE: 0004
FAILING INSTRUCTION TEXT: 00175045 00005049 01B0E368
TRANSLATION EXCEPTION ADDRESS: 00000000_01B6A404
BREAKING EVENT ADDRESS: 00000000_20E1F280

AR/GR 0-1 FFF00001/00000000_00000097 00000000/00000051_01B6AC90
AR/GR 2-3 00000001/00000000_0239A000 00000000/00000051_02FCFA00
...

HOME ASID: 0010 PRIMARY ASID: 0010 SECONDARY ASID: 0056
PKM: 8040 AX: 0001 EAX: 0000

RECOVERY ROUTINE ACTION
THE RECOVERY ROUTINE RETRIED TO ADDRESS 20E1F7DA.
THE REQUESTED SVC DUMP WAS NOT TAKEN. THE DUMP WAS SUPPRESSED BY DAE.
NO LOCKS WERE REQUESTED TO BE FREED.
THE SDWA WAS REQUESTED TO BE FREED BEFORE RETRY.

The information is a superset of the information in DAE. You get the registers at the point of failure, and what recovery action was taken. (Programs can ignore some abends, or pass the decision to a program higher up the stack).

In this case I can see there was no dump taken because of DAE.

The information in the logrec record was enough for me to debug the program. I did not need an SDUMP.

Processing LOGREC

Logrec is a z/OS dataset which records information about events, such as hardware problems, and software abends. Information is written to the dataset even though the information may have been suppressed elsewhere.

JCL to print it

//IBMPEREP JOB (ACCT),'PRINT LOGREC',CLASS=A,MSGCLASS=H 
//STEP EXEC PGM=IFCEREP1,PARM='CARD'
//SERLOG DD DISP=SHR,DSN=SYS1.S0W1.LOGREC
//DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,10,,CONTIG)
//EREPPT DD SYSOUT=A,DCB=BLKSIZE=133
//TOURIST DD SYSOUT=A,DCB=BLKSIZE=133
//ZERLOG DD SYSOUT=A,DCB=BLKSIZE=133
//SYSIN DD *
PRINT=PS
ACC=N
ZERO=N
ENDPARM
//

To print and clear specify ZERO=Y.

The command syntax is given here.

Example output

IPL record

IPL RECORD EDIT AND PRINTING SECTION                                                        
DAY YEAR HH MM SS TH
DATE -122 26 TIME -06 05 01 52
MODEL - 1090 CPU SERIAL NO. - 011238
MVS/ESA V7 R3

IPL REASON CODE - DF DEFAULT -U-
SUBSYSTEM ID - 00 SUBSYSTEM NAME - NULL
HIGHEST STORAGE ADDRESS 7FFFFFFF

LAST ACTIVITY INFORMATION :
DAY YEAR HH MM SS TH
DATE -121 26 TIME -19 14 13 68
END OF IPL RECORD

3270 termination

DEVICE NUMBER:  000703                DAY YEAR       JOB IDENTITY: VTAM                     

DEVICE TYPE: 3277
MODEL: 1090 HH MM SS.TH
ERROR PATH: EF-0703 CPU ID: 111238 TIME: 19 16 26.42
RECORD IS: TEMPORARY
MODE IS: 370XA

---UNIT STATUS---- SUB-CHANNEL STATUS
....

DEVICE DEPENDENT DATA

TYPE OF RECORD: CLOSEDOWN (X'20')

TERMINAL NAME: LCL703 SIO CNTR: 00000003 TEMP. ERRORS: 00

Software abend

See Where’s my dump?

How often should I clear it?

Periodically you should archive the data, so you can later do trend analysis, such as which disks are having more I/O problems than usual.

When logrec fills up, your automation can trigger a job to copy the logrec dataset, and clear it.