Not for humans but for search engine – BPX

BPXF024I

You get this message if the syslogd program is not running.

BPXP015I HFS PROGRAM /usr/lpp/zosmf/lib/libIzuCommandJni.so IS NOT MARKED PROGRAM CONTROLLED.   BPXP014I ENVIRONMENT MUST BE CONTROLLED FOR DAEMON (BPX.DAEMON) PROCESSING.

Use the command extattr /usr/lpp/zosmf/lib/libIzuCommandJni.so to check the Program Controlled attribute is set. Use the extattr +p…. to set it if required.

I had the wrong SAF_PREFIX(‘IZUDFLT‘) in USER.Z24A.PARMLIB(IZUPRMCP).   IZUDFLT was correct.

I had other problems like invalid password when I logged onto the web browser.

Fix the problem and regenerate.

BPXO042I with D OMVS,PFS

I was expecing D OMVS,PFS or D OMVS,P to give me BPXO068I and a list of Physical File Systems.

it gives BPXO042I when the command failed.

This was due to having an HFS definition in my z/OS 3.1 system. HFS is not supported on 3.1 . I removed the definition and it worked.

BPX1SOC TTLS_INIT_CONNECTION rv -1 rc ECONNRESET(1121) rs 2007593789 (0x77a9733d) 77a9733d EDC8121I Connection reset

The bpxmtext 77a9733d gives

TCPIP
JrTtlsHandshakeFailed: AT-TLS was unable to successfully negotiate a secure
TCP connection with the remote end.
Action: Review message EZD1286I for more information about the error.

On syslog was

EZD1287I TTLS Error RC: 403 Initial Handshake

Where 403 is The required certificate was not received from the communication partner.

The Wireshark output had a Certificate flow from the client to the server. This had no certificate in it.

The reason for this was,

  • the client had an RSA certificate
  • the Signature Hash Algorithms sent from the server did not include RSA.

The client was thus unable to send a certificate matching the SHA.

If I specified RSA only signature pairs, I could only use an RSA certificate. An Elliptic Curve certificate (ECDSA) had the same message and error code.

BPX1BND rv -1 rc EADDRINUSE(1115) rs 1951167047 (0x744c7247) EDC8115I Address already in use.

Because a program may not know that the “FIN” (end of conversation) has got to the other end, a socket enters a TIMEWAIT state. The IBM documentation says

If the server cannot wait for one to four minutes, you can use the setsockopt() call in the server to specify SO_REUSEADDR before it issues the bind() call. In that case, the server will be able to bind its socket to the same port number it was using before, even if the TIMEWAIT period has not elapsed. However, the TCP protocol layer still prevents it from establishing a connection to the same partner socket address. As clients normally initiate connections and clients use ephemeral port numbers, the likelihood of this is low.

BPX1SND rv -1 rc EOPNOTSUPP(1112) rs 1977578120 (0x75df7288) EDC8112I Operation not supported on socket.

I got this trying to issue bpx1snd() when there was data in the receive buffer. I used bpx1rcv to read the data, and the problem went away.

I peeked at the data before getting it, so I knew the length of the data to get, and so avoided waiting for data.

char buf[4000];
int lbuff = sizeof(buf);
int alet = 0;
int flags = MSG_PEEK;
BPX1RCV( &sd, // socket desciptor
&lbuff,
&buf,
&alet,
&flags,
&rv, // -1 or number of bytes
&rc,
&rs);

printf("BPX1RCV Peek bytes %d data... n",rv );

lbuff = rv; // the number of bytes in the buffer
flags = 0 ;
BPX1RCV( &sd, // socket descriptor
&lbuff,
&buf,
&alet,
&flags,
&rv, // -1 or number of bytes
&rc,
&rs);

printf("BPX1RCV bytes %d data... n",rv );

BPXF135E RETURN CODE 00000079, REASON CODE 055B005C

I got this using the command

MOUNT FILESYSTEM(‘COLIN.ZFS2’) TYPE(ZFS) MOUNTPOINT(‘/u/ibmuser/temp’ )

code 79 is invalid. The 005b005c means already in use. Either

  • COLIN.ZFS2 is already mounted
  • there is something else mounted on /u/ibmuser/temp

You can use the D OMVS,F command to display the file system and where they are mounted.

BPXF135E RETURN CODE 00000081, REASON CODE 053B006C

May because the file system is mounted READ and it needs to be RDWR.

BPXMTEXT 053B006C -> JRFileNotThere: The requested file does not exist.

Problem 1

I had MOUNTPOINT(‘/u/ibmuser/test’ ) (which did not exit) not the correct MOUNTPOINT(‘/u/ibmuser/temp’ )

Problem 2

I was trying to mount it at /my. I had to go into Unix and issue mkdir /my only then could I mount the file system.

BPXF137E RETURN CODE 00000079, REASON CODE 0588002E.

THE UNMOUNT FAILED FOR FILE SYSTEM …

002E is JRFilesysNotThere. Check the file system is mounted

BPXF137E RETURN CODE 00000072, REASON CODE 058800AA

BPXF137E RETURN CODE 00000072 (the resource is busy) , REASON CODE 058800AA JRFsParentFs The file system has file systems mounted on it.

I was trying to unmount a ZFS file systemm, and got the above messages. It means you cannot unmount it, because you have other file systems attached to it. On the z/OS console it had

BPXF271I FILE SYSTEM ZFS.USERS                             
FAILED TO UNMOUNT BECAUSE IT CONTAINS MOUNTPOINT DIRECTORIES FOR
ONE OR MORE OTHER FILE SYSTEMS WHICH MUST BE UNMOUNTED FIRST,
INCLUDING FILE SYSTEM COLIN.ZFS2

I used

unmount filesystem('COLIN.ZFS2') Immediate

and got message on the console

IOEZ00048I Detaching aggregate COLIN.ZFS2

BPXF002I FILE SYSTEM … NOT MOUNTED. RETURN CODE = 0000009D, REASON CODE = 11B900B7

With a Physical File System (PFS) there was a bad parameter in the OSI_THREAD block. The value of threadParm.ot_modname ( program to execute) did not exist.

Where’s my dump?

I had been working on a bug and getting out a System Dump (SDUMP). Then, after a fix, I stopped getting dumps, and just got a message with a return code indicating an abend had occurred. So where was my dump?

z/OS has Dump Elimination and Analysis DAE. This keeps a note of the dumps being taken, and can be configured to say “If you get the same abend many times, don’t bother taking a dump”. This stops you getting many identical dumps, and filling your disk storage.

In IPCS option 3 UTILITY – Perform utility functions, 5 DAE – Process DAE data, you can display the contents of the DAE information. On my system the data set is SYS1.DAE.

Command ===>                                                  Scroll ===> PAGE
Enter an Action Code next to an entry.
Enter / next to an entry to choose from a list of Action Codes.

Dataset: 'SYS1.DAE'
Dumps since last DAE Display: 0 Total Dumps suppressed: 214
Events since last DAE Display: 0 Suppression rate: 81%

A Last Last Total Date of Symptom String information:
C Date System Events Dump Abend Reason Module CSECT
_ 05/01/26 S0W1 7 04/28/26 S0EC6 055B0718 BPXINPVT BPXFSMNT
_ 05/01/26 S0W1 66 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
_ 05/01/26 S0W1 10 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
s 05/01/26 S0W1 53 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD

Selecting a record with S gave

                           Date      Time       System Name                    
Last (most recent) Event: 05/01/26 19:02:16 S0W1
Dump Taken: 04/28/26 18:44:59 S0W1

Symptoms used for Dump Suppression:
MVS RETAIN
Key Key Symptom Data Explanation
MOD/ RIDS/ BPXINPVT LOAD MODULE NAME
CSECT/ RIDS/ BPXVOTHD ASSEMBLY MODULE CSECT NAME
PIDS/ PIDS/ 5752SCPX1 PRODUCT/COMPONENT IDENTIFIER
AB/S AB/S 00C4 ABEND CODE-SYSTEM
REXN/ RIDS/ BPXMIPCE RECOVERY ROUTINE CSECT NAME
FI/ VALU/H 7542A0704742A0708BB917 FAILING INSTRUCTION AREA
REGS/ REGS/ C07DC REG/PSW DIFFERENCE
HRC1/ PRCS/ 00000004 ABEND REASON CODE
SUB1/ VALU/C OPENMVS COMPONENT SUBFUNCTION

This information is called a Symptom String. It provides a very short summary of the problem. You can search the internet with this information, to see if the problem has been found before.

For me, this symptom string, there has been 66 occurrences of it in my system (see the first display).

DAE can be configured to say if you get the same symptom string, do not take a dump, as is is most probably the same problem. This means your disks are not fully of identical system dumps.

Similar dumps

In

Command ===>                                                  Scroll ===> PAGE
Enter an Action Code next to an entry.
Enter / next to an entry to choose from a list of Action Codes.

Dataset: 'SYS1.DAE'
Dumps since last DAE Display: 0 Total Dumps suppressed: 214
Events since last DAE Display: 0 Suppression rate: 81%

A Last Last Total Date of Symptom String information:
C Date System Events Dump Abend Reason Module CSECT
_ 05/01/26 S0W1 7 04/28/26 S0EC6 055B0718 BPXINPVT BPXFSMNT
_ 05/01/26 S0W1 66 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
_ 05/01/26 S0W1 10 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD
_ 05/01/26 S0W1 53 04/28/26 S00C4 00000004 BPXINPVT BPXVOTHD

There is a record for S0EC6 055B0718, and several records for S00C4 00000004. The detailed symptom string for these S00C4 abends are similar but different. The abend occurred at a different place, so the symptom string is slightly different.

Configuring DAE

You configure DAE through parmlib members ADYSETXX

On my system member ADYSET01 stops DAE, so I get a dump for every abend.

In member ADYSET00 I have

 SVCDUMP(MATCH,SUPPRESSALL,UPDATE,NOTIFY(3,30)),   

Where

  • MATCH: Specifies that DAE is to compare the symptoms from the current memory dump to those that have already been recorded in the DAE data set. (Coding MATCH does not indicate that DAE suppresses duplicate memory dumps or update the DAE data set.)
  • SUPPRESSALL: Specifies that duplicate memory dumps are to be suppressed when all criteria for matching and suppressing memory dumps are met except for the VRADAE key.
  • Update: Specifies that the DAE data set is to be updated with the results of matching.
  • Notify: This sends an internal signal to system applications listening, that a dump has occurred. The default is three memory dumps in 30 minutes for a particular symptom string.

How am I meant to know what’s happened?

As well as DAE processing, a record is written to EREP/LOGREC. EREP is a repository for hardware errors, and software errors.

For example, formatting the dataset, gave me

TYPE:  SOFTWARE RECORD      REPORT:  SOFTWARE EDIT REPORT           DAY.YEAR      
(PROGRAM INTERRUPT) REPORT DATE: 121.26
...
SEARCH ARGUMENT ABSTRACT
PIDS/5752SCPX1 RIDS/BPXINPVT#L RIDS/BPXVOTHD AB/S00C4 PRCS/00000004 REGS/C082C
RIDS/BPXMIPCE#R

SYMPTOM DESCRIPTION
------- -----------
PIDS/5752SCPX1 PROGRAM ID: 5752SCPX1
RIDS/BPXINPVT#L LOAD MODULE NAME: BPXINPVT
RIDS/BPXVOTHD CSECT NAME: BPXVOTHD
AB/S00C4 SYSTEM ABEND CODE: 00C4
PRCS/00000004 ABEND REASON CODE: 00000004
REGS/C082C REGISTER/PSW DIFFERENCE FOR R0C:-082C
RIDS/BPXMIPCE#R RECOVERY ROUTINE CSECT NAME: BPXMIPCE
...
TIME OF ERROR INFORMATION
PSW: 07047001 80000000 00000000 20E1F2F4
INSTRUCTION LENGTH: 04 INTERRUPT CODE: 0004
FAILING INSTRUCTION TEXT: 00175045 00005049 01B0E368
TRANSLATION EXCEPTION ADDRESS: 00000000_01B6A404
BREAKING EVENT ADDRESS: 00000000_20E1F280

AR/GR 0-1 FFF00001/00000000_00000097 00000000/00000051_01B6AC90
AR/GR 2-3 00000001/00000000_0239A000 00000000/00000051_02FCFA00
...

HOME ASID: 0010 PRIMARY ASID: 0010 SECONDARY ASID: 0056
PKM: 8040 AX: 0001 EAX: 0000

RECOVERY ROUTINE ACTION
THE RECOVERY ROUTINE RETRIED TO ADDRESS 20E1F7DA.
THE REQUESTED SVC DUMP WAS NOT TAKEN. THE DUMP WAS SUPPRESSED BY DAE.
NO LOCKS WERE REQUESTED TO BE FREED.
THE SDWA WAS REQUESTED TO BE FREED BEFORE RETRY.

The information is a superset of the information in DAE. You get the registers at the point of failure, and what recovery action was taken. (Programs can ignore some abends, or pass the decision to a program higher up the stack).

In this case I can see there was no dump taken because of DAE.

The information in the logrec record was enough for me to debug the program. I did not need an SDUMP.

Processing LOGREC

Logrec is a z/OS dataset which records information about events, such as hardware problems, and software abends. Information is written to the dataset even though the information may have been suppressed elsewhere.

JCL to print it

//IBMPEREP JOB (ACCT),'PRINT LOGREC',CLASS=A,MSGCLASS=H 
//STEP EXEC PGM=IFCEREP1,PARM='CARD'
//SERLOG DD DISP=SHR,DSN=SYS1.S0W1.LOGREC
//DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,10,,CONTIG)
//EREPPT DD SYSOUT=A,DCB=BLKSIZE=133
//TOURIST DD SYSOUT=A,DCB=BLKSIZE=133
//ZERLOG DD SYSOUT=A,DCB=BLKSIZE=133
//SYSIN DD *
PRINT=PS
ACC=N
ZERO=N
ENDPARM
//

To print and clear specify ZERO=Y.

The command syntax is given here.

Example output

IPL record

IPL RECORD EDIT AND PRINTING SECTION                                                        
DAY YEAR HH MM SS TH
DATE -122 26 TIME -06 05 01 52
MODEL - 1090 CPU SERIAL NO. - 011238
MVS/ESA V7 R3

IPL REASON CODE - DF DEFAULT -U-
SUBSYSTEM ID - 00 SUBSYSTEM NAME - NULL
HIGHEST STORAGE ADDRESS 7FFFFFFF

LAST ACTIVITY INFORMATION :
DAY YEAR HH MM SS TH
DATE -121 26 TIME -19 14 13 68
END OF IPL RECORD

3270 termination

DEVICE NUMBER:  000703                DAY YEAR       JOB IDENTITY: VTAM                     

DEVICE TYPE: 3277
MODEL: 1090 HH MM SS.TH
ERROR PATH: EF-0703 CPU ID: 111238 TIME: 19 16 26.42
RECORD IS: TEMPORARY
MODE IS: 370XA

---UNIT STATUS---- SUB-CHANNEL STATUS
....

DEVICE DEPENDENT DATA

TYPE OF RECORD: CLOSEDOWN (X'20')

TERMINAL NAME: LCL703 SIO CNTR: 00000003 TEMP. ERRORS: 00

Software abend

See Where’s my dump?

How often should I clear it?

Periodically you should archive the data, so you can later do trend analysis, such as which disks are having more I/O problems than usual.

When logrec fills up, your automation can trigger a job to copy the logrec dataset, and clear it.