How to debug a bash script when the easy way does not work.

I was trying to debug a program product, and to specify a Java override. The configuration used bash scripts. My problem was which script should I put my fix in?

How to debug bash scripts

From searching in the internet, the way of debugging a simple self contained bash scripts is either

  • edit the file and add the -x as in #!/bin/sh -x this give output for just this file.
  • or use sh -x … command line option to enable trace

The sh -x … command applies to the specified program name. You cannot enable the trace this way for any programs it calls.

There is no way of saying trace this command …and any others programs it calls.

Editing the script may not easy

Editing the file was a challenge, as it was on a read only file system. I had to unmount it, and mount it read/write. Easy on my one person z/OS ; not so easy on a typical z/OS image with many users. You do not want to make a change and break someone else.

I used the Unix command

df -P /usr/lpp/IBM/... 

on the file to find which file system it was on, then used the TSO commands

unmount filesystem('ABC100.ZFS') Immediate
mount filesystem('ABC100.ZFS') mode(RDWR)                   
  type(HFS) mountpoint('/usr/lpp/IBM/abcdef')

Or change the BPX* parmlib member and re-ipl which seems overkill.

I did not really want to have to edit the production scripts – but this was only at the server startup. Only one instance was affected.

The chicken and the egg problem

How do I know which file to edit to add the “-x”? After lots of investigation I found one file deep down, but I could not see who called it.

I used two commands

ps -o pid,ppid,args -p $PPID   
ps -o pid,ppid,args -u $LOGNAME 

The first says give me information about the threads parent PID.

The second says gave information about threads running with the userid of the thread

The first gave

      PID       PPID COMMAND                                                                         
 33620145   67174575 /bin/sh -c /Z24C/usr/lpp/IBM/..../bin/envvars.sh -F/parameter  

Which shows the thread was called from the envvars.sh script with the parameter -F/parameter. This was a start, but who called envvars.sh?

The second command gave

         PID       PPID COMMAND  
(1) 83951745          1 BPXBATCH    
(2) 83951791   83951745 /bin/sh /usr/lpp/IBM/.../start.sh                                     
(7) 50397360   33619999 oedit /usr/lpp/IBM/.../bin/config.final.env                           
(3) 83951836   s83951791 /bin/sh /Z24C/usr/lpp/IBM/.../bin/abcmain.sh run -config /Z24C       
(5) 67174622   33620191 /bin/sh -x /Z24C/usr/lpp/IBM/.../bin/envvars.sh -F/tmp/RSEAPI_e       
(4) 33620191   83951836 /bin/sh -c /Z24C/usr/lpp/IBM/.../bin/envvars.sh -F/tmp/RSEAPI_e       
(6) 83951881   67174622 ps -o pid,ppid,args -u STCRSE 

You have to start at the top of the process tree, and find the children of each process. The parent and child have the same colour process ids above.

  1. I know the script was started from BPXBATCH. This has a process id of 83951745 (first column of data) in red.
  2. The process with this process id as a parent (PPID) (second column of data) in red. is /bin/sh /usr/lpp/IBM/…/start.sh This is what BPXBATCH executes. If you have a system with lots of BPXBATCH instances running, you can locate the command to find which thread is of interest. This process has a process id of 83951791
  3. This process invokes /bin/sh /Z24C/usr/lpp/IBM/…/bin/abcmain.sh run -config /Z24C with a PID of 83951836.
  4. This process invokes /bin/sh -c /Z24C/usr/lpp/IBM/…/bin/envvars.sh -F/tmp/RSEAPI_e with a PID of 33620191
  5. This invokes /bin/sh -x /Z24C/usr/lpp/IBM/…/bin/envvars.sh -F/tmp/RSEAPI_e (again!)
  6. This issues the command ps -o pid,ppid,args -u STCRSE which displays the thread information. We have got to the end.
  7. Was from me editing a file in an Unix Services session, so not relevant to the investigation

Who am I ?

You can use

echo "path to me ->  ${0}     "

which gave me

path to me -> /home/colinpaice/Downloads/test.sh

What would I do to make it easier to debug?

In each shell script have code like

#if the symbol myprog_text exists
if [ -z ${myprog_test+x} ]; 
then 
   # echo "myprog_test is not set"; 
else
  # echo "specified "${myprog_test}
  # if it has a value > 0 then write the path
  if [ $myprog_test -gt 0 ]
  then 
     echo "# path to me --------------->  ${0}     "
  fi
  #if the value is 2 or more then start the trace
  if [ $myprog_test -gt 1 ]
  then 
     set -x 
  fi
fi

This checks to see if global variable myprog_test is set, if it set to a 1 or larger, it displays the path name, if it is set to 2 or larger it turns trace on, using set -x.

With this each script checks a variable product_myscript (where myscript is the name of the script), and takes the appropriate action.

To turn the traces you list all of the scripts in the directory, then use

export myprod_myscript=1

for each script (of interest). This will then give you a trace of which scripts were invoked. You can then set

Some scripts use a variable $SHLVL which give you the call depth. This would be useful,but his is not supported in shell in z/OS.

Is it as easy(!) as this?

No quite. Many services are started as a started task, where the JCL is like

//ABCAPI   EXEC PGM=BPXBATCH,REGION=0M,TIME=NOLIMIT, 
//            PARM='PGM &HOME./tomcat.base/start.sh' 

You can put your global variables in STDENV. Note: this is a list of variables, not a shell script, so you cannot do

xxx=1
yyy=$xxx