Interacting with a Python script running as a started task on z/OS

I had blogged Running Python as a started task on z/OS which is fine if you run a script when runs and naturally ends. It is not a good idea, if you are using a long running script such as a server, because you have to cancel it, rather than shut it down.

I have created some code in github which allows you a Python script to Write To Operator, and wait for operator requests to Stop, or “modify” your job (pass it data).

It also allows you to display some z/OS information, job name, ASCB, thread TCB, and CPU used by thread.

Please try it and give me feedback.

One minute MVS: DLLs and load modules

This is one of a series of short blog posts on z/OS topics. This post is about using Dynamic Link Library. It follows on from One minute MVS: Binder and loader which explains some of the concepts used.

Self contained load modules are good but…

An application may have a main program and use external functions bound to the main program. This is fine for the simple case. In more complex environments where a program loads external modules this may be less effective.

If MOD1 and MOD2 are modules and they both have common subroutines MYSUBS, then when MOD1 and MOD2 are loaded, they each have a copy of MYSUBS. This can lead to inefficient use of storage (and not as fast as one copy which every thing uses).

You may not be responsible for MYSUBS, you just use the code. If they are bound to your load module, then in order to use a later version you will have to rebind your load modules. This is not very usable.

Stubs are better

There are several techniques to get round these problems.

In your program you can use a stub module. This sits between your application and the code which does all of the work.

  • For example for many of the z/OS provided facilities, the stub modules chain through z/OS control blocks to find the address of the routine to use.
  • You can have the subroutine code in a self-contained load module. Your stub can load the module then branch to it. This way you can use the latest available version of code. For example MQCONN loads a module, other MQ verbs use the module, MQDISC releases the load module.

If the stub loads a module, then other threads using the stub just increment the in-use count and do not have to reload the module (from disk). This means you do not have multiple copies of the load module in memory.

The Dynamic Link Library support does this load of the module for you, is it a build option rather than change your code.

Creating a DLL Object

When you compile the C source you need to have options DLL and EXPORTALL. (The documentation says it is better to use #PRAGMA EXPORT(name) for each entry, rather than use the EXPORTALL option – but this means making a small change to your program).

When you bind the module you need parameter DYNAM=DLL, and specify //BIND.SYSDEFSD DD… pointing to a member, for example COLIN.SIDE(CERT)

The member will have information (known as a side deck) with information about the functions your program has exposed. For example

 IMPORT DATA,'CERT','ascii_tab' 
 IMPORT DATA,'CERT','other_tab'  
 IMPORT CODE,'CERT','isbase64'                       
 IMPORT CODE,'CERT','printHex'                    
 IMPORT CODE,'CERT','UnBase64'                    f

Where

  • IMPORT …. ‘CERT’ says these are for load module CERT
  • IMPORT DATA – these labels are data constants
  • IMPORT CODE – these are functions within CERT

If this was a 64 bit program, it would have IMPORT CODE64, and IMPORT DATA64.

If you compile in Unix Services you get a side-deck file like /u/tmp/console/cert.x containing

IMPORT DATA64,'cert.so','ascii_tab'                                         IMPORT CODE64,'cert.so','isbase64'
...

and the load module /u/tmp/console/cert.so.

The .x file is only used by the binder. The .so file is loaded and used by programs.

When your program tries to use one of these functions, for example isbase64, module CERT(cert.so) is loaded and used. This means you need the library containing this module to be available to the job.

Binding with the DLL

Instead of binding your program with the CERT object. You bind it with the side deck and use the equivalent to INCLUDE COLIN.SIDE(CERT).
The binder associates the external reference isbase64 with the code in load module CERT.

Conceptually the isbase64 reference, points to some stub code which does

  • Load module CERT
  • Find entry point isbase64 within this module
  • Replace the address of the isbase64 external reference, so it now points to the real isbase64 entry point in module CERT.
  • Branch to the real isbase64 code.

The next time isbase64 is used – it goes directly to the isbase64 function in the CERT module.

Using DLLs

You can use DLLs in Unix, or normal address spaces.

For a program running in Unix services you can use the C run time functions

  • dlopen(modules_name)
  • dlclose(dlHandle) release the module
  • dlsym(dlHandle,function_name|variable_name) to get information about a function name or variable
  • dlerror() to get diagnostic information about the last dynamic link error

Creating a C header file from an assembler DSECT

I was writing a program to display information about a RACF keyring, using the RACF callable services. RACF provides a mapping for the structures – but this is an assembler macro, and no C header file is available.
Instead of crafting the header file by hand, I thought I would try the C DSECT conversion utility. This was pretty easy to use (once I had got it working!). But not 100% reliable.

The JCL is

//COLINZP  JOB 1,MSGCLASS=H 
//         JCLLIB ORDER=CBC.SCCNPRC 
//DSECT EXEC PROC=EDCDSECT, 
// INFILE=COLIN.C.SOURCE(DSECTA), 
// OUTFILE=COLIN.C.H.VB(TESTASM), 
// DPARM='EQU(BIT),LP64' 
/* 

The LP64 tells it to generate the header file for an LP64 program. It will generate pointers with __ptr32.

COLIN.C.SOURCE(DSECTA) has

AAAAA  CSECT 
*      IRRPCOMP 
AAA    DS  CL4 EYE CATCHER 
ABBB   DC  CL4'ABBB' 
       END 

The name of the structure is taken from the CSECT or DSECT the code is in. The above code produces

#pragma pack(packed)                                                    
                                                                        
struct aaaaa {                                                          
  unsigned char  aaa[4];  /* EYE CATCHER */                             
  unsigned char  abbb[4];                                               
  };                                                                    
                                                                        
#pragma pack(reset)                                                     

You needed an output dataset with Variable Blocked format. When I used a fixed block, the output was in columns 1 to 79, but by default C reads columns 1-72 for a Fixed Block file.

The JCL invokes the HLASM, and creates an ADATA file. This file has information about all of the data and instructions used. This ADATA file is passed through the CCNEDSCT program which generates C source from the ADATA information.

There are many negative comments in the internet about CCNEDSCT. For example it does not pass block comments through.

I found it a bit buggy.

The source

AAAAA  CSECT 
AAA    DS  CL4 EYE CATCHER 
ABBB   DC  CL4'ABBB' 
ACCC   DC  CL4'ACCC' 
ASIZE  EQU *-AAA 
       END 

gave me

#pragma pack(packed) 
                                                          
struct aaaaa { 
  unsigned char  aaa[4];     /* EYE CATCHER */ 
  unsigned char  abbb[4]; 
  unsigned int          : 4, 
                 asize  : 2, 
                       : 26; 
  }; 
                                                          
#pragma pack(reset) 

Which is clearly wrong, as it is missing variable ACCC, and ASIZE is a bit field within an integer.

More information about ADATA.

The data is laid out as described in macro (on my system) HLA.SASMMAC1(ASMADATA).

One minute MVS: Binder and loader

This topic is in the series of “One minute MVS” giving the essentials of a topic.

Your program

The use of functions or subroutines are very common in programming. For a simple call

x = mysub()

which calls an external function mysub has generated code like

MYPROG  CSECT     
     L 15,mysub the function 
     LA 1,PARMLIB
     BASR  14,15 or BALR in older programs 
...
mysub  DC  V(MYSUB)

where

  • MYPROG is an entry point to the program
  • the mysub variable defines some storage for the external reference(ER) to MYSUB.

The output of the assembler or compiler is a file or dataset member, known as an an “object deck” or “object file”. It cannot be executed, as it does not have the external functions or subroutines.

The binder (or linkage editor)

The binder program takes object decks, includes any subroutines and external code and creates a load module or program object.

In early days load modules were stored in PDS datasets. In the directory of a member was information about the size of the load module, and the entry point. As the binder got more sophisticated, the directory did not have enough space for all of the data that was created. As a result PDSE (Extended PDSs) were created, which have an extendable directory entry. For files in Unix Services Load modules are stored in the the Unix file system.

The term Program Object is used to cover load modules and files in the Unix file system. I still think of them both as Load Modules.

The binder takes the parts needed to create the program object, for example functions you created and are stored in a PDS or Unix, and includes code, for example for the prinf() function. These are merged into one file.

Pictorially the merged files look like

  • Offset 0 some C code.
  • Offset 200 MYPROG Object
    • Offset 10 within MYPROG, MYPROG entry point (so offset 210 from the start of the merged files)
    • Offset 200 within MYPROG, mysub:V(MYSUB)
    • Offset 310 within MYPROG end of MYPROG
  • Offset 512 FUNCTION1 object
  • Offset 800 MYSUB1 Object
    • Offset 28 within MYSUB1, MYSUB entry point
    • Offset 320 within MYSUB1, end of MYSUB

The binder can now resolve references. It knows that MYSUB entry point is at offset 28 within MYSUB1 object, and MYSUB1 Object is 800 from the start of the combined files. It can now replace the mysub:V(MYSUB) in MYPROG with the offset value 828.

The entire files is stored as a load module(program object) as one object, with a name that you give it, for example COLIN.

The loader

When load module COLIN is loaded. The loader loads the load module from disk into memory. For example at address 200,000. As part of the loading, it looks at the external references and calculates the address in memory from the offset value. So 200,000 + offset 828 is address 200828. This value is stored in the mysub variable.

When the function is about to be called via L 15,mysub, register 15 has the address of the code in memory and the program can branch to execute the code.

It gets more complex than this

Consider two source programs

int value = 0;
int total = 0;
void main()
{
  value =1;
  total = total + value; 
  printTotal();
  
}
int total;
int done;
void printotal()
{
  printf("Total = %d\n",total);
  done = 1; 
}

There are some global static variables. The variable “total” is used in each one – it is the same variable.

These programs are defined as being re-entrant, and could be loaded into read only storage.

The variables “value” and “total”, cannot go into read only storage as they change during the execution of the program.

There are three global variables: “value”, “total” and “done”; total is common to both programs.

These variables go into a storage area called Writeable Static Area (WSA).

If there are multiple threads running the program, each gets its own copy of the WSA, but they can all shared instructions.

A program can also have 31 bit resident code, and 64 bit resident code. The binder takes all of these parts and creates “classes” of data

  • The WSA class. This contains the merged list of static variables.
  • 64-bit re-entrant code – class. It takes the 64-bit resident code from all of the programs, and included subroutines and creates a “64-bit re-entrant” blob.
  • 31- bit re-entrant code -class. It takes the 31-bit resident code from all of the programs, and included subroutines and creates a “31-bit re-entrant” blob.
  • 64-bit data – class, from all objects
  • 31-bit data – class, from all objects

When the loader loads the modules

  • It creates a new copy of the WSA for each thread
  • It loads the 64 bit re-entrant code (or reuses any existing copy of the code) into 64 bit storage
  • It loads the 31 bit re-entrant code (or reuses any existing copy of the code) into 31 bit storage.

How can I see what is in the load module?

If you look at the output from the binder you get output which includes content like

CLASS  B_TEXT            LENGTH =      4F4  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  C_DATA64          LENGTH =        0  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  C_CODE64          LENGTH =     1A38  ATTRIBUTES = CAT,   LOAD, RMODE= 64 
CLASS  C_@@QPPA2         LENGTH =        8  ATTRIBUTES = MRG,   LOAD, RMODE= 64 
CLASS  C_CDA             LENGTH =     3B50  ATTRIBUTES = MRG,   LOAD, RMODE= 64 
CLASS  B_LIT             LENGTH =      140  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  B_IMPEXP          LENGTH =      A6B  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
CLASS  C_WSA64           LENGTH =      6B8  ATTRIBUTES = MRG, DEFER , RMODE= 64 
CLASS  C_COPTIONS        LENGTH =      304  ATTRIBUTES = CAT, NOLOAD 
CLASS  B_PRV             LENGTH =        0  ATTRIBUTES = MRG, NOLOAD 

Where

  • B_TEXT is from HLASM (assembler program). Any sections are conCATenated together (Attributes =CAT)
  • C_WSA64 is the 64 bit WSA. Any data in these sections have been MeRGed (see the “total” variable above) (Attributes = MRG)
  • C_OPTIONS contains the list of C options used at compile time. The loader ignores this section (NOLOAD), but it is available for advanced programs such as debuggers to extract this information from the load module.

To introduce even more complexity. You can have class segments. These are an advanced topic where you want groups of classes to be independently loaded. Most people use the default of 1 segment.

Layout of the load module

Class layout

You can see the layout of the classes in the segment.

  • Class B_TEXT starts at offset 0 and is length 4F4.
  • Class C_CODE64 is offset 4F8 (4F4 rounded up to the nearest doubleword) and of length 1A38.
CLASS  B_TEXT            LENGTH =      4F4  ATTRIBUTES = CAT,   LOAD, RMODE=ANY 
                         OFFSET =        0 IN SEGMENT 001     ALIGN = DBLWORD 
  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
CLASS  C_CODE64          LENGTH =     1A38  ATTRIBUTES = CAT,   LOAD, RMODE= 64 
                         OFFSET =      4F8 IN SEGMENT 001     ALIGN = DBLWORD ff

Within each class

CLASS  C_CODE64          LENGTH =     1A38  ATTRIBUTES = CAT,   LOAD, RMODE= 64 
                         OFFSET =      4F8 IN SEGMENT 001     ALIGN = DBLWORD 
--------------- 
                                                                                                  
 SECTION    CLASS                                      ------- SOURCE -------- 
  OFFSET   OFFSET  NAME                TYPE    LENGTH  DDNAME   SEQ  MEMBER 
                                                                                                  
                0  $PRIV000010        CSECT      1A38  /0000001  01 
       0        0     $PRIV000011        LABEL 
      B8       B8     PyInit_zconsole    LABEL 
     8B8      8B8     or_bit             LABEL 
     C30      C30     cthread            LABEL 
    1158     1158     cleanup            LABEL 
    11B8     11B8     printHex           LABEL 
  • The label $PRIV000010 CSECT is generated because I did not have a #pragma CSECT(CODE,”….”) statement in my source. If you use #pragma CSECT(STATIC,”….”) you get the name in the CLASS C_WSA64 section (see the following section)
  • The C function “or_bit” is at offset B8 in the class C_CODE64.

The static area

For the example below, the C module had #pragma CSECT(STATIC,”SZCONSOLE”)

CLASS  C_WSA64           LENGTH =      6B8  ATTRIBUTES = MRG, DEFER , RMODE= 64 
                         OFFSET =        0 IN SEGMENT 002     ALIGN = QDWORD 
--------------- 
                                                                                     
            CLASS 
           OFFSET  NAME                TYPE    LENGTH   SECTION 
                0  $PRIV000012      PART            10 
               10  SZCONSOLE        PART           5A0  ZCONSOLE 
              5B0  ascii_tab        PART           100  ascii_tab 
              6B0  gstate           PART             4  gstate 

There are two global static variables, common to all routines, ascii_tab, and gstate. They each have an entry defined in the class.

All of the static variables internal to routines are in the SZCONSOLE section. They do not have a explicit name because they are internal.

I thought writing to the operator was easy.

There are a couple of ways of writing a message to the operator and to the system log, but it took me a while to understand the differences. For example what are the circumstances when the following messages are produced (or not produced)?

  1. BPXM023I (COLIN) A Message
  2. A Message
  3. A Message
  4. +A Message
  5. @A Message

Note the messages in 2. and 3. were produced by different methods, and I got message 3. before lunch and message 4. after lunch.

I spent most of a day trying to find out why 3. and 4. were different; why 4. had a plus sign at the start of the message.

Background to operator messages

Message description

Message written to the operator or syslog have a description field which control how they are displayed. For example

  • 1 System Failure : The message indicates an error that disrupts system operations. To continue, the operator must reIPL the system or restart a major subsystem.
  • 2 Immediate Action Required: The message indicates that the operator must perform an action immediately. The message issuer could be in a wait state until the action is performed or the system needs the action as soon as possible to improve performance.
  • 3 Eventual Action Required: The message indicates that the operator must perform an action eventually.
  • 6 Job Status: The message indicates the status of a job or job step.
  • 7 Task-Related: The message is issued by an application or system program. Messages with this descriptor code are deleted when the job step that issued them ends.

Most applications will use description 3, 6, and 7. They will not use 1 (System Failure) or
2 (Immediate Action Required).

Action messages many have an * sign or @ sign displayed before the first character of the message. The * sign indicates that the WTO was issued by an authorized program. The @ sign indicates that the WTO was issued by an unauthorized program.

From the IBM documentation

Message routing

On a busy production system there are many messages produced. Some messages need an action taken, other messages are for information.

You can route where messages get sent to. For example if you want a tape mounted, then only those people involved with tapes are interested in these messages. Messages about security should be routed to security people. Messages which require operator action should be routed to an operator.

You specify where messages get sent to using the ROUTCDE field. These are documented in the documentation for WTO.

You control which console get which messages by the CONSOLxx member of SYS1.PARMLIB concatenation, or use the vary console command.

In the syslog output you get information like

N 4000000 S0W1 22204 10:02:46.19 JOB08909 00000090 @CP Desc 1

The 4000000 is the routing codes. 8000000 is routing code 1, 4000000 is routing code 2 etc.

You can configure which consoles get which messages for example

V 703,CONSOLE,ROUT=ALL

You can use this command to add (AROUT=rtcode) or delete (DROUT=rtcode) routing codes from the console.

Creating messages

Use of __console2().

There is a C runtime function __console2() which allows you to write a message to the console.

  • If the userid is has read access to BPX.CONSOLE in the FACILITY class or running as a super user (id(0) ), you get “A Message”
  • If the user does not have access to BPX.CONSOLE in the FACILITY class, and is not a super user, you get “BPXM023I (userid) A Message”

Use of Write To Log (WTL)

There is an assembler macro WTL which allows an application to write to the system log. This is in SYS1.AMODGEN. The documentation says

Note: IBM recommends you use the WTO macro with the MCSFLAG=HRDCPY parameter instead of WTL, because WTO supplies more data than WTL

Use Write To Operator

The assembler macro

    wto 'A Message'

Gives

+A message

The assembler services documentation has

If the user supplies a descriptor code in the WTO macro, an indicator is inserted at the start of the message. The indicators are: a blank, an at sign (@), an asterisk (*), or a blank followed by a plus sign (+). The indicator inserted in the message depends on the descriptor code that the user supplies and whether the user is a privileged or APF-authorized program or a non-authorized problem program. Table 1 shows the indicator that is used for each descriptor code.

With the assembler code

  wto 'CP Desc 1',DESC=1 
  wto 'CP Desc 2',DESC=2 
  wto 'CP Desc 3',DESC=3 
  wto 'CP Desc 11',DESC=11 
  wto 'CP Desc 12',DESC=12 

You get messages of different colours, and the messages may or may not be displayed!

Operator console

The operator screen is where messages are sent which require the operator to do something. These days automation processes many of the messages, so there should only be a few messages appearing on the operator console. Some of these messages can roll off the top of the screen.

APF authorised program

Not APF authorised program

From this

  • the APF authorised library includes messages with Descr 1 and 2. It uses “*” and ” ” as data prefix.
  • the non APF authorised library used prefix ” +” and “@”. It does not display messages with description 1 or 2, as an application should not be issuing messages of type “System Failure” or “Immediate Action Required”!

SYSLOG (from SDSF)

APF authorised

Not APF authorised

All messages are displayed. From the message prefix you can tell if the message came from an APF library or not.

Job log

APF authorised

Not APF authorised

The same information is displayed, allowing for the difference in the APF library.

Why did the + change over lunch?

The reason why it was different after lunch, was I had dynamically APF authorised my load library. I had re-ipled over lunch, so the load library was no longer authorised, and so it puts the “+” on the front – it is obvious now you know.

Make a decision on what you want

If you are going to use __console2 or WTO to write to the operator or syslog. You need do decide

  • If you want the operator to see it; for example some messages appear on the operator console, some do not appear there.
  • If the operator needs to take action messages prefixed with “*” or “@”
  • Which console to send the message to – just to hard copy, to the operator, or to the tape library

Recreating your certificates and keyrings with some hints on avoiding the swamp of RACF callable services and gsk.

I was trying to recreate the commands for the MQ certificates and keyring, so I could create a new set for a second queue manager.

At first glance it looked like writing it in rexx would involve a lot of parsing, so I started using the RACF callable service to extract information certificate and keyring information from RACF using a C/assembler interface.

This was a slow journey with many pitfalls, and I ended up using a REXX program.

I’ve put the code up on git hub for both the RACF and the C interface, to help other people who may be considering going into the RACF callable services swamp.

Using the C interface

You use the R_datalib or IRRSDL00 callable service.

The key bits of code  are

#pragma linkage(IRRSDL00 ,OS) 
- - - - - - - - - - - - - - - - -
char * workarea = (char *) malloc(1024) ;
- - - - - - - - - - - - - - - - -
struct {
char length;
char value[8];
} RACF_userid;
memcpy(RACF_userid.value,"START1 ",8);
RACF_userid.length = 6;
- - - - - - - - - - - - - - - - -
int parmlist_version = 0;
- - - - - - - - - - - - - - - - -
char DataGetFirst = 0x01 ; /*Data getFirst */
int attributes = 0;
rc= IRRSDL00( workarea, // WORKAREA
&ALET1 , // ALET
&SAF_RC, // SAF RC
&ALET2, // ALET
&RACF_RC,// RACF RC
&ALET3 , // ALET
&RACF_RS,// RACF Reason
&DataGetFirst ,// function code
&attributes, // option
&RACF_userid, // RACF userid
&ring_name, // certificate fw ... cert
&parmlist_version, // Aplication userid
&parmlist );

That code was not particularly difficult – it got more difficult using the parmlist.

#pragma pack(1) 
struct {
char * results_handle ; //in offset 0
int certificate_usage ; //out offset 4
int isDefault ; // out 8
int certificate_length; //in/out c
char * certificate ; // in 10
int private_key_length; //in/out 14
char * private_key ; //in 18
int private_key_type ; //out 1c
int private_bitsize ; //out 20
int label_length ; //in/out 24
char * label ; //in 28
char cert_useridl; // in 2c
char cert_userid[8]; // in 2d
char temp[3] ; // offset 35
int subjects_dn_length ; //in 38
char * subjects_dn_ptr ; //in 3c
int record_length ; //in/out 40
char * record_ptr ; //inp 44
int cert_status ; //in/out 48
} parmlist;
#pragma pack(4)
- - -
char certificate[2000];
parmlist.certificate_length = 2000 ;
parmlist.certificate =( char *) &certificate;

Notes:

  1. The temp[3] is not in the documentation and I got various errors without it – such as internal error.   This was because the pointers following it were at the wrong offset.
  2. parmlist.cert_useridl must be set to 8, and the value must be padded on the right with blanks.  This is different to the RACF_userid field in IRRSDL00 which is the true length of the userid. 

Compile it

//S1 JCLLIB ORDER=CBC.SCCNPRC 
//DOCLG EXEC PROC=EDCCBG,INFILE='ADCD.C.SOURCE(C)',
// CPARM='OPTF(DD:COPTS)'
//* CPARM='LIST,SSCOMM,SOURCE,LANGLVL(EXTENDED)'
//COMPILE.COPTS DD *
LIST,SSCOMM,SOURCE,LANGLVL(EXTENDED)
aggregate(offsethex) xref
TEST
/*
//COMPILE.SYSIN DD *
...
//BIND.SYSLMOD DD DISP=SHR,DSN=ADCD.PDSE.LOADLIB(CERT)
//BIND.CSS DD DISP=SHR,DSN=SYS1.CSSLIB
//BIND.SYSIN DD *
INCLUDE CSS(IRRSDL00)
/*

The output

The subject ( what I was after) had a format like in hex and ASCII is

00000000 : 3038310B 30090603 55040613 02474231 081.0...U....GB1 
00000010 : 0C300A06 0355040A 0C035353 53310B30 .0...U....SSS1.0
00000020 : 09060355 040B0C02 4341310E 300C0603 ...U....CA1.0...
00000030 : 5504030C 05535343 4138 U....SSCA8

If I list the certificate it reports Subject’s Name: CN=SSCA8.OU=CA.O=SSS.C=GB, you can see the elements in the hex dump.

The data is DER encoding of ANS.1 format. Buried within this,  is a field with value “GB” which is a type 2.5.4.6 which is country.   Information is stored in certificate in ASCII and in DER format, so I was not surprised to see the data in this format.

The easiest way of decoding this is to use gskit (IBM global security kit).   Gskit provides c routines for encryption/decryption and management of certificates.

Using GSKIT

Change the compile JCL

//S1 JCLLIB ORDER=CBC.SCCNPRC 
//DOCLG EXEC PROC=EDCCBG,INFILE='ADCD.C.SOURCE(C)',
// CPARM='OPTF(DD:COPTS)',
// GREGSIZ='0M'
//COMPILE.COPTS DD *
LIST,SSCOMM,SOURCE,LANGLVL(EXTENDED)
TEST
LSEARCH(/usr/lpp/gskssl/include/)
SEARCH(//'CEE.SCEEH.+',/usr/lpp/gskssl/include/)
DEFINE(MVS),LONGNAME,RENT,DLL,EXPMAC
aggregate(offsethex) xref
//COMPILE.SYSIN DD *

#pragma runopts(POSIX(ON))
..

//BIND.SYSLMOD DD DISP=SHR,DSN=ADCD.PDSE.LOADLIB(CERT)
//BIND.IMP DD DISP=SHR,DSN=SYS1.SIEASID
//BIND.CSS DD DISP=SHR,DSN=SYS1.CSSLIB
//BIND.SYSIN DD *
INCLUDE CSS(IRRSDL00)
include imp(GSKSSL)
include imp(GSKCMS31)
/*

How to decode the returned data

Gskit uses a gsk_buffer structure to handle strings.  It is easy to use as it has

  • length – the length of the string
  • data – a pointer to the string

If gskit returns a gsk_buffer to your program, you should use gsk_free_buffer(…) to release the buffer when you have finished with it.

gsk_buffer name; 
x509_name out;
name.length = parmlist.subjects_dn_length;
name.data = parmlist.subjects_dn_ptr;
gskrc = gsk_decode_name(& name, &out);
if ( gskrc != 0)
printf("gsk_decode_name %s\n",gsk_strerror(gskrc));

All this routine does it to convert the ANS.1 encoded string, into a structure of elements.  You can then use the following to print it

gskrc = gsk_name_to_dn(&out,&pName); 
if ( gskrc != 0)
printf("gsk_name_to_dn %s\n",gsk_strerror(gskrc));
else printf("Name:%s\n",pName);

This produced

Name:CN=SSCA8,OU=CA,O=SSS,C=GB 

As an exercise I wrote some code to format some of the x509 value –

for ( int i = 0; i < out.u.dn.count;i++) 
{
for (int j = 0;j < out.u.dn.rdns[i].count; j++ )
{
printf("Type %s ",px509_attribute_type(
out.u.dn.rdns[i].attributes[j].attributeType ));
char * p2 = out.u.dn.rdns[i]. attributes[j]. name.data;
int l = out.u.dn.rdns[i]. attributes[j]. name.length;
printf("%*.*s\n",l,l, fromascii(p2,l));
}
}

It took an hour or so to work out how to access the elements:  out.u.dn.rdns[i].attributes[j].attributeType .  I had to write a routine px509_attribute_type which took the attribute type and return a string such as “CN” etc.  I already had a function fromascii which took and ASCII string and converted it to EBCDIC.  Using gsk_name_to_dn was much easier.

Configuring a Python external function written in C on z/OS

You can write external functions for Python in C. For example I wrote one which can do a WTO and write to the system logs.

Creating this external function is not difficult, there are just several things to do.

High level view

For an external function called zconsole, there is a DLL (shared object) with name zconsole.so . Within this is an entrypoint label PyInit_zconsole.

PyInit_zconsole is a C function which returns a Python Object with information about the entry points within the DLL.

It can also define additional information such as a Python string “__doc__” which can be used to describe what the function does.

You can use the Python statement

print(dir(zconsole))

to give information about the module, the entry points, the module description, and additional fields like version number.

Conceptually the following are needed

  • The entry point PyInit_xxxxxx returns a PyModuleDef object
  • The PyModuleDef contains
    • The name of the module
    • A description of the module
    • A pointer to the module functions
  • The module functions contain, for each function
    • The function name as used in the Python program
    • A pointer to the C function
    • Specification of the parameters of the function
    • A description of the function

Because C needs something to be defined before it is used,the normal layout of the source is

  • C functions
  • Module functions definitions (which refer to the C functions)
  • PyModuleDef which refers to the Module Functions
  • The entry point which refers to the PyModuleDef

The initial entry point

This creates a Python object from the Python Modules Definitions, and passes it back to Python.

PyMODINIT_FUNC PyInit_zconsole(void) { 
  PyObject *m; 
                                                                             
  /* Create the module and add the functions */ 
  m = PyModule_Create(&console_module); 
                                                                             
return m; 
} 

Python Module Definitions

static char console_doc[] = 
  "z Console interface for Python"; 
static struct PyModuleDef console_module = {
   PyModuleDef_HEAD_INIT,
   "console",
   console_doc,
   -1,
   console_methods
};

Where

  • “console” is a short description. For the above if you use dir(zconsole) it gives __name__ console
  • console_doc refers to the string above which is a description of the module (defined above it)
  • console_methods define the C entry points – see below.

Methods

The list of functions must end in a NULL entry. The code below defines Python functions acb, taskinfo, and cancel. You can pass the description as a constant string or as a char * variable.

char * console_acb_doc[] = "...";
char * taskinfo_doc = "get the ASCB, TCB and TCBTTIME "; 
....
static struct PyMethodDef console_methods[] = { 
    {"acb", (PyCFunction)console_acb,METH_KEYWORDS | METH_VARARGS, console_acb_doc}, 
    {"taskinfo", (PyCFunction)console_taskinfo,METH_KEYWORDS | METH_VARARGS, taskinfo_doc},    
    {"cancel", (PyCFunction)console_cancel,METH_KEYWORDS | METH_VARARGS, "Cancel the subtask"},     
    {NULL, (PyCFunction)NULL, 0, NULL}        /* sentinel */ 
    }; 

A C function

This C function is passed the positional variable object, and keyword object, because of the “METH_KEYWORDS | METH_VARARGS” specified in the methods above. See below,

static PyObject *console_taskinfo(PyObject *self, PyObject *args, PyObject *keywds ) { 
  PyObject *rv = NULL;  // returned object 
  ... 
  // build the return value
  rv = Py_BuildValue("{s:s,s:s,s:s,s:l}", 
          "jobname",jobName, 
          "ascb",  cASCB, 
          "tcb",   cTCB, 
          "tcbttime", ttimer); 
  if (rv == NULL) 
  { 
    PyErr_Print(); 
    PyErr_SetString(PyExc_RuntimeError,"Py_BuildValue in taskinfo"); 
    printf(" Py_BuildValue error in taskinfo\n"); 
  } 
  return rv; 
}

Passing parameters from Python to the C functions.

In Python you can pass parameters as positional variables or as a list of name=value.

Passing a list of variables.

For example in a Python program, call an external function where two parameters are required.

result  = zconsole.acb(ccp,[exit_flag]) 

In the function definition use

static struct PyMethodDef console_methods[] = {
{"acb", (PyCFunction)console_acb, METH_VARARGS, console_cancel_doc},
{NULL, (PyCFunction)NULL, 0, NULL} /* sentinel */
};

and use

static PyObject *console_acb(PyObject *self, PyObject *args) {
  PyObject * method;
  PyObject * p1 = NULL;
  if (!PyArg_ParseTuple(args,"OO", // two objects
          &method, // function
          &p1 )// parms
      )
  {
   PyErr_SetString(PyExc_RuntimeError,"Problems parsing parameters");
   return NULL;
  }
...
}

Using positional and keyword parameters

For example in a Python program

rc = zconsole.console2("from console2",routecde=14) 

“from console2” is a positional variable, and routecde=14 is a keyword variable.

The function definition must include the METH_KEYWORDS parameter to be able to process the keywords.

{"console2", (PyCFunction)console_console2,
              METH_KEYWORDS |   METH_VARARGS, 
              console_put_doc},

The C function needs

static PyObject *console_console2(PyObject *self, 
                                  PyObject *args, 
                                  PyObject *keywds 
                                 ) {
...
}

You specify a list of keywords (which much include a keyword for positional parameters)

static PyObject *console_console2(PyObject *self, 
                                  PyObject *args, 
                                  PyObject *keywds 
                                 ) {
  char * p = "";
  Py_ssize_t lMsg = 0;
  // preset these
  int desc = 0;
  int route = 0;
  static char *kwlist[] = {"text","routecde","descr", NULL};
  // parse the passed data
  if (!PyArg_ParseTupleAndKeywords(args, keywds, 
       "s#|$ii", 
        kwlist,
        &p , // message text
       &lMsg , // message text
       &route, // i this variable is an array
       &desc , // i this variable is an array
  )) 
  {
    // there was a problem
    return NULL;
  }

In the static char *kwlist[] = {“text”,”routecde”,”descr”, NULL}; you must specify a parameter for the positional data (text).

In the format specification above

  • s says this is a string
  • # save the length
  • | the following are optional
  • $ end of positional
  • i an integer parameter
  • i another integer parameter.

You should initialise all variable to a suitable value because a variable is gets a value only of the relevant keyword (or positional) is specified.

How do I print an object from C?

If you have got an object (for example keywds), you can print it from a C program using

PyObject_Print(keywds,stderr,0);

Advanced configuration

You can configure additional information for example create a special exception type just for your code. You can create this and use it within your C program.

#define Py23Text_FromString PyUnicode_FromString  // converts C char* to Py3 str 
static PyObject *ErrorObj; 

PyMODINIT_FUNC PyInit_zconsole(void) { 
  PyObject *m, *d; 
                                                                                                   
  /* Create the module and add the functions */ 
  m = PyModule_Create(&console_module); 
                                                                                                   
  /* Add some symbolic constants to the module */ 
  d = PyModule_GetDict(m); 
                                                                                                   
  PyDict_SetItemString(d, "__doc__", Py23Text_FromString(console_doc)); 
  PyDict_SetItemString(d,"__version__", Py23Text_FromString(__version__)); 
  ErrorObj = PyErr_NewException("console.error", NULL, NULL); 
  PyDict_SetItemString(d, "console.error", ErrorObj); 
                                                                                                   
return m; 
} 
  • The d = PyModule_GetDict(m) returns the object dict for the function (you can see what is in the dict by using print(dir(zconsole))
  • PyDict_SetItemString(d, “__doc__”, Py23Text_FromString(console_doc)); Creates a unicode string from the console_doc string, and adds it to the dict with name “__doc__”
  • It also adds an entry for the version.
  • You could also define constants that the application might use.
  • The ErrorObj creates a new exception called “console.error”. It is added to the dict as “console.error”. This can be used to report a function specific error. For example
    • PyErr_Format(ErrorObj, “%s wrong size. Given: %lu, expected %lu”, name, (unsigned long)given, (unsigned long)expected); return NULL;
    • PyErr_SetString(ErrorObj, “No memory for message”); return NULL;

How do I compile the C code?

I used a shell script to make it easier to compile. The setup3.py does any Python builds

touch /u/tmp/console/console.c 
* generate the assembler stuff (outside of setup
as          -d cpwto.o cpwto.s 
as -a       -d qedit.o qedit.s 1> qedit.lst 
export _C99_CCMODE=1 
python3 setup3.py build bdist_wheel 1>a 2>b 
* copy the module into the Python Path
cp ./build/lib.os390-27.00-1090-3.8/console/zconsole.so .
* display the output.  b should be empty 
oedit a b 

The setup3 Python program is in several logical parts

# Basic imports
import setuptools 
from setuptools import setup, Extension 
import sysconfig 
import os 
os.environ['_C89_CCMODE'] = '1' 
from setuptools.command.build_ext import build_ext 
from setuptools import setup 
version = '1.0.0' 

Override the build – so unwanted C compile options can be removed

class BuildExt(build_ext): 
   def build_extensions(self): 
     print(self.compiler.compiler_so) 
     if '-fno-strict-aliasing' in self.compiler.compiler_so: 
       self.compiler.compiler_so.remove('-fno-strict-aliasing') 
     if '-Wa,xplink' in self.compiler.compiler_so: 
        self.compiler.compiler_so.remove('-Wa,xplink') 
     if '-D_UNIX03_THREADS' in self.compiler.compiler_so: 
        self.compiler.compiler_so.remove('-D_UNIX03_THREADS') 
     super().build_extensions() 
setup(name = 'console', 
    version = version, 
    description = 'z/OS console interface. Put, and respond to modify and stop request', 
    long_description= 'provide interface to z/OS console', 
    author='Colin Paice', 
    author_email='colinpaice3@gmail.com', 
    platforms='z/OS', 
    package_dir = {'': '.'}, 
    packages = ['console'], 
    license='Python Software Foundation License', 
    keywords=('z/OS console modify stop'), 
    python_requires='>=3', 
    classifiers = [ 
        'Development Status :: 4 - Beta', 
        'License :: OSI Approved :: Python Software Foundation License', 
        'Intended Audience :: Developers', 
        'Natural Language :: English', 
        'Operating System :: OS Independent', 
        'Programming Language :: C', 
        'Programming Language :: Python', 
        'Topic :: Software Development :: Libraries :: Python Modules', 
        ], 
        cmdclass = {'build_ext': BuildExt}, 
    ext_modules = [Extension('console.zconsole',['console.c'], 
        include_dirs=["//'COLIN.MQ930.SCSQC370'","."], 
        extra_compile_args=["-Wc,ASM,SHOWINC,ASMLIB(//'SYS1.MACLIB')", 
              "-Wc,LIST(c.lst),SOURCE,NOWARN64,XREF","-Wa,LIST,RENT"], 
        extra_link_args=["-Wl,LIST,MAP,DLL","/u/tmp/console/qedit.o", 
                                            "/u/tmp/console/cpwto.o", 
        ], 
       )] 
   ) 

This code…

  • The cmdclass = {‘build_ext’: BuildExt}, statement tells it to use the function I had defined.
  • Uses the C header files from MQ, using dataset COLIN.MQ930.SCSQC370
  • The C program uses the __asm__ statement to create inline assembler code. The macros libraries are for this assembler source are defined with ASMLIB(//’SYS1.MACLIB’)”,
  • The C listing is put into c.lst.
  • The bind options are LIST,MAP,DLL
  • The generated binder statement is /bin/xlc build/temp.os390-27.00-1090-3.8/console.o -o build/lib.os390-27.00-1090-3.8/console/zconsole.so….
  • The binder statements used include the assembler modules generate in the shell script are

INCLUDE C8920
ORDER CELQSTRT
ENTRY CELQSTRT
INCLUDE ‘./build/temp.os390-27.00-1090-3.8/console.o’
INCLUDE ‘/u/tmp/console/qedit.o
INCLUDE ‘/u/tmp/console/cpwto.o’
INCLUDE ‘/usr/lpp/IBM/cyp/v3r8/pyz/lib/python3.8/config-3.8/libpython

Some gotcha’s to look out for

The code is generated as 64 bit code.

Check you have the correct variable types.

You may get messages about conversion from 64 bit values to 31 bit values.

The code is generated with the ASCII option.

This means

printf(“Hello world\n”); will continue to work as expected. But a blank is 0x20, not 0x40 etc. so be careful when using hexadecimal.

Pass a character string between Python and z/OS

You will have convert from ASCII to EBCDIC, and vice versa when going back. For example copy the data from Python into a program variable, then convert and use it.

memcpy(pOurBuffer,pPythonData,lPythonData);
__a2e_l(pOurBuffer,lPythonData) ;
...

Returning character data from z/OS back to Python

If the data is in a z/OS field ( rather than a variable in your program), you will need to copy it and covert it. You can pass a null terminate string back to Python, or specify a length;
The code below uses Py_BuildValue to creates a Python dictionary {“jobname”: “COLINJOB”).

memcpy(&returnJobName,zOSJobName,8); 
returnJobName[8] = 0; // null terminator
__e2a_l(&returnJobName,8);
rv = Py_BuildValue("{s:s}","jobname",returnJobName); .. null terminated
// or
rv = Py_BuildValue("{s:s#}","jobname",returnJobName,8); // specify length

Python creating a callback for an asynchronous task in an external function.

At the high level, I wanted a Python program running as a started task on z/OS to be able to catch operator requests, such as shutdown. I thought the solutions I came up with were a little complex for what I wanted, then I saw an example of using callback which did “After a period of time, invoke this function with these parameters”. Could this be adapted to provide “call this Python function when an operator issues a command to the started task?” As usual it got me into areas I was unfamiliar with, but the answer is yes it can be adapted.

Background

The interface for an application to be notified of an operator request is the z/OS QEDIT interface. There is an Event Control Block(ECB) which gets posted when there is data for it. An application can wait on this ECB.

There are several approaches that can be taken for a (Python) program

  • Have an application loop round checking the ECB to see if it has been posted. If it has been posted, issue a WAIT on the ECB, which will wake up immediately; get the message and return. This would work, but how long do you wait between loops? The smaller the time, the more frequently you scan, and so use up more CPU.
  • Have a thread which waits to be posted. The thread wakes up and notifies the application.
    • Python has an ASYNC interface where applications can multithread on one thread. The code has to be well behaved. It has to give up control to the main thread when it has no work to do. It the (single) thread does an operating system wait, all work stops until the wait completes. This approach will not work as the thread has to wait for the ECB.
    • Use a thread from the Python thread pool. You can get a thread from Python, which can wait for the ECB. This thread has to be well behaved and release the Global Interpreter Lock (GIL) (which controls Python multi programming). An application can only update Python data when it has the GIL. It prevents problems with concurrent access to fields.
    • Use a thread which is not from the Python task pool. This thread can callback into Python and run a function.

This blog post is about the last item in the above list; using a thread which is not in the Python thread pool, to call back into a function in the main Python program.

High level view of the program

There are several “moving parts” to the program.

  • A Python external function which is passed the Python function and any parameters for the function. This external function creates a z/OS thread and passes the Python function name and its parameters to the thread.
  • Register a Python shutdown clean-up exit, to wake up or cancel the async thread when the Python program finishes.
  • The C program which runs as an independent thread (also known as a subtask or TCB). It registers the thread with Python (and gets the GIL lock) then loops:
    • Release the GIL lock
    • Waits for the QEDIT ECB to be posted
    • Get the GIL lock
    • Builds the parameter list of the data received
    • Calls the python function passing the original data, and the received data
  • The Python function is passed the original parameters, and the data from the request. The Python function can add data to a queue, update Python variables and can enable an Python event. The main task can waiting on this event, and so process the requests when they come in.

The main python program

The handle = zconsole.acb(ccp_cb,[exit_event]) creates the async thread, and returns a handle. The handle is used to cancel the outstanding wait.

There is code to update variables in a thread safe manner by using a threadlock.

An event is used to signal completion.

import zconsole as zconsole 
...
# This is the callback function which gets control from the C program
def ccp_cb(args,QEDIT_data) : 
      global stop   # set this to stop to 1 to end processing
      global global_counter  # increment this 
      parms = args[1] # [functionName,[parms]) 
      e  = parms[0]   # event 
      with threadLock: 
          global_counter += 1 
      print("qedit",QEDIT_data) # display what we received
      if QEDIT_data["verb"] == "Stop": 
         stop = 1 
      e.set() # post event - wake up main 
###############################################
threadLock = threading.Lock() # for serialisation of updates
exit_event = threading.Event() # for event processing
# wait for up to 30 seconds at most 4 times
# initiate the console wait. using Asynchronouse CallBack
handle = zconsole.acb(ccp_cb,[exit_event]) 
# This returns a handle.
for i in range (0,3):  # at most 4 times
   exit_flag.wait(timeout=30) # set 30 seconds time out                                          
   if (exit_flag.is_set() == False): # we timed out
       break 
   print("GlobalCounter",global_counter) 
   print("stop",stop) # debug info
   if stop == 1: 
      break 
print("after stop ",stop)
zconsole.cancel(handle) #  stop the async task  

The external function zconsole.acb(function,[parms])

The external acb (asychronous call back) function (written in C) has code

  • to read the parameters passed to the function
  • increment the use count of the python fields to prevent Python from freeing them. The async thread decrements the use-count
  • attaches a thread to run a program (called cthread).
...
pthread_t thid; 
PyObject * method = NULL; 
PyObject * parms  = NULL; 
// get the data as objects
if (!PyArg_ParseTuple(args,"OO",
    &method,   // function
    &parms ))  // parms
{  /// problem?
    return NULL;
} 
...
// zargs is used to hold the parameters
zargs -> method = method; 
zargs -> parms  = parms; 
// the following are decremented within the Async thread
Py_INCREF(zargs -> parms;  /* Prevent it from being deallocated. */ 
Py_INCREF(zargs -> method);/* Prevent it from being deallocated. */
// create the thread
rc = pthread_create(&thid, NULL, cthread, zargs);  

The async C thread to process the QEDIT data

This program

  • is passed the parameter list containing the Python function Object, and the Python function parameter list object
  • releases the GIL
  • executes the assembler program which waits on the QEDIT ECB
  • when this returns, it gets the GIL
  • builds a dictionary of parameters (“name”:”value”,…) from the QEDIT data
  • calls the Python function passing the function object, the parameters passed to the external function, and the dictionary of parameters from the operator request (from QEDIT).
void * cthread(void *_arg) { 
  struct thread_args * zargs  = (struct thread_args *) _arg ; 
                                                                           
  PyGILState_STATE gstate; 
  PyObject *rv = NULL;  // returned object 
  PyObject *x  = NULL;  // returned object 
  char * ret  = 0; 
  long  rc; 
  int stop = 0; 
  rc = 0; 

  // register this thread to Python                                                                            
  gstate = PyGILState_Ensure(); 
  loop{

    Py_BEGIN_ALLOW_THREADS 
    //   QEDIT waits to be posted and returns the data    
    rc = QEDIT( pMsg);  // assembler function
    // get the GIL and stop any other work
    Py_END_ALLOW_THREADS 
    ...
    // convert console name from EBCDIC to ASCII
    __e2a_l(  pCIBX ->consolename ,8 ); 
    // build the parameter list to pass to Python function                                                                 
    rv = Py_BuildValue("{s:i,s:s,s:s#,s:s#,s:y#,s:y#,s:y#}", 
           "rc", rc, 
           "verb", pVerb, 
           "data",&(pCIB -> data[0]),lData, 
           "console",&(pCIBX -> consolename),l8, 
           "cart",&(pCIBX -> CART),l8, 
           "consoleid",&(pCIBX -> consoleid),l8, 
           "oconsoleid",&(pCIBX -> consoleid),l8); 
                                                                     
   Py_INCREF(rv);    /* Prevent it from being deallocated. */ 
   //  Call the Python function
   x = PyObject_CallFunctionObjArgs( zargs -> method,zargs -> a1,rv      , NULL); 

   if ( x != NULL) 
      Py_DECREF(x       );    /* Prevent x from being deallocated. */ 
   if (stop >0) 
   { 
      //printf("Stop found - cthread exiting \n"); 
     break; 
   } 
} // end of main loop
if ( zargs -> a1  != NULL) 
  Py_DECREF(zargs -> a1);    /* allow it to be deallocated. */ 
if ( zargs -> method  != NULL) 
   Py_DECREF(zargs -> method);  /* Alllow it to be deallocated. */ 
pthread_exit(ret); 
return 0; 
                                                                       

Ending the thread

A thread running asynchronously needs to end when the caller end. If it stays running you will get a system abend A03.

You have a choice

  • Pass a “shutdown ECB” to the thread, and have the thread wait on an ECBLIST (shutdown ECB, and QEDIT ECB). The high level application can then post this ECB. I had an external function zconsole.cancel(handle). This got the address of the ECB from the parameter, and posted it
  • Cancel the thread. I had an external function zconsole.cancel(…). This was passed the thread-id, and issued pthread_cancel(thread-id). In the end I used the shutdown ECB as it was cleaner.

I found it best to use a class for my thread, and register for a function to be called at the Python program shutdown.

For example

class console: 
    handle = None 
    def __init__(self,a): 
       print("console.__init__",a) 
    def cb(self,a,b): 
       # call the function to create the async task
       # and return the handle
       self.handle =  zconsole.acb(a,b) 
       #register cleanup for shutdown 
       atexit.register(self.cleanup,self.handle) 
                                                                                                    
    def cleanup(self,handle): 
       print("IN CLEANUP") 
       if handle != None: 
          zconsole.cancel(self.handle) 

This says when the cb function is called to set up the callback, add this object and the cleanup routine to the list of “shutdown” activities. The cleanup function, tells the async thread to shutdown.

How do you know the thread has ended?

You can use code like pthread_cleanup_push and pthread_cleanup_pop to call an ending function. This function is called when the thread:
• Calls pthread_exit()
• Does a return from the start routine
• Is cancelled because of a pthread_cancel()

In your cleanup routine you need to check for locks and other resources owned by the thread, and release them.

PyGILState_STATE gstate; // referred to from cthread and cleanup
void cleanup(void * arg)
{
   printf("Thread was cancelled!\n\n"); 
   int s = PyGILState_Check();
   printf("chthread Python latch %d\n",s);
   // release the lock if we have it
   if (s)      
      PyGILState_Release(gstate);
}
void * cthread(void *_arg) {
  pthread_cleanup_push(cleanup,NULL);
  struct thread_args * tA = (struct thread_args *) _arg ;
  ...

  pthread_cleanup_pop(0);
  pthread_exit(ret);

}


Why adding a printf caused my program to hang

Or “how to cancel a pthread safely; and reverse time”

I was doing some work with external Python functions, and attaching a subtask to intercept operator requests. It was very frustrating when I added a printf to the C program to provide diagnostic information – and the program did not produce any output even from a previous printf(spooky). Remove the printf and it worked including the earlier print(“Starting”) before my new printf.

After a couple of days, and some long walks I found out the reason why. It was all down to my lack of knowledge about what is available with pthreads, and locking.

Python has a lock to serialise work. While a thread has this lock, no other thread can do any Python work.

An attached thread can be configured as to how it responds to a cancel request. For example you may not want to cancel the thread in the middle of a critical update, for example while holding a lock.

By default it looks like threads are non-cancellable, unless you allow for it.

When I ran my job, there was an abend A03 A task tried to end normally by issuing a RETURN macro or by branching to the return address in register 14. The task was not ready to end processing because …: The task had attached one or more subtasks that had not ended.

The task needs to be told to shutdown – or to respond to a cancel thread.

Creating a thread

struct thread_args {
   PyObject *method;
   ...
   } 
#define _OPEN_THREADS 2 
#include <pthread.h>
//create a structure to pass parameters to the thread.
struct thread_args *zargs = malloc (sizeof (struct thread_args));
zargs -> method = method;
...
pthread_t thid; 
int rc; 
// invoke pThread to create thread and pass the parms through 
rc = pthread_create(&thid, NULL, cthread, zargs); 
if (rc != 0) { 
  printf("pthread rc %d \n", rc); 
  perror("pthread_create() error"); 
} 

To cancel a thread

The short answer to how to cancel a thread is

rc = pthread_cancel(thid);
if ( rc != 0) 
{
   perror("Trying to cancel the thread");
}

Return code 0 means the request to cancel the thread was successfully issued, but it does necessarily mean the thread has been cancelled, because the thread could be set as non- cancellable.

Within the thread program.

You can configure the program running as a thread to be cancellable:

  • Not cancellable – the default
  • Cancellable
    • At this point
    • At any time.
    • Not between these instructions

To make a thread non cancellable

int previous = pthread_setintr(PTHREAD_INTR_DISABLE);

You can use the returned variable to reset the status with pthread_setintr(previous).

To make a thread cancellable at this point

Set up the thread. Do pthread_setintrtype before pthread_setintr to eliminate a timing window.

// Specify how it is interruptible, any time, or controlled
if (pthread_setintrtype(PTHREAD_INTR_CONTROLLED ) == -1 )
{ perror(“error setting pthread_setintrtype”);… }

// Say it is interruptible
int previous = pthread_setintr(PTHREAD_INTR_ENABLE);

The initial values are

  • pthread_setintrtype is PTHREAD_INTR_CONTROLLED (0)
  • pthread_setintr is PTHREAD_INTR_ENABLE(0)

So you may not need to use the pthread_setintr* functions.

The thread needs an “interruptible” function.

The documentation says

PTHREAD_INTR_CONTROLLED:
The thread can be cancelled, but only at specific points of execution. These are:

  • When waiting on a condition variable, which is pthread_cond_wait() or pthread_cond_timedwait()
  • When waiting for the end of another thread, which is pthread_join()
  • While waiting for an asynchronous signal, which is sigwait()
  • When setting the calling thread’s cancel-ability state, which is pthread_setintr()
  • Testing specifically for a cancel request, which is pthread_testintr()
  • When suspended because of POSIX functions or one of the following C standard functions: close(), fcntl(), open(), pause(), read(), tcdrain(), tcsetattr(), sigsuspend(), sigwait(), sleep(), wait(), or write().

In my thread I had used the interruptible function pthread_testintr().

printf(“before testcancel\n”);
pthread_testintr() ;
printf(“after testcancel\n”);

When my code was running I had

before testcancel
after testcancel

before testcancel
after testcancel

pthread_cancel() was issued and the output was

before testcancel

So we can see the code was behaving as expected,and was cancelled inside/at the pthread_testintr() function.

To make a thread cancellable at any time

if (pthread_setintrtype(PTHREAD_INTR_ASYNCHRONOUS ) == -1 )
{ perror(“error setting pthread_setintrtype”);… }
int previous = pthread_setintr(PTHREAD_INTR_ENABLE);

If you are using this you need to design the code so the thread has no locks or mutexes. These will not be released automatically.

To make a thread not cancellable between these instructions

pthread_setintrtype(PTHREAD_INTR_ASYNCHRONOUS)
pthread_setintr(PTHREAD_INTR_DISABLE)
// thread non cancellable

get a lock
do some work
free a lock

pthread_setintr(PTHREAD_INTR_ENABLE);
// thread now cancellable any point after this

The pthread_setintr(PTHREAD_INTR_ DISABLE|ENABLE) code protects the non cancellable code.

The pthread_setintrtype(PTHREAD_INTR_ASYNCHRONOUS) says that outside of the non-cancellable code it can be cancelled at any point when interrupts are enabled.
Instead you could use pthread_setintrtype(PTHREAD_INTR_CONTROLLED ) and pthread_testintr(), to make your code interruptible at a specific point.

It is not spooky.

When running my code. I initially had it running so it was interruptible anywhere.

What was happening was

  • get python lock
  • get interrupted. Thread ends

By adding a printf to my code, it changed where the thread was interrupted. With the printf – it was interrupted while the Python lock was held, the thread was cancelled with the lock still held, and no other Python work ran.

Without the additional printf, the thread abended without the Python lock from being held.

By putting the pthread_ calls around the code with the lock I could make sure the lock was released before the thread ended.

Spooky lack of printing

The Python program had used print(“starting”), but this was written to the print buffers, it was not forced out to disk.

When I used Python print(“starting”,force=True) the data was forced out before progressing.

The C function is fflush(stdout);

Overall – not spooky at all, just a lack of understanding.

Why is Ubuntu is running out of space? It is /var/log/journal/…

Low Disk space on “Filesystem root”

I’ve been getting this message more frequently – and I’ve found out why.

It could be

  • the system journal file
  • snap files in the cache
  • stuff in /tmp

You may get this message during installation of a large set of packages. Packages get unpacked into a temporary file – which is deleted afterwards, so you get a temporary hump in usage.

System journal file

There is a “systemd journal file” with content like

Jul 12 15:50:09 colinpaice rtkit-daemon[1385]: Successfully made thread 2682 of process 2540 owned by ‘1000’ RT at priority 10.
Jul 12 16:44:41 colinpaice rtkit-daemon[1385]: Supervising 5 threads of 3 processes of 1 users.
Jul 12 16:45:01 colinpaice CRON[7075]: pam_unix(cron:session): session opened for user root by (uid=0)
Jul 12 16:45:01 colinpaice CRON[7076]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 12 16:45:01 colinpaice CRON[7075]: pam_unix(cron:session): session closed for user root
Jul 12 15:58:32 colinpaice kernel: irq_thread+0xda/0x170

This goes back to when I first installed Ubuntu about 4 years ago, but I think a month’s worth of data would be enough.

You can display the disk space used by using

sudo journalctl –disk-usage

and display the contents of the file using

sudo journalctl -n 50 |less

Note: Without sudo you get the userid’s log size… with sudo you get total log size.

The log file is in /var/log/journal/ and was 1.4 GB in size. The size of this file is controlled by the /etc/systemd/journald.conf configuration file. I edited this file (using sudo gedit /etc/systemd.journald.conf).

  • I uncommented SystemMaxFileSize and gave it a value of 500M.
  • I uncommented SystemMaxFiles and gave it a value of 10

You can either reboot, or use

service systemd-journald restart

to restart the systemd journal.

Although I set the value to 500M, after the journal was restarted – it had size 100MB!

I think 100MB is plenty big enough, and I get a log of disk space back.

Snap files in the cache

sudo du -hs /var/lib/snapd/cache/

gives you the space used.

I then used

sudo rm -r /var/lib/snapd/cache/

Other rubbish

The disk usage analyser gives you a picture of all the space on a file system. Click on “Show Applications” and select Disk Usage Analyser