Creating a C external function for Python, an easier way

I wrote about my first steps in creating a C extension in Python. Now I’ve got more experience, I’ve found an easier way of compiling the program and creating a load module. It is not the official way – but it works, and is easier to do!

The traditional way of building a package is to use the setup.py technique. I’ve found just compiling it works just as well (and is slighly faster). You still need the setup.py for building Python source.

I set up a cp4.sh file

name=zos 
pythonSide='/usr/lpp/IBM/cyp/v3r8/pyz/lib/python3.8/config-3.8/libpython3.8.x' 
export _C89_CCMODE=1 
p1=" -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64" 
p2="-Wc,DLL -D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS" 
p2="-D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS" 
p3="-D_OPEN_SYS_FILE_EXT                     -qstrict          " 
p4="-Wa,asa,goff -qgonumber -qenum=int" 
p5="-I//'COLIN.MQ930.SCSQC370' -I. -I/u/tmp/zpymqi/env/include" 
p6="-I/usr/lpp/IBM/cyp/v3r8/pyz/include/python3.8" 
p7="-Wc,ASM,EXPMAC,SHOWINC,ASMLIB(//'SYS1.MACLIB'),NOINFO " 
p8="-Wc,LIST(c.lst),SOURCE,NOWARN64,FLAG(W),XREF,AGG -Wa,LIST,RENT" 
/bin/xlc $p1 $p2 $p3 $p4 $p5 $p6 $p7 $p8  -c $name.c -o $name.o  -qexportall -qagg -qascii 
l1="-Wl,LIST=ALL,MAP,XREF     -q64" 
l1="-Wl,LIST=ALL,MAP,DLL,XREF     -q64" 
/bin/xlc $name.o  $pythonSide  -o $name.so  $l1 1>a 2>b 
oedit a 
oedit b 

This shell script creates a zos.so load module in the current directory.

You need to copy the output load module (zos.so) to a directory on the PythonPath environment variable.

What do the parameters mean?

Many of the parameters I blindly copied from the setup.py script.

  • name=zos
    • This parametrizes the script, for example $name.c $name.o $name.so
  • pythonSide=’/usr/lpp/IBM/cyp/v3r8/pyz/lib/python3.8/config-3.8/libpython3.8.x’
    • This is where the python side deck, for resolving links the to functions in the Python code
  • export _C89_CCMODE=1
    • This is needed to prevent the message “FSUM3008 Specify a file with the correct suffix (.c, .i, .s,.o, .x, .p, .I, or .a), or a corresponding data set name, instead of -o./zos.so.”
  • p1=” -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64″
    • -O3 optimization level
    • -qarch=10 is the architectural level of the code to be produced.
    • –qlanglvl=extc99 says use the C extensions defined in level 99. (For example defining variables in the middle of a program, rather that only at the top)
    • -q64 says make this a 64 bit program
  • p2=”-D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS”
    • The C #defines to preset
  • p3=”-D_OPEN_SYS_FILE_EXT -qstrict “
    • -qstrict Used to prevent optimizations from re-ordering instructions that could introduce rounding errors.
  • p4=”-Wa,asa,goff -qgonumber -qenum=int”
    • -Wa,asa,goff options for any assembler compiles (not used)
    • -qgonumber include C program line numbers in any dumps etc
    • -qenum=int use integer variables for enums
  • p5=”-I//’COLIN.MQ930.SCSQC370′ -I. -I/u/tmp/zpymqi/env/include”
    • Where to find #includes:
    • the MQ libraries,
    • the current working directory
    • the header files for my component
  • p6=”-I/usr/lpp/IBM/cyp/v3r8/pyz/include/python3.8″
    • Where to find #includes
  • p7=”-Wc,ASM,EXPMAC,SHOWINC,ASMLIB(//’SYS1.MACLIB’),NOINFO “
    • Support the use of __ASM().. to use inline assembler code.
    • Expand macros to show what is generated
    • List the data from #includes
    • If using__ASM__(…) where to find assembler copy files and macros.
    • Do not report infomation messages
  • p8=”-Wc,LIST(c.lst),SOURCE,NOWARN64,FLAG(W),XREF,AGG -Wa,LIST,RENT”
    • For C compiles, produce a listing in c.lst,
    • include the C source
    • do not warn about problems with 64 bit/31 bit
    • display the cross references (where used)
    • display information about structures
    • For Assembler programs generate a list, and make it reentrant
  • /bin/xlc $p1 $p2 $p3 $p4 $p5 $p6 $p7 $p8 -c $name.c -o $name.o -qexportall
    • Compile $name.c into $name.o ( so zos.c into zos.o) export all entry points for DLL processing
  • L1=”-Wl,LIST=ALL,MAP,DLL,XREF -q64″
    • bind pararameters -Wl, produce a report,
    • show the map of the module
    • show the cross reference
    • it is a 64 bit object
  • /bin/xlc $name.o $pythonSide -o $name.so $L1 1>a 2>b
    • take the zos.o, the Python side deck and bind them into the zos.so
    • pass the parameters defined in L1
    • output the cross reference to a and errors to b
  • oedit a
    • This will have the map, cross reference and other output from the bind
  • oedit b
    • This will have any error messages – it should be empty

Notes:

  • -qarch=10 is the default
  • the -Wa are for when compiling assembler source eg xxxx.s
  • –qlanglvl=extc99. EXTENDED may be better than extc99.
  • it needs the -qascii to work with Python.

Python classes, objects, external functions and cleaning up.

I’ve been working in some code to be able to use z/OS datasets, and DD statements. It took me a while to understand how some bits of Python work.

I also did things such as open a file, allocate a 1MB buffer, and wondered how to close the file, and release the buffer to prevent a storage leak.

The Python import

The Python import makes external functions and classes available to a program. The syntax is like

import abc as xyz

x = xyz…..

abc can be

  • a file abc.py
  • a directory abc
  • a load module abc.so

I’ll focus on the load module.

The abc.so load module

This can define a function based approach, so you would use it like

fileHandle = zos.fopen(“colin.c”,”rb”)
data = zos.fread(fileHandle)
zos.fclose(fileHandle)

You can provide many functions. Some may return a “handle” object, such as fileHandle which is passed to other functions.

It can also be object based and the C load module external function creates a new type.

file = zos.fopen(“colin.c”,”rb”)
data = file.fread()
file.close()

The functions are associated with the object “file”, rather than the load module zos.

Internally the object is passed to the function.

Cleaning up

Within my code I had fileHandle = fopen(“datasetname”….), which allocated a 1MB buffer for the read function.

I also had fclose(fileHandle) where I closed the file and freed the buffer.

However I could also do

fileHandle = fopen(“datasetname1″….)
fileHandle = fopen(“datasetname2″….)
fileHandle = fopen(“datasetname3″….)
fclose(fileHandle)

with no intermediate fclose(), which would lead to a storage leak as the file was fclose routine was not being called.

Using a class to call a function at exit

If you have a Python class for your data you can use

def cb(self,a,b):
     self.handle =  zconsole.acb(a,b)
     atexit.register(self.clean_up,self.handle)

def clean_up(self,handle):
    if handle != None:
        zconsole.cancel(self.handle)

When function cb is used, it registers with the “at exit” routine atexit, and says, “at exit” call my routine “clean_up”, and pass the handle.

At shutdown the clean_up routine is called once for every instance, and gives the cancel code a chance to clean up.

Using a C external function and “functions”.

Within the external functions C code, is PyModuleDef which defines the module to Python.

As such there is no way to automatically get your clean up function to be called (and free my 1MB buffer).

However you can exploit the Python module state data. For example

struct {
myparm * ...
...
} myStatic;

static struct PyModuleDef zos_module = {
  PyModuleDef_HEAD_INIT,
  "zos",
  zos_doc,
  sizeof(myStatic),
  zos_methods, // the functions (methods)
  NULL, // Multi phase init. NULL -> single
  NULL, // Garbage collection traversal
  zos_clear, // Garbage collection clear
  zos_free // Garbage collection free
};

The block of state data is allocated for you, and you can issue the PyModule_GetState(PythonModule) function to get access this block.

You chain could chain your data from the state data, perhaps in a linked list.

When the clean up occurs, your “zos_free” routine will be called, and you can free all the storage you allocated and clean up.

For example

PyMODINIT_FUNC PyInit_zos(void) { 
  PyObject *d; 
                                                                                        
  /* Create the module  */ 
  mzos = PyModule_Create(&zos_module); 
  // get the state data and initialise it
  state * pState = (state * )  PyModule_GetState(mzos); 
  memcpy(pState -> eyec,"state   ",8);
  ... 
                                  
  PyDict_SetItemString(d, "__doc__", Py23Text_FromString(zos_doc)); 
  PyDict_SetItemString(d,"__version__", Py23Text_FromString(__version__)); 
                                                                                        
return mzos;

Using a C external function and “objects” or types.

With a “function based” function, you have Python code like

fileHandle = zos.fopen("myfilename"....)
data = zos.fread(fileHande)
...

With “object based” functions you have Python code like

fileHandle = zos.fopen("myfilename"...)
data = fileHandle.fread()

In this case the object is a Python type. There is a good description here.

As with function based code you define the attributes of the object, including the tp_dealloc function. This gets control when the object is deallocated. In the Custom_dealloc, function you can close the file and free the buffer etc.

static PyTypeObject CustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "custom.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_dealloc = (destructor) Custom_dealloc,
    .tp_flags = Py_TPFLAGS_DEFAULT,
    .tp_new = PyType_GenericNew,
};

static void
Custom_dealloc(CustomObject *self)
{
   ... // put your code here
}

static PyModuleDef custommodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "custom",
    .m_doc = "Example module that creates an extension type.",
    .m_size = -1,
};

PyMODINIT_FUNC
PyInit_custom(void)
{
    PyObject *m;

    m = PyModule_Create(&custommodule);
    if (m == NULL)
        return NULL;
    Py_INCREF(&CustomType);
    if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) <  0) {
        Py_DECREF(&CustomType);
        Py_DECREF(m);
        return NULL;
    }
    return m;
}

Note: The list of available .tp… definitions is available here.

Python import, packages and modules.

I’ve been building various Python packages (for example pymqi for z/OS, and accessing z/OS datasets from Python). It took me a while to understand how Python import works, for example why I needed two packages, one for my load modules, and one for the Python code.

There is a lot of good documentation but I felt it was missing the end user’s view who was starting to work in this area.

The import statement

The Python import makes external functions and classes available to a program. The syntax is like

import abc as xyz

x = zyx…..

abc can be

  • a file abc.py
  • a directory abc
  • a load module abc.so

They do the same thing, but differently

The abc.py file

This Python source file can have a class (for objects) or functions in the file. It can import other files.

The abc.pyc file

This is a compiled Python file (from abc.py).

The abc.so load module

The load module is generated from C source.

This can defined a function based approach, so you would use it like

fileHandle = zos.fopen(“colin.c”,”rb”)
data = zos.fread(fileHandle)
zos.fclose(fileHandle)

You can provide many functions. Some functions may return a “handle” object which is passed to other functions.

It can also be object based and the C code creates a new type.

hFile = zos.fopen(“colin.c”,”rb”)
data = hFile.fread()
hFile.fclose()

The function calls are attached to the object (hFile) – rather than the load module zos.

Internally the object is passed to the function.

The abc directory with __init__.py

This is known as a “regular” module package.

It has the __init__.py file, and can have other files and subdirectories.

The __init__.py is run when the package is first imported, so this can import other packages and do other initialisation.

The abc directory without __init__.py

This is the follow-on to regular module package, known as a “namespace” package. It feels a bit strange, and I guess most people do not need to know about it.

I’ll give the concept view here, and give an expanded description below.

For example you have a couple of directories

  • /u/mystuff/xyz/abc.py
  • /u/mystuff/xyz/a2.py
  • /usr/myprod/xyz/hij.pj
  • /usr/myprod/xyz/klm.pj

and when the PythonPath has both directories in it, you can use

import xyz
from xyz import abc, klm

which selects the directories in the PythonPath and imports from these.

Packages

The documentation says …

Python defines two types of packages, regular packages and namespace packages. Regular packages are traditional packages as they existed in Python 3.2 and earlier. A regular package is typically implemented as a directory containing an __init__.py file. When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. The __init__.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.

A Namespace package is a composite of various portions, where each portion contributes a sub-package to the parent package. Portions may reside in different locations on the file system. Portions may also be found in zip files, or where-ever else that Python searches during import. Namespace packages may or may not correspond directly to objects on the file system; they may be virtual modules that have no concrete representation.

My view as to how they work is

Regular packages

You have PYTHONPATH pointing to a list of directories.

You want to import foo.

  • For each directory on PYTHONPATH
    • If <directory>/foo/__init__.py is found, return the regular package foo
    • If <directory>/foo.{py,pyc,so,pyd} is found, return the regular package foo

If this returns with a package then import the package.

Namespace package

You have PYTHONPATH pointing to a list of directories.

You want to import foo.

  • dirList = “”
  • For each directory on PYTHONPATH
    • If <directory>/foo/__init__.py is found, return the regular package foo
    • If <directory>/foo.{py,pyc,so,pyd} is found, return the regular package foo
    • If “<directory>/foo/” is a directory then dirList += “<directory>/foo/

If no package was returned, and dirList is not empty then we have a namespace package.

This can be used as follows

from foo import abc

has logic like

  • for d in dirlist:
    • if d/”abc.*” exists then return d/”abc….”

This has the advantage that you can work on a sub component.

If you have PYTHONPATH = /u/colin;/usr/python, and there is a file /u/colin/foo/abc.py, the statement from foo import abc, xyz imports /u/colin/foo/abc and /usr/python/foo/xyz.py

When is an error not an err?

If you step off the golden path of trying to read a file – you can quickly end up in in trouble and the diagnostics do not help.

I had some simple code

FILE * hFile = fopen(...); 
recSize = fread(pBuffer ,1,bSize,hFile); 
if (recSize == 0)
{
  // into the bog!
 if (feof(hFile))printf("end of file\n");
 else if (ferror(hFile)) printf("ferror(hFile) occurred\n");
 else printf("Cannot occur condition\n");
 
}

When running a unit test of the error path of passing a bad file handle, I got the “cannot occur condition because the ferror() returned “OK – no problem “

The ferror() description is

General description: Tests for an error in reading from or writing to the specified stream. If an error occurs, the error indicator for the stream remains set until you close the stream, call rewind(), or call clearerr().
If a non-valid parameter is given to an I/O function, z/OS XL C/C++ does not turn the error flag on. This case differs from one where parameters are not valid in context with one another.

This gave me 0, so it was not able to detect my error. ( So what is the point of ferror()?)

If I looked at errno and used perror() I got

errno 113
EDC5113I Bad file descriptor. (errno2=0xC0220001)

You may think that I need to ignore ferror() and check errno != 0 instead. Good guess, but it may not be that simple.

The __errno2 (or errnojr – errno junior)) description is

General description: The __errno2() function can be used when diagnosing application problems. This function enables z/OS XL C/C++ application programs to access additional diagnostic information, errno2 (errnojr), associated with errno. The errno2 may be set by the z/OS XL C/C++ runtime library, z/OS UNIX callable services or other callable services. The errno2 is intended for diagnostic display purposes only and it is not a programming interface. The __errno2() function is not portable.
Note: Not all functions set errno2 when errno is set. In the cases where errno2 is not set, the __errno2() function may return a residual value. You may use the __err2ad() function to clear errno2 to reduce the possibility of a residual value being

If you are going to use __errno2 you should clear it using __err2ad() before invoking a function that may set it.

I could not find if errno is clean or if it may return a residual value, so to be sure to set it before every use of a C run time library function.

Having got your errno value what do you do with it?

There are #define constants in errono.h such as

#define EIO 122 /* Input/output error */

You case use if ( errno == EIO ) …

Like many products there is no mapping of 122 to “EIO”, but you can use strerror(errno) to map the errno to the error string like EDC5113I Bad file descriptor. (errno2=0xC0220001). This also provides the errno2 string value.

Using ASCII stuff in a C program

This is another topic which looks simple of the surface but has hidden depths. (Where “hidden depths” is a euphemism for “looks buggy”). It started off as one page, but by the time I had discovered the unexpected behaviours, it became 10 times the size. The decision tree to say if text will be printed in ASCII or EBCDIC was three levels deep – but I give a solution.

Why do you want to use ASCII stuff in a C program?

Java, Python and Node.js run on z/OS, and they use ASCII for the character strings, so if you are writing JNI interfaces for Java, or C external functions for Python you are likely to use this.

Topics covered in the log post

The high level view

You can define ASCII character strings in C using:

char * pA = 0x41424c30 ;  // ABC0
// or 
#pragma convert(819)
char * pASCII = "CODEPAGE 819 data" ;
#pragma convert(pop)

And EBCDIC strings (for example when using ASCII compile option)

#pragma convert("IBM-1047") 
char * pEBCDIC = "CODEPAGE 1047  data" ; 
#pragma convert(pop) 

You can define that the whole program is “ASCII” by using the -Wc,ASCII or -qascii options at compile time. This also gives you printf and other functions.

You can use requests like fopen(“myfile”…) and the code successfully handles the file name in ASCII. Under the covers I expect the code does __a2e_l() to covert from ASCII to EBCDIC, then uses the EBCDIC version of fopen().

The executable code is essentially the same – the character strings are in ASCII, and the “Flags” data is different (to reflect the different compile options). At bind time different stubs are included.

To get my program compiled with the -qascii option, to successfully print to the OMVS terminal, piped using |, or redirect using 1>, I had to using the C run time function fcntl() to set the code page and tag of STDOUT and STDERR. See here below.

You need to set both STDERR and STDOUT – as perror() prints to STDERR, and you may need this if you have errors in the C run time functions.

Some of the hidden depths!

I had a simple “Hello World” program which I compiled in USS with the -qascii option. Depending on how I ran it, I got “Hello World” or “çÁ%%?-ï?Ê%À” (Hello World in ASCII).

  • straight “./prog”. The output in ASCII
  • pipe “./prog | cat”. If I used the environment variable _TAG_REDIR_OUT=”TXT” the output was in EBCDIC – otherwise it came out in ASCII.
  • redirect to a file “./prog 1> aaa”. Depending on the environment variable _BPXK_AUTOCVT=”ON”, and if the file existed or nor, and if the files existed and was tagged. The output could be in EBCDIC or ASCII!

So all in all – it looks a bit of a mess.

Background to outputting data

Initially I found printing of ASCII data was easy; then I found what I had written sometimes did not work, and after a week or so I had a clearer idea of what is involved, then I found a few areas which were even more complex. You may want to read this section once, read the rest of the blog post, then come back to this section.

A file can have

  • a tag – this is typically “this file is binary|text|unknown”
  • the code page of the data – or not specified.
  • you can use ls -T file.name to display the tag and code page of a file.

Knowing that information…

  • If a file has a tag=text and an ASCII code page,
    • if the _BPXK_AUTOCVT=”ON” environment flag is set it will display in EBCDIC (eg cat…)
    • else (_BPXK_AUTOCVT=”OFF”) it will display unreadable ASCII (eg cat…)
  • If a file has a tag=binary then converting to a different code page makes no sense. For example a .jpeg or load module. Converting a load module would change the instructions Store Half Word(x40) to a Load Positive (x20).
  • If a file is not tagged – it becomes a very fuzzy area. The output is dependant on the program running. An ASCII program would output as ASCII, and an EBCDIC would output as EBCDIC.
  • ISPF edit is smart enough to detect the file is ASCII – and enable ISPF edit command “Source ASCII”.

Other strands to the complexity

  • Terminal type
    • If you are logged on using rlogin, you can use chcp to change the tag or code page of your terminal.
    • If you are logged in through TSO – using OMVS, you cannot use chcp. I can’t find a command to set the tag or code page, but you can set it programmatically.
  • Redirection
    • You can print to the terminal or redirect the output to a file for example ./runHello 1>output.file .
    • The file can be created with the appropriate tag and code page.
    • You can use the environment variables _TAG_REDIR_OUT=TXT|BIN and _TAG_REDIR_ERR=TXT|BIN to specify what the redirected output will be.
    • If you use _TAG_REDIR_OUT=TXT, the output is in EBCDIC.
      • you can use ./prog | cat to take the output of prog and pipe it through cat to display the lines in EBCDIC.
      • you can use ./prog |grep ale etc. For me this displayed the EBCDIC line with Locale in it.
    • You can have the situation where if you print to the terminal it is in ASCII, if you redirect, it looks like EBCDIC!
  • The tag and ccsid are set on the first write to the file. If you write some ASCII data, then set the tag and code page, the results are unpredictable. You should set the tag and ccsid before you try to print anything.

What is printed?

A rough set of rules to determine what gets printed are

  • If the output file has a non zero code page, and _BPXK_AUTOCVT=”ON” is set, the output the data converted from the code page to EBCDIC.
  • If the output file has a zero code page, then use the code page from the program. A program complied with ASCII will have an ASCII code page, else it will default to 1047.

A program can set the tag, and code page using the fcntl() function.

Once the tag and code page for STDOUT and STDERR have been set, they remain set until reset. This leads to the confusing sequence of commands and output

  • run my ascii compiled “hello world” program – the data comes out as ASCII.
  • run Python3 –version which sets the STDOUT code page.
  • rerun my ascii compiled “hello world” program – the data comes out as EBCDIC! This is because the STDOUT code page was set by Python (and not reset).

I could not find a field to tell me which code page should be used for STDOUT and STDERR, so I suggest using 1047 unless you know any better.

Using fcntl() to display and set the code page of the output STDOUT and STDERR

To display the information for an open file you can use the fcntl() function. The stat() function also provide information on the open file. They may not be entirely consistent, when using pipe or redirect.

#include <fcntl.h> 
...
struct f_cnvrt f; 
f.cvtcmd = QUERYCVT; 
f.pccsid = 0; 
f.fccsid = 0; 
int action = F_CONTROL_CVT; 
rc =fcntl( STDOUT_FILENO,action, &f ); 
...
switch(f.cvtcmd) 
{ 
  case QUERYCVT : printf("QUERYCVT - Query"); break; 
  case SETCVTOFF: printf("SETCVTOFF -Set off"); break; 
  case SETCVTON : printf("SETCVTON -Set on unconditionally"); break; 
  case SETAUTOCVTON : printf("SETAUTOCVTON - Set on conditionally"); break; 
  default:  printf("cvtcmd %d",f.cvtcmd); 
} 
printf(" pccsid=%hd fccsid=%hd\n",f.pccsid,f.fccsid); 

For my program compiled in EBCDIC the output was

SETCVTOFF -Set off pccsid=1047 fccsid=0

This shows the program was compiled as EBCDIC (1047), and the STDOUT ccsid was not set.

When the ascii compiled version was used, the output, in ASCII was

SETCVTON -Set on unconditionally pccsid=819 fccsid=819

This says the program was ASCII (819) and the STDOUT code page was ASCII (819).

When I reran the ebcdic complied version, the output was

SETCVTOFF -Set off pccsid=1047 fccsid=0

Setting the code page

printf("before text\n");
f.cvtcmd = SETCVTON ; 
f.fccsid = 0x0417  ;  // code page 1047
f.pccsid = 0x0000  ;  // program 333 = ascii, 0 take default 
rc =fcntl( STDOUT_FILENO,action, &f ); 
if ( rc != 0) perror("fcntl"); 
printf("After fcntl\n"); 

The first time this ran as an ASCII program, the “before text” was not readable, but the “After fcntl” was readable. The second time it was readable. For example, the second time:

SETCVTON -Set on unconditionally pccsid=819 fccsid=1047
before text
After fcntl

You may want to put some logic in your program

  • use fcntl and f.cvtcmd = QUERYCVT for STDOUT and STDERR
  • if fccsid == 0 then set it to 1047, using fcntl and f.cvtcmd = SETCVTON

Set the file tag

This is needed to set the file tag for use when output is piped or redirected.

void settag(int fileNumber) 
{ 
  int rc; 
  int action; 
  struct file_tag ft; 
  memset(&ft,sizeof(ft),0); 
  ft.ft_ccsid = 0x0417; 
  ft.ft_txtflag = 1; 
  ft.ft_deferred = 0;  // if on then use the ccsid from the program! 
  action = F_SETTAG; 
  rc =fcntl( fileNumber,action, &ft); 
  if ( rc < 0) 
  { 
     perror("fctl f_settag"); 
     printf("F_SETTAG %i %d\n",rc,errno); 
  } 
}

What gets printed (without the fcntl() code)?

It depends on

  • if you are printing to the terminal – or redirecting the ouptut.
  • if the STDOUT fccsid is set to 1047
  • If _BPXK_AUTOCVT=”ON”

Single program case normal program

If you have a “normal” C program and some EBCDIC character data, when you print it, you get output like “Hello World”.

Single program case ASCII program or data – fccsid = 0

If you have a C program, compiled with the ASCII option, and print some ASCII data, when you print it you get output like ø/ËËÁÀ-À/È/-

Single program case ASCII program or data – fccsid = 1047

If you have a C program, compiled with the ASCII option, and print some ASCII data, when you print it you get output output like “Hello World”.

Program with both ASCII and EBCDIC data – compiled with ASCII

With a program with

#pragma convert("IBM-1047") 
 char * pEBCDIC ="CODEPAGE 1047  data"  ; 
#pragma convert(pop) 
printf("EBCDIC:%s\n",pEBCDIC); 

#pragma convert(819) 
char * pASCII = "CODEPAGE 819 data" ; 
#pragma convert(pop) 
printf("ASCII:%s\n",pASCII); 

When compiled without ascii it produces

EBCDIC:CODEPAGE 1047 data
ASCII:ä|àá& åá—–À/È/

When compiled with ascii it produces

EBCDIC:ÃÖÄÅ×ÁÇÅ@ñðô÷@@–£-
ASCII:CODEPAGE 819 data

Mixing programs compiled with/without the ASCII options

If you have a main program and function which have been compiled the same way – either both with ASCII compile option, or both without, and pass a string to the function to print, the output will be readable “Hello World”.

If the two programs have been compiled one with, and one without the ASCII options, an ASCII program will try to print the EBCDIC data, and an EBCDIC program will try to print the ASCII data.

If you know you are getting the “wrong” code page data, you can use the C run time functions __e2a_l or __a2e_l to convert the data.

Decoding/displaying the ASCII output

If you pipe the output to a file, for example ./cp2.so 1>a you can display the contents of the file converted from ASCII using

  • the “source ascii” command in ISPF edit, or
  • oedit . , and use the “ea” (edit ascii) or “/” line command

The “source ascii” works with SDSF output, with the SE line command.

Several times I had a file with EBCDIC data but the file had been tagged as ASCII. I used chtag -tc IBM-1047 file.name to reset it.

Under the covers

At compile time

There is a predefined macro __CHARSET_LIB which is is defined to 1 when the ASCII compiler option is in effect and to 0 when the ASCII compile option is not used.

There is C macro logic like defines which function to use

if (__CHARSET_LIB == 1) 
#pragma map (printf, "@@A00118")
else 
#pragma map (printf, "printf")

A program compiled with the ASCII option will use a different C run time module (@@A00118), compared with the non ASCII mode, which will use printf.

The #pragma map “renames” the function as input to the binder.

This is known as bimodal and is described in the documentation.

I want my program to be bimodal.

Bimodal is where your program can detect if it is running as ASCII or EBCDIC and make the appropriate decision. This can happen in you have code which is #included into the end user’s program.

This is documented here.

You can use

#define _AE_BIMODAL 1 
if (__isASCII()) // detect the run time
   __printf_a(format, string);  // use the ascii version
else
   __printf_e(format, string); // use the ebcdic version

fopen trace etc is not so useful

If you can specify an environment variable you can trace the C file operations.

This did not give much useful information, as it did not give the name of the file being processed, and I could not trace the file which was causing fopen problems, so overall a good idea – but a poor implementation.

How to set it up

See File I/O trace, Locating the file I/O trace and the environment variable _EDC_IO_TRACE

For example

export _EDC_IO_TRACE="(*,2,1M)"

Where filter is

Filter Indicates which files to trace.

  • //DD:filter Trace will include the DD names matching the specified filter string.
  • //filter Trace will include the MVS™ data sets matching the specified filter string. Member names of partitioned data sets cannot be matched without the use of a wildcard. filter Trace will include the Unix files matching the specified filter string.
  • //DD:* Trace will include all DD names.
  • //* Trace will include all MVS data sets. This is the default setting.
  • /* Trace will include all Unix files.
  • * Trace will include all MVS data sets and Unix files.

Detail – use 2.

Buffer size such as 1M or 50K .

The output goes to a file such as /tmp, but you can change this with

export _CEE_DMPTARG=”.”

This worked for me … but initially I could not read the output file. (It may because it came from Python which has been compiled with ASCII option.

The command ls -ltrT showed the file was tagged in ASCII, so I used

chtag -r EDC*

to reset it, and I could edit the file.

Sample output

Trace details for ((POSIX)):
        Trace detail level:  2 
        Trace buffer size:   1024K                                                                  
        fdopen(10,r)                                                                 
        fldata: 
            __recfmF:1........ 0            __dsorgVSAM:1..... 0 
            __recfmV:1........ 0            __dsorgHFS:1...... 1 
            __recfmU:1........ 1            __openmode:2...... 1 
...

Which is not very helpful as it does not tell you the file that has been opened!

When I traced a Python program, I only got information on 5 files – instead of the hundreds I was expecting.

Various abends and problems

I’ll list them here for search engines to find.

CEE3250C The system or user abend U4000 R=00007017 was issued.

U4000

  • Explanation: The assembler user exit could have forced an abend for an unhandled condition. These are user-specified abend codes.
  • System action:Task terminated.
  • Programmer response:
  • Check the Language Environment message file for message output. This will tell you what the original abend was.

There were no other messages. BPXBATCH ended with return code 2304 which means a kill -9 was issued.

If I remove the _EDC_IO_TRACE it works.

I also got a file BST-1.20220809.110241.83951661 etc which is tagged as ASCII – but is not.

This file had the trace for they Python file which was being run – including the name of the file.

If you cannot open a data set, amrc may help

I had a C program which opened a dataset and read from it. I enhanced it, by adding comments and other stuff, and after lunch it failed to open

I undid all my changes, and it still it failed to open! Weird.

I got message

  • EDC5061I
    • An error occurred when attempting to define a file to the system. (errno2=0xC00B0403)
    • Programmer response : Check the __amrc structure for more information. See z/OS XL C/C++ Programming Guide for more information on the __amrc structure.
  • C00B0403:
    • The filename argument passed to fopen() or freopen() specified dsname syntax. Allocation of a ddname for the dsname was attempted, but failed.
    • Programmer response: Failure information returned from SVC 99 was recorded in the AMRC structure. Use the information there to determine the cause of the failure.

This feels like the unhelpful messages Ive seen. “An error has occurred – we know what the error is – but we wont tell you” type messages.

To find the reason I had to add some code to my program.

 file =  fopen(fileName, mode     ); 
 __amrc_type save_amrc; 
 memcpy(&save_amrc,__amrc,sizeof(__amrc)); 
 printf("AMRC __svc99_info %hd error %hd\n",
         save_amrc.__code.__alloc.__svc99_info, 
         save_amrc.__code.__alloc.__svc99_error); 
                                                                                    

and it printed

AMRC __svc99_info 0 528

The DYNALLOC (dynamic allocation) which uses SVC 99 to allocate data sets, has a section Interpreting error reason codes from DYNALLOC. The meaning of 528 is Requested data set unavailable. The data set is allocated to another job and its usage attribute conflicts with this request.

And true enough, in one of the ISPF sessions in one of my TSO userid I was editing the file.

It looks like

printf(“__errno2 = %08x\n”, __errno2());

Would print the same information.

Thoughts

It appears that you cannot tell fopen to open it for read even if it has a write lock on it.

For DYNALLOC, if the request worked, these fields may have garbage in them – as I got undocumented values.

It would be nice if the developer of the fopen code produced messages like

EDC5061I: An error occurred when attempting to define a file to the system. (errno2=0xC00B0403) (AMRC=0x00000210)

Then it would be more obvious!

How do I use BPXBATCH? It is not that obvious.

I was running a Python script via BPXBATCH, with no problem. Then I extended it, and it was unable to find a load module. In getting this to work, I found out a lot about using BPXBATCH, and how things do not work as documented.

Getting started

You can run a shell program, or a “module” (program).

Use STDPARM

This can be used instead of specifying PARM=…. It avoids the 100 character restriction of PARM=…

You can use EXPORT SYMLIST, and SYMBOLS=EXECSYS for JCL variables to be passed into the DD * data.

//  EXPORT SYMLIST=*
// SET PY='/usr/lpp/IBM/cyp/v3r8/pyz/bin/python3'
// SET PR='/u/tmp/zos'
//STEP1 EXEC PGM=BPXBATSL,
//STDPARM DD *,SYMBOLS=EXECSYS             
pgm &PY &PR/my.py'
/*
//STDOUT   DD SYSOUT=* 
//STDERR   DD SYSOUT=* 
//SYSDUMP  DD SYSOUT=* 
//CEEDUMP  DD SYSOUT=* 
//STDIN    DD DUMMY 

This will execute program /usr/lpp/IBM/cyp/v3r8/pyz/bin/python3 and pass /u/tmp/zos/my.py

Run a program

You can use JCL like

// SET PY='/usr/lpp/IBM/cyp/v3r8/pyz/bin/python3'
// SET PR='/u/tmp/zos'
//STEP1 EXEC PGM=BPXBATSL,PARM='pgm &PY &PR/my.py'
//STDOUT   DD SYSOUT=* 
//STDERR   DD SYSOUT=* 
//SYSDUMP  DD SYSOUT=* 
//CEEDUMP  DD SYSOUT=* 
//STDIN    DD DUMMY 

This will execute program /usr/lpp/IBM/cyp/v3r8/pyz/bin/python3 and pass /u/tmp/zos/my.py

Run a shell

//STEP1   EXEC PGM=BPXBATSL,REGION=0M,TIME=NOLIMIT,MEMLIMIT=NOLIMIT, 
//   PARM='SH /u/tmp/zos/cc.sh' 
//STDENV   DD * 
PATH=/u/abc/ 
xyz=123 
//STDOUT   DD SYSOUT=* 
//STDOUT2  DD SYSOUT=* 
//STDERR   DD SYSOUT=* 
//SYSDUMP  DD SYSOUT=* 
//* SABEND DD SYSOUT=* 
//CEEDUMP  DD SYSOUT=* 
//STDIN    DD DUMMY 

This runs the shell script /u/tmp/zos/cc.sh.

Because this is a shell script, there are some profiles that may run before the script executes.

The documentation says in Customizing the shell environment variables

The places to set environment variables, in the order that the system sets them, are:

1. The RACF® user profile.
2.The /etc/profile file, which is a system-wide file that sets environment variables for all z/OS shell users. This file is only run for login shells.
3.The $HOME/.profile file, which sets environment variables for individual users. This file is only run for login shells.
4.The file named in the ENV environment variable. This file is run for both login shells and subshells.
5.A shell command or shell script.

Later settings take precedence. For example, the values set in $HOME/.profile override those in /etc/profile.

Colin’s notes

  • I cant find how to set any environment variables in the RACF profile.
  • The /etc/profile is only run if a shell(sh) command is issued
  • The $HOME/.profile. This needs a home entry in the RACF userid OMVS segment ( Use the TSO RACF command LU userid OMVS to display the OMVS information)
  • I specified a file in the ENV environment variable – this was not used. If the file did not exist it did not produce an error message. When I had //STDENV DD *… in my JCL the statements were used

//STDENV

When I used

//STDPARM   DD * 
sh export 
//STDENV   DD * 
PATH=/u/abc/ 
xyz=1234 
yy="ABCD" 
xx=$xyz 
/*

The export command listed all of the environment variables. These included

PATH=”/bin:/usr/sbin:/usr/lpp/jav….
xx=”\$xyz”
xyz=”1234″
yy=”\”ABCD\””

  • My PATH statement was not used. It was overwritten by /etc/profile (or $HOME/.profile)
  • The special characters $ and ” have been escaped.
  • It is not doing shell processing, for example in a shell xx=$xyz, says assign to xxx the value of zyz. All that happens is xx is assigned the literal value $xyz

So overall – it didn’t work as I expected it to, and I need to do some redesign.

Using BPXBATSL

When I copied some JCL which used BPXBATSL I got

BPXM018I BPXBATCH FAILED BECAUSE SPAWN (BPX1SPN) OF /BIN/LOGIN FAILED WITH RETURN CODE 0000009D REASON CODE
0B1B0473

BPXBATSL is an alias of BPXBATCH. I do not think it supports the SH command.

There are many ways to fail to read a file in a C program.

The 10 minute task to read a file from disk using a C program took a week.

There are several options you can specify on the C fopen() function and it was hard to find the one that worked. I basically tried all combinations. Once I got it working, I tried on other files, and these failed, so my little task turned into a big piece of research.

Depending on the fopen options and the use of fgets or fread I got different data when getting from a dataset with two records in it!

  • AAAA – what I wanted.
  • AAAAx’15’ – with a hex 15 on the end.
  • AA
  • AAAAx’15’BBBBx’15’- both the records, with each record terminated in a line-end x’15’.
  • AAAABBBB – both records with no line ends.
  • x’00140000′ x’00080000’AAAAx’000080000’BBBB.
    • This has a Block Descriptor Word (x’00140000′) saying the length of the block is x14.
    • There is a Record Descriptor Word (RDW) (X’00080000′) of length x’0008′ including the 4 bytes for the RDW.
    • There is the data for the first record AAAA,
    • A second RDW
    • The second data
    • The size of this record is 4 for the BWD, 4 for the first RDW, 4 for the AAAA, 4 for the second RDW, and 4 for the data = 20 bytes = x14.
  • Nothing!

Sections in this blog post

There are different sorts of data

  • z/OS uses data sets which have records. Records can be blocked (for better disk utilisation) It is more efficient to write one block containing 40 * 80 bytes records than write 40 blocks each of 80 bytes. You can treat the data as one big block (of size 3200 bytes) – or as 40 blocks of 80 (or 2 blocks of 1600 …) . With good blocking you can get 10 times the amount of data onto a disk. (If you imagine each record on disk needs a 650 byte header you will see why blocking is good).
  • You can have “text” files, which contain printable characters. They often use the new line character (x’15’) to indicate end of line.
  • You can have binary files which should be treated as a blob. In these characters line x’15’ do not represent new line. For example it might just be part of a sequence x’13141516′.
  • Unix (and so OMVS) uses byte-addressable storage.
    • “Normal files” have data in EBCDIC
    • You can use enhanced ASCII. Where you can have files with data in ASCII (or other code page). The file metadata contains the type of data (binary or text) and the code page of the text data.
    • With environment _BPXK_AUTOCVT you can have automatic conversion which converts a text file in ASCII to EBCDIC when it is read.

From this you can see that you need to use the correct way of accessing the data(records or byte addressable), and of using the data(text or binary).

Different ways of getting the data

You can use

  • the fread function which reads the data, according to the file open options (binary|text, blocked)
  • the fgets function which returns text data (often terminated by a line end character).

Introduction to fopen()

The fopen() function takes a file name and information on how the file should be opened and returns a handle. The handle can be used in an fread() function or to provide information about the file. The C runtime reference gives the syntax of fopen(), and there is a lot of information in the C Programming guide:

There is a lot of good information – but it didn’t help me open a file and read the contents.

The file name

The name can be, for example:

  • “DD:SYSIN” – a reference to a data set via JCL
  • “//’USER.PROCLIB(MYPROC)'” – a direct reference to a dataset
  • “/etc/profile” – an OMVS file.

If you are using the Unix shell you need to use “//’datasetname'” with both sets of quotes.

The options

This options string has one or more parameters.

The first parameter defines the operation, read, write, read+write etc. Any other parameters are in the format keyword=format.

The read operation can be

  • “rb” to read a binary file. In a binary file, the data is treated as a blob.
  • “rt” to read a text file. A text file can have auto conversion see FILETAG C run time option.
  • “r”

The type can be

  • type=record – read a record of data from the disk, and give the record to the application
  • type=blocked. It is more efficient in disk storage terms, to build up a record of 10 * 80 byte records into one 800 byte record (or even 3200 byte records). When a blocked record is read, the read request returns the first N bytes, then the next N bytes. When the end of the block is reached it will get retrieve the next block from disk.
  • not specified, which implies byte access, rather than record access, and use of fgets function.

An example fopen

FILE * f2 fopen(“//’COLIN.VB'”,”rt type=blocked”) ;

How much data is read at a time?

Depending on the type of data, and the open options a read request may return data

  • up to the end of the record
  • up to and including the new-line character
  • up to the size of the buffer you give it.

If you do not give it a big enough buffer you may have to issue several reads to get whole logical record.

The system may also “compress” data, for example remove trailing blanks from the end of a line. If you were expecting an 80 byte record – you might only get 20 bytes – so you may need to check this, and fill in the missing blanks if needed.

Reading the data

You can use the fread() function

The fread() parameters are:

  • the address of a buffer to hold the data
  • the unit of the buffer block size
  • the number of buffer blocks
  • the file handle.

It returns the number of completed buffer blocks used.

It is usually used

#define lBuffer 1024

char buffer[lBuffer];
size_t len = fread(buffer, 1 ,lBuffer ,fHandle );

This says the unit of the buffer is 1 character, and there are 1024 of them.

If the record returned was 80 bytes long, then len would be 80.

If it was written as

size_t len = fread(buffer, lBuffer, 1 ,fHandle );

This says the unit of buffer is 1024 – and there is one of them. The returned “length” is 0, as there were no full 1024 size buffers used.

It appears that a returned length of 0 means either end of file or a file error. You can tell if the read is trying to go past the end of the file (feof()) or was due to a file error (ferror()).

if (feof(hFile))…
else if (ferror(hFile))
{ int myerror = errno;
perror(“Colins error”);

}

You can use the fgets function.

The fgets() parameters are:

  • the address of a buffer to hold the data
  • the size of the buffer
  • the file handle.

This returns a null terminated string in the buffer. You can use strlen(buffer) to get the length of it. Any empty string would have length 0.

fgets()returns a pointer to the string, or NULL if there is an error. If there is an error you can use feof() or ferror() as above.

Results from reading data sets and files

I did many tests with different options, and configurations, and the results are displayed below.


The following are using in the tables:

  • Run.
    • job. The program was run in JCL using BPXBATSL or via PGM=…
    • unix shell – the program was run in OMVS shell.
  • BPXAUTO. This is used in OMVS shell. The environment variable _BPXK_AUTOCVT can have
    • “ON” – do automatic conversion from ASCII to EBCDIC where application
    • “OFF” do not do automatic conversion from ASCII to EBCDIC
  • open. The options used.
    • r readrb read binaryrt read text
    • “t=r” is shorthand for type=record
  • read. The C function used to read the data.
    • fread – this is the common read function. It can read text files and binary files
    • fgets – this is used to get text from a file. The records usually have the new-line character at the end of the data.
    • aread is a C program which uses fread – but the program is compiled with the ASCII option.
  • data
    • E is the EBCDIC data I was expecting. For example reading from a FB dataset member gave me an 80 byte record with the data in it.
    • E15 is EBCDIC data, but trailing blanks have been removed. The last character is an EBDCIC line end (x15).
    • A0A. The data is returned in ASCII, and has a trailing line end character (x0A). You can convert this data from ASCII to EBCDIC using the C run time function __a2e_l(buffer,len) .
    • Abuffer – data up to the size of the buffer ( or size -1 for fgets) the data in ASCII.
    • Ebuffer – data up to the size of the buffer ( or size -1 for fgets) the data in EBCDIC. This data may have newlines within the buffer

To read a dataset

This worked with a data set and inline data within a JOB.

RunBPXAUTOopenreaddata
JobNArb,t=rfreadE
r|rtfreadEbuffer
rfgetsE15
rtfgetsE15
Unix shellAnyrb,t=rfreadE
r|rtfreadEbuffer
rfgetsE15
rtfgetsE15

Read a normal(EBCDIC) OMVS file

RunBPXAUTOopenreaddata
JobOFFrb,t=rfreadE
offrtfgetsE15
off*freadEbuffer
Unix shelloffrfgetsE15
offrtfgetsE15
offrbfgetsE15
off*freadE15
onrb,t=rfreadE
onrbfgetsE15

Read an ASCII file in OMVS

RunBPXAUTOopenreaddata
JoboffragetsA0A
offrbagetsA0A
offrtagetsA0A
Unix shellonrfgetsE15
onrbfgetsE15
onrtfgetsE15
on*freadbuffer
offragetsA0A
offrbagetsA0A
offrtagetsA0A
off*freadAbuffer

To read a binary file

To read a data set

RunBPXAUTOopenreaddata
JoboffrbfreadE
Unix shellonrbfreadE

Read a binary normal(EBCDIC) OMVS file

RunBPXAUTOopenreaddata
JoboffrbfreadE
Unix shellonrbfreadE
offrbfreadE

Read a binary ASCII file in OMVS

If use list the attributes of a binary file, for example “ls -T ecrsa.p12” it gives

b binary T=off ecrsa1024.p12

Which shows it is a binary file

RunBPXAUTOopenreaddata
Joboffrb,t=rfreadE
Unix shellonrb,=rfreadE
offrb,t=rfreadE

Reading data sets from a shell script

If you try to use fopen(“DD:xxxx”…) from a shell script you will get

FOPEN: EDC5129I No such file or directory. (errno2=0x05620062)

If you use fopen(“//’COLIN.VB'”…) and specify a fully qualified dataset name if will work.

fopen(“//VB”..) will put the RACF userid in front off they name. For example attempt to open “//’COLIN.VB.'”

How big a buffer do I need?

The the buffer is not large enough to get all of the data in the record, the buffer will be filled up to the size of the data. For example using fgets and a 20 byte buffer, 19 characters were returned. The 20th character was a null. The “new-line” character was present only when the end of the record was got.

How can I tell the size of the buffer I need?

There is data available which helps – but does not give the whole picture.

With the C fldata() — Retrieve file information function you can get information from the file handle such as

  • the name of the dataset (as provided at open time)
  • record format – fixed, variable
  • dataset organisation (dsorg) – Paritioned, Temporary, HFS
  • open mode – text, binary, record
  • device type – disk, printer, hfs, terminal
  • blksize
  • maximum record length

With fstat(), fstat64() — Get status information from a file handle and lstat(), lstat64() — Get status of file name or symbolic link you can get information about an OMVS file name (or file handle). This has the information available in the OMVS “ls” command. For example

  • file serial number
  • owner userid
  • group userid
  • size of the file
  • time last access
  • time last modified
  • time last file status changed
  • file tag
    • ccsid, value or 0x000 for untagged, or 0xffff for binary
    • pure text flag.

Example output

For data sets and files (along the top of the table)

  • VB a sequential data set Variable Blocked
  • FB a member of user.proclib which is Fixed Block
  • SYSIN inline data in a job
  • Loadlib a library containing load modules (binary file)
  • OMVS file for example zos.c
  • ASCII for example z.py which has been tagged as ASCII

Other values

  • V variable format data
  • F fixed format data
  • Blk it has blocked records
  • B “Indicates whether it was allocated with blocked records”. I do not know the difference between Blk and V
  • U Undefined format
  • PS It is a Physical Sequential
  • PO It is partitioned – it as members
  • PDSE it is a PDSE
  • HFS it is a file from OMVS
VBFBSYSINloadibOMVS fileASCII
fldatarecfmV Blk BF Blk BF Blk BUUU
dsorgPSPO PDSMemPSPO PDSMem PDSEHFSHFS
devicediskdiskotherdiskHFSHFS
blocksize614461080640000
maxreclen10248080640010241024
statccsid00000819
file size00004780824

To read a record, you should use the maxreclen size. This may not be the size of the data record but it is the best guess.

It look like the maxreclen for Unix files is 1024 – I guess this is the page size on disk.