Parsing command line values

I wanted to pass multiple parameters to a z/OS batch program and parse the data. There are several different ways of doing it – what is the best way ?

This question is complicated by

Checking options

Processing command line options can mean a two stage process. Reading the command line, and then checking to ensure a valid combination of options have been specified.

If you have an option -debug with a value in range 0 to 3. You can either check the range as the option is processed, or have a separate section of checks once all the parameters have been passed. If there is no order requirement on the parameters you need to have separate code to check the parameters. If you can require order to the parameters, you might be able to have code “if -b is specified, then check -a has already been specified

I usually prefer a separate section of code at it makes the code clearer.

Command styles

On z/OS there are two styles of commands

def xx(abc) parm1(value) xyz

or the Unix way

-def -xx abc -parm1 -1 -a –value value1 -xyz.

Where you can have

  • short options “-a” and “-1”
  • long option with two “-“, as in “–value”,
  • “option value” as is “-xx abc”
  • “option and concatenated value” as in “-xyz”; option -x, value yz

I was interested in the “Unix way”.

  • One Unix way is to have single character option names like -a -A -B -0. This is easy to program – but it means the end user needs to lookup the option name every time as the options are not usually memorable.
  • Other platforms (but not z/OS) have parsing support for long names like – -userid value.
  • You can parse a string like ro,rw,name=value, where you have keyword=value using getsubopt.
  • I wrote a simple parser, and a table driven parser for when I had many options.

Defining the parameter string toJCL.

The traditional way of defining a parameter string in batch is EXEC PGM=MYPROG,PARM=’….’ but the parameter is limited in length.

I tend to use

// SET P1=COLIN.PKIICSF.C 
// SET P2="optional"
//S1 EXEC PGM=MYPROG,PARM='parms &P1 &P2'  

You can get round the parameter length limitation using

//ISTEST   EXEC PGM=CGEN,REGION=0M,PARMDD=MYPARMS 
//MYPARMS DD * 
/ 
 -detail 0 
 -debug 0 
 -log "COLINZZZ" 
 -cert d

Where the ‘/’ on its own delimits the C run time options from my program’s options.

The values are start in column 2 of the data. If it starts in column 1, the value is concatenated to the value in the previous line.

You can use JCL and System symbols

// EXPORT SYMLIST=(*) 
// SET LOG='LOG LOG' 
//ISTEST   EXEC PGM=CGEN,REGION=0M,PARMDD=MYPARMS 
//MYPARMS DD *,SYMBOLS=EXECSYS
/ 
 -log "COLINZZZ" 
 -log "&log"
 ...

This produced -log COLINZZZ -log “LOG LOG”

Parsing the data

C main programs have two parameters, a count of the number of parameter, and an array of null terminated strings.

You can process these

int main( int argc, char *argv??(??)) 
{ 
  int iArg; 
  for (iArg = 1;iArg< argc; iArg ++   ) 
  { 
    printf(".%s.\n",argv[iArg]); 
  } 
  return 0; 
} 

Running this job

//CPARMS   EXEC  CCPROC,PROG=PARMS 
//ISTEST   EXEC PGM=PARMS,REGION=0M,PARMDD=MYPARMS 
//MYPARMS DD * 
/ 
 -debug 0 
 -log "COLIN  ZZZ" 
 -cert 
 -ae colin@gmail.com 

gave

.-debug.                   
.0.                        
.-log.                     
.COLIN  ZZZ.               
.-cert.                    
.-ae.                      
.colin@gmail.com.          

and we can see the string “COLIN ZZZ” in double quotes was passed in as a single string.

Parsing with single character options

C has a routine getopt, for processing single character options like -a… and -1… (but not -name) for example

while ((opt = getopt(argc, argv, "ab:c:")) != -1) 
   { 
       switch (opt) { 
       case 'a': 
           printf("-a received\n"); 
           break; 
       case 'b': 
           printf("-b received \n"); 
           printf("optarg %d\n",optarg); 
           if (optarg) 
             printf("-b received value %s\n",optarg); 
           else 
             printf("-b optarg is0       \n"); 
           break; 
       case 'c': 
           printf("-c received\n"); 
           printf("optarg %d\n",optarg); 
           if (optarg) 
             printf("-c received value %s\n",optarg); 
           else 
             printf("-c optarg is0       \n"); 
           break; 
       default: /* '?' */ 
           printf("Unknown n"); 
     } 
   } 

The string “ab:c:” tells the getopt function that

  • -a is expected with no option
  • -b “:” says an option is expected
  • -c “:” says an option is expected

I could only get this running in a Unix environment or in a BPXBATCH job. In batch, I did not get the values after the option.

When I used

//BPX EXEC PGM=BPXBATCH,REGION=0M,
// PARM='PGM /u/tmp/zos/parm.so -a -b 1 -cc1 '

the output included

-b received value b1
-c received value c1

This shows that “-b v1” and “-cc1” are both acceptable forms of input.

Other platforms have a getopt_long function where you can pass in long names such as –value abc.

getsubopt to parse keyword=value

You can use getsubopt to process an argument string like “ro,rw,name=colinpaice”.

If you had an argument like “ro, rw, name=colinpaice” this is three strings and you would have to use getsubopt on each string!

You have code like

int main( int argc, char *argv??(??)) 
{ 
 enum { 
       RO_OPT = 0, 
       RW_OPT, 
       NAME_OPT 
   }; 
   char *const token[] = { 
       [RO_OPT]   = "ro", 
       [RW_OPT]   = "rw", 
       [NAME_OPT] = "name", 
       NULL 
   }; 
   char *subopts; 
   char *value; 

   subopts = argv[1]; 
 while (*subopts != '\0' && !errfnd) { 
   switch (getsubopt(&subopts, token, &value)) { 
     case RO_OPT: 
       printf("RO_OPT specified \n"); 
       break; 
     case RW_OPT: 
       printf("RW_OPT specified \n"); 
       break; 
     case NAME_OPT: 
       if (value == NULL) { 
          printf("Missing value for " 
                 "suboption '%s'\n", token[NAME_OPT]); 
           continue; 
       } 
       else 
         printf("NAME_OPT value:%s\n",value);
         break; 
    default: 
         printf("Option not found %s\n",value); 
         break; 
     }  // switch 
   } // while 
 }  

Within this is code

  • enum.. this defines constants RO_OPT = 0 RW_OP = 1 etc
  • char const * token defines a mapping from keywords “ro”,”rw” etc to the constants defined above
  • getsubopt(&subopts, token, &value) processes the string, passes the mapping, and the field to receive the value

This works, but was not trivial to program

It did not support name=”colin paice” with an imbedded blank in it.

My basic command line parser(101)

I have code

for (iArg = 1;iArg< argc; iArg ++   ) 
{ 
  // -cert is a keyword with no value it is present or not
  if (strcmp(argv[iArg],"-cert") == 0) 
  { 
    function_code = GENCERT    ; 
    continue; 
  } 
  else 
  //  debug needs an option
  if (strcmp(argv[iArg],"-debug") == 0 
      && iArg +1 < argc) // we have a value 
      { 
        iArg  ++; 
        debug = atoi(argv[iArg]); 
        continue; 
      } 
  else 
  ...
  else 
    printf("Unknown parameter or problem near parameter %s\n", 
           argv[iArg]);
  }   // for outer - parameters 

This logic processes keywords with no parameters such as -cert, and keyword which have a value such as -debug.

The code if (strcmp(argv[iArg],”-debug”) == 0 && iArg +1 < argc) checks to see if the keyword has been specified, and that there is a parameter following it (that is, we have not run off the end of the parameters).

Advanced – table – ize it

For a program with a large number of parameters I used a different approach. I created a table with option name, and pointer to the fields variable.

For example

getStr lookUpStr[] = { 
    {"-debug", &debug     }, 
    {"-type",  &type       }, 
    {(char *) -1,  0} 
   }; 

You then check each parameter against the list. To add a new option – you just update the table, with the new option, and the variable.

int main( int argc, char *argv??(??)) 
{ 
   char * debug = "Not specified"; 
   char * type   = "Not specified"; 
   typedef struct getStr 
   { 
      char * name; 
      char ** value; 
   } getStr; 
   getStr lookUpStr[] = { 
       {"-debug", &debug     }, 
       {"-type",  &type       }, 
       {(char *) -1,  0} 
      }; 
  int iArg; 
  for (iArg = 1;iArg< argc; iArg ++   ) 
  { 
   int found = 0; 
   getStr * pGetStr =&lookUpStr[0];
   // iterate over the options with string values
   for (; pGetStr -> name != (char *)  -1; pGetStr ++) 
   { 
     // look for the arguement in the table
     if (strcmp(pGetStr ->name, argv[iArg]) == 0) 
     { 
       found = 1; 
       iArg ++; 
       if (iArg < argc) // if there are enough parameters
                        // so save the pointer to the data
        *( pGetStr -> value)= argv[iArg] ; 
       else 
         printf("Missing value for %s\n", argv[iArg]);       
       break;  // skip the rest of the table
     }  // if (strcmp(pGetStr ->name, argv[iArg]) == 0) 
     if (found > 0) break; 
    } // for (; pGetStr -> name != (char *)  -1; pGetStr ++) 
   
   if (found == 0) 
   // iterate over the options with int values 
   ....
  } 
  printf("Debug %s\n",debug); 
  printf("Type  %s\n",type ); 
  return 0; 
}   

This can be extended so you have

getStr lookUpStr[] = { 
    {"-debug", &debug, "char" }, 
    {"-type",  &type ,"int"       }, 
    {(char *) -1,  0, 0} 
   }; 

and have logic like

if (strcmp(pGetStr ->name, argv[iArg]) == 0) 
     { 
       found = 1; 
       iArg ++; 
       if (iArg < argc) // if there are enough parmameters
       {
       if ((strcmp(pGetStr -> type, "char") == 0 
        *( pGetStr -> value)= argv[iArg] ; 
       else 
        if ((strcmp(pGetStr -> type, "int ") == 0 )
        *( pGetStr -> value)= atoi(argv[iArg]) ;
      ...   
     }

You can go further and have a function pointer

getStr lookUpStr[] = { 
    {"-debug", &debug,myint }, 
    {"-loop", &loop  ,myint },  
    {"-type",  &type , mystring  }, 
    {"-type",  &type , myspecial  }, 
    {(char *) -1,  0, 0} 
   };f

and you have a little function for each option. The function “myspecial(argv[iarg])” looked up values {“approved”, “rejected”…} etc and returned a number representation of the data.

This takes a bit more work to set up, but over all is cleaner and clearer.

What’s the date in ‘n’ days time?

I needed to see if a certificate is due to expire within “n” days. How do I find this date? It turns out to be pretty easy using standard C functions.

                                                                          
#include <stdio.h> 
#include <time.h> 
int main( int argc, char *argv??(??)) 
{ 
....
    char expireDate[11]; 
    time_t t1, t3; 
    struct tm *t2; 
    t1 = time(NULL); 
    t2 = localtime(&t1); 
    t2 -> tm_mday += 40 ; // 40 days from now 
    t3 = mktime(t2); 
    int s; 
    s=  strftime( expireDate, 11, "%Y/%m/%d", t2  ); 
    printf("====size  %d================\n",s); 
    printf(".%10.10s\n",expireDate); 

This successfully printed out the date 40 days, from now. The only little problem I had, was with strftime. The size of the output is 10 bytes. The “11” specifies the maximum number of characters that can be copied into the array. If this was 10… the size of the data I was expecting, The output was wrong “. 2023/06/1” ; off by one character in the buffer and a leading blank.!

With the technique of changing the value within a tm structure you can get the date-time n seconds / m minutes / d days/ from now either in the future – or in the past.

Clever stuff !

Easy question – hard answer, how to I convert a hex string to hex byte string in C?

I have a program which takes as input a hex string, and this needed to be converted to an internal format, specifically a DER encoded format ( also known as a TLV, Tag, Length, Value);

This took me a good couple of hours to get right, and I thought the solution would be worth passing on.

The problem is: I have a C program and I pass in a parameter -serial abcd123456. I want to create a hex string 0x02llabcd123456 where ll is 5 – the length of the data.

Read the parameter

for (iArg = 1;iArg< argc; iArg ++   ) 
{ 
   if (strcmp(argv[iArg],"-serial") == 0 
      && iArg +1 < argc) // we have a value 
   { 
      iArg ++; 
      char * pData = argv[iArg]; 
      int iLen = strlen(pDataz); 
      if ( iLen > 16 ) 
      { 
         printf("Serial is too long(%d) %s.\n",iLen,pData); 
         return 8; 
      } 
    ... process it.
}

The

if (strcmp(argv[iArg],"-serial") == 0 
      && iArg +1 < argc) // we have a value 

checks to see if -serial was specified, and there is a parameter following. It handles the case of passing “-serial” and no following parameter.

Convert it from hex string to internal hex

I looked at various ways of converting the character string to hex, and decided the internal C run time sscanf function was best. This is the opposite of printf. It takes a formatting string and converts from printable to internal format.

For example

sscanf(pData,”%ix”,&i);

Would processes the characters in the data pointed to by pData and covert them, treating hen as hex data, to an integer &i. The processing continues until a non valid hex character is met, or the integer value is full.

If the parameter was –serial AC, the output value would be 0x000000AC.

I initially tried this, but then I had to go along and ignore any leading zeros.

You can use

sscanf(pData,”%6hh”,&bs[0]);

To read up to 6 characters into the byte string bs. If the parameter was –serial AC, the output value would be 0xAC….

This is almost what I want – I want a left justified byte string. But I have a variable length, and cannot pass a length as part of the string.

I managed this using a combindation of sprintf and sscanf.

The final-ish solution

 
int len = strlen(pData); // get the length of passed value
char sscan[20];   // used for sscanf string 
// we need to covert an ebcdic string to hex, so 
//  "1" needs length of 1, 
//  "12" needs length of 1 
//  "123" needs length of 2 etc 
int lHex = (len + 1) /2; 
                                                       
// convert to %4ddx 
// create a string for the sscan with the length 
// as it needs a hard coded length 
sprintf(&sscan[0],"%%%dhhx\0", len);
// if len = 4 this creates "%4hhx"
char tempOutput[16];
// Now use it 
sscanf(pData,&sscan[0],&tempOutput[0]); 

and I have &tempOutput containing my data – of length lHex.

This worked until it didn’t work

This worked fine until a couple of hours later. If the hex value was 7F… or smaller it worked. If it was 80… or larger it did not work.

This is because of the way the DER format handles signed numbers.

The value 0x02036789ab says

  • 02 this is an integer field
  • 03 of length 03
  • with value 6789ab.

The value 0x0203Abcdef says

  • 02 this is an integer field
  • 03 of length 03
  • with negative value Abcdef – negative because the top bit of the number is a 1.

Special casing for negative numbers

I had to allows for this negative number effect.

For negative numbers, the output needs to be 0x020400abcdef which says

  • 02 this is an integer field
  • 04 of length 04
  • with value 00abcdef – positive because the top bit is zero.

pBuffer points to the output byte field.

 if (tempOutput[0] < 0x80) 
 { 
   memcpy(pBuffer+1,&temp[0],lHex); // move the data
   *pBuffer = lHex; // char value of the length 
 } 
 else // we need to insert extra so we do not get -ve 
 { 
   *(pBuffer+1) = 0x00; // insert extra null 
   memcpy(pBuffer+2,&temp[0],lHex); // and the rest 
   *pBuffer = lHex +1 ; // char value of the length 
 } 

The solution is easy with hindsight.

Improving application performance – why, how ?

I’m working on a presentation on performance, for some university students, and I thought it would be worth blogging some of the content.

I had presented on what it was like working in industry, compared to working in a university environment. I explained what it is like working in a financial institutions; where you have 10,000 transactions a second, transactions response time is measured in 10s of milliseconds, and if you are down for a day you are out of business. After this they asked how you tune the applications and systems at this level of work.

Do you need to do performance tuning?

Like many questions about performance the answer is it depends….. it comes down to cost benefit analysis. How much CPU (or money) will you save if you do analysis and tuning. You could work for a month and save a couple of hundred pounds. You could work for a day and find CPU savings which means you do not need to upgrade your systems, and so save lots of money.

It is not usually worth doing performance analysis on programs which run infrequently, or are of short duration.

Obvious statements

When I joined the performance team, the previous person in the role had left a month before, and the hand over documentation was very limited. After a week or so making tentative steps into understanding work, I came to the realise the following (obvious once you think about it) statements

  • A piece of work is either using CPU or is waiting.
  • To reduce the time a piece of work takes you can either reduce the CPU used, or reduce the waiting time.
  • To reduce the CPU you need to reduce the CPU used.
  • The best I/O is no I/O
  • Caching of expensive operations can save you a lot.

Scenario

In the description below I’ll cover the moderately a simple case, and also the case where there are concurrent threads accessing data.

Concurrent activity

When you have more than one thread in your application you will need to worry about data consistency. There are locks and latches

  • Locks tend to be “long running” – from milliseconds to seconds. For example you lock a database record while updating it
  • Latches tend to be held across a block of code, for example manipulation of lists and updating pointers.

Storing data in memory

There are different ways of storing data in memory, from arrays, hash tables to binary trees. Some are easy to use, some have good performance.

Consider having a list of 10,000 names, which you have to maintain.

Array

An array is a contiguous block of memory with elements of the same size. To locate an element you calculate the offset “number of element” * size of element.

If the list is not sorted, you have to iterate over the array to find the element of interest.

If the list is sorted, you can do a binary search, for example if the array has 1000 elements, first check element 500, and see if the value is higher or lower, then select element 250 etc.

An array is easy to use, but the size is inflexible; to change the size of the array you have to allocate a new array, copy old to new, release old.

Single Linked list

This is a chain of elements, where each element points to the next, the there is a pointer to the start of the chain, and something to say end of chain ( often “next” is 0).

This is flexible, in that you can easily add elements, but to find an element you have to search along the chain and so this is not suitable for long chains.

You cannot easily delete an element from the chain.

If you have A->B->D->Q. You can add a new element G, by setting G->Q, and D->G. If there are multiple threads you need to do this under a latch.

Doubly linked lists

This is like a single linked list, but you have a back chain as well. This allows you to easily delete an element. To add an element you have to update 4 pointers.

This is a flexible list where you can add and remove element, but you have to scan it sequentially to find the element of interest, and so is not suitable for long chains.

If there are multiple threads you need to do this under a latch.

Hash tables

Hash tables are a combination of array and linked lists.

You allocate an array of suitable size, for example 4096. You hash the key to a value between 0 and 4095 and use this as the index into the array. The value of the array is a linked list of elements with the same hash value, which you scan to find the element of interest.

You need a hash table size so there are a few (up to 10 to 50) elements in the linked list. The hash function needs to produce a wide spread of values. Having a hash function which returned one value, means you would have one long linked list.

Binary trees

Binary trees are an efficient way of storing data. If there are any updates, you need to latch the tree while updates are made, which may slow down multi threaded programs.

Each node of a tree has 4 parts

  • The value of this node such as “COLIN PAICE”
  • A pointer to a node for values less than “COLIN PAICE”
  • A pointer to a node for values greater than “COLIN PAICE”
  • A pointer to the data record for this node.

If the tree is balanced the number of steps from the start of the tree to the element of interest is approximately the same for all elements.

If you add lots of elements you can get an unbalanced tree where the tree looks like a flag pole – rather than an apple tree. In this case you need to rebalanced the tree.

You do not need to worry about the size of the tree because it will grow as more elements are added.

If you rebalance the tree, this will require a latch on the tree, and the rebalancing could be expensive.

There are C run time functions such as tsearch which walks the tree and if the element exists in the tree, it returns the node. If it did not exist in the tree, it adds to the free, and returns the value.

This is not trivial to code – (but is much easier than coding a tree yourself).

You need to latch the tree when using multiple threads, which can slow down your access.

Optimising your code

Take the scenario where you write an application which is executed a 1000 times a second.

int myfunc(char * name, int cost, int discount)
{
  printf(“Values passed to myfunc %s cost discount" i\n”,name,cost,discount);
  rc= dosomething()  
  rc = 0;
  printf(“exit from myfunc %i\n”,rc);
  return rc;
}

Note: This is based on a real example, I went to a customer to help with a performance problem, and found the top user was printf() – printing out logging information. They commented this code out in all of their functions and it went 5 times faster

You can make this go faster by having a flag you set to produce trace output, so

if (global.trace ) 
    printf(“Values passed to myfunc %s cost discount" i\n”,name,cost,discount);

You could to the same for the exit printf, but you may want to be more subtle, and use

if (global.traceNZonexit  && rc != 0)
   printf(“exit from myfunc %i\n”,rc);

This is useful when the return code is 0 most of the time. It is useful if someone reports problems with the application – and you can say “there is a message access-denied” at the time of your problem.

FILE * hFILE = 0;
for ( I = 0;i < 100;i ++)
    /* create a buffer with our data in it */
    lenData =  sprintf(buffer,”userid %s, parm %s\n”, getid(), inputparm); 
    error = ….()
    if (error > 0)
    {
     hFILE = fopen(“outputfile”,”a);
     fwrite(buffer,1,lenData,fFile)
     fclose(hFile)
    }
…
}

This can be improved

  • by moving the getid() out of the loop – it does not change within the loop
  • move the lenData = sprintf.. within the error loop.
  • change the error loop
{
  ... 
  if (error > 0)
  {
     if (hFile == 0 )
     {  
        hFILE = fopen(“outputfile”,”a”);
        pUserid = strdup(getuserid());  
     } 
     fwrite(buffer,1,lenData,fFile)     
  }
...
}
if (hFile > 0) 
   fclose(hFile);

You can take this further, and have the file handle passed in to the function, so it is only opened once, rather than every time the function is invoked.

main()
{
   struct {FILE * hFile
      …
    } threadBlock
   for(i=1,i<9999,i++)
   myprog(&threadBlock..}
   if (threadBlock →hFile != 0 )fclose(theadBlock → hFile);
   }
}
// subroutine
   myprog(threadblock * pt....){
...

  if (error > 0)
  {
     if (pt -> hFile == 0 )
     {  
        pt -> hFile= fopen(“outputfile”,”a”);       
     } 
     fwrite(buffer,1,lenData,pt -> hFile)
  }
   

Note: If this is a long running “production” system you may want to open the file as part of application startup to ensure the file can be opened etc, rather than find this out two days later.

Migrating from cc to xlc is like playing twister

I needed to compile a file in Unix System Services; I took an old make file, changed cc to xlc expecting it to compile and had lots of problems.

It feels like the documentation was well written in the days of the cc and c89 complier, and has a different beast inserted into it.

As started to write this blog post, I learned even more about compiling in Unix Services on z/OS!

Make file using cc

cparmsa= -Wc,"SSCOM,DEBUG,DEF(MVS),DEF(_OE_SOCKETS),UNDEF(_OPEN_DEFAULT),NOOE 
cparmsb= ,SO,SHOW,LIST(),XREF,ILP32,DLL,SKIPS(HIDE)" 
syslib= -I'/usr/include' -I'/usr/include/sys'  -I"//'TCPIP.SEZACMAC'" -I"//'TCPIP.SEZANMAC'" 
all: main 
parts =  tcps.o 
main: $(parts)
  cc -o tcps  $(parts) 
                                                                                                                            
%.o: %.c 
 cc  -c -o $@   $(syslib) $(cparmsa)$(cparmsb)    -V          $< 
 
clean: 
 rm  *.o 

The generated compile statement is

cc -c -o tcps.o -I’/usr/include’ -I’/usr/include/sys’ -I”//’TCPIP.SEZACMAC'” -I”//’TCPIP.SEZANMAC'” -Wc,”SSCOM,DEBUG,DEF(MVS),DEF(_OE_SOCKETS),UNDEF(_OPEN_DEFAULT),NOOE,SO, SHOW,LIST(),XREF,ILP32,DLL,SKIPS(HIDE)” -V tcps.c

Note the following

  • the -V option generates the listing. “-V produces all reports for the compiler, and binder, or prelinker, and directs them to stdout“. If you do not have -V you do not get a listing.
  • -Wc,list() says generate a list with a name like tcps.lst based on the file name being compiled. If you use list(x.lst) it does not produce any output! This is contrary to what the documentation says. (Possible bug on compiler when specifying nooe”
  • SHOW lists the included files
  • SKIPS(HIDE) omits the stuff which is not used – see below.

Make using xlc

I think the xlc compiler has bits from z/OS and bits from AIX (sensibly sharing code!). On AIX some parameters are passed using -q. You might use -qSHOWINC or -qNOSHOWINC instead of -Wc,SHOWINC

cparmsx= -Wc,"SO,SHOW,LIST(lst31),XREF,ILP32,DLL,SSCOM, 
cparmsy= DEBUG,DEF(MVS),DEF(_OE_SOCKETS),UNDEF(_OPEN_DEFAULT),NOOE" 
cparms3= -qshowinc -qso=./lst.yy  -qskips=hide -V 
syslib= -I'/usr/include' -I'/usr/include/sys'  -I"//'TCPIP.SEZACMAC'" -I"//'TCPIP.SEZANMAC'" 
all: main 
parts =  tcps.o 
main: $(parts) 
  cc -o tcps  $(parts) 
                                                                                                      
%.o: %.c 
 xlc -c -o $@   $(syslib) $(cparmsx)$(cparmsy) $(cparms3)     $< 
                                                                                                      
clean: 
 rm  *.o 

This generates a statement

xlc -c -o tcps.o -I’/usr/include’ -I’/usr/include/sys’ -I”//’TCPIP.SEZACMAC'” -I”//’TCPIP.SEZANMAC'” -Wc,”SO,SHOW,LIST(lst31),XREF, ILP32,DLL, SSCOM,DEBUG,DEF(MVS),DEF(_OE_SOCKETS), UNDEF(_OPEN_DEFAULT),NOOE” -qshowinc -qso=./lst.yy -qskips=hide tcps.c

Note the -q options. You need -qso=…. to get a listing.

Any -V option is ignored, and LIST(…) is not used.

Note: There is a buglet in the compiler, specifying nooe does not always produce a listing. The above xlc statement gets round this problem.

SKIPS(SHOW|HIDE)

The SKIPS(HIDE) also known as SKIPSRC shows you what is used, and suppresses text which is not used. I found this useful trying to find the combination of #define … to get the program to compile.

For example with SKIPS(SHOW)

170 |#if 0x42040000 >= 0X220A0000                               | 672     4      
171 |    #if defined (_NO_PROTO) &&  !defined(__cplusplus)      | 673     4      
172 |        #define __new210(ret,func,parms) ret func ()       | 674     4      
173 |    #else                                                  | 675     4      
174 |        #define __new210(ret,func,parms) ret func parms    | 676     4      
175 |    #endif                                                 | 677     4      
176 |#elif !defined(__cplusplus) && !defined(_NO_NEW_FUNC_CHECK)| 678     4      
177 |       #define __new210(ret,func,parms) \                  | 679     4      
178 |        extern struct __notSupportedBeforeV2R10__ func     | 680     4      
179 |   #else                                                   | 681     4      
180 |     #define __new210(ret,func,parms)                      | 682     4      
181 |#endif                                                     | 683     4      

With SKIPS(HIDE) the bold lines are not displayed,

170 |#if 0x42040000 >= 0X220A0000                              | 629     4 
171 |    #if defined (_NO_PROTO) &&  !defined(__cplusplus)     | 630     4 
172 |        #define __new210(ret,func,parms) ret func ()      | 631     4 
173 |     else                                                 | 632     4 
175 |    #endif                                                | 633     4 
176 |                                                          | 634     4 
179 |   #else                                                  | 635     4 
181 |#endif                                                    | 636     4 
182 |#endif                                                    | 637     4 

This shows

  • 170 The line number within the included file
  • 629 The line number within the file
  • 4 is the 4th included file. In the “I N C L U D E S” section it says 4 /usr/include/features.h
  • rows 174 is missing … this is the #else text which was not included
  • rows 177, 178,180 are omitted.

This makes is much easier to browse through the includes to find why you have duplicate definitions and other problems.

Compiling the TCP/IP samples on z/OS

Communications server (TCPIP) on z/OS provides some samples. I had problems getting these to compile, because the JCL in the documentation was a) wrong and b) about 20 years behind times.

Samples

There are some samples in TCPIP.SEZAINST

  • TCPS: a server which listens on a port
  • TCPC: a client which connects to a server using IP address and port
  • UDPC: C socket UDP client
  • UDPS: C socket UDP server
  • MTCCLNT: C socket Multitasking client
  • MTCSRVR: C socket Multitasking server
  • MTCCSUB: C socket subtask MTCCSUB

The JCL I used is

//COLCOMPI   JOB 1,MSGCLASS=H,COND=(4,LE) 
//S1          JCLLIB ORDER=CBC.SCCNPRC 
// SET LOADLIB=COLIN.LOAD 
// SET LIBPRFX=CEE 
// SET SOURCE=COLIN.C.SOURCE(TCPSORIG) 
//COMPILE  EXEC PROC=EDCCB, 
//       LIBPRFX=&LIBPRFX, 
//       CPARM='OPTFILE(DD:SYSOPTF),LSEARCH(/usr/include/)', 
// BPARM='SIZE=(900K,124K),RENT,LIST,RMODE=ANY,AMODE=31' 
//COMPILE.SYSLIB DD 
//               DD 
//               DD DISP=SHR,DSN=TCPIP.SEZACMAC 
//*              DD DISP=SHR,DSN=TCPIP.SEZANMAC  for IOCTL 
//COMPILE.SYSOPTF DD * 
DEF(_OE_SOCKETS) 
DEF(MVS) 
LIST,SOURCE 
TEST 
RENT ILP32        LO 
INFO(PAR,USE) 
NOMARGINS EXPMAC   SHOWINC XREF 
LANGLVL(EXTENDED) sscom dll 
DEBUG 
/* 
//COMPILE.SYSIN    DD  DISP=SHR,DSN=&SOURCE 
//BIND.SYSLMOD DD DISP=SHR,DSN=&LOADLIB. 
//BIND.SYSLIB  DD DISP=SHR,DSN=TCPIP.SEZARNT1 
//             DD DISP=SHR,DSN=&LIBPRFX..SCEELKED 
//* BIND.GSK     DD DISP=SHR,DSN=SYS1.SIEALNKE 
//* BIND.CSS    DD DISP=SHR,DSN=SYS1.CSSLIB 
//BIND.SYSIN DD * 
  NAME  TCPS(R) 
//START1   EXEC PGM=TCPS,REGION=0M, 
// PARM='4000          ' 
//STEPLIB  DD DISP=SHR,DSN=&LOADLIB 
//SYSERR   DD SYSOUT=*,DCB=(LRECL=200) 
//SYSOUT   DD SYSOUT=*,DCB=(LRECL=200) 
//SYSPRINT DD SYSOUT=*,DCB=(LRECL=200) 

Change the source

The samples do not compile with the above JCL. I needed to remove some includes

#include <manifest.h> 
// #include <bsdtypes.h> 
#include <socket.h> 
#include <in.h> 
// #include <netdb.h> 
#include <stdio.h> 

With the original sample I got compiler messages

ERROR CCN3334 CEE.SCEEH.SYS.H(TYPES):66 Identifier dev_t has already been defined on line 98 of “TCPIP.SEZACMAC(BSDTYPES)”.
ERROR CCN3334 CEE.SCEEH.SYS.H(TYPES):77 Identifier gid_t has already been defined on line 101 of “TCPIP.SEZACMAC(BSDTYPES)”.
ERROR CCN3334 CEE.SCEEH.SYS.H(TYPES):162 Identifier uid_t has already been defined on line 100 of “TCPIP.SEZACMAC(BSDTYPES)”.
ERROR CCN3334 CEE.SCEEH.H(NETDB):87 Identifier in_addr has already been defined on line 158 of “TCPIP.SEZACMAC(IN)”.


INFORMATIONAL CCN3409 TCPIP.SEZAINST(TCPS):133 The static variable “ibmcopyr” is defined but never referenced.

I tried many combinations of #define but could not get it to compile, unless I removed the #includes.

Compile problems I stumbled upon

Identifier dev_t has already been defined on line ...                                                     
Identifier gid_t has already been defined on line ...                                                     
Identifier uid_t has already been defined on line ....

This was caused by the wrong libraries in SYSLIB. I needed

  • CEE.SCEEH.H
  • CEE.SCEEH.SYS.H
  • TCPIP.SEZACMAC
  • TCPIP.SEZANMAC

The compile problems were caused by CEE.SCEEH.SYS.H being missing.

Execution problems

I had some strange execution problem when I tried to use AT-TLS within the program.

EDC5000I No error occurred. (errno2=0x05620062)

The errno2 reason from TSO BPXMTEXT 05620062 was

BPXFSOPN 04/27/18
JRNoFileNoCreatFlag: A service tried to open a nonexistent file without O_CREAT

Action: The open service request cannot be processed. Correct the name or the open flags and retry the operation.

Which seems very strange. I have a feeling that this field is not properly initialised and that this value can be ignored.

Running assembler control block chains in C

I needed to extract some information from z/OS in my C program. There is not a callable interface for the data, so I had to chain through z/OS control blocks.

Once you have an example to copy it is pretty easy – it just getting started which is the problem.

I have code (which starts with PSATOLD)

 #define PSA  540 
 char *TCB   = (char*)*(int*)(PSA); 
 char *TIO   = (char*)*(int*)(TCB + 12); 
 char *TIOE  = (char*)(TIO + 24) ; 
                                                            

  • At absolute address 540 (0x21C) is the address of the currently executing TCB.
  • (int *) (PSA) says treat this as an integer (4 byte) pointer.
  • * take the value of what this integer pointer points to. This is the address of the TCB
  • TCB + 12. Offset 12 (0x0c) in the TCB is the address of the Task I/O table (TCB IO)
  • (int *) says treat this as an integer ( 4 byte) pointer
  • * take the value of it to get to the TIOT
  • Offset 24 (0x18) the the location of the first TIO Entry in the control block

When I copied the code originally had char * (long * ) PSA. This worked fine on 31 bit programs but not on a 64 bit program as it uses 64 bit as an address – not 32 ! I had to use “int” to get it to work.

Another example, which prints the CPU TCB and SRB time used by each address space, is

// CVT Main anchor for many system wide control blocks
#define FLTCVT     16L
//  The first Address Space Control Block
#define CVTASCBH  564L
// the chain of ASCBs - next
#define ASCBFWDP    4L
//  offset to job info
#define ASCBEJST   64L
// the ASID of this address space
#define ASCBASID   36L


__int64 lTCB, lSRB; // could have used long long 
short ASID;  // 0x0000
char *plStor = (char*)FLTCVT;
char *plCVT  = (char*)*(int*)plStor;
char *plASCB = (char*)*(int*)(plCVT+CVTASCBH); // first ASCB
for( i=0; i<1000 & plASCB != NULL;
       i++, plASCB = (char*)*(int*)(plASCB+ASCBFWDP) )
{
  lTCB = *(__int64*)(plASCB+ASCBEJST) >> 12; // microseconds
  lSRB = *(__int64*)(plASCB+ASCBSRBT) >> 12; // microseconds
  ASID = *(short*)(plASCB+ASCBASID));
  printf("ASID=%4.4x TCB=%lld; SRB=%lld\n", ASID, lTCB, lSRB);
}

Creating a C external function for Python, an easier way to compile

I wrote about my first steps in creating a C extension in Python. Now I’ve got more experience, I’ve found an easier way of compiling the program and creating a load module. It is not the official way – but it works, and is easier to do!

The traditional way of building a package is to use the setup.py technique. I’ve found just compiling it works just as well (and is slighly faster). You still need the setup.py for building Python source.

I set up a cp4.sh file

name=zos
pythonSide='/usr/lpp/IBM/cyp/v3r8/pyz/lib/python3.8/config-3.8/libpython3.8.x'
export _C89_CCMODE=1
p1=" -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64"
p2="-Wc,DLL -D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS"
p2="-D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS"
p3="-D_OPEN_SYS_FILE_EXT -qstrict "
p4="-Wa,asa,goff -qgonumber -qenum=int"
p5="-I//'COLIN.MQ930.SCSQC370' -I. -I/u/tmp/zpymqi/env/include"
p6="-I/usr/lpp/IBM/cyp/v3r8/pyz/include/python3.8"
p7="-Wc,ASM,EXPMAC,SHOWINC,ASMLIB(//'SYS1.MACLIB'),NOINFO "
p8="-Wc,LIST(c.lst),SOURCE,NOWARN64,FLAG(W),XREF,AGG -Wa,LIST,RENT"
/bin/xlc $p1 $p2 $p3 $p4 $p5 $p6 $p7 $p8 -c $name.c -o $name.o -qexportall -qagg -qascii
l1="-Wl,LIST=ALL,MAP,XREF -q64"
l1="-Wl,LIST=ALL,MAP,DLL,XREF -q64"
/bin/xlc $name.o $pythonSide -o $name.so $l1 1>a 2>b
oedit a
oedit b

This shell script creates a zos.so load module in the current directory.

You need to copy the output load module (zos.so) to a directory on the PythonPath environment variable.

What do the parameters mean?

Many of the parameters I blindly copied from the setup.py script.

  • name=zos
    • This parametrizes the script, for example $name.c $name.o $name.so
  • pythonSide=’/usr/lpp/IBM/cyp/v3r8/pyz/lib/python3.8/config-3.8/libpython3.8.x’
    • This is where the python side deck, for resolving links the to functions in the Python code
  • export _C89_CCMODE=1
    • This is needed to prevent the message “FSUM3008 Specify a file with the correct suffix (.c, .i, .s,.o, .x, .p, .I, or .a), or a corresponding data set name, instead of -o./zos.so.”
  • p1=” -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64″
    • -O3 optimization level
    • -qarch=10 is the architectural level of the code to be produced.
    • –qlanglvl=extc99 says use the C extensions defined in level 99. (For example defining variables in the middle of a program, rather that only at the top)
    • -q64 says make this a 64 bit program
  • p2=”-D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS”
    • The C #defines to preset
  • p3=”-D_OPEN_SYS_FILE_EXT -qstrict ”
    • -qstrict Used to prevent optimizations from re-ordering instructions that could introduce rounding errors.
  • p4=”-Wa,asa,goff -qgonumber -qenum=int”
    • -Wa,asa,goff options for any assembler compiles (not used)
    • -qgonumber include C program line numbers in any dumps etc
    • -qenum=int use integer variables for enums
  • p5=”-I//’COLIN.MQ930.SCSQC370′ -I. -I/u/tmp/zpymqi/env/include”
    • Where to find #includes:
    • the MQ libraries,
    • the current working directory
    • the header files for my component
  • p6=”-I/usr/lpp/IBM/cyp/v3r8/pyz/include/python3.8″
    • Where to find #includes
  • p7=”-Wc,ASM,EXPMAC,SHOWINC,ASMLIB(//’SYS1.MACLIB’),NOINFO ”
    • Support the use of __ASM().. to use inline assembler code.
    • Expand macros to show what is generated
    • List the data from #includes
    • If using__ASM__(…) where to find assembler copy files and macros.
    • Do not report infomation messages
  • p8=”-Wc,LIST(c.lst),SOURCE,NOWARN64,FLAG(W),XREF,AGG -Wa,LIST,RENT”
    • For C compiles, produce a listing in c.lst,
    • include the C source
    • do not warn about problems with 64 bit/31 bit
    • display the cross references (where used)
    • display information about structures
    • For Assembler programs generate a list, and make it reentrant
  • /bin/xlc $p1 $p2 $p3 $p4 $p5 $p6 $p7 $p8 -c $name.c -o $name.o -qexportall
    • Compile $name.c into $name.o ( so zos.c into zos.o) export all entry points for DLL processing
  • L1=”-Wl,LIST=ALL,MAP,DLL,XREF -q64″
    • bind pararameters -Wl, produce a report,
    • show the map of the module
    • show the cross reference
    • it is a 64 bit object
  • /bin/xlc $name.o $pythonSide -o $name.so $L1 1>a 2>b
    • take the zos.o, the Python side deck and bind them into the zos.so
    • pass the parameters defined in L1
    • output the cross reference to a and errors to b
  • oedit a
    • This will have the map, cross reference and other output from the bind
  • oedit b
    • This will have any error messages – it should be empty

Notes:

  • -qarch=10 is the default
  • the -Wa are for when compiling assembler source eg xxxx.s
  • –qlanglvl=extc99. EXTENDED may be better than extc99.
  • it needs the -qascii to work with Python.

When is an error not an err?

If you step off the golden path of trying to read a file – you can quickly end up in in trouble and the diagnostics do not help.

I had some simple code

FILE * hFile = fopen(...); 
recSize = fread(pBuffer ,1,bSize,hFile); 
if (recSize == 0)
{
  // into the bog!
 if (feof(hFile))printf("end of file\n");
 else if (ferror(hFile)) printf("ferror(hFile) occurred\n");
 else printf("Cannot occur condition\n");
 
}

When running a unit test of the error path of passing a bad file handle, I got the “cannot occur condition because the ferror() returned “OK – no problem “

The ferror() description is

General description: Tests for an error in reading from or writing to the specified stream. If an error occurs, the error indicator for the stream remains set until you close the stream, call rewind(), or call clearerr().
If a non-valid parameter is given to an I/O function, z/OS XL C/C++ does not turn the error flag on. This case differs from one where parameters are not valid in context with one another.

This gave me 0, so it was not able to detect my error. ( So what is the point of ferror()?)

If I looked at errno and used perror() I got

errno 113
EDC5113I Bad file descriptor. (errno2=0xC0220001)

You may think that I need to ignore ferror() and check errno != 0 instead. Good guess, but it may not be that simple.

The __errno2 (or errnojr – errno junior)) description is

General description: The __errno2() function can be used when diagnosing application problems. This function enables z/OS XL C/C++ application programs to access additional diagnostic information, errno2 (errnojr), associated with errno. The errno2 may be set by the z/OS XL C/C++ runtime library, z/OS UNIX callable services or other callable services. The errno2 is intended for diagnostic display purposes only and it is not a programming interface. The __errno2() function is not portable.
Note: Not all functions set errno2 when errno is set. In the cases where errno2 is not set, the __errno2() function may return a residual value. You may use the __err2ad() function to clear errno2 to reduce the possibility of a residual value being

If you are going to use __errno2 you should clear it using __err2ad() before invoking a function that may set it.

I could not find if errno is clean or if it may return a residual value, so to be sure to set it before every use of a C run time library function.

Having got your errno value what do you do with it?

There are #define constants in errono.h such as

#define EIO 122 /* Input/output error */

You case use if ( errno == EIO ) …

Like many products there is no mapping of 122 to “EIO”, but you can use strerror(errno) to map the errno to the error string like EDC5113I Bad file descriptor. (errno2=0xC0220001). This also provides the errno2 string value.

Using ASCII stuff in a C program

This is another topic which looks simple of the surface but has hidden depths. (Where “hidden depths” is a euphemism for “looks buggy”). It started off as one page, but by the time I had discovered the unexpected behaviours, it became 10 times the size. The decision tree to say if text will be printed in ASCII or EBCDIC was three levels deep – but I give a solution.

Why do you want to use ASCII stuff in a C program?

Java, Python and Node.js run on z/OS, and they use ASCII for the character strings, so if you are writing JNI interfaces for Java, or C external functions for Python you are likely to use this.

Topics covered in the log post

The high level view

You can define ASCII character strings in C using:

char * pA = 0x41424c30 ;  // ABC0
// or 
#pragma convert(819)
char * pASCII = "CODEPAGE 819 data" ;
#pragma convert(pop)

And EBCDIC strings (for example when using ASCII compile option)

#pragma convert("IBM-1047") 
char * pEBCDIC = "CODEPAGE 1047  data" ; 
#pragma convert(pop) 

You can define that the whole program is “ASCII” by using the -Wc,ASCII or -qascii options at compile time. This also gives you printf and other functions.

You can use requests like fopen(“myfile”…) and the code successfully handles the file name in ASCII. Under the covers I expect the code does __a2e_l() to covert from ASCII to EBCDIC, then uses the EBCDIC version of fopen().

The executable code is essentially the same – the character strings are in ASCII, and the “Flags” data is different (to reflect the different compile options). At bind time different stubs are included.

To get my program compiled with the -qascii option, to successfully print to the OMVS terminal, piped using |, or redirect using 1>, I had to using the C run time function fcntl() to set the code page and tag of STDOUT and STDERR. See here below.

You need to set both STDERR and STDOUT – as perror() prints to STDERR, and you may need this if you have errors in the C run time functions.

Some of the hidden depths!

I had a simple “Hello World” program which I compiled in USS with the -qascii option. Depending on how I ran it, I got “Hello World” or “çÁ%%?-ï?Ê%À” (Hello World in ASCII).

  • straight “./prog”. The output in ASCII
  • pipe “./prog | cat”. If I used the environment variable _TAG_REDIR_OUT=”TXT” the output was in EBCDIC – otherwise it came out in ASCII.
  • redirect to a file “./prog 1> aaa”. Depending on the environment variable _BPXK_AUTOCVT=”ON”, and if the file existed or nor, and if the files existed and was tagged. The output could be in EBCDIC or ASCII!

So all in all – it looks a bit of a mess.

Background to outputting data

Initially I found printing of ASCII data was easy; then I found what I had written sometimes did not work, and after a week or so I had a clearer idea of what is involved, then I found a few areas which were even more complex. You may want to read this section once, read the rest of the blog post, then come back to this section.

A file can have

  • a tag – this is typically “this file is binary|text|unknown”
  • the code page of the data – or not specified.
  • you can use ls -T file.name to display the tag and code page of a file.

Knowing that information…

  • If a file has a tag=text and an ASCII code page,
    • if the _BPXK_AUTOCVT=”ON” environment flag is set it will display in EBCDIC (eg cat…)
    • else (_BPXK_AUTOCVT=”OFF”) it will display unreadable ASCII (eg cat…)
  • If a file has a tag=binary then converting to a different code page makes no sense. For example a .jpeg or load module. Converting a load module would change the instructions Store Half Word(x40) to a Load Positive (x20).
  • If a file is not tagged – it becomes a very fuzzy area. The output is dependant on the program running. An ASCII program would output as ASCII, and an EBCDIC would output as EBCDIC.
  • ISPF edit is smart enough to detect the file is ASCII – and enable ISPF edit command “Source ASCII”.

Other strands to the complexity

  • Terminal type
    • If you are logged on using rlogin, you can use chcp to change the tag or code page of your terminal.
    • If you are logged in through TSO – using OMVS, you cannot use chcp. I can’t find a command to set the tag or code page, but you can set it programmatically.
  • Redirection
    • You can print to the terminal or redirect the output to a file for example ./runHello 1>output.file .
    • The file can be created with the appropriate tag and code page.
    • You can use the environment variables _TAG_REDIR_OUT=TXT|BIN and _TAG_REDIR_ERR=TXT|BIN to specify what the redirected output will be.
    • If you use _TAG_REDIR_OUT=TXT, the output is in EBCDIC.
      • you can use ./prog | cat to take the output of prog and pipe it through cat to display the lines in EBCDIC.
      • you can use ./prog |grep ale etc. For me this displayed the EBCDIC line with Locale in it.
    • You can have the situation where if you print to the terminal it is in ASCII, if you redirect, it looks like EBCDIC!
  • The tag and ccsid are set on the first write to the file. If you write some ASCII data, then set the tag and code page, the results are unpredictable. You should set the tag and ccsid before you try to print anything.

What is printed?

A rough set of rules to determine what gets printed are

  • If the output file has a non zero code page, and _BPXK_AUTOCVT=”ON” is set, the output the data converted from the code page to EBCDIC.
  • If the output file has a zero code page, then use the code page from the program. A program complied with ASCII will have an ASCII code page, else it will default to 1047.

A program can set the tag, and code page using the fcntl() function.

Once the tag and code page for STDOUT and STDERR have been set, they remain set until reset. This leads to the confusing sequence of commands and output

  • run my ascii compiled “hello world” program – the data comes out as ASCII.
  • run Python3 –version which sets the STDOUT code page.
  • rerun my ascii compiled “hello world” program – the data comes out as EBCDIC! This is because the STDOUT code page was set by Python (and not reset).

I could not find a field to tell me which code page should be used for STDOUT and STDERR, so I suggest using 1047 unless you know any better.

Using fcntl() to display and set the code page of the output STDOUT and STDERR

To display the information for an open file you can use the fcntl() function. The stat() function also provide information on the open file. They may not be entirely consistent, when using pipe or redirect.

#include <fcntl.h> 
...
struct f_cnvrt f; 
f.cvtcmd = QUERYCVT; 
f.pccsid = 0; 
f.fccsid = 0; 
int action = F_CONTROL_CVT; 
rc =fcntl( STDOUT_FILENO,action, &f ); 
...
switch(f.cvtcmd) 
{ 
  case QUERYCVT : printf("QUERYCVT - Query"); break; 
  case SETCVTOFF: printf("SETCVTOFF -Set off"); break; 
  case SETCVTON : printf("SETCVTON -Set on unconditionally"); break; 
  case SETAUTOCVTON : printf("SETAUTOCVTON - Set on conditionally"); break; 
  default:  printf("cvtcmd %d",f.cvtcmd); 
} 
printf(" pccsid=%hd fccsid=%hd\n",f.pccsid,f.fccsid); 

For my program compiled in EBCDIC the output was

SETCVTOFF -Set off pccsid=1047 fccsid=0

This shows the program was compiled as EBCDIC (1047), and the STDOUT ccsid was not set.

When the ascii compiled version was used, the output, in ASCII was

SETCVTON -Set on unconditionally pccsid=819 fccsid=819

This says the program was ASCII (819) and the STDOUT code page was ASCII (819).

When I reran the ebcdic complied version, the output was

SETCVTOFF -Set off pccsid=1047 fccsid=0

Setting the code page

printf("before text\n");
f.cvtcmd = SETCVTON ; 
f.fccsid = 0x0417  ;  // code page 1047
f.pccsid = 0x0000  ;  // program 333 = ascii, 0 take default 
rc =fcntl( STDOUT_FILENO,action, &f ); 
if ( rc != 0) perror("fcntl"); 
printf("After fcntl\n"); 

The first time this ran as an ASCII program, the “before text” was not readable, but the “After fcntl” was readable. The second time it was readable. For example, the second time:

SETCVTON -Set on unconditionally pccsid=819 fccsid=1047
before text
After fcntl

You may want to put some logic in your program

  • use fcntl and f.cvtcmd = QUERYCVT for STDOUT and STDERR
  • if fccsid == 0 then set it to 1047, using fcntl and f.cvtcmd = SETCVTON

Set the file tag

This is needed to set the file tag for use when output is piped or redirected.

void settag(int fileNumber) 
{ 
  int rc; 
  int action; 
  struct file_tag ft; 
  memset(&ft,sizeof(ft),0); 
  ft.ft_ccsid = 0x0417; 
  ft.ft_txtflag = 1; 
  ft.ft_deferred = 0;  // if on then use the ccsid from the program! 
  action = F_SETTAG; 
  rc =fcntl( fileNumber,action, &ft); 
  if ( rc < 0) 
  { 
     perror("fctl f_settag"); 
     printf("F_SETTAG %i %d\n",rc,errno); 
  } 
}

What gets printed (without the fcntl() code)?

It depends on

  • if you are printing to the terminal – or redirecting the ouptut.
  • if the STDOUT fccsid is set to 1047
  • If _BPXK_AUTOCVT=”ON”

Single program case normal program

If you have a “normal” C program and some EBCDIC character data, when you print it, you get output like “Hello World”.

Single program case ASCII program or data – fccsid = 0

If you have a C program, compiled with the ASCII option, and print some ASCII data, when you print it you get output like ø/ËËÁÀ-À/È/-

Single program case ASCII program or data – fccsid = 1047

If you have a C program, compiled with the ASCII option, and print some ASCII data, when you print it you get output output like “Hello World”.

Program with both ASCII and EBCDIC data – compiled with ASCII

With a program with

#pragma convert("IBM-1047") 
 char * pEBCDIC ="CODEPAGE 1047  data"  ; 
#pragma convert(pop) 
printf("EBCDIC:%s\n",pEBCDIC); 

#pragma convert(819) 
char * pASCII = "CODEPAGE 819 data" ; 
#pragma convert(pop) 
printf("ASCII:%s\n",pASCII); 

When compiled without ascii it produces

EBCDIC:CODEPAGE 1047 data
ASCII:ä|àá& åá—–À/È/

When compiled with ascii it produces

EBCDIC:ÃÖÄÅ×ÁÇÅ@ñðô÷@@–£-
ASCII:CODEPAGE 819 data

Mixing programs compiled with/without the ASCII options

If you have a main program and function which have been compiled the same way – either both with ASCII compile option, or both without, and pass a string to the function to print, the output will be readable “Hello World”.

If the two programs have been compiled one with, and one without the ASCII options, an ASCII program will try to print the EBCDIC data, and an EBCDIC program will try to print the ASCII data.

If you know you are getting the “wrong” code page data, you can use the C run time functions __e2a_l or __a2e_l to convert the data.

Decoding/displaying the ASCII output

If you pipe the output to a file, for example ./cp2.so 1>a you can display the contents of the file converted from ASCII using

  • the “source ascii” command in ISPF edit, or
  • oedit . , and use the “ea” (edit ascii) or “/” line command

The “source ascii” works with SDSF output, with the SE line command.

Several times I had a file with EBCDIC data but the file had been tagged as ASCII. I used chtag -tc IBM-1047 file.name to reset it.

Under the covers

At compile time

There is a predefined macro __CHARSET_LIB which is is defined to 1 when the ASCII compiler option is in effect and to 0 when the ASCII compile option is not used.

There is C macro logic like defines which function to use

if (__CHARSET_LIB == 1) 
#pragma map (printf, "@@A00118")
else 
#pragma map (printf, "printf")

A program compiled with the ASCII option will use a different C run time module (@@A00118), compared with the non ASCII mode, which will use printf.

The #pragma map “renames” the function as input to the binder.

This is known as bimodal and is described in the documentation.

I want my program to be bimodal.

Bimodal is where your program can detect if it is running as ASCII or EBCDIC and make the appropriate decision. This can happen in you have code which is #included into the end user’s program.

This is documented here.

You can use

#define _AE_BIMODAL 1 
if (__isASCII()) // detect the run time
   __printf_a(format, string);  // use the ascii version
else
   __printf_e(format, string); // use the ebcdic version