Using ASCII stuff in a C program

This is another topic which looks simple of the surface but has hidden depths. (Where “hidden depths” is a euphemism for “looks buggy”). It started off as one page, but by the time I had discovered the unexpected behaviours, it became 10 times the size. The decision tree to say if text will be printed in ASCII or EBCDIC was three levels deep – but I give a solution.

Why do you want to use ASCII stuff in a C program?

Java, Python and Node.js run on z/OS, and they use ASCII for the character strings, so if you are writing JNI interfaces for Java, or C external functions for Python you are likely to use this.

Topics covered in the log post

The high level view

You can define ASCII character strings in C using:

char * pA = 0x41424c30 ;  // ABC0
// or 
#pragma convert(819)
char * pASCII = "CODEPAGE 819 data" ;
#pragma convert(pop)

And EBCDIC strings (for example when using ASCII compile option)

#pragma convert("IBM-1047") 
char * pEBCDIC = "CODEPAGE 1047  data" ; 
#pragma convert(pop) 

You can define that the whole program is “ASCII” by using the -Wc,ASCII or -qascii options at compile time. This also gives you printf and other functions.

You can use requests like fopen(“myfile”…) and the code successfully handles the file name in ASCII. Under the covers I expect the code does __a2e_l() to covert from ASCII to EBCDIC, then uses the EBCDIC version of fopen().

The executable code is essentially the same – the character strings are in ASCII, and the “Flags” data is different (to reflect the different compile options). At bind time different stubs are included.

To get my program compiled with the -qascii option, to successfully print to the OMVS terminal, piped using |, or redirect using 1>, I had to using the C run time function fcntl() to set the code page and tag of STDOUT and STDERR. See here below.

You need to set both STDERR and STDOUT – as perror() prints to STDERR, and you may need this if you have errors in the C run time functions.

Some of the hidden depths!

I had a simple “Hello World” program which I compiled in USS with the -qascii option. Depending on how I ran it, I got “Hello World” or “çÁ%%?-ï?Ê%À” (Hello World in ASCII).

  • straight “./prog”. The output in ASCII
  • pipe “./prog | cat”. If I used the environment variable _TAG_REDIR_OUT=”TXT” the output was in EBCDIC – otherwise it came out in ASCII.
  • redirect to a file “./prog 1> aaa”. Depending on the environment variable _BPXK_AUTOCVT=”ON”, and if the file existed or nor, and if the files existed and was tagged. The output could be in EBCDIC or ASCII!

So all in all – it looks a bit of a mess.

Background to outputting data

Initially I found printing of ASCII data was easy; then I found what I had written sometimes did not work, and after a week or so I had a clearer idea of what is involved, then I found a few areas which were even more complex. You may want to read this section once, read the rest of the blog post, then come back to this section.

A file can have

  • a tag – this is typically “this file is binary|text|unknown”
  • the code page of the data – or not specified.
  • you can use ls -T file.name to display the tag and code page of a file.

Knowing that information…

  • If a file has a tag=text and an ASCII code page,
    • if the _BPXK_AUTOCVT=”ON” environment flag is set it will display in EBCDIC (eg cat…)
    • else (_BPXK_AUTOCVT=”OFF”) it will display unreadable ASCII (eg cat…)
  • If a file has a tag=binary then converting to a different code page makes no sense. For example a .jpeg or load module. Converting a load module would change the instructions Store Half Word(x40) to a Load Positive (x20).
  • If a file is not tagged – it becomes a very fuzzy area. The output is dependant on the program running. An ASCII program would output as ASCII, and an EBCDIC would output as EBCDIC.
  • ISPF edit is smart enough to detect the file is ASCII – and enable ISPF edit command “Source ASCII”.

Other strands to the complexity

  • Terminal type
    • If you are logged on using rlogin, you can use chcp to change the tag or code page of your terminal.
    • If you are logged in through TSO – using OMVS, you cannot use chcp. I can’t find a command to set the tag or code page, but you can set it programmatically.
  • Redirection
    • You can print to the terminal or redirect the output to a file for example ./runHello 1>output.file .
    • The file can be created with the appropriate tag and code page.
    • You can use the environment variables _TAG_REDIR_OUT=TXT|BIN and _TAG_REDIR_ERR=TXT|BIN to specify what the redirected output will be.
    • If you use _TAG_REDIR_OUT=TXT, the output is in EBCDIC.
      • you can use ./prog | cat to take the output of prog and pipe it through cat to display the lines in EBCDIC.
      • you can use ./prog |grep ale etc. For me this displayed the EBCDIC line with Locale in it.
    • You can have the situation where if you print to the terminal it is in ASCII, if you redirect, it looks like EBCDIC!
  • The tag and ccsid are set on the first write to the file. If you write some ASCII data, then set the tag and code page, the results are unpredictable. You should set the tag and ccsid before you try to print anything.

What is printed?

A rough set of rules to determine what gets printed are

  • If the output file has a non zero code page, and _BPXK_AUTOCVT=”ON” is set, the output the data converted from the code page to EBCDIC.
  • If the output file has a zero code page, then use the code page from the program. A program complied with ASCII will have an ASCII code page, else it will default to 1047.

A program can set the tag, and code page using the fcntl() function.

Once the tag and code page for STDOUT and STDERR have been set, they remain set until reset. This leads to the confusing sequence of commands and output

  • run my ascii compiled “hello world” program – the data comes out as ASCII.
  • run Python3 –version which sets the STDOUT code page.
  • rerun my ascii compiled “hello world” program – the data comes out as EBCDIC! This is because the STDOUT code page was set by Python (and not reset).

I could not find a field to tell me which code page should be used for STDOUT and STDERR, so I suggest using 1047 unless you know any better.

Using fcntl() to display and set the code page of the output STDOUT and STDERR

To display the information for an open file you can use the fcntl() function. The stat() function also provide information on the open file. They may not be entirely consistent, when using pipe or redirect.

#include <fcntl.h> 
...
struct f_cnvrt f; 
f.cvtcmd = QUERYCVT; 
f.pccsid = 0; 
f.fccsid = 0; 
int action = F_CONTROL_CVT; 
rc =fcntl( STDOUT_FILENO,action, &f ); 
...
switch(f.cvtcmd) 
{ 
  case QUERYCVT : printf("QUERYCVT - Query"); break; 
  case SETCVTOFF: printf("SETCVTOFF -Set off"); break; 
  case SETCVTON : printf("SETCVTON -Set on unconditionally"); break; 
  case SETAUTOCVTON : printf("SETAUTOCVTON - Set on conditionally"); break; 
  default:  printf("cvtcmd %d",f.cvtcmd); 
} 
printf(" pccsid=%hd fccsid=%hd\n",f.pccsid,f.fccsid); 

For my program compiled in EBCDIC the output was

SETCVTOFF -Set off pccsid=1047 fccsid=0

This shows the program was compiled as EBCDIC (1047), and the STDOUT ccsid was not set.

When the ascii compiled version was used, the output, in ASCII was

SETCVTON -Set on unconditionally pccsid=819 fccsid=819

This says the program was ASCII (819) and the STDOUT code page was ASCII (819).

When I reran the ebcdic complied version, the output was

SETCVTOFF -Set off pccsid=1047 fccsid=0

Setting the code page

printf("before text\n");
f.cvtcmd = SETCVTON ; 
f.fccsid = 0x0417  ;  // code page 1047
f.pccsid = 0x0000  ;  // program 333 = ascii, 0 take default 
rc =fcntl( STDOUT_FILENO,action, &f ); 
if ( rc != 0) perror("fcntl"); 
printf("After fcntl\n"); 

The first time this ran as an ASCII program, the “before text” was not readable, but the “After fcntl” was readable. The second time it was readable. For example, the second time:

SETCVTON -Set on unconditionally pccsid=819 fccsid=1047
before text
After fcntl

You may want to put some logic in your program

  • use fcntl and f.cvtcmd = QUERYCVT for STDOUT and STDERR
  • if fccsid == 0 then set it to 1047, using fcntl and f.cvtcmd = SETCVTON

Set the file tag

This is needed to set the file tag for use when output is piped or redirected.

void settag(int fileNumber) 
{ 
  int rc; 
  int action; 
  struct file_tag ft; 
  memset(&ft,sizeof(ft),0); 
  ft.ft_ccsid = 0x0417; 
  ft.ft_txtflag = 1; 
  ft.ft_deferred = 0;  // if on then use the ccsid from the program! 
  action = F_SETTAG; 
  rc =fcntl( fileNumber,action, &ft); 
  if ( rc < 0) 
  { 
     perror("fctl f_settag"); 
     printf("F_SETTAG %i %d\n",rc,errno); 
  } 
}

What gets printed (without the fcntl() code)?

It depends on

  • if you are printing to the terminal – or redirecting the ouptut.
  • if the STDOUT fccsid is set to 1047
  • If _BPXK_AUTOCVT=”ON”

Single program case normal program

If you have a “normal” C program and some EBCDIC character data, when you print it, you get output like “Hello World”.

Single program case ASCII program or data – fccsid = 0

If you have a C program, compiled with the ASCII option, and print some ASCII data, when you print it you get output like ø/ËËÁÀ-À/È/-

Single program case ASCII program or data – fccsid = 1047

If you have a C program, compiled with the ASCII option, and print some ASCII data, when you print it you get output output like “Hello World”.

Program with both ASCII and EBCDIC data – compiled with ASCII

With a program with

#pragma convert("IBM-1047") 
 char * pEBCDIC ="CODEPAGE 1047  data"  ; 
#pragma convert(pop) 
printf("EBCDIC:%s\n",pEBCDIC); 

#pragma convert(819) 
char * pASCII = "CODEPAGE 819 data" ; 
#pragma convert(pop) 
printf("ASCII:%s\n",pASCII); 

When compiled without ascii it produces

EBCDIC:CODEPAGE 1047 data
ASCII:ä|àá& åá—–À/È/

When compiled with ascii it produces

EBCDIC:ÃÖÄÅ×ÁÇÅ@ñðô÷@@–£-
ASCII:CODEPAGE 819 data

Mixing programs compiled with/without the ASCII options

If you have a main program and function which have been compiled the same way – either both with ASCII compile option, or both without, and pass a string to the function to print, the output will be readable “Hello World”.

If the two programs have been compiled one with, and one without the ASCII options, an ASCII program will try to print the EBCDIC data, and an EBCDIC program will try to print the ASCII data.

If you know you are getting the “wrong” code page data, you can use the C run time functions __e2a_l or __a2e_l to convert the data.

Decoding/displaying the ASCII output

If you pipe the output to a file, for example ./cp2.so 1>a you can display the contents of the file converted from ASCII using

  • the “source ascii” command in ISPF edit, or
  • oedit . , and use the “ea” (edit ascii) or “/” line command

The “source ascii” works with SDSF output, with the SE line command.

Several times I had a file with EBCDIC data but the file had been tagged as ASCII. I used chtag -tc IBM-1047 file.name to reset it.

Under the covers

At compile time

There is a predefined macro __CHARSET_LIB which is is defined to 1 when the ASCII compiler option is in effect and to 0 when the ASCII compile option is not used.

There is C macro logic like defines which function to use

if (__CHARSET_LIB == 1) 
#pragma map (printf, "@@A00118")
else 
#pragma map (printf, "printf")

A program compiled with the ASCII option will use a different C run time module (@@A00118), compared with the non ASCII mode, which will use printf.

The #pragma map “renames” the function as input to the binder.

This is known as bimodal and is described in the documentation.

I want my program to be bimodal.

Bimodal is where your program can detect if it is running as ASCII or EBCDIC and make the appropriate decision. This can happen in you have code which is #included into the end user’s program.

This is documented here.

You can use

#define _AE_BIMODAL 1 
if (__isASCII()) // detect the run time
   __printf_a(format, string);  // use the ascii version
else
   __printf_e(format, string); // use the ebcdic version

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s