There are many ways to fail to read a file in a C program.

The 10 minute task to read a file from disk using a C program took a week.

There are several options you can specify on the C fopen() function and it was hard to find the one that worked. I basically tried all combinations. Once I got it working, I tried on other files, and these failed, so my little task turned into a big piece of research.

Depending on the fopen options and the use of fgets or fread I got different data when getting from a dataset with two records in it!

AAAA – what I wanted.
AAAAx’15’ – with a hex 15 on the end.
AA
AAAAx’15’BBBBx’15’- both the records, with each record terminated in a line-end x’15’.
AAAABBBB – both records with no line ends.
x’00140000′ x’00080000’AAAAx’000080000’BBBB.
- This has a Block Descriptor Word (x’00140000′) saying the length of the block is x14.
- There is a Record Descriptor Word (RDW) (X’00080000′) of length x’0008′ including the 4 bytes for the RDW.
- There is the data for the first record AAAA,
- A second RDW
- The second data
- The size of this record is 4 for the BWD, 4 for the first RDW, 4 for the AAAA, 4 for the second RDW, and 4 for the data = 20 bytes = x14.
Nothing!

Sections in this blog post

There are different sorts of data
Different ways of getting the data
Introduction to fopen()
The file name
The options
How much data is read at a time?
Reading the data
You can use the fread() function
You can use the fgets function
Results from reading data sets and files
To read a dataset
Read a normal(EBCDIC) OMVS file
Read an ASCII file in OMVS
To read a binary file
Read a binary normal(EBCDIC) OMVS file
Read a binary ASCII file in OMVS
Reading data sets from a shell script
How big a buffer do I need?
How can I tell the size of the buffer I need?

There are different sorts of data

z/OS uses data sets which have records. Records can be blocked (for better disk utilisation) It is more efficient to write one block containing 40 * 80 bytes records than write 40 blocks each of 80 bytes. You can treat the data as one big block (of size 3200 bytes) – or as 40 blocks of 80 (or 2 blocks of 1600 …) . With good blocking you can get 10 times the amount of data onto a disk. (If you imagine each record on disk needs a 650 byte header you will see why blocking is good).
You can have “text” files, which contain printable characters. They often use the new line character (x’15’) to indicate end of line.
You can have binary files which should be treated as a blob. In these characters line x’15’ do not represent new line. For example it might just be part of a sequence x’13141516′.
Unix (and so OMVS) uses byte-addressable storage.
- “Normal files” have data in EBCDIC
- You can use enhanced ASCII. Where you can have files with data in ASCII (or other code page). The file metadata contains the type of data (binary or text) and the code page of the text data.
- With environment _BPXK_AUTOCVT you can have automatic conversion which converts a text file in ASCII to EBCDIC when it is read.

From this you can see that you need to use the correct way of accessing the data(records or byte addressable), and of using the data(text or binary).

Different ways of getting the data

You can use

the fread function which reads the data, according to the file open options (binary|text, blocked)
the fgets function which returns text data (often terminated by a line end character).

Introduction to fopen()

The fopen() function takes a file name and information on how the file should be opened and returns a handle. The handle can be used in an fread() function or to provide information about the file. The C runtime reference gives the syntax of fopen(), and there is a lot of information in the C Programming guide:

There is a lot of good information – but it didn’t help me open a file and read the contents.

The file name

The name can be, for example:

“DD:SYSIN” – a reference to a data set via JCL
“//’USER.PROCLIB(MYPROC)'” – a direct reference to a dataset
“/etc/profile” – an OMVS file.

If you are using the Unix shell you need to use “//’datasetname'” with both sets of quotes.

The options

This options string has one or more parameters.

The first parameter defines the operation, read, write, read+write etc. Any other parameters are in the format keyword=format.

The read operation can be

“rb” to read a binary file. In a binary file, the data is treated as a blob.
“rt” to read a text file. A text file can have auto conversion see FILETAG C run time option.
“r”

The type can be

type=record – read a record of data from the disk, and give the record to the application
type=blocked. It is more efficient in disk storage terms, to build up a record of 10 * 80 byte records into one 800 byte record (or even 3200 byte records). When a blocked record is read, the read request returns the first N bytes, then the next N bytes. When the end of the block is reached it will get retrieve the next block from disk.
not specified, which implies byte access, rather than record access, and use of fgets function.

An example fopen

FILE * f2 fopen(“//’COLIN.VB'”,”rt type=blocked”) ;

How much data is read at a time?

Depending on the type of data, and the open options a read request may return data

up to the end of the record
up to and including the new-line character
up to the size of the buffer you give it.

If you do not give it a big enough buffer you may have to issue several reads to get whole logical record.

The system may also “compress” data, for example remove trailing blanks from the end of a line. If you were expecting an 80 byte record – you might only get 20 bytes – so you may need to check this, and fill in the missing blanks if needed.

Reading the data

You can use the fread() function

The fread() parameters are:

the address of a buffer to hold the data
the unit of the buffer block size
the number of buffer blocks
the file handle.

It returns the number of completed buffer blocks used.

It is usually used

#define lBuffer 1024

char buffer[lBuffer];
size_t len = fread(buffer, 1 ,lBuffer ,fHandle );

This says the unit of the buffer is 1 character, and there are 1024 of them.

If the record returned was 80 bytes long, then len would be 80.

If it was written as

size_t len = fread(buffer, lBuffer, 1 ,fHandle );

This says the unit of buffer is 1024 – and there is one of them. The returned “length” is 0, as there were no full 1024 size buffers used.

It appears that a returned length of 0 means either end of file or a file error. You can tell if the read is trying to go past the end of the file (feof()) or was due to a file error (ferror()).

if (feof(hFile))…
else if (ferror(hFile))
{ int myerror = errno;
perror(“Colins error”);
…
}

You can use the fgets function.

The fgets() parameters are:

the address of a buffer to hold the data
the size of the buffer
the file handle.

This returns a null terminated string in the buffer. You can use strlen(buffer) to get the length of it. Any empty string would have length 0.

fgets()returns a pointer to the string, or NULL if there is an error. If there is an error you can use feof() or ferror() as above.

Results from reading data sets and files

I did many tests with different options, and configurations, and the results are displayed below.

The following are using in the tables:

Run.
- job. The program was run in JCL using BPXBATSL or via PGM=…
- unix shell – the program was run in OMVS shell.
BPXAUTO. This is used in OMVS shell. The environment variable _BPXK_AUTOCVT can have
- “ON” – do automatic conversion from ASCII to EBCDIC where application
- “OFF” do not do automatic conversion from ASCII to EBCDIC
open. The options used.
- r readrb read binaryrt read text
- “t=r” is shorthand for type=record
read. The C function used to read the data.
- fread – this is the common read function. It can read text files and binary files
- fgets – this is used to get text from a file. The records usually have the new-line character at the end of the data.
- aread is a C program which uses fread – but the program is compiled with the ASCII option.
data
- E is the EBCDIC data I was expecting. For example reading from a FB dataset member gave me an 80 byte record with the data in it.
- E15 is EBCDIC data, but trailing blanks have been removed. The last character is an EBDCIC line end (x15).
- A0A. The data is returned in ASCII, and has a trailing line end character (x0A). You can convert this data from ASCII to EBCDIC using the C run time function __a2e_l(buffer,len) .
- Abuffer – data up to the size of the buffer ( or size -1 for fgets) the data in ASCII.
- Ebuffer – data up to the size of the buffer ( or size -1 for fgets) the data in EBCDIC. This data may have newlines within the buffer

To read a dataset

This worked with a data set and inline data within a JOB.

Run	BPXAUTO	open	read	data
Job	NA	rb,t=r	fread	E
		r\|rt	fread	Ebuffer
		r	fgets	E15
		rt	fgets	E15
Unix shell	Any	rb,t=r	fread	E
		r\|rt	fread	Ebuffer
		r	fgets	E15
		rt	fgets	E15

Read a normal(EBCDIC) OMVS file

Run	BPXAUTO	open	read	data
Job	OFF	rb,t=r	fread	E
	off	rt	fgets	E15
	off	*	fread	Ebuffer
Unix shell	off	r	fgets	E15
	off	rt	fgets	E15
	off	rb	fgets	E15
	off	*	fread	E15
	on	rb,t=r	fread	E
	on	rb	fgets	E15

Read an ASCII file in OMVS

Run	BPXAUTO	open	read	data
Job	off	r	agets	A0A
	off	rb	agets	A0A
	off	rt	agets	A0A
Unix shell	on	r	fgets	E15
	on	rb	fgets	E15
	on	rt	fgets	E15
	on	*	fread	buffer
	off	r	agets	A0A
	off	rb	agets	A0A
	off	rt	agets	A0A
	off	*	fread	Abuffer

To read a binary file

To read a data set

Run	BPXAUTO	open	read	data
Job	off	rb	fread	E
Unix shell	on	rb	fread	E

Read a binary normal(EBCDIC) OMVS file

Run	BPXAUTO	open	read	data
Job	off	rb	fread	E
Unix shell	on	rb	fread	E
	off	rb	fread	E

Read a binary ASCII file in OMVS

If use list the attributes of a binary file, for example “ls -T ecrsa.p12” it gives

b binary T=off ecrsa1024.p12

Which shows it is a binary file

Run	BPXAUTO	open	read	data
Job	off	rb,t=r	fread	E
Unix shell	on	rb,=r	fread	E
	off	rb,t=r	fread	E

Reading data sets from a shell script

If you try to use fopen(“DD:xxxx”…) from a shell script you will get

FOPEN: EDC5129I No such file or directory. (errno2=0x05620062)

If you use fopen(“//’COLIN.VB'”…) and specify a fully qualified dataset name if will work.

fopen(“//VB”..) will put the RACF userid in front off they name. For example attempt to open “//’COLIN.VB.'”

How big a buffer do I need?

The the buffer is not large enough to get all of the data in the record, the buffer will be filled up to the size of the data. For example using fgets and a 20 byte buffer, 19 characters were returned. The 20th character was a null. The “new-line” character was present only when the end of the record was got.

How can I tell the size of the buffer I need?

There is data available which helps – but does not give the whole picture.

With the C fldata() — Retrieve file information function you can get information from the file handle such as

the name of the dataset (as provided at open time)
record format – fixed, variable
dataset organisation (dsorg) – Paritioned, Temporary, HFS
open mode – text, binary, record
device type – disk, printer, hfs, terminal
blksize
maximum record length

With fstat(), fstat64() — Get status information from a file handle and lstat(), lstat64() — Get status of file name or symbolic link you can get information about an OMVS file name (or file handle). This has the information available in the OMVS “ls” command. For example

file serial number
owner userid
group userid
size of the file
time last access
time last modified
time last file status changed
file tag
- ccsid, value or 0x000 for untagged, or 0xffff for binary
- pure text flag.

Example output

For data sets and files (along the top of the table)

VB a sequential data set Variable Blocked
FB a member of user.proclib which is Fixed Block
SYSIN inline data in a job
Loadlib a library containing load modules (binary file)
OMVS file for example zos.c
ASCII for example z.py which has been tagged as ASCII

Other values

V variable format data
F fixed format data
Blk it has blocked records
B “Indicates whether it was allocated with blocked records”. I do not know the difference between Blk and V
U Undefined format
PS It is a Physical Sequential
PO It is partitioned – it as members
PDSE it is a PDSE
HFS it is a file from OMVS

		VB	FB	SYSIN	loadib	OMVS file	ASCII
fldata	recfm	V Blk B	F Blk B	F Blk B	U	U	U
	dsorg	PS	PO PDSMem	PS	PO PDSMem PDSE	HFS	HFS
	device	disk	disk	other	disk	HFS	HFS
	blocksize	6144	610	80	6400	0	0
	maxreclen	1024	80	80	6400	1024	1024
stat	ccsid	0	0	0	0	0	819
	file size	0	0	0	0	4780	824

To read a record, you should use the maxreclen size. This may not be the size of the data record but it is the best guess.

It look like the maxreclen for Unix files is 1024 – I guess this is the page size on disk.

There are many ways to fail to read a file in a C program.

Sections in this blog post

There are different sorts of data

Different ways of getting the data

Introduction to fopen()

The file name

The options

An example fopen

How much data is read at a time?

Reading the data

You can use the fread() function

You can use the fgets function.

Results from reading data sets and files

To read a dataset

Read a normal(EBCDIC) OMVS file

Read an ASCII file in OMVS

To read a binary file

Read a binary normal(EBCDIC) OMVS file

Read a binary ASCII file in OMVS

Reading data sets from a shell script

How big a buffer do I need?

How can I tell the size of the buffer I need?

Example output

Published by Colin Paice

Leave a comment Cancel reply

Sections in this blog post

There are different sorts of data

Different ways of getting the data

Introduction to fopen()

The file name

The options

An example fopen

How much data is read at a time?

Reading the data

You can use the fread() function

You can use the fgets function.

Results from reading data sets and files

To read a dataset

Read a normal(EBCDIC) OMVS file

Read an ASCII file in OMVS

To read a binary file

Read a binary normal(EBCDIC) OMVS file

Read a binary ASCII file in OMVS

Reading data sets from a shell script

How big a buffer do I need?

How can I tell the size of the buffer I need?

Example output

Share this:

Related

Published by Colin Paice

Leave a comment Cancel reply