Getting table data out of html – successfully

A couple of times I’ve wanted to get information from documentation into my program for example, from

I want to extract

“cics:operator_class” : "set","add","remove","delete"
“cics:operator_classes”: N/A

then extract those which have N/A (or those with “set” etc).

Background

From the picture you can see the HTML table is not simple, it has a coloured background, some text is in one font, and other text is in a different font.

The table is not like

<table>
<tr><td>"cics:operator_classes"</td>...<td>N/A</td></tr>
</table>

and so relatively easy to parse.

It will be more like one long string containing

<td headers="ubase__tablebasesegment__entry__39 ubase__tablebasesegment__entry__2 " 
  class="tdleft">
&nbsp;
</td>
<td headers="ubase__tablebasesegment__entry__39 ubase__tablebasesegment__entry__3 " 
class="tdleft">
'Y'
</td>

Where &nbsp. is a non blank space.

Getting the source

Some browsers allow you do save the source of a page, and some do not.
I use Chrome to display and save the page.

You can use Python facilities to capture a web page.

My first attempt with Python

For me, the obvious approach was to use Python to process it. Unfortunately it complained about some of the HTML, so I spent some time using Linux utilities to remove the HTML causing problems. This got more and more complex, so I gave up. See Getting table data out of html – unsuccessfully.

Using Python again

I found Python has different parsers for HTML (and XML), and there was a better one than the one I had been using. The BeautifulSoup parser handled the complex HTML with no problems.

My entire program was (it is very short!)

from lxml import etree
from bs4 import BeautifulSoup

utf8_parser = etree.XMLParser(encoding='utf-8',recover=True)

# read the data from the file
file="/home/colin/Downloads/Dataset SEAR.html"
with open(file,"r") as myfile:
    data=myfile.read()

soup = BeautifulSoup(data,  'html.parser')
#nonBreakSpace = u'\xa0'
tables = soup.find_all(['table'])
for table in tables:
    tr = table.find_all("tr")
    for t in tr:
        line = list(t)
        if len(line) == 11:            
            print(line[1].get_text().strip(),line[7].get_text().strip())
        else: 
            print("len:",len(line),line)
quit()

This does the following

file =… with open… data =… reads the data from a file. You could always use a URL and read directly from the internet.
tables = soup.find_all([‘table’]) extract the data within the specified tags. That is all the data between <table…>…</table> tags.
for table in tables: for each table in turn (it is lucky we do not have nested tables)
tr = table.find_all(“tr”) extract all the rows within the current table.
for t in tr: for each row
line = list(t) return all of the fields as a list

the variable line has fields like

' ', 
<td><code class="language-plaintext highlighter-rouge">"tme:roles"</code></td>,
' ', 
<td><code class="language-plaintext highlighter-rouge">roles</code></td>, 
' ', 
<td><code class="language-plaintext highlighter-rouge">string</code></td>, 
' ', 
<td>N/A</td>, 
' ', 
<td><code class="language-plaintext highlighter-rouge">"extract"</code></td>, 
' '

print(line[1].get_text().strip(),… takes the second line, and extracts the value from it ignoring any tags (“tme:roles”) and removes any leading or trailing blanks and prints it.
print(…line[7].get_text().strip()) takes the line, extracts the value (N/A), removes any leading or trailing blanks, and prints it.

This produced a list like

“base:global_auditing” N/A
“base:security_label” “set””delete”
“base:security_level” “set””delete

I was only interested in those with N/A, so I used

python3 ccpsear.py |grep N/A | sed 's.N/A.,.g '> mylist.py

which selected those with N/A, changed N/A to “,” and created a file mylist.py

Note:Some tables have non blank space in tables to represent and empty cell. These sometimes caused problems, so I had code to handle this.

nonBreakSpace = u'\xa0'
for each field:
  if value == " ":
    continue
  if value ==  nonBreakSpace:
    continue

Which certificates does Python install (PIP) use on z/OS?

Using the Python pip install … command I was getting error message

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1019)
...
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1019)

On Discord, someone said the ca-certificates package seems to be missing on z/OS, I give a possible solution below.

How I fixed it see Upload the certificate from Linux.

I used Wireshark to monitor the web sites being used, and z/OS could not validate the certificate sent to it. It had a signing chain of 3 Certificate Authorities.
I tried capturing the certificates using openssl s_client…, but they didn’t work.

There is a pip option –trusted-host github.com –trusted-host 20.26.156.215 which says ignore the certificate validation from the specified sites. This did not work.

The pip command worked on Linux, so it was clearly a problem with certificates on z/OS.

I had zopen installed on my z/OS, and could issue the command

openssl s_client -connect github.com:443

This gave

subject=CN=github.com                                                                                                        
issuer=C=GB, ST=Greater Manchester, L=Salford, O=..., CN=...Secure Server CA          
---                                                                                                                          
No client certificate CA names sent                                                                                          
Peer signing digest: SHA256                                                                                                  
Peer signature type: ecdsa_secp256r1_sha256                                                                                  
Peer Temp Key: X25519, 253 bits                                                                                              
---                                                                                                                          
SSL handshake has read 3480 bytes and written 1605 bytes                                                                     
Verification error: unable to get local issuer certificate

This was much quicker than trying to wait for Pip to process the request.

Where does Python expect the certificates to be?

I executed a small Python program to display the paths used

COLIN:/u/colin: >python
Python 3.12.3 ....on zos
Type "help", "copyright", "credits" or "license" for more information.

import _ssl
print(_ssl.get_default_verify_paths())
quit()

This gave

(‘SSL_CERT_FILE’, ‘/usr/local/ssl/cert.pem’, ‘SSL_CERT_DIR’, ‘/usr/local/ssl/certs’)

This was unexpected because I have openssl certificates in /usr/ssl/certs.

Upload the certificate from Linux

The Linux command

sudo apt reinstall ca-certificates

downloads the latest ca certificates into /etc/ssl/certs/ca-certificates.crt

I uploaded this to z/OS into /usr/local/ssl/cert.pem, for the Python code.

echo “aa” | openssl s_client -connect github.com:443 -verifyCAfile /etc/ssl/certs/ca-certificates.crt

worked. The certificate was verified.

I also uploaded it to /etc/ssl/certs/ca-certificates.crt for Python.

The openssl documentation

The openssl documentation discusses the location of the certificate store. The environment variable OPENSSLDIR locates where the certificate is stored, and how to download trusted certificates in a single file. Specifying OPENSSLDIR did not help.

Zowe: Getting data from Zowe

As part of an effort to trace the https traffic from Zowe, I found there are trace points you can enable.

You can get a list of these from a request like “https://10.1.1.2:7558/application/loggers”. In the browser it returns one long string like (my formatting)

{"levels":["OFF","ERROR","WARN","INFO","DEBUG","TRACE"],
"loggers":{"ROOT":{"configuredLevel":"INFO","effectiveLevel":"INFO"},
"_org":{"configuredLevel":null,"effectiveLevel":"INFO"},
"_org.springframework":{"configuredLevel":null,"effectiveLevel":"INFO"},
"_org.springframework.web":{"configuredLevel":null,"effectiveLevel":"INFO"},
...

Once you know the trace point, you can change it. See here.

Using https module

certs="--cert colinpaice.pem --cert-key colinpaice.key.pem"
verify="--verify no"
url="https://10.1.1.2:7558/application/loggers"
https GET ${url}  $certs  $verify

This displayed the data, nicely formatted. But if you pipe it, the next stage receives one long character string.

Using Python

#!/usr/bin/env python3

import ssl
import json
import sys
from http.client import HTTPConnection 
import requests
import urllib3
# trace the traffic flow
HTTPConnection.debuglevel = 1

my_header = {  'Accept' : 'application/json' }

urllib3.disable_warnings()
context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)

certificate="colinpaice.pem"
key="colinpaice.key.pem"
cpcert=(certificate,key)
jar = requests.cookies.RequestsCookieJar()

s = requests.Session()
geturl="https://10.1.1.2:7558/application/loggers"

res = s.get(geturl,headers=my_header,cookies=jar,cert=cpcert,verify=False)

if res.status_code != 200:
    print("error code",res.status_code)
    sys.exit(8)

headers = res.headers

for h in headers:
    print(h,headers[h])

cookies = res.cookies.get_dict()
for c in cookies:
    print("cookie",c,cookies[c])

js = json.loads(res.text)
print("type",js.keys())
print(js['levels'])
print(js['groups'])
loggers = js['loggers']
for ll in loggers:
    print(ll,loggers[ll])

This prints out one line per item.

The command

python3  zloggers.py |grep HTTP

gives

...
org.apache.http {'configuredLevel': 'DEBUG', 'effectiveLevel': 'DEBUG'}
org.apache.http.conn {'configuredLevel': None, 'effectiveLevel': 'DEBUG'}
org.apache.http.conn.ssl {'configuredLevel': None, 'effectiveLevel': 'DEBUG'}
...

Python how do I convert a STCK to readable time stamp?

As part of writing a GTF trace formatter in Python I needed to covert a STCK value to a printable value. I could do it in C – but I did not find a Python equivalent.

from datetime import datetime
# Pass in a 8 bytes value
def stck(value):
    value = int.from_bytes(value)
    t = value/4096 #  remove the bottom 12 bits to get value in micro seconds
    tsm = (t /1000000 ) -  2208988800 # // number of seconds from Jan 1 1970 as float
    ts = datetime.fromtimestamp(tsm) # create the timestamp
    print("TS",tsm,ts.isoformat()) # format it

it produced

TS 1735804391.575975 2025-01-02T07:53:11.575975

Python could not read a data set I sent from z/OS USS.

I created a file in Unix System Services, and FTPed it down to my Linux box. I could edit it, and process it with no problems, until I came to read in the file using Python.

Python gave me

File “<frozen codecs>”, line 322, in decode
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xb8 in position 3996: invalid start byte

The Linux command file pagentn.txt gave me

pagentn.txt: ISO-8859 text

whereas other files had ASCII text.

I changed my Python program to have

with open(“/home/colinpaice/python/pagentn.txt”,encoding=”ISO-8859-1″) as file:

and it worked!

I browsed the web, and found a Python way of finding the code page of a file

import chardet    
rawdata = open(infile, 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']

it returned a dict with

result {‘encoding’: ‘ISO-8859-1’, ‘confidence’: 0.73, ‘language’: ”}

Creating a spread sheet from Python to show time spent

I’ve been using xlsxwriter from Python to create data graphs of data, and it is great.

I had problems with adding more data to a graph, and found some aspects of this was not documented. This blog post is to help other people who are trying to do something similar.

I wanted to produce a graph like

The pink colour would normally be transparent. I have coloured it to make the explanation easier.

This shows the OMS team worked from 0900 to 1600 on Monday, and SDC worked from 10am to 12am and for a few minutes on the same day around 1700. I wanted the data to be colour coded, so OMS was brown and SDC was green.

Create the workbook, and the basic chart

spreadSheet = "SPName"
workbook = xlsxwriter.Workbook(spreadSheet+"xlsx")
workbook.set_calc_mode("auto")
summary = workbook.add_worksheet("data")
hhmm  = workbook.add_format({'num_format': 'hh:mm'})
# now the basic chart
chart = self.workbook.add_chart({'type': 'bar','subtype':'stacked'})
        chart.set_title ({'name': "end week:time spent"})
        chart.set_y_axis({'reverse': True})
        chart.set_legend({'none': True})        
        chart.set_x_axis({
                'time_axis':  True,
                'num_format': 'hh:mm',
                'min': 0, 
                'max': 1.0,
                'major_unit': 1/12,
                'minor_tick_mark': 'inside'
                })
        
        chart.set_size({'width': 1070, 'height': 300})
summary.insert_chart('A1', chart)

Data layout

It took a while to understand how the data should be laid out in the table

	A	B	C	D	E	F	G
1		OStart1	ODur 1	SStart 1	SDur1	SInt2	SDur2
2	OMS Tue	9.0	7.0
3	OMS Wed	17.0	2.0
4	SDC Wed			10.0	2.0	7.0	0.1

Where the data

OSTart1 is the time based on hours for OMS
ODur1 is the duration for OMS, so on Wed, the time was from 1700 to 1900, and interval of 2.0 hours
SStart1 is the start time of the SDC Wed item
SDur1 is the duration of the work. The work was from 1000 to 1200
SInt2 is the interval from the end of the work to the start of the next work. It is not the start time. It is 1000 + interval of 2 hours + interval of 7 hours, or 1900
SDur2 is the duration from the start of the work. It ends at 1000+ 2 hours + 7 hours + 0.1 hours

Define the data to the chart

To get the data displayed properly I used add_series to define the data.

Categories (The labels OMS Tue, OMS Wed, SDC Wed): You have to specify the same categories for all of your data. For me, range A2:A5. Using add_series for the OMS data, and add_series for the SDC data did not display the SDC data labels. This was the key to my problems.

You define the data as columns. The first column is the time from midnight. I have coloured it pink to show you. Normally this would be fill = [{‘none’ : true } ] You use

fill = [{'color': "pink"}] # fill = [{'none': True}]
chart.add_series({
                 'name': "Series1, 
                 'categories': ["Hours",1,0,4,0],        
                 'values':     ["Hours",1,1,4,1],
                 'fill':  fill 
                })

This specifies categories row 1, column 0 to row 4, column 0, and the column of data row 1, column 1, to row 4, column 1. (Column 0 is column A etc.)

For the second column – the brown, you use

fill = [{'color': "brown"}]
chart.add_series({
                 'name': "Series2, 
                 'categories': ["Hours",1,0,4,0],        
                 'values':     ["Hours",1,2,4,2],
                 'fill':  fill 
                })

The categories stays the same, the superset of names.

The “values” specifies the column of data row 1, column 2, to row 4, column 2.

Because the data for SDC is missing – this is not displayed.

For the SDC data I used 4 add_series request. The first one

name:Series3
‘categories’: [“Hours”,1,0,4,0], the same as for OMS
values: row 1,column 3 to row 4 column 3

I then repeated this for columns (and Series) 4,5,6

This gave me the output I wanted.

I used Python lists and used loops to generate the data, so overall the code was fairly compact. The overall result was

CEE3501S The module libpython3.8.so was not found.

Running some Python programs on z/OS I got the above error when using Python 11.

If seems that when the C code was compiled, an option (which I cannot find documented) says make it downward compatible.

The fix is easy…

Mount the Python 11 file system r/w (this is a one of)
cd /u/ibmuser/python/v3r11/lib or what every library you are using
ln -s libpython3.11.so libpython3.8.so
This says… if you are looking for libpython3.8.so … go and use libpython3.11.so.

The command ls -ltr /u/ibmuser/python/v3r11/lib/libpython* gave

-rwxr-xr-x ... Jul 15 12:09 /u/ibmuser/python/v3r11/lib/libpython3.11.so                     
lrwxrwxrwx ... Sep 6 12:11  /u/ibmuser/python/v3r11/lib/libpython3.8.so -> libpython3.11.so

Automation production of a series of charts in Excel format is easy with a bit of Python

We use a building, and have a .csv files of the power used every half hour for the last three months. We wanted to produce charts of the showing usage, for every Monday, and for every week throughout the year. Creating charts in a spreadsheet,manually creating a chart, and adding data to the series, soon got very boring. It was much more interesting to automate this. This blog post describes how I used Python and xlsxWwriter to create an Excel format spread sheet – all this from Linux.

Required output

Because our building is used by different groups during the week, I wanted to have

a chart for “Monday” for one group of users, “Tuesday” for another group of users, etc. This would allow me to see the typical profile, and make sure the calculated usage was sensible.
A chart on a week by week basis. So a sheet and chart for each week.
Automate this so, I just run a script to get the spread sheet and all of the graphs.

From these profiles we could see that from 0700 to 0900 every day there was a usage hump – a timer was turning on the outside lights, even though no one used the building before 1000!

Summary of the code

Read a CSV file in to memory
Create the workbook and add a sheet
Create a chart template
Create a chart for every day of the week
Build up the first row of data labels as a header row
Convert a string to a data time
Write each row
Create a sheet for each week
Add data range to each chart
Writing a formula
Save, clean up and end

Reading the csv file

I used

import csv
fn = "HF.csv"
with open(fn, newline='') as csvfile:
    reader = csv.DictReader(csvfile)
   for row in reader:
      # get the column lables
      keys = row.keys()
...

Create the workbook and add a sheet

This opens the specified file chart_scatter.xlsx, for output, and overwrites any previous data.

import xlsxwriter
...
workbook = xlsxwriter.Workbook('chart_scatter.xlsx')
data= workbook.add_worksheet("Data")

Create a chart template

I used a Python function to create a standard chart with common configuration, so all charts had the same scale, and look and feel.

def default_chart(workbook,title):
   chart1 = workbook.add_chart({'type': 'scatter'})
   # Add a chart title and some axis labels.
   chart1.set_title ({'name': title})
   chart1.set_x_axis({
          'time_axis':  True,
          'num_format': 'hh:mm',
          'min': 0, 
          'max': 1.0,
          'major_unit':1/12., # 2 hours
          'minor_unit':1.0/24.0, # every hour
          'major_gridlines': {
            'visible': True,
            'line': {'width': 1.25, 'dash_type': 'long_dash'},
             },
          'minor_tick_mark': 'inside'
          })
   chart1.set_y_axis({
          'time_axis':  True,
          'min': 0, 
          'max': 7.0, # so they all have the same max value
          'major_unit':1,
          'minor_unit':0,
          'minor_tick_mark': 'inside'
          })
         #chart1.set_y_axis({'name': 'Sample length (mm)'})
   chart1.set_style(11)  # I do not know what this does
   chart1.set_size({'width': 1000, 'height': 700})
   return chart1

Create a chart for every day of the week

This creates a sheet (tab) for each day of the week, creates a chart, and attaches the chart to the sheet.

days=['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
days_chart = []
for day in days:
      c=default_chart(day) # create chart
      days_chart.append(c)     # build up list of days
      # add a sheet with name of the day of the week 
      wb =workbook.add_worksheet(day) # create a sheet with name 
      wb.insert_chart('A1',c)  # add chart to sheet

Build up the first row of data labels as a header row

This processes the CSV file opened above and writes each key to the first row of the table.

In my program I had some logic to change the headers from the csv column name to a more meaningful value.

fn = "HF.csv"
with open(fn, newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    # read the header row from the csv  
    row  = next(reader, None)
    count = LC.headingRow(workbook,data,summary,row)
    keys = list(row.keys())
    for i,j in enumerate(keys):
       #  'i' is is the position
       # 'j' is the value
       heading = j 
       # optional logic to change heading 
       # write data in row 0, column i
       data.write_string(0,i,heading) # first row an column of the data
    # add my own columns header
    data.write_string(0,count+1,"Daily total")

Convert a string to a data time

d = row['Reading Date'] # 01/10/2022
dd,mm,yy  = d.split('/')
dt = datetime.fromisoformat(yy+'-'+mm+'-'+dd)
weekday = dt.weekday()	
# make a nice printable value
dow =days[weekday] + ' ' + dd + ' ' + mm + ' ' + yy
row['Reading Date'] = datetime.strptime(d,'%d/%m/%Y')

Write each row

This takes the data items in the CSV file and writes them a cell at a time to the spread sheet row.

I have some cells which are numbers, some which are strings, and one which is a date time. I have omitted the code to convert a string to a date time value

ddmmyy  = workbook.add_format({'num_format': 'dd/mm/yy'})
for row in reader:
    keys = row.keys()
    items = list(row.items())  
    for i,j  in enumerate(items):  # ith and (key,value)
       j =j[1] # get the value
       # depending on data type - used appropriate write method
       if isinstance(j,datetime):
          data.write_datetime(r,i, j,ddmmyy)
       else:
       if j[0].isdigit():  
           dec = Decimal(j)
           data.write_number(r,i,dec) 
           sum = sum + dec 
       else:    
          data.write(r,i ,j)

Create a sheet for each week

 if (r == 1 or dt.weekday() == 6): # First record or Sunday
 # create a new work sheet, and chart 
    temp = workbook.add_worksheet(dd + '-' +mm)
    chart1 = workbook.add_chart({'type': 'scatter'})
    chart1 = default_chart('Usage for week starting '+ ...)
    # put chart onto the sheet
    temp.insert_chart('A1', chart1)

Add data range to each chart

This says create a chart with

data name from the date value in column 3 of the row – r is row number
use the column header from data sheet row 0, column 5; to row 0 column count -1
use the vales from from r, column 5 to row r ,column count -1
pick the colour depending on the day colours[] is an array of colours [“red”,”blue”..]
picks a marker type based on week day from an array [“square”,”diamond”…]

# r is the row number in the data 
chart1.add_series({
         'name':       ['Data',r,3],
         #  field name is row 0 cols 5 to ... 
         'categories': ['Data',0,5,0,count-1],
          # data is in row r - same range 5 to  ,,,
         'values':     ['Data',r,5,r,count-1],
          # pick the colour and line width 
         'line':       {'color': colours[weekday],"width" :1 },
         # and the marker
         'marker':     {'type': markers[weekday]}
       })

Write a cell formula

You can write a formula instead of a value. You have to modify the formula for each row and column.

In a spread sheet you can create a formula, then use cut and paste to copy it to many cells. This will change the variables. If you have for cell A1, =SUM(A2:A10) then copy this to cell B2, the formula will be =SUM(B3:B11).

With xlsxWriter you have to explicitly code the formula

worksheet.write_formula('A1', '{=SUM(A2:A10)}')
worksheet.write_formula('B2', '{=SUM(B3:B11)}')

Save, clean up and end

I had the potential to hide columns – but then they did not display.

I made the column widths fit the data.

# hide boring stuff
# data.set_column('A:C',None,None,{'hidden': 1}) 
# Make columns narrow 
data.set_column('D:D', 5)  # Just Column d    
data.set_column('F:BA', 5)  # Columns F-BA 30.    
workbook.close()       
exit(0)

Creating a C external function for Python, an easier way to compile

I wrote about my first steps in creating a C extension in Python. Now I’ve got more experience, I’ve found an easier way of compiling the program and creating a load module. It is not the official way – but it works, and is easier to do!

The traditional way of building a package is to use the setup.py technique. I’ve found just compiling it works just as well (and is slighly faster). You still need the setup.py for building Python source.

I set up a cp4.sh file

name=zos 
pythonSide='/usr/lpp/IBM/cyp/v3r8/pyz/lib/python3.8/config-3.8/libpython3.8.x' 
export _C89_CCMODE=1 
p1=" -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64" 
p2="-Wc,DLL -D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS" 
p2="-D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS" 
p3="-D_OPEN_SYS_FILE_EXT                     -qstrict          " 
p4="-Wa,asa,goff -qgonumber -qenum=int" 
p5="-I//'COLIN.MQ930.SCSQC370' -I. -I/u/tmp/zpymqi/env/include" 
p6="-I/usr/lpp/IBM/cyp/v3r8/pyz/include/python3.8" 
p7="-Wc,ASM,EXPMAC,SHOWINC,ASMLIB(//'SYS1.MACLIB'),NOINFO " 
p8="-Wc,LIST(c.lst),SOURCE,NOWARN64,FLAG(W),XREF,AGG -Wa,LIST,RENT" 
/bin/xlc $p1 $p2 $p3 $p4 $p5 $p6 $p7 $p8  -c $name.c -o $name.o  -qexportall -qagg -qascii 
l1="-Wl,LIST=ALL,MAP,XREF     -q64" 
l1="-Wl,LIST=ALL,MAP,DLL,XREF     -q64" 
/bin/xlc $name.o  $pythonSide  -o $name.so  $l1 1>a 2>b 
oedit a 
oedit b

This shell script creates a zos.so load module in the current directory.

You need to copy the output load module (zos.so) to a directory on the PythonPath environment variable.

What do the parameters mean?

Many of the parameters I blindly copied from the setup.py script.

name=zos
- This parametrizes the script, for example $name.c $name.o $name.so
pythonSide=’/usr/lpp/IBM/cyp/v3r8/pyz/lib/python3.8/config-3.8/libpython3.8.x’
- This is where the python side deck, for resolving links the to functions in the Python code
export _C89_CCMODE=1
- This is needed to prevent the message “FSUM3008 Specify a file with the correct suffix (.c, .i, .s,.o, .x, .p, .I, or .a), or a corresponding data set name, instead of -o./zos.so.”
p1=” -DNDEBUG -O3 -qarch=10 -qlanglvl=extc99 -q64″
- -O3 optimization level
- -qarch=10 is the architectural level of the code to be produced.
- –qlanglvl=extc99 says use the C extensions defined in level 99. (For example defining variables in the middle of a program, rather that only at the top)
- -q64 says make this a 64 bit program
p2=”-D_XOPEN_SOURCE_EXTENDED -D_POSIX_THREADS”
- The C #defines to preset
p3=”-D_OPEN_SYS_FILE_EXT -qstrict “
- -qstrict Used to prevent optimizations from re-ordering instructions that could introduce rounding errors.
p4=”-Wa,asa,goff -qgonumber -qenum=int”
- -Wa,asa,goff options for any assembler compiles (not used)
- -qgonumber include C program line numbers in any dumps etc
- -qenum=int use integer variables for enums
p5=”-I//’COLIN.MQ930.SCSQC370′ -I. -I/u/tmp/zpymqi/env/include”
- Where to find #includes:
- the MQ libraries,
- the current working directory
- the header files for my component
p6=”-I/usr/lpp/IBM/cyp/v3r8/pyz/include/python3.8″
- Where to find #includes
p7=”-Wc,ASM,EXPMAC,SHOWINC,ASMLIB(//’SYS1.MACLIB’),NOINFO “
- Support the use of __ASM().. to use inline assembler code.
- Expand macros to show what is generated
- List the data from #includes
- If using__ASM__(…) where to find assembler copy files and macros.
- Do not report infomation messages
p8=”-Wc,LIST(c.lst),SOURCE,NOWARN64,FLAG(W),XREF,AGG -Wa,LIST,RENT”
- For C compiles, produce a listing in c.lst,
- include the C source
- do not warn about problems with 64 bit/31 bit
- display the cross references (where used)
- display information about structures
- For Assembler programs generate a list, and make it reentrant
/bin/xlc $p1 $p2 $p3 $p4 $p5 $p6 $p7 $p8 -c $name.c -o $name.o -qexportall
- Compile $name.c into $name.o ( so zos.c into zos.o) export all entry points for DLL processing
L1=”-Wl,LIST=ALL,MAP,DLL,XREF -q64″
- bind pararameters -Wl, produce a report,
- show the map of the module
- show the cross reference
- it is a 64 bit object
/bin/xlc $name.o $pythonSide -o $name.so $L1 1>a 2>b
- take the zos.o, the Python side deck and bind them into the zos.so
- pass the parameters defined in L1
- output the cross reference to a and errors to b
oedit a
- This will have the map, cross reference and other output from the bind
oedit b
- This will have any error messages – it should be empty

Notes:

-qarch=10 is the default
the -Wa are for when compiling assembler source eg xxxx.s
–qlanglvl=extc99. EXTENDED may be better than extc99.
it needs the -qascii to work with Python.

Python classes, objects, external functions and cleaning up.

I’ve been working in some code to be able to use z/OS datasets, and DD statements. It took me a while to understand how some bits of Python work.

I also did things such as open a file, allocate a 1MB buffer, and wondered how to close the file, and release the buffer to prevent a storage leak.

The Python import

The Python import makes external functions and classes available to a program. The syntax is like

import abc as xyz

x = xyz…..

abc can be

a file abc.py
a directory abc
a load module abc.so

I’ll focus on the load module.

The abc.so load module

This can define a function based approach, so you would use it like

fileHandle = zos.fopen(“colin.c”,”rb”)
data = zos.fread(fileHandle)
zos.fclose(fileHandle)

You can provide many functions. Some may return a “handle” object, such as fileHandle which is passed to other functions.

It can also be object based and the C load module external function creates a new type.

file = zos.fopen(“colin.c”,”rb”)
data = file.fread()
file.close()

The functions are associated with the object “file”, rather than the load module zos.

Internally the object is passed to the function.

Cleaning up

Within my code I had fileHandle = fopen(“datasetname”….), which allocated a 1MB buffer for the read function.

I also had fclose(fileHandle) where I closed the file and freed the buffer.

However I could also do

fileHandle = fopen(“datasetname1″….)
fileHandle = fopen(“datasetname2″….)
fileHandle = fopen(“datasetname3″….)
fclose(fileHandle)

with no intermediate fclose(), which would lead to a storage leak as the file was fclose routine was not being called.

Using a class to call a function at exit

If you have a Python class for your data you can use

def cb(self,a,b):
     self.handle =  zconsole.acb(a,b)
     atexit.register(self.clean_up,self.handle)

def clean_up(self,handle):
    if handle != None:
        zconsole.cancel(self.handle)

When function cb is used, it registers with the “at exit” routine atexit, and says, “at exit” call my routine “clean_up”, and pass the handle.

At shutdown the clean_up routine is called once for every instance, and gives the cancel code a chance to clean up.

Using a C external function and “functions”.

Within the external functions C code, is PyModuleDef which defines the module to Python.

As such there is no way to automatically get your clean up function to be called (and free my 1MB buffer).

However you can exploit the Python module state data. For example

struct {
myparm * ...
...
} myStatic;

static struct PyModuleDef zos_module = {
  PyModuleDef_HEAD_INIT,
  "zos",
  zos_doc,
  sizeof(myStatic),
  zos_methods, // the functions (methods)
  NULL, // Multi phase init. NULL -> single
  NULL, // Garbage collection traversal
  zos_clear, // Garbage collection clear
  zos_free // Garbage collection free
};

The block of state data is allocated for you, and you can issue the PyModule_GetState(PythonModule) function to get access this block.

You chain could chain your data from the state data, perhaps in a linked list.

When the clean up occurs, your “zos_free” routine will be called, and you can free all the storage you allocated and clean up.

For example

PyMODINIT_FUNC PyInit_zos(void) { 
  PyObject *d; 
                                                                                        
  /* Create the module  */ 
  mzos = PyModule_Create(&zos_module); 
  // get the state data and initialise it
  state * pState = (state * )  PyModule_GetState(mzos); 
  memcpy(pState -> eyec,"state   ",8);
  ... 
                                  
  PyDict_SetItemString(d, "__doc__", Py23Text_FromString(zos_doc)); 
  PyDict_SetItemString(d,"__version__", Py23Text_FromString(__version__)); 
                                                                                        
return mzos;

Using a C external function and “objects” or types.

With a “function based” function, you have Python code like

fileHandle = zos.fopen("myfilename"....)
data = zos.fread(fileHande)
...

With “object based” functions you have Python code like

file = zos.fopen("colin.c","rb")
data = file.fread()
file.close()

In this case the object is a Python type. There is a good description here.

As with function based code you define the attributes of the object, including the tp_dealloc function. This gets control when the object is deallocated. In the Custom_dealloc, function you can close the file and free the buffer etc.

static PyTypeObject CustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "custom.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_dealloc = (destructor) Custom_dealloc,
    .tp_flags = Py_TPFLAGS_DEFAULT,
    .tp_new = PyType_GenericNew,
};

static void
Custom_dealloc(CustomObject *self)
{
   ... // put your code here
}

static PyModuleDef custommodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "custom",
    .m_doc = "Example module that creates an extension type.",
    .m_size = -1,
};

PyMODINIT_FUNC
PyInit_custom(void)
{
    PyObject *m;

    m = PyModule_Create(&custommodule);
    if (m == NULL)
        return NULL;
    Py_INCREF(&CustomType);
    if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) <  0) {
        Py_DECREF(&CustomType);
        Py_DECREF(m);
        return NULL;
    }
    return m;
}

Note: The list of available .tp… definitions is available here.