Getting from C enumerations to Python dicts.

I wanted to create Python enumerates from C code. For example with system ssl, there is a datatype x509_attribute_type.

This has a definition

typedef enum {                                                                    
    x509_attr_unknown                   = 0,                                      
    x509_attr_name                      = 1,   /* 2.5.4.41                   */   
    x509_attr_surname                   = 2,   /* 2.5.4.4                    */   
    x509_attr_givenName                 = 3,   /* 2.5.4.42                   */   
    x509_attr_initials                  = 4,   /* 2.5.4.43                   */   
    ... 
} x509_attribute_type;

I wanted to created

			
x509_attribute_type = {
  "x509_attr_unknown"   : 0,  
  "x509_attr_name"      : 1, 
  "x509_attr_surname"   : 2,   
  "x509_attr_givenName" : 3,   
  "x509_attr_initials"  : 4,
  ...  
}

		

I’ve done this using ISPF macros, but thought it would be easier(!) to automate it.

There is a standard way for compilers to products information for debuggers to understand the structure of programs. DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging. The data is stored internally using Executable and Linkable Format(ELF).

Getting the DWARF file

To get the structures into the DWARF file, it looks like you have to use the structure, that is, if you #include a file, by default the definitions are not stored in the DWARF file.

When I used

			
#include <gskcms.h> 
...
x509_attribute_type aaa;
x509_name_type nt;
x509_string_type st; 
x509_ecurve_type et; 
int l = sizeof(aaa) sizeof(nt) + sizeof(st) + sizeof(et);

		

I got the structures x509_attribute_type etc in the DWARF file.

Compiling the file

I used USS xlc command with

xlc ...-Wc,debug(FORMAT(DWARF),level(9))... abc.c

or
                                                                   
/bin/xlclang  -v -qlist=d.lst -qsource qdebug=format=dwarf -g -c abc.c -o abc.o

This created a file abc.dbg

The .dbg file includes an eye catcher of ELF (in ASCII)

I downloaded the file in binary to Linux.

There are various packages which are meant to be able to process the file. The only one I got to work successfully was dwarfdump. The Linux version has many options to specify what data you want to select and how you want to report it. dwarfdump reported some errors, but I got most of the information out.

readelf displays some of the information in the file, but I could not get it to display the information about the variables.

What does the output from dwarfdump look like?

The format has changed slightly since I first used this a year or so ago. The data are not always aligned on the same columns, and values like <146> and 2426 (used as a locator id) are now hexadecimal offsets.

The older format

<1>< 2426>      DW_TAG_subprogram
                DW_AT_type                  <146>
                DW_AT_name                  printCert
                DW_AT_external              yes
...
<2>< 2456>      DW_TAG_formal_parameter
                DW_AT_name                  id
                DW_AT_type                  <10387>
...

The newer format

< 1><0x0000132d>    DW_TAG_typedef
                      DW_AT_type                  <0x00001349>
                      DW_AT_name                  x509_attribute_type
                      DW_AT_decl_file             0x00000002
                      DW_AT_decl_line             0x000000a7
                      DW_AT_decl_column           0x00000003
< 1><0x00001349>    DW_TAG_enumeration_type
                      DW_AT_name                  __3
                      DW_AT_byte_size             0x00000002
                      DW_AT_decl_file             0x00000002
                      DW_AT_decl_line             0x00000091
                      DW_AT_decl_column           0x0000000e
                      DW_AT_sibling               <0x00001553>
< 2><0x00001356>      DW_TAG_enumerator
                        DW_AT_name                  x509_attr_unknown
                        DW_AT_const_value           0
< 2><0x0000136a>      DW_TAG_enumerator
                        DW_AT_name                  x509_attr_name
                        DW_AT_const_value           1

Some of the fields are obvious… others are more cyptic, with varying levels of indirection.

<1><0x0000132d> is a high level object <1> with id <0x0000132d>
- DW_TAG_typedef is a typedef
- DW_AT_type <0x00001349> see <0x00001349> for the definition (below)
- DW_AT_name x509_attribute_type is the name of the typedef
- DW_AT_decl_file 0x00000002 there is a file definition #2… but I could not find it
- DW_AT_decl_line 0x000000a7 the position within the file
- DW_AT_decl_column 0x00000003
< 1><0x00001349> DW_TAG_enumeration_type. This is referred to by the previous element
- DW_AT_name __3 this an internally generated name
< 2><0x00001356> DW_TAG_enumerator This is part of the <1> <0x00001349> above. It is an enumerator.
- DW_AT_name x509_attr_unknown this is the label of the value
- DW_AT_const_value 0 with value 0
the next is label x509_attr_name with value 1

Other interesting data

I have a function

			
int colin(char * cinput, gsk_buffer * binput )
{
  ...
}
and
typedef struct _gsk_data_buffer { 
    gsk_size            length; 
    void *              data; 
} gsk_data_buffer, gsk_buffer; 

		

Breaking this down into its parts, there is an entry in the DWARF output for “int”, “colin”, “*”, “char”, “cinput”, “*”, gsk_buffer (which has levels within it), “binput”

< 1><0x0000009a>    DW_TAG_base_type
                      DW_AT_name                  int
                      DW_AT_encoding              DW_ATE_signed
                      DW_AT_byte_size             0x00000004

< 1><0x00000142>    DW_TAG_subprogram
                      DW_AT_type                  <0x0000009a>
                      DW_AT_name                  colin
                      DW_AT_external              yes(1)
                      ...
< 2><0x0000015c>      DW_TAG_formal_parameter
                        DW_AT_name                  cinput
                        DW_AT_type                  <0x00001fa9>

< 2><0x00000173>      DW_TAG_formal_parameter
                        DW_AT_name                  binput
                        DW_AT_type                  <0x00001fb5>
:

< 2><0x0000018a>      DW_TAG_variable
                        DW_AT_name                  __func__
                        DW_AT_type                  <0x00001fc0>

for cinput

< 1><0x00001fa9>    DW_TAG_pointer_type
                      DW_AT_type                  <0x0000006e>
                      DW_AT_address_class         0x0000000a
< 1><0x0000006e>    DW_TAG_base_type
                      DW_AT_name                  unsigned char
                      DW_AT_encoding              DW_ATE_unsigned_char
                      DW_AT_byte_size             0x00000001

For binput

binput (1fbf) -> DW_TAG_pointer_type (1e2e) -> gsk_buffer (1e41) -> _gsk_data_buffer of length 8.
_gsk_data_buffer has two component
- length _> gsk_size…
- data (1faf) -> pointer_type (13c)-> unspecified type void

Processing the data in the file

Parse the data

For an entry like DW_AT_type <0x0000006e> which refers to a key of <0x0000006e>. The definition for this could be before after the current entry being processed.

I found it easiest to process the whole file (in Python), and build up a Python dictionary of each high level defintion.

I could then process the dict one element at a time, and know that all the elements it refers to are in the dict.

There are definitions like

typedef struct _x509_tbs_certificate { 
    x509_version                version; 
    gsk_buffer                  serialNumber; 
    x509_algorithm_identifier   signature; 
    x509_name                   issuer; 
    x509_validity               validity; 
    x509_name                   subject; 
    x509_public_key_info        subjectPublicKeyInfo; 
    gsk_bitstring               issuerUniqueId; 
    gsk_bitstring               subjectUniqueId; 
    x509_extensions             extensions; 
    gsk_octet                   rsvd[16]; 
} x509_tbs_certificate;

Some elements like x509_algorithm_identifier have a complex structure, which refer to other structures. I think the maximum depth for one of the structures was 6 levels deep.
If you are processing a structure you need to decide how many levels deep you process. For the enumeration I was just interested in the level < 1> and < 2> definitions and ignored any below that depth.

For each < 1> element, there may be zero or more < 2> elements. I added each < 2> element to a Python list within the < 1> element.

You may decide to ignore entries such as which file, row or column a definition is in.

My Python code to parse the file is

			
fn = "./dwarf.txt"
with open(fn) as fp:
        # skip the stuff at the front
        for line in fp:
            if line[0:5] == "LOCAL":
                break
        all = {} # return the data here
        for line in fp:
            # if the line starts with ".debug" we're done
            if line[0:6] == ".debug":
                break
            lhs = line[0:4]
            # do not process nested requests
            if line[0:1] == "<":                
                if lhs in ["< 1>","< 2>"]:
                        keep = True  
                else:
                    keep = False
            else:
                if line[0:1] != " ":
                    continue        
            if keep is False: # only within <1> and <2>
                continue
            # we now have records of interest < 1> and < 2> and records underneath them               
            if line[0:4] == "< 1>":
                key = line[4:16].strip() # "< 123>"
                kwds1 = {}
                all[key] = kwds1
                kwds1["l2"] = [] # empty list to which we add                
                state = 1  # <1> element
                kwds1["type"] = line[20:-1]
            elif line[0:4] == "< 2>":
               kwds2 = {}
               kwds1["l2"].append(kwds2)
               kwds2["type"] =line[21:-1]
               state = 2 # <2> element
            else: 
                tag = line[0:47].strip()
                value = line[47:-1].strip()
                if state == 1:
                    kwds1[tag] = value
                else:
                    kwds2[tag] = value     

		

Process the data

I want to look for the enumerate definitions and only process those

			
print("=================================")
    for e in all:
        d = all[e]
        if d["type"] == "DW_TAG_typedef": # only these
            print(d["DW_AT_name"],"{")
            at_type = d["DW_AT_type" ] # get the key of the value
            l = all[at_type]["l2"] # and the <2> elements within it
            for ll in l:
                if "DW_AT_const_value" in ll:
                    print('    "'+ll["DW_AT_name"]+'":',ll["DW_AT_const_value"])
            print("}")

		

This produced code like

			
x509_attribute_type = {
  "x509_attr_unknown"   : 0,  
  "x509_attr_name"      : 1, 
  "x509_attr_surname"   : 2,   
  "x509_attr_givenName" : 3,   
  "x509_attr_initials"  : 4,
  ...  
}

		

You can do more advanced things if you want to, for example create structures for Python Struct (struct — Interpret bytes as packed binary data) to build control blocks. With this you can pass in a dict of names, and the struct definitions, and it converts the bytes into the specified definitions ( int, char, byte etc) with the correct bigend/little-end processing etc.

Getting from C enumerations to Python dicts.

Getting the DWARF file

Compiling the file

What does the output from dwarfdump look like?

The older format

The newer format

Other interesting data

Processing the data in the file

Parse the data

Process the data

Published by Colin Paice

Leave a comment Cancel reply

Getting the DWARF file

Compiling the file

What does the output from dwarfdump look like?

The older format

The newer format

Other interesting data

Processing the data in the file

Parse the data

Process the data

Share this:

Related

Published by Colin Paice

Leave a comment Cancel reply