I wanted to create Python enumerates from C code. For example with system ssl, there is a datatype x509_attribute_type.
This has a definition
typedef enum {
x509_attr_unknown = 0,
x509_attr_name = 1, /* 2.5.4.41 */
x509_attr_surname = 2, /* 2.5.4.4 */
x509_attr_givenName = 3, /* 2.5.4.42 */
x509_attr_initials = 4, /* 2.5.4.43 */
...
} x509_attribute_type;
I wanted to created
x509_attribute_type = { "x509_attr_unknown" : 0, "x509_attr_name" : 1, "x509_attr_surname" : 2, "x509_attr_givenName" : 3, "x509_attr_initials" : 4, ... }
I’ve done this using ISPF macros, but thought it would be easier(!) to automate it.
There is a standard way for compilers to products information for debuggers to understand the structure of programs. DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging. The data is stored internally using Executable and Linkable Format(ELF).
Getting the DWARF file
To get the structures into the DWARF file, it looks like you have to use the structure, that is, if you #include a file, by default the definitions are not stored in the DWARF file.
When I used
#include <gskcms.h> ...x509_attribute_type aaa;x509_name_type nt;x509_string_type st; x509_ecurve_type et; int l = sizeof(aaa) sizeof(nt) + sizeof(st) + sizeof(et);
I got the structures x509_attribute_type etc in the DWARF file.
Compiling the file
I used USS xlc command with
xlc ...-Wc,debug(FORMAT(DWARF),level(9))... abc.c
or
/bin/xlclang -v -qlist=d.lst -qsource qdebug=format=dwarf -g -c abc.c -o abc.o
This created a file abc.dbg
The .dbg file includes an eye catcher of ELF (in ASCII)
I downloaded the file in binary to Linux.
There are various packages which are meant to be able to process the file. The only one I got to work successfully was dwarfdump. The Linux version has many options to specify what data you want to select and how you want to report it. dwarfdump reported some errors, but I got most of the information out.
readelf displays some of the information in the file, but I could not get it to display the information about the variables.
What does the output from dwarfdump look like?
The format has changed slightly since I first used this a year or so ago. The data are not always aligned on the same columns, and values like <146> and 2426 (used as a locator id) are now hexadecimal offsets.
The older format
<1>< 2426> DW_TAG_subprogram
DW_AT_type <146>
DW_AT_name printCert
DW_AT_external yes
...
<2>< 2456> DW_TAG_formal_parameter
DW_AT_name id
DW_AT_type <10387>
...
The newer format
< 1><0x0000132d> DW_TAG_typedef
DW_AT_type <0x00001349>
DW_AT_name x509_attribute_type
DW_AT_decl_file 0x00000002
DW_AT_decl_line 0x000000a7
DW_AT_decl_column 0x00000003
< 1><0x00001349> DW_TAG_enumeration_type
DW_AT_name __3
DW_AT_byte_size 0x00000002
DW_AT_decl_file 0x00000002
DW_AT_decl_line 0x00000091
DW_AT_decl_column 0x0000000e
DW_AT_sibling <0x00001553>
< 2><0x00001356> DW_TAG_enumerator
DW_AT_name x509_attr_unknown
DW_AT_const_value 0
< 2><0x0000136a> DW_TAG_enumerator
DW_AT_name x509_attr_name
DW_AT_const_value 1
Some of the fields are obvious… others are more cyptic, with varying levels of indirection.
- <1><0x0000132d> is a high level object <1> with id <0x0000132d>
- DW_TAG_typedef is a typedef
- DW_AT_type <0x00001349> see <0x00001349> for the definition (below)
- DW_AT_name x509_attribute_type is the name of the typedef
- DW_AT_decl_file 0x00000002 there is a file definition #2… but I could not find it
- DW_AT_decl_line 0x000000a7 the position within the file
- DW_AT_decl_column 0x00000003
- < 1><0x00001349> DW_TAG_enumeration_type. This is referred to by the previous element
- DW_AT_name __3 this an internally generated name
- < 2><0x00001356> DW_TAG_enumerator This is part of the <1> <0x00001349> above. It is an enumerator.
- DW_AT_name x509_attr_unknown this is the label of the value
- DW_AT_const_value 0 with value 0
- the next is label x509_attr_name with value 1
Other interesting data
I have a function
int colin(char * cinput, gsk_buffer * binput ){ ...}andtypedef struct _gsk_data_buffer { gsk_size length; void * data; } gsk_data_buffer, gsk_buffer;
Breaking this down into its parts, there is an entry in the DWARF output for “int”, “colin”, “*”, “char”, “cinput”, “*”, gsk_buffer (which has levels within it), “binput”
< 1><0x0000009a> DW_TAG_base_type
DW_AT_name int
DW_AT_encoding DW_ATE_signed
DW_AT_byte_size 0x00000004
< 1><0x00000142> DW_TAG_subprogram
DW_AT_type <0x0000009a>
DW_AT_name colin
DW_AT_external yes(1)
...
< 2><0x0000015c> DW_TAG_formal_parameter
DW_AT_name cinput
DW_AT_type <0x00001fa9>
< 2><0x00000173> DW_TAG_formal_parameter
DW_AT_name binput
DW_AT_type <0x00001fb5>
:
< 2><0x0000018a> DW_TAG_variable
DW_AT_name __func__
DW_AT_type <0x00001fc0>
for cinput
< 1><0x00001fa9> DW_TAG_pointer_type
DW_AT_type <0x0000006e>
DW_AT_address_class 0x0000000a
< 1><0x0000006e> DW_TAG_base_type
DW_AT_name unsigned char
DW_AT_encoding DW_ATE_unsigned_char
DW_AT_byte_size 0x00000001
For binput
- binput (1fbf) -> DW_TAG_pointer_type (1e2e) -> gsk_buffer (1e41) -> _gsk_data_buffer of length 8.
- _gsk_data_buffer has two component
- length _> gsk_size…
- data (1faf) -> pointer_type (13c)-> unspecified type void
Processing the data in the file
Parse the data
For an entry like DW_AT_type <0x0000006e> which refers to a key of <0x0000006e>. The definition for this could be before after the current entry being processed.
I found it easiest to process the whole file (in Python), and build up a Python dictionary of each high level defintion.
I could then process the dict one element at a time, and know that all the elements it refers to are in the dict.
There are definitions like
typedef struct _x509_tbs_certificate {
x509_version version;
gsk_buffer serialNumber;
x509_algorithm_identifier signature;
x509_name issuer;
x509_validity validity;
x509_name subject;
x509_public_key_info subjectPublicKeyInfo;
gsk_bitstring issuerUniqueId;
gsk_bitstring subjectUniqueId;
x509_extensions extensions;
gsk_octet rsvd[16];
} x509_tbs_certificate;
Some elements like x509_algorithm_identifier have a complex structure, which refer to other structures. I think the maximum depth for one of the structures was 6 levels deep.
If you are processing a structure you need to decide how many levels deep you process. For the enumeration I was just interested in the level < 1> and < 2> definitions and ignored any below that depth.
For each < 1> element, there may be zero or more < 2> elements. I added each < 2> element to a Python list within the < 1> element.
You may decide to ignore entries such as which file, row or column a definition is in.
My Python code to parse the file is
fn = "./dwarf.txt"with open(fn) as fp: # skip the stuff at the front for line in fp: if line[0:5] == "LOCAL": break all = {} # return the data here for line in fp: # if the line starts with ".debug" we're done if line[0:6] == ".debug": break lhs = line[0:4] # do not process nested requests if line[0:1] == "<": if lhs in ["< 1>","< 2>"]: keep = True else: keep = False else: if line[0:1] != " ": continue if keep is False: # only within <1> and <2> continue # we now have records of interest < 1> and < 2> and records underneath them if line[0:4] == "< 1>": key = line[4:16].strip() # "< 123>" kwds1 = {} all[key] = kwds1 kwds1["l2"] = [] # empty list to which we add state = 1 # <1> element kwds1["type"] = line[20:-1] elif line[0:4] == "< 2>": kwds2 = {} kwds1["l2"].append(kwds2) kwds2["type"] =line[21:-1] state = 2 # <2> element else: tag = line[0:47].strip() value = line[47:-1].strip() if state == 1: kwds1[tag] = value else: kwds2[tag] = value
Process the data
I want to look for the enumerate definitions and only process those
print("=================================") for e in all: d = all[e] if d["type"] == "DW_TAG_typedef": # only these print(d["DW_AT_name"],"{") at_type = d["DW_AT_type" ] # get the key of the value l = all[at_type]["l2"] # and the <2> elements within it for ll in l: if "DW_AT_const_value" in ll: print(' "'+ll["DW_AT_name"]+'":',ll["DW_AT_const_value"]) print("}")
This produced code like
x509_attribute_type = { "x509_attr_unknown" : 0, "x509_attr_name" : 1, "x509_attr_surname" : 2, "x509_attr_givenName" : 3, "x509_attr_initials" : 4, ... }
You can do more advanced things if you want to, for example create structures for Python Struct (struct — Interpret bytes as packed binary data) to build control blocks. With this you can pass in a dict of names, and the struct definitions, and it converts the bytes into the specified definitions ( int, char, byte etc) with the correct bigend/little-end processing etc.