Accessing lots of data efficiently using C binary tree functions.

A binary tree is an efficient way of storing data in memory (in a tree!) The C run time library has a set of functions that allow you to create and manage data in a binary tree, but it is not that easy to set up.

Core concepts

Each element, or node, in the tree has two children and a pointer to the record. Each child can be null, or the address of another child. The left child contains elements which are less than the parent, the right node contains elements which are greater than the parent node. To find an element in the tree, you start at the top, or the root node, and compare the value of the nodes value, with the element you are looking for and work down the tree until you find a matching record or you hit the bottom of the tree. You provide a compare function takes two records and returns:

  • -1 if the first record is greater than the second record
  • 0 if the two records are the same item
  • 1 if the first record is less than the second record.

The comparison could be a simple string compare, or comparing data in a structure for example

typedef struct {
char pString[5]; // to the message text
int count ;
 int date;
} MyNode;

int Node_compare(const void *MyNode1, const void *MyNode2) {
  MyNode * p1, *p2;
int rc;
 // check the dates
  rc = p1-> date - p2 -> date;
  if (rc == 0 )
 // check the string
rc = strcmp(p1->pString, p2->pString);
  // return the result
  return rc;

}

You need a root for the tree

void *ROOT = NULL;

Some C I do not understand

In some of the routines you need code like

char * buffer = * (char **) in;
//char * buffer = (char *) in;

With my knowledge of C, these two statements look the same. Only the first one works.

I have code

void print_Node(const void *ptr, ...) {
...
 MyNode p = *(MyNode**) ptr;

There may be a smarter way of referencing the passed data, but the definition of “MyNode p” and using “* (Mynode **)” on the passed in pointer, works.

Referencing structures

It took me a while to work out the correct type definitions to get it to work. For example

int Node_compare(const void *, const void *);
MyNode * pMyNode;
void * p;
p = (void *) pMyNode;
MyNode * q;
/* see if it is in the tree already */
q = *(MyNode **) tsearch(p, (void **) &ROOT, Node_Compare);

The various functions expect pointer defined with type (void *).

Find an element

To look for for an element you use the tfind (tree find) function. The parameters are

  • a pointer to the record (or string) partially initialised with the values used by the compare function.
  • the root of the tree
  • the compare function, described above.

It returns the record, or a null.

To use this function you have to create a record containing the values the compare function can use it. This record can be in automatic storage, or you can use malloc() to allocate storage for it.

typedef struct {
char pString[5]; // to the message text
int count ;
 int date;
} MyNode;
MyNode tempNode;
memcpy(&tempNode.String("WXYZ"),sizeof("WXYZ");
tempNode.date = ....

You do not need to initialise tempNode.count, as this is not used in the comparison.

Adding an element

There is no tadd() function as such, there is a tsearch which provides “look for this node, and add it if it was not found“.

The parameters are

  • a pointer to the record. 
  • the root of the tree
  • the compare function, described above.

it returns the address of a record. If it is the one you passed in – then it was added. It it returns a different node – an existing record was returned.

See below for a more detailed description.

Deleting a node

You pass the standard three parameters in.

Walking the tree.

I’ve typically used a binary tree to store lots of information, such as a list of error messages and the count of occurrences, then, at the end of the job, print them all out in sequence. This is called walking the tree.

You use

twalk(root, print_function);

This calls the print_function for every element in the tree.

Planning for the tree

If you plan to walk the tree and print the elements in order, then the compare function needs to be written so the data is in the right order.

For example, I want to report the number of error messages, and the count of the messages, by day.

With my structure

typedef struct {
char pString[5]; // to the message text
int count ;
 int date;
} MyNode;

If I use

int Node_compare(const void *MyNode1, const void *MyNode2) {
  MyNode * p1, *p2;
int rc;
 // check the dates
  rc = p1-> date - p2 -> date;
  if (rc == 0 )
 // check the string
rc = strcmp(p1->pString, p2->pString);
  return rc;

}

then the data will be sorted by date, then message within date order

If I use

int Node_compare(const void *MyNode1, const void *MyNode2) {
  MyNode * p1, *p2;
int rc;
 // check the string
rc = strcmp(p1->pString, p2->pString);
  if (rc == 0 )

  // check the dates
  rc = p1-> date - p2 -> date;
  
  return rc;

}

it will report in message sequence, then by date within message.

Adding nodes to the tree.

You need to malloc storage for each node you intend to add, because the node may be stored within the tree.

MyNode  * pTode;
/* allocate our Node - we cant use automatic storage as it may */
/* be added to the tree - so must not be deleted */
pNode = (MyNode *) malloc(sizeof(MyNode));
if (pNode == 0)
{
perror("Malloc for Node failed ");
return 8;
}

/* initialise it */
strcpy(&pNode->pString[0],"ABCD") ;
pNode -> date = date();
pNode->count = 1;

void * p;
p = (void *) pNode;

MyNode * qreturned;
/* see if it is in the tree already */
qreturned = *(MyNode **) tsearch(p, (void **) &ROOT,Node_compare);

if (qreturned == p)
{
/* it didnt exist before - thus it was added */
   /* possibly do something   */
     initialise the remained of the record
}

else /* it did exist before so we need to update the count field */
{
   qreturned->count += 1; Update the values
   /* release the storage we dont need */
free(pNode);
}

All data must be self contained within the node, or to permanently allocate storage (with malloc()). If it references something in automatic storage, then when the automatic storage is reused, your pointer will point to invalid data.

Print the tree

twalk(ROOT, print_Node); // ROOT not &ROOT because of no updates, and pass the function
//
// which invokes
//
void print_Node(const void *ptr, VISIT order, int level) {
 if (order == leaf || order == postorder) {
  MyNode p = *(MyNode**) ptr;
  printf("Msg %4.4s date %d count %d\n",
        p->pString, p->date,p-> count);
  // level is how far down the tree the node is. Root = 0
  }
}

Putting it together

I wanted to collect information from RACF records and be able to refer back to the records. I treated the records as char *

The essence of the code is

Compare

#define LRECORD 294
int compare207(const void *MyNode1, const void *MyNode2) {
  int rc = memcmp( ((char * ) Mynode1) + 1
             ((char * ) Mynode2) + 1,261)
return rc;
}

Add a record

void add207(char * pRACF) 
{
void * p = malloc(LRECORD);
 ...
memcpy(p ,pRACF,294);
qreturned = *(char **) tsearch(p, (void **) &ROOT207,
compare207);
if (qreturned == p)
{
// it was added
}
else /* it did exist before so we need to update the count field */
{
//it existed - so update the fields);
  // qreturned -> ....
}

Find a record

char * find207(char * pIn) 
{
char * * pOut;
pOut =*(char **) tfind( (const void *) pIn,
(void **)&ROOT207,
compare207);
return pOut;
}

Walk the tree and print a record

void tree207print(const void *in    , VISIT order, int level) { 
char * pBuffer = * (char **) in;

 if (order == leaf || order == postorder)
{
  printf("Data %s\n",pBbuffer -> .... );
 }
}
void twalk207() 
{
twalk(ROOT207, tree207print);
}

Leave a comment