I’m working on a presentation on performance, for some university students, and I thought it would be worth blogging some of the content.
I had presented on what it was like working in industry, compared to working in a university environment. I explained what it is like working in a financial institutions; where you have 10,000 transactions a second, transactions response time is measured in 10s of milliseconds, and if you are down for a day you are out of business. After this they asked how you tune the applications and systems at this level of work.
Do you need to do performance tuning?
Like many questions about performance the answer is it depends….. it comes down to cost benefit analysis. How much CPU (or money) will you save if you do analysis and tuning. You could work for a month and save a couple of hundred pounds. You could work for a day and find CPU savings which means you do not need to upgrade your systems, and so save lots of money.
It is not usually worth doing performance analysis on programs which run infrequently, or are of short duration.
Obvious statements
When I joined the performance team, the previous person in the role had left a month before, and the hand over documentation was very limited. After a week or so making tentative steps into understanding work, I came to the realise the following (obvious once you think about it) statements
- A piece of work is either using CPU or is waiting.
- To reduce the time a piece of work takes you can either reduce the CPU used, or reduce the waiting time.
- To reduce the CPU you need to reduce the CPU used.
- The best I/O is no I/O
- Caching of expensive operations can save you a lot.
Scenario
In the description below I’ll cover the moderately a simple case, and also the case where there are concurrent threads accessing data.
Concurrent activity
When you have more than one thread in your application you will need to worry about data consistency. There are locks and latches
- Locks tend to be “long running” – from milliseconds to seconds. For example you lock a database record while updating it
- Latches tend to be held across a block of code, for example manipulation of lists and updating pointers.
Storing data in memory
There are different ways of storing data in memory, from arrays, hash tables to binary trees. Some are easy to use, some have good performance.
Consider having a list of 10,000 names, which you have to maintain.
Array
An array is a contiguous block of memory with elements of the same size. To locate an element you calculate the offset “number of element” * size of element.
If the list is not sorted, you have to iterate over the array to find the element of interest.
If the list is sorted, you can do a binary search, for example if the array has 1000 elements, first check element 500, and see if the value is higher or lower, then select element 250 etc.
An array is easy to use, but the size is inflexible; to change the size of the array you have to allocate a new array, copy old to new, release old.
Single Linked list
This is a chain of elements, where each element points to the next, the there is a pointer to the start of the chain, and something to say end of chain ( often “next” is 0).
This is flexible, in that you can easily add elements, but to find an element you have to search along the chain and so this is not suitable for long chains.
You cannot easily delete an element from the chain.
If you have A->B->D->Q. You can add a new element G, by setting G->Q, and D->G. If there are multiple threads you need to do this under a latch.
Doubly linked lists
This is like a single linked list, but you have a back chain as well. This allows you to easily delete an element. To add an element you have to update 4 pointers.
This is a flexible list where you can add and remove element, but you have to scan it sequentially to find the element of interest, and so is not suitable for long chains.
If there are multiple threads you need to do this under a latch.
Hash tables
Hash tables are a combination of array and linked lists.
You allocate an array of suitable size, for example 4096. You hash the key to a value between 0 and 4095 and use this as the index into the array. The value of the array is a linked list of elements with the same hash value, which you scan to find the element of interest.
You need a hash table size so there are a few (up to 10 to 50) elements in the linked list. The hash function needs to produce a wide spread of values. Having a hash function which returned one value, means you would have one long linked list.
Binary trees
Binary trees are an efficient way of storing data. If there are any updates, you need to latch the tree while updates are made, which may slow down multi threaded programs.
Each node of a tree has 4 parts
- The value of this node such as “COLIN PAICE”
- A pointer to a node for values less than “COLIN PAICE”
- A pointer to a node for values greater than “COLIN PAICE”
- A pointer to the data record for this node.
If the tree is balanced the number of steps from the start of the tree to the element of interest is approximately the same for all elements.
If you add lots of elements you can get an unbalanced tree where the tree looks like a flag pole – rather than an apple tree. In this case you need to rebalanced the tree.
You do not need to worry about the size of the tree because it will grow as more elements are added.
If you rebalance the tree, this will require a latch on the tree, and the rebalancing could be expensive.
There are C run time functions such as tsearch which walks the tree and if the element exists in the tree, it returns the node. If it did not exist in the tree, it adds to the free, and returns the value.
This is not trivial to code – (but is much easier than coding a tree yourself).
You need to latch the tree when using multiple threads, which can slow down your access.
Optimising your code
Take the scenario where you write an application which is executed a 1000 times a second.
int myfunc(char * name, int cost, int discount) { printf(“Values passed to myfunc %s cost discount" i\n”,name,cost,discount); rc= dosomething() rc = 0; printf(“exit from myfunc %i\n”,rc); return rc; }
Note: This is based on a real example, I went to a customer to help with a performance problem, and found the top user was printf() – printing out logging information. They commented this code out in all of their functions and it went 5 times faster
You can make this go faster by having a flag you set to produce trace output, so
if (global.trace ) printf(“Values passed to myfunc %s cost discount" i\n”,name,cost,discount);
You could to the same for the exit printf, but you may want to be more subtle, and use
if (global.traceNZonexit && rc != 0) printf(“exit from myfunc %i\n”,rc);
This is useful when the return code is 0 most of the time. It is useful if someone reports problems with the application – and you can say “there is a message access-denied” at the time of your problem.
FILE * hFILE = 0; for ( I = 0;i < 100;i ++) /* create a buffer with our data in it */ lenData = sprintf(buffer,”userid %s, parm %s\n”, getid(), inputparm); error = ….() if (error > 0) { hFILE = fopen(“outputfile”,”a); fwrite(buffer,1,lenData,fFile) fclose(hFile) } … }
This can be improved
- by moving the getid() out of the loop – it does not change within the loop
- move the lenData = sprintf.. within the error loop.
- change the error loop
{ ... if (error > 0) { if (hFile == 0 ) { hFILE = fopen(“outputfile”,”a”); pUserid = strdup(getuserid()); } fwrite(buffer,1,lenData,fFile) } ... } if (hFile > 0) fclose(hFile);
You can take this further, and have the file handle passed in to the function, so it is only opened once, rather than every time the function is invoked.
main() { struct {FILE * hFile … } threadBlock for(i=1,i<9999,i++) myprog(&threadBlock..} if (threadBlock →hFile != 0 )fclose(theadBlock → hFile); } } // subroutine myprog(threadblock * pt....){ ... if (error > 0) { if (pt -> hFile == 0 ) { pt -> hFile= fopen(“outputfile”,”a”); } fwrite(buffer,1,lenData,pt -> hFile) }
Note: If this is a long running “production” system you may want to open the file as part of application startup to ensure the file can be opened etc, rather than find this out two days later.
If programs run infrequently but during peak , or are of short duration but run at high frequency optimization will save cost.
Reïnitialize only if needed and only the used part of in-storage data structures that are reused.
LikeLike