These days many applications use a stack and heap to manage storage used by an application. For C and Cobol programs on z/OS these use the C run time facilities. As Java uses the C run time facilities, it also uses the stack and heap.
If the stack and heap are not configured appropriately it can lead to an increase in CPU. With the introduction of 64 bit storage, tuning the heap pools and stack is no longer critical. You used to have to carefully manage the stack and heap pool sizes so you didn’t run out of storage.
The 5 second information on what to check, is the number of segments freed for the stack and heap should be zero. If the value is large then a lot of CPU is being used to manage the storage.
The topics are
Kinder garden background to stack.
When a C (main) program starts, it needs storage for the variables uses in the program. For example
int i;
for (ii=0;ii<3:ii++)
{}
char * p = malloc(1024);
The variables ii and p are variables within the function, and will be on the functions stack. p is a pointer.
The block of storage from the malloc(1024) will be obtained from the heap, and its address stored in p.
When the main program calls a function the function needs storage for the variables it uses. This can be done in several ways
- Each function uses a z/OS GETMAIN request on entry, to allocate storage, and a z/OS FREEMAIN request on exit. These storage requests are expensive.
- The main program has a block of storage which functions can use. For the main program uses bytes 0 to 1500 of this block, and the first function needs 500 bytes, so uses bytes 1501 to 2000. If this function calls another function, the lower level function uses storage from 2001 on wards. This is what usually happens, it is very efficient, and is known as a “stack”.
Intermediate level for stack
It starts to get interesting when initial block of storage allocated in the main program is not big enough.
There are several approaches to take when this occurs
- Each function does a storage GETMAIN on entry, and FREEMAIN on exit. This is expensive.
- Allocate another big block of storage, so successive functions now use this block, just like in the kinder garden case. When functions return to the one that caused a new block to be allocated,
- this new block is freed. This is not as expensive as the previous case.
- this block is retained, and stored for future requests. This is the cheapest case. However a large block has been allocated, and may never be used again.
How big a block should it allocate?
When using a stack, the size of the block to allocate is the larger of the user specified size, and the size required for the function. If the specified secondary size was 16KB, and a function needs 20KB of storage, then it will allocate at least 20KB.
How do I get the statistics?
For your C programs you can specify options in the #PRAGMA statement or, the easier way, is to specify it through JCL. You specify C run time options through //CEEOPTS … For example
//CEEOPTS DD *
STACK(2K,12K,ANYWHERE,FREE,2K,2K)
RPTSTG(ON)
Where
- STACK(…) is the size of the stack
- RPTSTG(ON) says collect and display statistics.
There is a small overhead in collecting the data.
The output is like:
STACK statistics:
Initial size: 2048
Increment size: 12288
Maximum used by all concurrent threads: 16218808
Largest used by any thread: 16218808
Number of segments allocated: 2004
Number of segments freed: 2002
Interpreting the stack statistics
From the above data
- This shows the initial stack size was 2KB and an increment of 12KB.
- The stack was extended 2004 times.
- Because the statement had STACK(2K,12K,ANYWHERE,FREE,2K,2K), when the secondary extension became free it was FREEMAINed back to z/OS.
When KEEP was used instead of FREE, the storage was not returned back to z/OS.
The statistics looked like
STACK statistics:
Initial size: 2048
Increment size: 12288
Maximum used by all concurrent thread: 16218808
Largest used by any thread: 16218808
Number of segments allocated: 1003
Number of segments freed: 0
What to check for and what to set
For most systems, the key setting is KEEP, so that freed blocks are not released. You can see this a) from the definition b) Number of segments freed is 0.
If a request to allocate a new segment fails, then the C run time can try releasing segments that are not in use. If this happens the “”segments freed” will be incremented.
Check that the “segments freed” is zero, and if not, investigate why not.
When a program is running for a long time, a small number of “segments allocated” is not a problem.
Make the initial size larger, closer to the “Largest used of any thread” may improve the storage utilisation. With smaller segments there is likely to be unused space, which was too small for a functions request, causing the next segment to be used. So a better definition would be
STACK(16M,12K,ANYWHERE,KEEP,2K,2K)
Which gave
STACK statistics:
Initial size: 16777216
Increment size: 12288
Maximum used by all concurrent threads: 16193752
Largest used by any thread: 16193752
Number of segments allocated: 1
Number of segments freed: 0
Which shows that just one segment was allocated.
Kinder garden background to heap
When there is a malloc() request in C, or a new … in Java, the storage may exist outside of the function. The storage is obtained from the heap.
The heap has blocks of storage which can be reused. The blocks may all be of the same size, or or different sizes. It uses CPU time to scan free blocks looking for the best one to reuse. With more blocks it can use increasing amounts of CPU.
There are heap pools which avoids the costs of searching for the “right” block. It uses a pools of blocks. For example:
- there is a heap pool with 1KB fixed size blocks
- there is another heap pool with 16KB blocks
- there is another heap pool with 256 KB blocks.
If there is a malloc request for 600 bytes, a block will be taken from the 1KB heap pool.
If there is a malloc request for 32KB, a block would be used from the 256KB pool.
If there is a malloc request for 512KB, it will issue a GETMAIN request.
Intermediate level for heap
If there is a request for a block of heap storage, and there is no free storage, a large segment of storage can be obtained, and divided up into blocks for the stack. If the heap has 1KB blocks, and a request for another block fails, it may issue a GETMAIN request for 100 * 1KB and then add 100 blocks of 1KB to the heap. As storage is freed, the blocks are added to the free list in the heap pool.
There is the same logic as for the stack, about returning storage.
- If KEEP is specified, then any storage that is released, stays in the thread pool. This is the cheapest solution.
- If FREE is specified, then when all the blocks in an additional segment have been freed, then free the segment back to the z/OS. This is more expensive than KEEP, as you may get frequent GETMAIN and FREEMAIN requests.
How many heap pools do I need and of what size blocks?
There is usually a range of block sizes used in a heap. The C run time supports up to 12 cell sizes. Using a Liberty Web server, there was a range of storage requests, from under 8 bytes to 64KB.
With most requests there will frequently be space wasted. If you want a block which is 16 bytes long, but the pool with the smallest block size is 1KB – most of the storage is wasted.
The C run time gives you suggestions on the configuration of the heap pools, the initial size of the pool and the size of the blocks in the pool.
Defining a heap pool
How to define a heap pool is defined here.
You specify the size of overall size of storage in the heap using the HEAP statement. For example for a 16MB total heap size.
HEAP(16M,32768,ANYWHERE,FREE,8192,4096)
You then specify the pool sizes
HEAPPOOL(ON,32,1,64,2,128,4,256,1,1024,7,4096,1,0)
The figures in bold are the size of the blocks in the pool.
- 32,1 says maximum size of blocks in the pool is 32 bytes, allocate 1% of the heap size to this pool
- 64,2 says maximum size of blocks in the pool is 64 bytes, allocate 2% of the heap size to this pool
- 128,4 says maximum size of blocks in the pool is 128 bytes, allocate 4% of the heap size to this pool
- 256,1 says maximum size of blocks in the pool is 256 bytes, allocate 1% of the heap size to this pool
- 1024,7 says maximum size of blocks in the pool is 1024 bytes, allocate 7% of the heap size to this pool
- 4096,1 says maximum size of blocks in the pool is 4096 bytes, allocate 1% of the heap size to this pool
- 0 says end of definition.
Note, the percentages do not have to add up to 100%.
For example, with the CEEOPTS
HEAP(16M,32768,ANYWHERE,FREE,8192,4096)
HEAPPOOLS(ON,32,50,64,1,128,1,256,1,1024,7,4096,1,0)
After running my application, the data in //SYSOUT is
HEAPPOOLS Summary:
Specified Element Extent Cells Per Extents Maximum Cells In
Cell Size Size Percent Extent Allocated Cells Used Use
------------------------------------------------------------------------
32 40 50 209715 0 0 0
64 72 1 2330 1 1002 2
128 136 1 1233 0 0 0
256 264 1 635 0 0 0
1024 1032 7 1137 1 2 0
4096 4104 1 40 1 1 1
------------------------------------------------------------------------
For the cell size of 32, 50% of the pool was allocated to it,
Each block has a header, and the total size of the 32 byte block is 40 bytes. The number of 40 bytes units in 50% of 16 MB is 8MB/40 = 209715, so these figures match up.
(Note with 64 bit heap pools, you just specify the absolute number you want – not a percentage of anything).
Within the program there was a loop doing malloc(50). This uses cell pool with size 64 bytes. 1002 blocks(cells) were used.
The output also has
Suggested Percentages for current Cell Sizes:
HEAPP(ON,32,1,64,1,128,1,256,1,1024,1,4096,1,0)
Suggested Cell Sizes:
HEAPP(ON,56,,280,,848,,2080,,4096,,0)
I found this confusing and not well documented. It is another of the topics that once you understand it it make sense.
Suggested Percentages for current Cell Sizes
The first “suggested… ” values are the suggestions for the size of the pools if you do not change the size of the cells.
I had specified 50% for the 32 byte cell pool. As this cell pool was not used ( 0 allocated cells) then it suggests making this as 1%, so the suggestion is HEAPP(ON,32,1…
You could cut and paste this into you //CEEOPTS statement.
Suggested Cell Sizes
The C run times has a profile of all the sizes of blocks used, and has suggested some better cell sizes. For example as I had no requests for storage less than 32 bytes, making it bigger makes sense. For optimum storage usage, it suggests of using sizes of 56, 280,848,2080,4096 bytes.
Note it does not give suggested number of blocks. I think this is poor design. Because it knows the profile it could have an attempt at specifying the numbers.
If you want to try this definition, you need to add some values such as
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)
Then rerun your program, and see what percentage figures it recommends, update the figures, and test again. Not the easiest way of working.
What to check for and what to set
There can be two heap pools. One for 64 bit storage ( HEAPPOOL64) the other for 31 bit storage (HEAPPOOL).
The default configuration should be “KEEP”, so any storage obtained is kept and not freed. This saves the cost of expensive GETMAINS and FREEMAINs.
If the address space is constrained for storage, the C run time can go round each heap pool and free up segments which are in use.
The value “Number of segments freed” for each heap should be 0. If not, find out why (has the pool been specified incorrectly, or was there a storage shortage).
You can specify how big each pool is
- for HEAPPOOL the HEAP size, and the percentage to be allocated to each pool – so two numbers to change
- for HEAPPOOL64 you specify the size of each pool directly.
The sizes you specify are not that sensitive, as the pools will grow to meet the demand. Allocating one large block is cheaper that allocating 50 smaller blocks – but for a server, this different can be ignored.
With a 4MB heap specified
HEAP(4M,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)
the heap report was
HEAPPOOLS Summary:
Specified Element Extent Cells Per Extents Maximum Cells In
Cell Size Size Percent Extent Allocated Cells Used Use
------------------------------------------------------------------------
56 64 1 655 2 1002 2
280 288 1 145 1 1 0
848 856 1 48 1 1 0
2080 2088 1 20 1 1 1
4096 4104 1 10 0 0 0
------------------------------------------------------------------------
Suggested Percentages for current Cell Sizes:
HEAPP(ON,56,2,280,1,848,1,2080,1,4096,1,0)
With a small(16KB) heap specified
HEAP(16K,32768,ANYWHERE,FREE,8192,4096)
HEAPP(ON,56,1,280,1,848,1,2080,1,4096,1,0)
The output was
HEAPPOOLS Summary:
Specified Element Extent Cells Per Extents Maximum Cells In
Cell Size Size Percent Extent Allocated Cells Used Use
------------------------------------------------------------------------
56 64 1 4 251 1002 2
280 288 1 4 1 1 0
848 856 1 4 1 1 0
2080 2088 1 4 1 1 1
4096 4104 1 4 0 0 0
------------------------------------------------------------------------
Suggested Percentages for current Cell Sizes:
HEAPP(ON,56,90,280,2,848,6,2080,13,4096,1,0)
and we can see it had to allocate 251 extents for all the request.
Once the system has “warmed up” there should not be a major difference in performance. I would allocate the heap to be big enough to start with, and avoid extensions.
With the C run time there are heaps as well as heap pools. My C run time report gave
64bit User HEAP statistics:
31bit User HEAP statistics:
24bit User HEAP statistics:
64bit Library HEAP statistics:
31bit Library HEAP statistics:
24bit Library HEAP statistics:
64bit I/O HEAP statistics:
31bit I/O HEAP statistics:
24bit I/O HEAP statistics:
You should check all of these and make the initial size the same as the suggested recommended size. This way the storage will be allocated at startup, and you avoid problems of a request to expand the heap failing due to lack of storage during a buys period.
Advanced level for heap
While the above discussion is suitable for many workloads, especially if they are single threaded. It can get more complex when there are multiple thread using the heappools.
If you have a “hot” or highly active pool you can get contention when obtaining and releasing blocks from the heap pool. You can define multiple pools for an element size. For example
HEAPP(ON,(56,4),1,280,1,848,1,2080,1,4096,1,0)
The (56,4) says make 4 pools with block size of 56 bytes.
The output has
HEAPPOOLS Summary:
Specified Element Extent Cells Per Extents Maximum Cells In
Cell Size Size Percent Extent Allocated Cells Used Use
------------------------------------------------------------------------
56 64 1 4 251 1002 2
56 64 1 4 0 0 0
56 64 1 4 0 0 0
56 64 1 4 0 0 0
280 288 1 4 1 1 0
848 856 1 4 1 1 0
2080 2088 1 4 1 1 1
4096 4104 1 4 0 0 0
------------------------------------------------------------------------
We can see there are now 4 pools with cell size of 56 bytes. The documentation says Multiple pools are allocated with the same cell size and a portion of the threads are assigned to allocate cells out of each of the pools.
If you have 16 threads you might expect 4 threads to be allocated to each pool.
How do you know if you have a “hot” pool.
You cannot tell from the summary, as you just get the maximum cells used.
In the report is the count of requests for different storage ranges.
Pool 2 size: 160 Get Requests: 777707
Successful Get Heap requests: 81- 88 77934
Successful Get Heap requests: 89- 96 59912
Successful Get Heap requests: 97- 104 47233
Successful Get Heap requests: 105- 112 60263
Successful Get Heap requests: 113- 120 80064
Successful Get Heap requests: 121- 128 302815
Successful Get Heap requests: 129- 136 59762
Successful Get Heap requests: 137- 144 43744
Successful Get Heap requests: 145- 152 17307
Successful Get Heap requests: 153- 160 28673
Pool 3 size: 288 Get Requests: 65642
I used ISPF edit, to process the report.
By extracting the records with size: you get the count of requests per pool.
Pool 1 size: 80 Get Requests: 462187
Pool 2 size: 160 Get Requests: 777707
Pool 3 size: 288 Get Requests: 65642
Pool 4 size: 792 Get Requests: 18293
Pool 5 size: 1520 Get Requests: 23861
Pool 6 size: 2728 Get Requests: 11677
Pool 7 size: 4400 Get Requests: 48943
Pool 8 size: 8360 Get Requests: 18646
Pool 9 size: 14376 Get Requests: 1916
Pool 10 size: 24120 Get Requests: 1961
Pool 11 size: 37880 Get Requests: 4833
Pool 12 size: 65536 Get Requests: 716
Requests greater than the largest cell size: 1652
It might be worth splitting Pool 2 and seeing if makes a difference in CPU usage at peak time. If it has a benefit, try Pool 1.
You can also sort the “Successful Heap requests” count, and see what range has the most requests. I don’t know what you would use this information for, unless you were investigating why so much storage was being used.
Ph D level for heap
For high use application on boxes with many CPUs you can get contention for storage at the hardware cache level.
Before a CPU can use storage, it has to get the 256 byte cache line into the processor cache. If two CPU’s are fighting for storage in the same 256 bytes the throughput goes down.
By specifying
HEAPP(ALIGN….
It ensures each block is isolated in its own cache line. This can lead to an increase in virtual storage, but you should get improved throughput at the high end. It may make very little difference when there is little load, or on an LPAR with few engines.