Slab Allocator
Introduction
The slab allocator provides the following main features:
- Typed object allocation with minimal waste from internal fragmentation.
- General purpose allocation, similar sized allocations going to one of a select number of allocation pools.
- Garbage Collection. Concurrent GC being investigated.
Object types
Custom object allocation is achieved by creating a type definition. The type definition includes:
- Object size
- Optional GC mark function - Allows optimisation of object tracing, by allowing only pointers to other objects within the object to be marked.
- Optional GC finalize function - When an object is determined to be garbage, if a finalizer is defined, it is called with the object to do any per-object clean up.
Example:
static slab_type_t threads[1] = {SLAB_TYPE(sizeof(thread_t), thread_mark, thread_finalize)};
...
thread_t * thread = slab_calloc(threads);
Slabs
Slab types - slab_type_t
Objects have a type, which indicates the object size and any custom marking and finalization.
The type also keeps track of slabs that contain instances of this object type. The slabs are kept in a linked list, to be keep track of memory allocated to each type.
Generic objects
Generic allocation is done using a pool of sized types, with generic garbage marking and no finalizer. When allocating memory, the size of the allocation is matched up to the closest pool type, and the allocation completed using that pool type.
This is the only generic heap allocation, and thus limits heap allocated data to sizes that fit in a slab (which is limited to a single page.)
Thus, heap allocation is not currently suitable for arbitrarily large data, such as memory used for transfer buffers to/from block devices.
Slabs
Slabs are allocated on demand, when the type being allocated has no more free slabs. A slab is a single page of kernel heap data, and consists of:
- Header
- Pointer to the slab type information.
- Linked list to next/prev slab of this type.
- Pointer to the first free slot.
- Slot data - slots are the unit of allocation
Slots
Slots consist of:
- Slot header
- Garbage collection colour.
- Linked list to next/prev slot in the garbage collection colour list.
- Optional debug information, such as code location the slot was allocated from.
- Object data
Once the allocator has located a free slot, the object is removed from the free list, put onto the correct garbage collector list (currently white list, but could be grey in the case of concurrent GC), and a pointer to the object data returned as a result of the allocation.
Garbage collection
The garbage collector is currently a simple mark and sweep collector, with tri-colour marking of reachable heap objects from a set of root objects. The root objects comprise:
- Global objects annotated with GCROOT.
- CPU registers.
- Thread stack space.
Objects are one of three colours:
- White - Unmarked memory and hence also unscanned.
- Grey - Marked as in use, but not yet scanned.
- Black - Marked and scanned.
The garbage collector will not visit black objects again, but a black object may be made grey again as a result of concurrent updates to the object, necessitating rescanning. This would be to support incremental or concurrent garbage collection, which are currently not supported.
The collector is a conservative collector, interpreting root objects as pointers, and marking any pointers that point into the heap space. The result being that a non-pointer value that happens to point into the heap will result in the heap spaces "pointed" to by that non-pointer being marked as reachable and not garbage. While this may result in some leakage, in a higher half kernel, it is unlikely that non-pointer values will appear to be valid pointers to heap area.
Garbage collector algorithm
When it is determined that garbage collection is required, garbage collection operates using the following algorithms.
Garbage collection start
The garbage collector starts by stopping the world (this is a no-op for single CPU system), doing a little housekeeping (getting a starting timestamp, clearing memory stats) then queuing the garbage collection global roots for scanning (global memory annotated with GCROOT).
Garbage collection marking
This is the meat of the garbage collection. While there is memory queued to be scanned (as done when the GCROOT data is queued) or there are grey objects to be scanned, we do a garbage collection "step".
The top level marking algorithm is thus:
void slab_gc()
{
while(gclevel || grey) {
slab_gc_step();
}
}
Garbage collection step
Each step of garbage collection comprises the steps:
- While there is non-heap data to be scanned (indicated by
gclevel>0
), scan it. - While there are grey objects, and no non-heap data to be scanned (indicated by
gclevel==0
), scan the first grey object, and then it to the black list.
Partitioning in this way means the queue of non-heap data to be scanned is bounded by the number of non-heap memory ranges a custom object marking routine may generate.
For the generic marking, which looks only for pointers into the heap, marking will only ever generate new grey objects, which are queued on the end of the grey list.
As an example of a custom marker generating non-heap data to be scanned, consider the thread structure marking, which will queue non-heap memory for the thread kernel stack for marking (from the logical top (stack pointer) to the logical bottom of the stack.)
Thus, arbitrarily nested and complex data can be safely scanned, without fear of overflowing the kernel stack (in the case of recursive marking) or exceeding the bounds of a scan queue (in the case of a single scan queue), both of which were strategies used to track objects to be scanned, and caused failure of the garbage collection, in previous versions.
Garbage collection end
Eventually, once all live data has been scanned, no more grey non-heap or grey objects will be generated, marking will be finished.
At this point, we'll have two lists of heap objects:
- White list - these objects were never marked, and were therefore not reachable from the garbage collection roots, and contains all our garbage heap data.
- Black list - these objects were marked and scanned.
At this point, we need to stop the world again, and:
- Move the white list to the garbage list for sweeping.
- Move the black list back to the white list.
At this point, the mark phase of garbage collection is finished.
Garbage collection incremental sweep
The garbage heap data on the garbage list is by definition not referenced. We can therefore allow the kernel to continue its operation, and sweep the garbage list when required or convenient, thus reducing the synchronous impact of garbage collection.
Garbage is swept incrementally when:
- We are allocating a new object, and existing slab(s) are empty.
- We are otherwise idle, and therefore have no better work to do.
Incremental sweep on allocation
When allocating a new object, we search for free objects in existing slabs. If there are no free objects, we invoke an incremental sweep until we sweep an object of the type we're looking for, then retry the allocation.
On the assumption that the garbage list will be an indeterminate order of objects, we will likely only sweep a subset of the garbage objects until we find garbage object of the type we're looking for.
Incremental sweep in idle
In the idle thread, we sweep all remaining garbage data. This sweeping of garbage in idle time, when there is no other useful work to do anyway, effectively makes the sweeping of garbage data free. Along with idle time garbage marking, garbage collection becomes a cheap option for automatic memory management.
Security
Importantly, user provided data would not be scanned for pointers by the garbage collector, so it is impossible for user space data to poison the garbage collector and enact a denial-of-service attack on kernel heap memory.