TWCOS Kernel

Heap2
Login

Heap2 Allocator

Introduction

The heap2 allocator provides the following main features:

Object types

Custom object allocation is achieved by creating a type definition. The type definition includes:

Example:

static slab_type_t threads[1] = {SLAB_TYPE(sizeof(thread_t), thread_mark, thread_finalize)}; ... thread_t * thread = gc_alloc(threads);

Free list

Free objects of each type are maintained in a list per object type, allowing simple and quick reuse of existing free memory areas for each type.

If a free list for a type is empty, if there is garbage to be swept, then we sweep the garbage list until we've swept an object that will satisfy this allocation. If not such object is found, the heap extended if possible using the bump allocator to allocate space for the new object.

If the heap is exhausted, GC must happen immediately, before we retry the allocation.

If allocation continues to fail after GC, we throw an out of memory exception, and allocation (and probably the operation requiring the allocated memory) will fail.

The allocator will never return NULL.

Garbage collection

The garbage collector is currently a simple mark and sweep collector, with tri-colour marking of reachable heap objects from a set of root objects. The root objects comprise:

Objects are one of three colours:

The garbage collector will not visit black objects again, but a black object may be made grey again as a result of concurrent updates to the object, necessitating rescanning. This would be to support incremental or concurrent garbage collection, which are currently not supported.

The collector is a conservative collector, interpreting root objects as pointers, and marking any pointers that point into the heap space. The result being that a non-pointer value that happens to point into the heap will result in the heap spaces "pointed" to by that non-pointer being marked as reachable and not garbage. While this may result in some leakage, in a higher half kernel, it is unlikely that non-pointer values will appear to be valid pointers to heap area.

Garbage collector algorithm

When it is determined that garbage collection is required, garbage collection operates using the following algorithms.

Garbage collection start

The garbage collector starts by stopping the world (this is a no-op for single CPU system), doing a little housekeeping (getting a starting timestamp, clearing memory stats) then queuing the garbage collection global roots for scanning (global memory annotated with GCROOT).

Garbage collection marking

This is the meat of the garbage collection. While there is memory queued to be scanned (on a marking stack, as done when the GCROOT data is queued) or there are grey objects to be scanned, we do a garbage collection "step".

The top level marking algorithm is thus:

void slab_gc() { while(gclevel || grey) { slab_gc_step(); } }

Garbage collection step

Each step of garbage collection comprises the steps:

``` static void slabgcstep() { /* Mark non-heap items */ while(gclevel) { ... mark stack processing ... gclevel--; }

    /* Mark heap items */
    while(grey && !gclevel) {
            ... grey object processing ...
    }

}

```

Partitioning in this way means the queue of non-heap data to be scanned is bounded by the number of non-heap memory ranges a custom object marking routine may generate.

For the generic marking, which looks only for pointers into the heap, marking will only ever generate new grey objects, which are queued on the end of the grey list.

As an example of a custom marker generating non-heap data to be scanned, consider the thread structure marking, which will queue non-heap memory for the thread kernel stack for marking (from the logical top (stack pointer) to the logical bottom of the stack.)

Thus, arbitrarily nested and complex data can be safely scanned, without fear of overflowing the kernel stack (in the case of recursive marking) or exceeding the bounds of a scan queue (in the case of a single scan queue), both of which were strategies used to track objects to be scanned, and caused failure of the garbage collection, in previous versions.

Garbage collection end

Eventually, once all live data has been scanned, no more mark stack non-heap or grey objects will be generated, and marking will be finished.

At this point, we'll have two lists of heap objects:

At this point, we need to stop the world again, and: - Move the white list to the garbage list for deferred sweeping. - Move the black list back to the white list (this is contingent on full GC - generational GC may leave tenured objects on the black queue.).

At this point, the mark phase of garbage collection is finished.

Garbage collection incremental sweep

The garbage heap data on the garbage list is by definition not referenced. We can therefore allow the kernel to continue its operation, and sweep the garbage list when required or convenient, thus reducing the synchronous impact of garbage collection.

Garbage is swept incrementally when:

Incremental sweep on allocation

When allocating a new object, we search for free objects in existing slabs. If there are no free objects, we invoke an incremental sweep until we sweep an object of the type we're looking for, then retry the allocation.

On the assumption that the garbage list will be an indeterminate order of objects, we will likely only sweep a subset of the garbage objects until we find garbage object of the type we're looking for.

Incremental sweep in idle

In the idle thread, we sweep all remaining garbage data. This sweeping of garbage in idle time, when there is no other useful work to do anyway, effectively makes this sweeping of garbage data free. Along with idle time garbage marking, garbage collection becomes a cheap option for automatic memory management.

Security

Importantly, user provided data would not be scanned for pointers by the garbage collector, so it is impossible for user space data to poison the garbage collector and enact a denial-of-service attack on kernel heap memory.