# TCMalloc Overview

TCMalloc is Google's customized implementation of C's `malloc()` and C++'s
`operator new` used for memory allocation within our C and C++ code. This custom
memory allocation framework is an alternative to the one provided by the C
standard library (on Linux usually through `glibc`) and C++ standard library.
TCMalloc is designed to be more efficient at scale than other implementations.

Specifically, TCMalloc provides the following benefits:

*   Performance scales with highly parallel applications.
*   Optimizations brought about with recent C++14 and C++17 standard
    enhancements, and by diverging slightly from the standard where performance
    benefits warrant. (These are noted within the
    [TCMalloc Reference](reference.md).)
*   Extensions to allow performance improvements under certain architectures,
    and additional behavior such as metric gathering.

## TCMalloc Cache Operation Mode

TCMalloc may operate in one of two fashions:

*   (default) per-CPU caching, where TCMalloc maintains memory caches local to
    individual logical cores. Per-CPU caching is enabled when running TCMalloc
    on any Linux kernel that utilizes restartable sequences (RSEQ). Support for
    RSEQ was merged in Linux 4.18.
*   per-thread caching, where TCMalloc maintains memory caches local to each
    application thread. If RSEQ is unavailable, TCMalloc reverts to using this
    legacy behavior.

NOTE: the "TC" in TCMalloc refers to Thread Caching, which was originally a
distinguishing feature of TCMalloc; the name remains as a legacy.

In both cases, these cache implementations allows TCMalloc to avoid requiring
locks for most memory allocations and deallocations.

## TCMalloc Features

TCMalloc provides APIs for dynamic memory allocation: `malloc()` using the C
API, and `::operator new` using the C++ API. TCMalloc, like most allocation
frameworks, manages this memory better than raw memory requests (such as through
`mmap()`) by providing several optimizations:

*   Performs allocations from the operating system by managing
    specifically-sized chunks of memory (called "pages"). Having all of these
    chunks of memory the same size allows TCMalloc to simplify bookkeeping.
*   Devoting separate pages (or runs of pages called "Spans" in TCMalloc) to
    specific object sizes. For example, all 16-byte objects are placed within a
    "Span" specifically allocated for objects of that size. Operations to get or
    release memory in such cases are much simpler.
*   Holding memory in *caches* to speed up access of commonly-used objects.
    Holding such caches even after deallocation also helps avoid costly system
    calls if such memory is later re-allocated.

The cache size can also affect performance. The larger the cache, the less any
given cache will overflow or get exhausted, and therefore require a lock to get
more memory. TCMalloc extensions allow you to modify this cache size, though the
default behavior should be preferred in most cases. For more information,
consult the [TCMalloc Tuning Guide](tuning.md).

Additionally, TCMalloc exposes telemetry about the state of the application's
heap via `MallocExtension`. This can be used for gathering profiles of the live
heap, as well as a snapshot taken near the heap's highwater mark size (a peak
heap profile).

## The TCMalloc API

TCMalloc implements the C and C++ dynamic memory API endpoints from the C11,
C++11, C++14, and C++17 standards.

From C++, this includes

*   The basic `::operator new`, `::operator delete`, and array variant
    functions.
*   C++14's sized `::operator delete`
*   C++17's overaligned `::operator new` and `::operator delete` functions.

Unlike in the standard implementations, TCMalloc does not throw an exception
when allocations fail, but instead crashes directly. Such behavior can be used
as a performance optimization for move constructors not currently marked
`noexcept`; such move operations can be allowed to fail directly due to
allocation failures. In [Abseil](https://abseil.io/docs/cpp/guides/base), these
are enabled with `-DABSL_ALLOCATOR_NOTHROW`.

From C, this includes `malloc`, `calloc`, `realloc`, and `free`.

The TCMalloc API obeys the behavior of C90 DR075 and
[DR445](http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_445)
which states:

> The alignment requirement still applies even if the size is too small for any
> object requiring the given alignment.

In other words, `malloc(1)` returns `alignof(std::max_align_t)`-aligned pointer.
Based on the progress of
[N2293](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm), we may relax
this alignment in the future.

For more complete information, consult the [TCMalloc Reference](reference.md).