4 Improved thread local storage for non-trivial types.
6 * ~4x faster than `boost::thread_specific_ptr`.
7 * Similar speed as using `pthread_getspecific` directly, but only consumes a
8 single `pthread_key_t` per `Tag` template param.
9 * Expands on the `thread_specific_ptr` API with `accessAllThreads` and extended
10 custom deleter support.
16 The API of `ThreadLocalPtr` is very close to `boost::thread_specific_ptr` with
17 the notable addition of the `accessAllThreads` method. There is also a
18 `ThreadLocal` class which is a thin wrapper around `ThreadLocalPtr` that manages
19 allocation automatically (creates a new object the first time it is dereferenced
22 `ThreadLocalPtr` simply gives you a place to put and access a pointer local to
23 each thread such that it will be destroyed appropriately.
27 folly::ThreadLocalPtr<Widget> w;
28 w.reset(new Widget(0), Widget::customDeleterA);
30 w.reset(new Widget(1), Widget::customDeleterB);
31 w.get()->mangleWidget();
32 } // Widget(1) is destroyed with customDeleterB
33 } // Widget(0) is destroyed with customDeleterA
36 Note that `customDeleterB` will get called with
37 `TLPDestructionMode::THIS_THREAD` and `customerDeleterA` will get called with
38 `TLPDestructionMode::ALL_THREADS`. This is to distinguish between thread exit
39 vs. the entire `ThreadLocalPtr` getting destroyed, in which case there is
40 cleanup work that may be avoided.
42 The `accessAllThreads` interface is provided to walk all the thread local child
43 objects of a parent. `accessAllThreads` initializes an accessor
44 which holds a global lock that blocks all creation and destruction of
45 `ThreadLocal` objects with the same `Tag` and can be used as an iterable
46 container. Typical use is for frequent write, infrequent read data access
47 patterns such as counters. Note that you must specify a unique Tag type so you
48 don't block other ThreadLocal object usage, and you should try to minimize the
49 lifetime of the accessor so the lock is held for as short as possible).
51 The following example is a simplification of `folly/ThreadCachedInt.h`. It
52 keeps track of a counter value and allows multiple threads to add to the count
53 without synchronization. In order to get the total count, `read()` iterates
54 through all the thread local values via `accessAllThreads()` and sums them up.
55 `class NewTag` is used to break the global mutex so that this class won't block
56 other `ThreadLocal` usage when `read()` is called.
58 Note that `read()` holds the global mutex which blocks construction,
59 destruction, and `read()` for other `SimpleThreadCachedInt`'s, but does not
60 block `add()`. Also, since it uses the unique `NewTag`, `SimpleThreadCachedInt`
61 does not affect other `ThreadLocal` usage.
64 class SimpleThreadCachedInt {
66 class NewTag; // Segments the global mutex
67 ThreadLocal<int,NewTag> val_;
71 *val_ += val; // operator*() gives a reference to the thread local instance
76 // accessAllThreads acquires the global lock
77 for (const auto& i : val_.accessAllThreads()) {
79 } // Global lock is released on scope exit
89 We keep a `__thread` array of pointers to objects (`ThreadEntry::elements`)
90 where each array has an index for each unique instance of the `ThreadLocalPtr`
91 object. Each `ThreadLocalPtr` object has a unique id that is an index into
92 these arrays so we can fetch the correct object from thread local storage
95 In order to prevent unbounded growth of the id space and thus huge
96 `ThreadEntry::elements` arrays, for example due to continuous creation and
97 destruction of `ThreadLocalPtr` objects, we keep track of all active instances
98 by linking them together into a list. When an instance is destroyed we remove
99 it from the chain and insert the id into `freeIds_` for reuse. These operations
100 require a global mutex, but only happen at construction and destruction time.
101 `accessAllThreads` also acquires this global mutex.
103 We use a single global `pthread_key_t` per `Tag` to manage object destruction
104 and memory cleanup upon thread exit because there is a finite number of
105 `pthread_key_t`'s available per machine.