proxygen: folly::AccessSpreader< Atom > Struct Template Reference

#include <CacheLocality.h>

Classes
class	CpuCache
	Caches the current CPU and refreshes the cache every so often. More...

Public Member Functions
template<>
size_t	current (size_t numStripes)

Static Public Member Functions
static size_t	current (size_t numStripes)

static size_t	cachedCurrent (size_t numStripes)
	Fallback implementation when thread-local storage isn't available. More...

Private Types
enum	{ kMaxCpus = 128 }

typedef uint8_t	CompactStripe

Private Member Functions
template<>
Getcpu::Func	pickGetcpuFunc ()

template<>
Getcpu::Func	pickGetcpuFunc ()

template<>
Getcpu::Func	pickGetcpuFunc ()

template<>
Getcpu::Func	pickGetcpuFunc ()

Static Private Member Functions
static Getcpu::Func	pickGetcpuFunc ()
	Returns the best getcpu implementation for Atom. More...

static int	degenerateGetcpu (unsigned cpu, unsigned node, void *)
	Always claims to be on CPU zero, node zero. More...

static bool	initialize ()

Static Private Attributes
static Getcpu::Func	getcpuFunc

static CompactStripe	widthAndCpuToStripe [kMaxCpus+1][kMaxCpus] = {}

static bool	initialized = AccessSpreader<Atom>::initialize()

Detailed Description

template<template< typename > class Atom = std::atomic>
struct folly::AccessSpreader< Atom >

AccessSpreader arranges access to a striped data structure in such a way that concurrently executing threads are likely to be accessing different stripes. It does NOT guarantee uncontended access. Your underlying algorithm must be thread-safe without spreading, this is merely an optimization. AccessSpreader::current(n) is typically much faster than a cache miss (12 nanos on my dev box, tested fast in both 2.6 and 3.2 kernels).

If available (and not using the deterministic testing implementation) AccessSpreader uses the getcpu system call via VDSO and the precise locality information retrieved from sysfs by CacheLocality. This provides optimal anti-sharing at a fraction of the cost of a cache miss.

When there are not as many stripes as processors, we try to optimally place the cache sharing boundaries. This means that if you have 2 stripes and run on a dual-socket system, your 2 stripes will each get all of the cores from a single socket. If you have 16 stripes on a 16 core system plus hyperthreading (32 cpus), each core will get its own stripe and there will be no cache sharing at all.

AccessSpreader has a fallback mechanism for when __vdso_getcpu can't be loaded, or for use during deterministic testing. Using sched_getcpu or the getcpu syscall would negate the performance advantages of access spreading, so we use a thread-local value and a shared atomic counter to spread access out. On systems lacking both a fast getcpu() and TLS, we hash the thread id to spread accesses.

AccessSpreader is templated on the template type that is used to implement atomics, as a way to instantiate the underlying heuristics differently for production use and deterministic unit testing. See DeterministicScheduler for more. If you aren't using DeterministicScheduler, you can just use the default template parameter all of the time.

Definition at line 228 of file CacheLocality.h.

Member Typedef Documentation

template<template< typename > class Atom = std::atomic>

typedef uint8_t folly::AccessSpreader< Atom >::CompactStripe

private

Definition at line 265 of file CacheLocality.h.

Member Enumeration Documentation

template<template< typename > class Atom = std::atomic>

anonymous enum

private

If there are more cpus than this nothing will crash, but there might be unnecessary sharing

Enumerator
kMaxCpus

Definition at line 263 of file CacheLocality.h.

263 { kMaxCpus = 128 };

folly::AccessSpreader::kMaxCpus

Definition: CacheLocality.h:263

Member Function Documentation

template<template< typename > class Atom = std::atomic>

static size_t folly::AccessSpreader< Atom >::cachedCurrent ( size_t numStripes )

inlinestatic

Fallback implementation when thread-local storage isn't available.

Definition at line 255 of file CacheLocality.h.

Referenced by BENCHMARK(), and folly::AccessSpreader< Atom >::current().

                                                  {
     return current(numStripes);
   }

template<>

size_t folly::AccessSpreader< CachedCurrentTag >::current ( size_t numStripes )

Definition at line 63 of file CacheLocalityBenchmark.cpp.

References folly::AccessSpreader< Atom >::cachedCurrent().

                                                                   {
   return AccessSpreader<std::atomic>::cachedCurrent(numStripes);
 }

template<template< typename > class Atom = std::atomic>

static size_t folly::AccessSpreader< Atom >::current ( size_t numStripes )

inlinestatic

Returns the stripe associated with the current CPU. The returned value will be < numStripes.

Definition at line 231 of file CacheLocality.h.

References current, and min.

Referenced by folly::detail::DigestBuilder< DigestT >::append(), BENCHMARK(), contentionAtWidth(), folly::CoreCachedSharedPtr< T, kNumSlots >::get(), folly::CoreCachedWeakPtr< T, kNumSlots >::get(), folly::AtomicCoreCachedSharedPtr< T, kNumSlots >::get(), and TEST().

                                            {
     // widthAndCpuToStripe[0] will actually work okay (all zeros), but
     // something's wrong with the caller
     assert(numStripes > 0);
 
     unsigned cpu;
     getcpuFunc(&cpu, nullptr, nullptr);
     return widthAndCpuToStripe[std::min(size_t(kMaxCpus), numStripes)]
                               [cpu % kMaxCpus];
   }

template<template< typename > class Atom = std::atomic>

static int folly::AccessSpreader< Atom >::degenerateGetcpu	(	unsigned *	cpu,
		unsigned *	node,
		void *
	)

inlinestaticprivate

Always claims to be on CPU zero, node zero.

Definition at line 321 of file CacheLocality.h.

                                                                     {
     if (cpu != nullptr) {
       *cpu = 0;
     }
     if (node != nullptr) {
       *node = 0;
     }
     return 0;
   }

template<template< typename > class Atom = std::atomic>

static bool folly::AccessSpreader< Atom >::initialize ( )

inlinestaticprivate

Definition at line 344 of file CacheLocality.h.

References Atom, and max.

                            {
     getcpuFunc = pickGetcpuFunc();
 
     auto& cacheLocality = CacheLocality::system<Atom>();
     auto n = cacheLocality.numCpus;
     for (size_t width = 0; width <= kMaxCpus; ++width) {
       auto numStripes = std::max(size_t{1}, width);
       for (size_t cpu = 0; cpu < kMaxCpus && cpu < n; ++cpu) {
         auto index = cacheLocality.localityIndexByCpu[cpu];
         assert(index < n);
         // as index goes from 0..n, post-transform value goes from
         // 0..numStripes
         widthAndCpuToStripe[width][cpu] =
             CompactStripe((index * numStripes) / n);
         assert(widthAndCpuToStripe[width][cpu] < numStripes);
       }
       for (size_t cpu = n; cpu < kMaxCpus; ++cpu) {
         widthAndCpuToStripe[width][cpu] = widthAndCpuToStripe[width][cpu - n];
       }
     }
     return true;
   }

template<>

Getcpu::Func folly::AccessSpreader< ThreadLocalTag >::pickGetcpuFunc ( )

private

Definition at line 50 of file CacheLocalityBenchmark.cpp.

template<>

Getcpu::Func folly::AccessSpreader< PthreadSelfTag >::pickGetcpuFunc ( )

private

Definition at line 54 of file CacheLocalityBenchmark.cpp.

57 {

template<template< typename > class Atom = std::atomic>

static Getcpu::Func folly::AccessSpreader< Atom >::pickGetcpuFunc ( )

inlinestaticprivate

Returns the best getcpu implementation for Atom.

Definition at line 315 of file CacheLocality.h.

References folly::FallbackGetcpu< ThreadId >::getcpu(), and folly::Getcpu::resolveVdsoFunc().

Referenced by folly::test::DeterministicMutex::unlock().

                                      {
     auto best = Getcpu::resolveVdsoFunc();
     return best ? best : &FallbackGetcpuType::getcpu;
   }

template<>

Getcpu::Func folly::AccessSpreader< test::DeterministicAtomic >::pickGetcpuFunc ( )

private

Definition at line 429 of file DeterministicSchedule.cpp.

References folly::test::DeterministicSchedule::getcpu().

                                                                    {
   return &test::DeterministicSchedule::getcpu;
 }

template<>

Getcpu::Func folly::AccessSpreader< test::DeterministicAtomic >::pickGetcpuFunc ( )

private

Member Data Documentation

template<template< typename > class Atom = std::atomic>

Getcpu::Func folly::AccessSpreader< Atom >::getcpuFunc

staticprivate

Initial value:

AccessSpreader<Atom>::degenerateGetcpu

Points to the getcpu-like function we are using to obtain the current cpu. It should not be assumed that the returned cpu value is in range. We use a static for this so that we can prearrange a valid value in the pre-constructed state and avoid the need for a conditional on every subsequent invocation (not normally a big win, but 20% on some inner loops here).

Definition at line 269 of file CacheLocality.h.

Referenced by folly::AccessSpreader< Atom >::CpuCache::cpu().

template<template< typename > class Atom = std::atomic>

bool folly::AccessSpreader< Atom >::initialized = AccessSpreader<Atom>::initialize()

staticprivate

Definition at line 312 of file CacheLocality.h.

template<template< typename > class Atom = std::atomic>

AccessSpreader< Atom >::CompactStripe folly::AccessSpreader< Atom >::widthAndCpuToStripe = {}

staticprivate

For each level of splitting up to kMaxCpus, maps the cpu (mod kMaxCpus) to the stripe. Rather than performing any inequalities or modulo on the actual number of cpus, we just fill in the entire array.

Definition at line 286 of file CacheLocality.h.

The documentation for this struct was generated from the following file:

proxygen/folly/folly/concurrency/CacheLocality.h

Classes

Public Member Functions

Static Public Member Functions

Private Types

Private Member Functions

Static Private Member Functions

Static Private Attributes

Detailed Description

template<template< typename > class Atom = std::atomic> struct folly::AccessSpreader< Atom >

Member Typedef Documentation

Member Enumeration Documentation

Member Function Documentation

Member Data Documentation

template<template< typename > class Atom = std::atomic>
struct folly::AccessSpreader< Atom >