proxygen
|
libevent is an excellent cross-platform eventing library. Folly's async provides C++ object wrappers for fd callbacks and event_base, as well as providing implementations for many common types of fd uses.
The main libevent / epoll loop. Generally there is a single EventBase per thread, and once started, nothing else happens on the thread except fd callbacks. For example:
EventBase has built-in support for message passing between threads. To send a function to be run in the EventBase thread, use runInEventBaseThread().
There are various ways to run the loop. EventBase::loop() will return when there are no more registered events. EventBase::loopForever() will loop until EventBase::terminateLoopSoon() is called. EventBase::loopOnce() will only call epoll() a single time.
Other useful methods include EventBase::runAfterDelay() to run events after some delay, and EventBase::setMaxLatency(latency, callback) to run some callback if the loop is running very slowly, i.e., there are too many events in this loop, and some code should probably be running in different threads.
EventBase always calls all callbacks inline - that is, there is no explicit or implicit queuing. The specific implications of this are:
EventHandler is the object wrapper for fd's. Any class you wish to receive callbacks on will inherit from EventHandler. registerHandler(EventType)
will register to receive events of a specific type.
Currently supported event types:
Unsupported libevent event types, and why-
EV_ET - Currently all the implementations of EventHandler are set up for level triggered. Benchmarking hasn't shown that edge triggered provides much improvement.
Edge-triggered in this context means that libevent will provide only a single callback when an event becomes active, as opposed to level-triggered where as long as there is still data to read/write, the event will continually fire each time event_wait is called. Edge-triggered adds extra code complexity, since the library would need to maintain a similar list of active FDs that libevent currently does between edge triggering events. The only advantage of edge-triggered is that you can use EPOLLONESHOT to ensure the event only gets called on a single event_base - but in this library, we assume each event is only registered on a single thread anyway.
A nonblocking socket implementation. Writes are queued and written asynchronously, even before connect() is successful. The read api consists of two methods: getReadBuffer() and readDataAvailable(). When the READ event is signaled, libevent has no way of knowing how much data is available to read. In some systems (linux), we could make another syscall to get the data size in the kernel read buffer, but syscalls are slow. Instead, most users will just want to provide a fixed size buffer in getReadBuffer(), probably using the IOBufQueue in folly/io. readDataAvailable() will then describe exactly how much data was read.
AsyncSocket provides send timeouts, but not read timeouts - generally read timeouts are application specific, and should use an AsyncTimer implementation below.
Various notes:
Similar to AsyncSocket, but uses openssl. Provides an additional HandshakeCallback to check the server's certificates.
TODO: Currently in fbthrift.
A socket that reads/writes UDP packets. Since there is little state to maintain, this is much simpler than AsyncSocket.
A listen()ing socket that accept()s fds, and passes them to other event bases.
The general pattern is:
Generally there is a single accept() thread, and multiple AcceptCallback objects. The Acceptee objects then will manage the individual AsyncSockets. While AsyncSockets can be moved between event bases, most users just tie them to a single event base to get better cache locallity, and to avoid locking.
Multiple ServerSockets can be made, but currently the linux kernel has a lock on accept()ing from a port, preventing more than ~20k accepts / sec. There are various workarounds (SO_REUSEPORT), but generally clients should be using connection pooling instead when possible.
Since AsyncServerSocket provides an fd, an AsyncSSLSocket or AsyncSocket can be made using the same codepath
Similar to AsyncServerSocket, but for UDP messages - messages are read() on a single thread, and then fanned out to multiple worker threads.
NotificationQueue is used to send messages between threads in the same process. It is what backs EventBase::runInEventBaseThread(), so it is unlikely you'd want to use it directly instead of using runInEventBaseThread().
An eventFD (for kernels > 2.6.30) or pipe (older kernels) are added to the EventBase loop to wake up threads receiving messages. The queue itself is a spinlock-guarded list. Since we are almost always talking about a single sender thread and a single receiver (although the code works just fine for multiple producers and multiple consumers), the spinlock is almost always uncontended, and we haven't seen any perf issues with it in practice.
The eventfd or pipe is only notified if the thread isn't already awake, to avoid syscalls. A naive implementaiton that does one write per message in the queue, or worse, writes the whole message to the queue, would be significantly slower.
If you need to send messages between processes, you would have to write the whole message to the pipe, and manage the pipe size. See AsyncPipe.
An individual timeout callback that can be installed in the event loop. For code cleanliness and clarity, timeouts are separated from sockets. There is one fd used per AsyncTimeout. This is a pretty serious restriction, so the two below subclasses were made to support multiple timeouts using a single fd.
Implementation of a hashed hierarchical wheel timer. Any timeout time can be used, with O(1) insertion, deletion, and callback time. The wheel itself takes up some amount of space, and wheel timers have to have a constant tick, consuming a constant amount of CPU.
An alternative to a wheel timer would be a heap of callbacks sorted by timeout time, but would change the big-O to O(log n). In our experience, the average server has thousands to hundreds of thousands of open sockets, and the common case is to add and remove timeouts without them ever firing, assuming the server is able to keep up with the load. Therefore O(log n) insertion time overshadows the extra CPU consumed by a wheel timer tick.
NOTE: currently in proxygen codebase.
If we assume that all timeouts scheduled use the same timeout time, we can keep O(1) insertion time: just schedule the new timeout at the tail of the list, along with the time it was actually added. When the current timeout fires, we look at the new head of the list, and schedule AsyncTimeout to fire at the difference between the current time and the scheduled time (which probably isn't the same as the timeout time.)
This requires all AsyncTimeoutSets timeouts to have the same timeout time though, which in practice means many AsyncTimeoutSets are needed per application. Using HHWheelTimer instead can clean up the code quite a bit, because only a single HHWheelTimer is needed per thread, as opposed to one AsyncTimeoutSet per timeout time per thread.
Used to handle AsyncSignals. Similar to AsyncTimeout, for code clarity, we don't reuse the same fd as a socket to receive signals.
Async reads/writes to a unix pipe, to send data between processes.
Since messages are frequently passed between threads with runInEventBaseThread(), ThreadLocals don't work for messages. Instead, RequestContext can be used, which is saved/restored between threads. Major uses for this include:
In this library only runInEventBaseThread save/restores the request context, although other Facebook libraries that pass requests between threads do also: folly::future, and fbthrift::ThreadManager, etc
Since EventBase callbacks already have the EventHandler and EventBase on the stack, calling delete
on either of these objects would most likely result in a segfault. Instead, these objects inherit from DelayedDestruction, which provides reference counting in the callbacks. Instead of delete, destroy()
is called, which notifies that is ready to be destroyed. In each of the callbacks there is a DestructorGuard, which prevents destruction until all the Guards are gone from the stack, when the actual delete method is called.
DelayedDestruction can be a painful to use, since shared_ptrs and unique_ptrs need to have a special DelayedDestruction destructor type. It's also pretty easy to forget to add a DestructorGuard in code that calls callbacks. But it is well worth it to avoid queuing callbacks, and the improved P99 times as a result.
Often for an object requesting callbacks from other components (timer, socket connect, etc.) there is a chance that the requestor will be deallocated before it'll receive the callback. One of the ways to avoid dereferencing the deallocated object from callbacks is to derive the object from DelayedDestruction, and add a delayed destruction guard to the callback context. In case if keeping the object around until all the requested callbacks fire is too expensive, or if the callback requestor can't have private destructor (it's allocated on the stack, or as a member of a larger object), DestructorCheck can be used. DestructorCheck is not affecting object life time. It helps other component to detect safely that the tracked object was deallocated.
The object requesting the callback must be derived from DestructorCheck. The callback context should contain an instance of DestructorCheck::Safety object initialized with a reference to the object requesting the callback. Safety object can be captured by value in the callback lambda, or explicitly added to a predefined callback context class. Multiple instances of Safety object can be instantiated for the same tracked object. Once the callback is invoked, before dereferencing the requester object, callback code should make sure that destroyed()
method for the corresponding Safety object returns false.
DANGEROUS.
Since there is ususally only a single EventBase per thread, why not make EventBase managed by a threadlocal? Sounds easy! But there are several catches:
EventBaseManager::get()->getEventBase()
may not actually be running.A much safer option is to explicitly pass around an EventBase, or use an explicit pool of EventBases.
SSL helper routines to load / verify certs. Used with AsyncSSLSocket.
Facebook has a lot of experience running services. For background reading, see The C10k problem and Fast UNIX servers
Some best practices we've found:
perf sched
tool.