Modern C++: Writing a thread-safe Queue

The STL provides a high-performance queue class in std::queue<T>; however, the API is not very intuitive, or easy, to use and it is not safe to access from multiple threads.
In this article, we will show how to wrap it in a more convenient, and thread-safe, API.

The C++ Standard Template Library (STL) offers a template-class std::queue<T> that implements Queue semantics: the usual push/pop/empty methods to implement a FIFO queue for elements of type T.

The class also offers the ability to specify the underlying Container to use for storage at construction (std::deque<T> is used by default).

The requirements on the T class of queued elements are pretty light, the most obvious being that they must be “copyable” (although, if the class to be contained cannot be copied, one can use either references of pointers – with the usual caveat that it becomes a bit trickier to ensure they don’t go out of scope while still being referenced to from the queue).

The design (and implementation) of the API is meant for performance and small memory footprint; however, it makes for a somewhat awkward usage; most surprising is that pop() will not return the T element at the front of the queue, if any.

One has to use front() to actually get it, then pop() to remove; further, if there is no such element (i.e., the queue is empty) calling pop() may result in undefined behavior, depending from the underlying Container: in fact, the default std::deque will cause such undefined behavior:

Calling front on an empty container is undefined.

Further, access to the elements of the queue is not thread-safe; so implementing the classical Producer/Consumer pattern, with multiple “worker” threads would result in chaos and almost certainly undefined behavior (that would actually be “best case scenario”: at least one would know that things have gone horribly off the rails; the alternative, silent incorrect computation, is way worse – especially when it comes to distributed systems in Production).

This brief technical note shows first how to build a simple, yet effective, single-threaded Queue<T> class, with possibly slightly less efficient performance, but more intuitive API; and then how to extend it so that it is thread-safe.

Source code

All the source is available on BitBucket and can be cloned on your machine via:

git clone git@bitbucket.org:marco/samples.git

All my code uses CMake to build, and I warmly recommend the use of clang++ on both Linux and MacOS; however the code does not use any external library or facility, and should compile/run anywhere, so long as your compiler supports C++17.

A better queue

The first iteration of the Queue<T> class is available at the thread-unsafe tag:

git checkout thread-unsafe

A simplified listing for the class is shown here:

template<typename T>
class Queue {
  std::queue<T> queue_;

 public:
  Queue() = default;
  Queue(const Queue<T> &) = delete ;
  Queue& operator=(const Queue<T> &) = delete ;

  Queue(Queue<T>&& other) {
    queue_ = std::move(other.queue_);
  }

  virtual ~Queue() {  }

  bool empty() const { return queue_.empty(); }
  unsigned long size() const { return queue_.size(); }

  std::optional<T> pop() {
    if (queue_.empty()) {
      return {};
    }
    T tmp = queue_.front();
    queue_.pop();
    return tmp;
  }

  void push(const T &item) {
    queue_.push(item);
  }

  std::optional<T> peek() {
    if (queue_.empty()) {
      return {};
    }
    return queue_.front();
  }
};

As you can see, the copy and assignment constructors have been deleted, and we have implemented the move constructor, as it may be useful for certain use cases, when it is necessary to pass around the Queue’s ownership.

In this implementation, pop() does return the element at the front of the queue, if any, or an empty optional otherwise.

The std::optional<T> class has been added to C++17 (at long last, one might add) and finally brings to C++ a concept that has been part of Scala since its inception, and Java since JDK 8 (although a bit clumsy in its first iteration, and we had to wait until JDK 10 for a better API).

An optional value is a “better alternative” to returning a nullptr (for pointer types) or a sentinel value for value types (how many types did you return -1 to signal “eeehh, I don’t know”): it essentially signals to the caller that what they asked for is not there, and that is neither an error nor an aberration.

An optional<T> behaves as expected when converted to a boolean in an if clause, and uses the * indirection operator to dereference its value (if any); the typical pattern is:

Queue<MyData> queue;
// use the queue...

auto elem_maybe = queue.pop();
if (elem_maybe) {
  MyData elem = *elem_maybe;
  // use elem...
}

When used by a single thread (and, beware, this class is not thread-safe), one can also “guard” via the use of empty() and bypass the check:

if (!queue.empty()) {
  // We know that the optional is non-empty here
  auto elem = *queue.pop();
}

It no work so good

As far as Queues go, and provided this is used by only one thread at a time, this is a decent implementation of a Queue, and will work just fine, with only limited overhead with respect to the (better, but slightly awkward) std::queue.

However, when used by multiple threads (as demonstrated by the main() method in the sample code), weird and wonderful things will happen, none of them good.

By tweaking the values in the sleep duration calls, one can go from perfectly functioning code, to slightly baffling results (such as only 29 results being processed) to outright segfaults.

A thread-safe Queue

Given that the most common use case for a Queue is to decouple worker threads in the Producer/Consumer design pattern, it is worth considering how to improve the design of the Queue<T> class so that multi-threaded clients can use it.

The code here is tagged at thread-safe:

git checkout thread-safe

Interestingly enough, we can leave the API almost unchanged, and we only need to add a std::mutex to guard access to the “wrapped” std::queue:

template<typename T>
class ThreadsafeQueue {
  std::queue<T> queue_;
  mutable std::mutex mutex_;

  // Moved out of public interface to prevent races between this
  // and pop().
  bool empty() const {
    return queue_.empty();
  }

 public:
  ThreadsafeQueue() = default;
  ThreadsafeQueue(const ThreadsafeQueue<T> &) = delete ;
  ThreadsafeQueue& operator=(const ThreadsafeQueue<T> &) = delete ;

  ThreadsafeQueue(ThreadsafeQueue<T>&& other) {
    std::lock_guard<std::mutex> lock(mutex_);
    queue_ = std::move(other.queue_);
  }

  virtual ~ThreadsafeQueue() { }

  unsigned long size() const {
    std::lock_guard<std::mutex> lock(mutex_);
    return queue_.size();
  }

  std::optional<T> pop() {
    std::lock_guard<std::mutex> lock(mutex_);
    if (queue_.empty()) {
      return {};
    }
    T tmp = queue_.front();
    queue_.pop();
    return tmp;
  }

  void push(const T &item) {
    std::lock_guard<std::mutex> lock(mutex_);
    queue_.push(item);
  }
};

The most obvious change is the addition of a std::mutex to guard all accesses to the underlying std::queue, using the std:lock_guard pattern, to ensure that the lock is released by the holding thread under all possible exit scenarios (including exception being raised).

A more subtle API change is the removal of the empty() method from public access: now clients can no longer query the Queue about its state, but can only infer its “emptiness state” by attempting to pop an element and obtaining an empty optional.

The reason for this is that under no circumstances the Queue can guarantee that matters won’t change between the time the client queries empty() and the time pop() is called, making the point entirely moot, and this code a potential source of intermittent (read: hard to pin) bugs:

if (!queue.empty()) {
  // What could possibly go wrong? A lot, it turns out.
  auto elem = *queue.pop();
}

Obviously, there is a perfectly legitimate use case for the use of empty() as a simple “informational” (and, necessarily, ephemeral) assessment of the state of the Queue, but personally I just prefer to take the gun away from the child, as it would require (a) adding in large, bold font a warning to the user and (b) relying on the user to actually heed that warning.

This is a more general approach to API design that holds, loosely: “Make interfaces easy to use right, and hard to use wrong” (originally attributed to Martin Fowler).

About Locking and Performance

There is a never-ending debate as to whether locks (and multi-threading in general) are pure evil and should be avoided at all costs; in particular, many folks feel that the gain to be had by using multi-core architectures is usually lost to the complexity and overhead to manage context switching, and locks acquisition (and release).

The answer, as so many things in distributed systems, is “it depends”; and, most certainly it is foolish (if not outright fraudulent) to make statements about performance and optimizations without hard data and objective measurements; especially in view of the overall system’s stated goals and priorities.

Be that as it may, if you believe that adopting a multi-threaded architecture would be beneficial to your system, and you have measured that using mutually exclusive locks (that’s actually what “mutex” stands for) does not negatively impact performance, this is a relatively efficient and simple-to-use implementation of a thread-safe queue.


2 thoughts on “Modern C++: Writing a thread-safe Queue

  1. Thank you so much sir, What could be a good resource for starting with Mordern C++/ Distributed Systems for HFT’s

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s