The Currency of Concurrency (Part 1)

I hear it all the time: Ruby doesn’t scale, it’s not concurrent and you would be foolish to use it in a high-volume system. It all started when Twitter migrated their Rails back-end to Scala in the late 2000s, back when scaling Ruby was a four-letter word. A few years later, LinkedIn made a similar move when they migrated their mobile site from Rails to Node.js for similar reasons. More recently, José Valim, a core member of Rails, created Elixir (Ruby flavored Erlang) because he was having a hard time scaling Rails in production. The future looked bleak back then for Ruby, at least as far as concurrency goes. But is it really?

Back when I wasn’t using Ruby, I couldn’t care less. But now that I use it every day at MYOB, I thought maybe it was time to do some digging to find out if there’s any truth to these claims. So I set out on a quest to explore the dark art of Ruby concurrency. I figured, if I was going to be using a stack for the foreseeable future, I might as well understand it better so I could make more effective use of it.

In this post, we’ll look at why concurrency matters, how concurrency works in Ruby and what contributes to people’s perception that Ruby doesn’t scale. In the next posts we’ll look at how Ruby has come a long way since, the plethora of choices when scaling a Rack-based system, and how Ruby stacks up against the likes of Erlang/Elixir, Haskell, Go, Node.js and .NET.

Disclaimer: I’m relatively new to Ruby, and this is my humble attempt at understanding the underpinnings of Ruby concurrency. While there’s some literature out there on this topic, my observation is it’s scattered all over the place. That motivated me to organize my research in a format I find palatable which ultimately culminates in this blog series. That said, I welcome constructive feedback, errata and addenda, so feel free to go crazy (but not too crazy) on the comments section.

Why should I care?

We’ve seen it time and again. The website is snappy first thing in the morning, but as the day progresses it slowly grinds to a halt. When we log on to the system dashboard and look at the CPU and memory utilization, it’s actually not that bad. No wonder the autoscaling hasn’t kicked in. So what’s the deal here, we wonder.

The answer is, it’s complicated. Be that as it may, from my experience one of the most common culprits is blocking I/O. As software becomes more distributed, we like to go medieval on the little suckers we like to call microservices. For every request, just makes 5 calls to services A to E, what could possibly go wrong? Except now we end up with lots of blocking requests and atrocious response times. Not to mention our customers aren’t happy campers.

To get around this, many systems resort to complex and costly solutions:

  • Asynchronous messaging using a queue or service bus. While there are perfectly valid reasons for using asynchronous messaging, using it purely to reduce blocking I/O is probably not worth the complexity
  • Spinning up new servers to handle more requests even though the existing ones are not fully utilized but are merely blocked
  • Caching dynamic data to increase throughput, only to deal with the never ending rabbit hole that is cache invalidation.

By understanding what better options are available at our disposal, we can:

  • Reduce the complexity of our software and deliver it faster by having less moving parts
  • Avoid maintenance headaches with a simpler architecture
  • Bring down hosting costs by making better use of system resources

What’s more, we’d like nothing better than to provide a good experience for our customers, and reducing latency is a big part of that. In 1982, a couple of IBM engineers penned a seminal paper entitled “The Economic Value of Rapid Response Time” where they introduced what’s now known as the Doherty Threshold. In their research, they found that if a system takes 100ms or less to respond to a request, the user will stay engaged and use it more. As a result, they become more productive. More recently, Amazon and Google came to a similar conclusion - every 100ms in latency cost them 1% in sales. In other words, the higher the latency, the higher the opportunity cost.

MRI, Rails and the gold standard

Before I delved deeper into the different models of concurrency in Ruby, I wanted to first learn some basic principles. That entails understanding how multithreading works in Ruby and the history behind it. Here’s what I learned.

Historically, when someone says Ruby doesn’t scale, they’re probably referring to the thread-safety of MRI and Rails.

MRI, the reference and most widely-used implementation of Ruby, hasn’t always supported native threads. Prior to version 1.9, MRI only supported green threads which are essentially lightweight threads that run in user space. The problem with these little green suckers is they still run on a single native thread, which means you can’t really run things in parallel with them.

In 2008, MRI 1.9 was released with support for native threads. Everything was fluffy kittens and rainbow unicorns. That was until Ruby developers learned that v1.9 also implements what’s known as a global interpreter lock (or GIL, not to be confused with the Final Fantasy currency of the same name). Its purpose is to ensure that only a single thread is running at a time for every process, not unlike the Python GIL and V8 isolates. This is often cited as a major limitation in scaling Ruby as it still doesn’t enable “true parallelism”. What do you mean we have native threads but can’t run them in parallel, homeboy?

Having said that, some people think it’s actually not a bad idea.

  • It greatly simplifies implementation since developers seldom need to worry about thread-safety.
  • Single-threaded code can be faster as there’s less overhead of thread creation (assuming a dynamic thread pool), which leads to a reduced memory footprint, locking and context switching.
  • It improves compatibility with C libraries that aren’t thread-safe.

On the other hand, other implementations of Ruby like JRuby and Rubinius are unencumbered by a GIL, which is why many consider them to enable “true parallelism”.

The GIL wasn’t the only issue, the chequered history of Rails thread-safety also contributes to the prevailing perception that Ruby scalability is an abomination. If MRI is the gold standard of Ruby, Rails is by far the most popular Rack-compliant web framework in Rubyland, so much so that many developers confuse Rails with Ruby. Before 2008, Rails concurrency was an oxymoron - every request was wrapped in a giant mutex because Rails and many libraries weren’t thread-safe. But this changed in 2008 with the advent of Rails 2.2 which was certified thread-safe. Unfortunately, many widely-used libraries like the MySQL (the default database) driver and Mongrel (the most popular web server) were still not thread-safe, thereby perpetuating the myth that Ruby doesn’t scale.

Of threads and callback hell

Even with a GIL, it’s important to note that MRI merely gives the illusion of thread-safety. You still can’t count on your code to be truly thread-safe as long as you use native threads. It doesn’t help that native thread constructs are not easy to work with, and that leads to error-prone software riddled with race conditions and flakiness, not to mention unhappy spouses and kids.

To get around this, some developers came up with high-level concurrency libraries inspired by models found in other languages. For instance, libraries like EventMachine, Celluloid and concurrent-ruby provide high-level concurrency patterns like evented IO, future, promise, actor and channel that hide the underlying multithreading implementations. These idioms make concurrent code easier to write and reason about, which means less multithreading issues to deal with.

But that wasn't enough. Even though the said patterns are definitely an improvement over low-level threads, using callbacks can still make your code hard to read and test (callback hell, anyone?). That’s why using fibers/coroutines, which was introduced in Ruby 1.9, is another great option as it not only improves thread-safety but also increases the readability and testability of concurrent code. Using fibers with coroutines allows you to entangle the nasty callbacks so you can write asynchronous code that looks synchronous.

Show me the money

Which brings me to my next point - what if I tell you that with EventMachine and fiber, Ruby (both JRuby and MRI) can be as concurrent as Node.js? How does Ruby stack up against the likes of Erlang/Elixir, Haskell, Go and .NET? In part 2, I’m going to go over some of the concurrency patterns in Rack, some semi-scientific benchmarks of these patterns and when to choose one over the other. And maybe, just maybe, we'll show the Node.js hipsters what Ruby is really made of. Or not. So stay tuned.

Cover photo by Beshef under the Creative Commons license