Making it Complex


Warning: Mostly a brain dump ahead.

In most of my career as a Software Engineer I've worked with Drupal, a decent and fairly widespread Content Management System. I started working with it during Google's Summer of Code and I fell in love with it. Or perhaps, I didn't fall in love so much with Drupal, but rather the Drupal community.

Falling Out of Love

However, as time has gone on and as I've worked with Drupal professionally I've come to dislike it more and more. I've seen Drupal sites in every state imaginable from squeaky clean to incredibly broken. Usually, they're incredibly broken.

So when I stumbled on Keeping it Simple written by Sam Boyer the piece resonated deeply with me. To Sam's reasoning we need to simplify and reduce the complexity in Drupal.

Unfortunately, the likelihood of reducing Drupal's complexity is slim to none. Here's why.

Shaky Foundations

Drupal is built atop the venerable PHP programming language, which has been alive and kicking since 1994. PHP is ubiquitous, and powers a large swathe of the web including Facebook, Wordpress (and it's main deployment, wordpress.com), and Wikipedia among many others.

Despite it's ubiquity, PHP relies on a model of execution that's dangerously outdated, the forking model.

Fork Me

What happens when a browser makes a request to a webserver that's running PHP?
Let's break it down.

  1. Request is initiated by the client
  2. Webserver (typically Apache or Nginx) processes request
  3. Webserver notes that the request is to a PHP extension and creates a new PHP process (or selects one from a pool of already running procceses)
  4. PHP file is processed and code is executed
  5. Output is handed back to the webserver and execution is completed
  6. Output is sent back to the client by the webserver

It's a simple model, and on it's surface it's just fine. But there is a problematic detail in all of this. The lifetime of a PHP web application is a single request. Even if resources span the entire lifetime of an application, they need to be set up and torn down with every single request.

This model becomes problematic when doing a number of differents tasks.

  1. Caching
  2. Queue based processing
  3. Scheduled jobs
  4. Real-time applications
  5. Search
  6. Batch Processing

Caching in the traditional sense isn't done by Drupal, rather output results are stored in the site's database. There is no backend process that can be used to consume items out of a queue. By the same token, without background processes, there isn't an easy way to set up scheduled jobs. With large memory overhead, and the inefficiency of setup and teardown, it's difficult to do realtime applications. With search, long running and sophisticated cache warming systems are necessary to help maintain performance. Since there is no way to distribute a workload over time, Drupal resorts to an incredibly nasty hack to process jobs in bulk.

Yet, for each of these Drupal contains a broken and poorly implemented system.

It isn't that these systems are bad to have in and of themselves. It's that they're built on top of a model which can't easily support them in the first place. It inherently increases the complexity of tasks that otherwise have known solutions. This means that for any site or application of reasonable complexity, Drupal needs to be married with an array of other systems.

If the batteries included are faulty ones, it's questionable as to why they are included in the first place.

Dichotomy

There's a certain dichotomy in that Drupal exists for your small website that gets very little traffic and for which the batteries are included. While at the same time, it exists to serve incredibly high traffic sites that the included batteries can't function for, but can hopefully be swapped out.

Unforunately, the model provided by PHP is again one which prevents us from easily transitioning between these two use cases.

For instance, a high traffic site that wants to use memcached for high performance caching needs to set it up, and while it isn't necessary for a small site, it certainly doesn't hurt either. But there isn't an easy way to create an embedded in memory cache in Drupal.

Instead, we're stuck maintaing systems which are used by default for small sites that inevitably need to be swtiched out for components that operate better, but are more difficult to set up.

Module Madness

Modularity is a great thing, and Drupal has a thriving ecosystem of modules. This is again hobbled by how PHP operates.

In order to service a request, Drupal will need to load every single module in order to determine what should be run. While Opcode Caching offsets the performance penalty of this, it causes issues with high memory consumption. Modules load regardless of their use, and as things stand unless their is a way to target their loading this will continue.

Bolt On Culture

Coupled with module madness, Drupal lives in a world of Bolt On Culture. Rather than creating useful and well defined system with which sites can be built, people seek to find the module that will solve their problems in the shortest amount of time possible.

This leads to sites and applications that often have at least 100 modules enabled and a system which becomes difficult to understand or reason about.
Instead of a system that is organized and orchestrated together, it's code pulling in 10 different directions and each doing it's own thing.

To a certain degree, it's a fact of life driven by budget pressures. By it has a high ongoing cost of maintenance.

Changing the Status Quo

As long as Drupal is tied to the PHP model of execution it will continue to suffer these issues. But with a massive legacy codebase, it's difficult to see how that change will happen.

ga_key = "UA-31144606-1"