How big should a module be in Symfony? I’ve been re-thinking this issue in light of what I’ve been reading about RESTful architectures.

For awhile now, Symfony has been moving toward enabling a RESTful architecture. As Fabien Potencier wrote a year ago:

The sfRequestRoute is the first step towards a RESTful architecture.

See also this talk he gave.

What is a RESTful architecture and what is it useful for? The best answer to both questions comes from the book Restful Web Services by Leonard Richardson and Sam Ruby (published May of 2007). Though the phrase “RESTful” actually comes from Roy Fielding’s Phd dissertation, I think it is reasonable to say that the popularization of the term come from what those 2 men wrote on their blogs in 2006 and 2007, and then from the book they jointly produced.

Their writing arose at first in opposition to what had become known as “Web Services”. That style of architecture focused on delivering XML via an HTTP request and then interpreting the payload via protocols that had become horribly complex. For an excellent parody, read “The S stands for Simple” by Pete Lacey, which creates an imaginary conversation between a developer and an evangelist for the SOAP protocol. Lacey’s criticism is the same one that Richardson and Ruby are making:

Dev: Hrrm. And what happens if I move the service to a different endpoint? Do I get a 301 back?

SG: No. SOAP doesn’t really use HTTP response codes.

Dev: So, when you said SOAP uses HTTP, what you meant to say is SOAP tunnels over HTTP.

SG: Well, ‘tunnel’ is such an ugly word. We prefer to say SOAP is transport agnostic.

Dev: I’ll note that your entire industry is built around ambiguous, sometimes erroneous, and definitely not standardized specifications. In fact, the SOAP and WSDL specs are just W3C Notes, not even working drafts.

Against this bloated, overly-complicated style, Richardson and Ruby set out to remind people what made the Web such a big success in the first place (Preface, page xiii):

It may seem strange to claim that the Web’s potential for distributed programming has been overlooked. After all, this book competes for shelf space with any number of other books about web services. The problem is, most of today’s “web services” have nothing to do with the web. In opposition to the Web’s simplicity, the espouse a heavyweight architecture for distributed object access, similar to COM or CORBRA. Today’s “web service” architectures reinvent or ignore every feature that makes the Web successful.

It doesn’t have to be that way. We know the technologies behind the Web can drive useful remote services, because those services exist and we use them every day. We know such services can scale to enormous size, because they already do. Consider the Google search engine. What is it but a remote service for querying a massive database and getting back a formatted response? We don’t normally think of web sites as “services,” because that’s programming talk and a web site’s ultimate client is a human, but services are what they are.

Every web application - every web site - is a service. You can harness this power for programmable applications if you work with the Web instead of against it, if you don’t bury its unique power under layers of abstraction. It’s time to put the “web” back in “web services”.

The features that make a web site easy for a web surfer to use also make a web service API easy for a programmer to use. To find the principles underlying the design of these services, we can just translate the principles for human-readable web sites into terms that make sense when the surfers are computer programmers.

They then summarize some of the “weaknesses” in the HTTP protocol, and they point out that these weaknesses are really strengths. They reference what Clay Shirky has written on this subject:

If it were April Fool’s Day, the Net’s only official holiday, and you wanted to design a ‘Novelty Protocol’ to slip by the Internet Engineering Task Force as a joke, it might look something like the Web:

The server would use neither a persistent connection nor a store-and-forward model, thus giving it all the worst features of both telnet and e-mail.

The server’s primary method of extensibility would require spawning external processes, thus ensuring both security risks and unpredictable load.

The server would have no built-in mechanism for gracefully apportioning resources, refusing or delaying heavy traffic, or load-balancing. It would, however, be relatively easy to crash.

Multiple files traveling together from one server to one client would each incur the entire overhead of a new session call.

The hypertext model would ignore all serious theoretical work on hypertext to date. In particular, all hypertext links would be one-directional, thus making it impossible to move or delete a piece of data without ensuring that some unknown number of pointers around the world would silently fail.

The tag set would be absurdly polluted and user-extensible with no central coordination and no consistency in implementation. As a bonus, many elements would perform conflicting functions as logical and visual layout elements.

HTTP and HTML are the Whoopee Cushion and Joy Buzzer of Internet protocols, only comprehensible as elaborate practical jokes. For anyone who has tried to accomplish anything serious on the Web, it’s pretty obvious that of the various implementations of a worldwide hypertext protocol, we have the worst one possible.

Except, of course, for all the others.

The problem with that list of deficiencies is that it is also a list of necessities — the Web has flourished in a way that no other networking protocol has except e-mail, not despite many of these qualities but because of them. The very weaknesses that make the Web so infuriating to serious practitioners also make it possible in the first place. In fact, had the Web been a strong and well-designed entity from its inception, it would have gone nowhere. As it enters its adolescence, showing both flashes of maturity and infuriating unreliability, it is worth recalling what the network was like before the Web.

My thoughts about a RESTful design in Symfony need to be qualified: I’m mostly talking about architecture for robot-consumed APIs. The architecture I’m thinking of is not necessarily one that you would need to adopt for a site that is meant for humans. Richardson and Ruby make a distinction between the “human web” and the “programmable web”, that is, sites that are meant to be seen by humans, versus sites that are mostly devoured by software robots that need to consume data to fulfil whatever function they’ve been programmed to do. From page 1:

When you - a human being - want to find a book on a certain topic, you probably point your web browser to the URI of an online bookstore: say, http://www.amazon.com/

You’re served a web page, a document in HTML format that your browser renders graphically. You visually scan the page for a search form, type your topic (say, “web services”) into a text box, and submit the form. At this point your web browser makes a second HTTP request, to a URI that incorporates your topic…

The web server at amazon.com responds by serving a second document in HTML format. This document contains a description of your search results, links to additional search options, and miscellaneous commercial enticements. Again, your browser renders the document in graphical form, and you look at it and decide what to do from there.

The web you use is full of data: book information, opinions, prices, arrival times, messages, photographs, and miscellaneous junk. It’s full of services: search engines, online stores, weblogs, wikis, calculators and games. Rather than installing all this data and all these programs on your own computer, you install one program - a web browser - and access the data and services through it.

The programmable web is just the same. The main difference is that instead of arranging its data in attractive HTML pages with banner ads and cute pastel logos, the programmable web usually serves stark, brutal XML documents. The programmable web is not necessarily for human consumption. Its data is intended as input to a software program that does something amazing.

On most of the Symfony projects that I’ve worked on so far, the tendency has been to have one module for one model. There have been some variations on that theme, but that has been the trend. For instance, to handle the users of a site, we might have a database table called sf_guard_user_profile, and then a model called sfGuardUserProfile, and a module called “user”, and in that module we might put every action that has anything to do with the model - look at all users, look at frequent users, look at a particular user, send a message to a user, update the user, show the user’s dashboard, show the user’s finances. On the MyBailiwick project we had huge modules - the user module had 40 public functions and maybe a dozen protected or private ones.

Building a RESTful API calls for a different design. Following the advice or Richardson and Ruby, we might end up with lots of very small modules, each of which have only 4 or 5 functions, corresponding to these actions:

read all (list all of a type)

read one (show one of a type)

write (create)

update

delete

Richardson and Ruby feel strongly that the method name (the action to be undertaken) should be one of the HTTP verbs:

GET

POST

PUT

DELETE

This is what Richardson and Ruby refer to as the “uniform interface” of HTTP. Unlike the ever changing, and confusing, interface of protocols such as SOAP or WSDL, HTTP has a simple interface, made up of 7 verbs, and the interface is not likely to change.

In a well-designed RESTful architecture, the method action does not go in the URL, it goes in the HTTP headers. In the course of the book, Richardson and Ruby set out to create an API for a site that allows people to upload and create their own maps. Here they talk about creating the service that will allow the site to create new user accounts (from page 149):

Expose a subset of the uniform interface

This is the first new step. I skipped it when designing read-only resources, because there was nothing to decide. By definition, read-only resources are the ones that expose no more than the HTTP methods GET, HEAD and OPTIONS. Now that I’m designing resources that can be created and modified at runtime, I also have PUT, POST and DELETE to consider.

Even so, this step is pretty simple because the uniform interface is always the same. If you find yourself wishing there were more HTTP methods, the first thing to do is go back to step two, and try to split up your data set so you have more kinds of resources. Only if this fails should you consider introducing an element of the RPC style by making a particular resource support overloaded POST.

To reiterate the example from Chapter 5: if you have resources for “readers”, and resources for “published columns,” and you start thinking “it sure would be nice if there was a SUBSCRIBE method in HTTP,” the best thing to do is to create a new kind of resource: the “subscription”. As HTTP resources, subscriptions are subject to HTTP’s uniform interface. If you decide to forgo the uniform interface and handle subscriptions through overloaded POST on your “reader” resources, defining the interface for those resources becomes much more difficult.

I can decided which bits of the uniform interface to expose by asking questions about intended usage:

1.) Will clients be creating new resources of this type? Of course they will. There’s no other way for users to get on the system.

2.) When the client creates a new resource of this type, who’s in charge of determining the new resource’s URI? Is it the client or the server? The client is in charge, since the URI is made up entirely of constant strings (https://maps.example.com/user/) and variables under the client’s control ({user-name}).

From those two questions I get my first result. To create a user account, a client will send a PUT request to the account’s URI. If the answer to the second question was “the server’s in charge of the final URI,” I’d expect my clients to create a user by sending a POST request to some “factory” or “parent” URI.

Again, this advice is aimed at API’s that are designed to be consumed by computers, not humans. But I like the idea of designing my code so that it could function, without much change, for dual use, as both an architecture for a human readable website, and also for an API for computers. This style of architecture would be a big change for me. Right now I’m working to refactor the code on WP Questions. As I look through the modules, I see that many of them could be broken apart into smaller modules, each with just 4 or 5 public functions: read, write, update delete. Though I worry about ending up with too many modules, I like the idea of each module being simple and focused. And RESTful.