Saturday, October 31, 2009
NoSQL East 2009 Redux

I just attended NoSQL East down it Atlanta over the last two days. This was a fantastic conference, well-organized, with not only great content and speakers, but also a very well-educated audience. There was a telling moment in the first non-keynote talk where the speaker asked the audience "How many people have read the Dynamo paper?" and easily 95% of the audience put their hands up.

I'd divide the major focus areas into the following groups:

As one of the speakers put it, which one you need depends on the shape of your data; these are all isomorphic in one way or another to each other, just as Turing complete languages are all essentially complete. But you still want to choose the right tool for the job. One nice touch was that a number of the systems were introduced not by one of the project committers, but rather by people who were using them to get stuff done. It was a very practical focus, much appreciated.

There were a couple of interesting tidbits from the conference, including a rollicking, confusing talk by John "My first network address was '12'" Day about network architecture and how TCP/IP had really gotten it all wrong. This guy was about 3 planes of existence above what I could follow at 9:15am--I had a distinct feeling he knew exactly what he was talking about, and was probably right, but there was no way he was making sense even to a very technical audience. Then there was the poor guy from Microsoft Research who had done some interesting distributed IDE work...for the .NET framework. In a sea of Mac and Linux laptops across the audience, he had a bunch of people who really couldn't make practical use of what he'd done.

Key-Value Stores

I've got to give a pretty big nod to Riak here for a straight-up key-value store; Justin Sheehy basically gave a talk about decentralized, scalable, lights-out system design that might well have been a good keynote, and it was clear that Riak was designed with high priority on those aspects. Major advantages over Voldemort in my mind are: ability to add/subtract nodes from running clusters, tunable (per-bucket, per-request!) quorum parameters (N,R,W), plus support for pluggable consistent hashing modules, partition distribution modules, and conflict resolution modules. For most of my use cases, with N=3, I'd probably do R=1,W=3 to optimize for read latency, as most of our writes would either be a user clicking "save" or would be the result of an asynchronous background process. On the other hand, if I wanted to build something like SQS on top of this (which I conjecture is possible), I'd probably do W=1,R=3 to optimize for the "post a message" latency.

While I was there, I had a great talk with them where we went over how there were going to expose the conflict resolution out up to the client via their HTTP/JSON interface. There's no out-of-the-box support for multiple datacenter awareness, although it seemed possible to add it via the pluggable partition distribution module, although their inter-node communication leverages Erlang's native networking, meaning some kind of VPN/IPSec tunnel would be the main way to make this work across sites. I'm pretty sure no matter what you would want to use in this space you'll probably have to end up using a VPN over WAN to give the appearance of a seamless LAN cluster (although the Cassandra guys piped up in the conference IRC channel that Cassandra has multi-site support already).

I can't tell whether being written in Erlang is a plus or a minus. On the one hand, it won't plug in well to our JMX-based monitoring frameworks, and I'm pretty sure very few folks within our organization have ever built or deployed an Erlang system. On the other hand, Erlang is the perfect choice for highly concurrent, fault-tolerant programming, and would probably be right up the alley of several of our programmers (I'm proud to say that while we are primarily a Java shop, we have a lot of other language proficiencies in-house spanning Ruby, Python, Clojure, and Scheme. Based on my brief escapade with it last night, I'm confident that (1) we have a number of folks who could pick it up easily and (2) who would jump at the chance). This is probably a wash.

Very interesting in my mind is that some of the key-value stores have brought over some features from other design spaces; Riak allows some basic link structure in their values, including a version of targeted MapReduce that can follow links, which starts to make it feel like a graph-oriented database like Neo4j. Similarly, Cassandra has support for column-oriented indices like HBase does. It's clear there's probably a project out there to scratch your particular itch.

Massive-scale data mining

We saw a couple of talks from folks who were using Hadoop for analyzing very large data sets. But furthermore, they were using frameworks like Pig and Cascading on top of Hadoop to do a lot of ad-hoc queries. Twitter in fact uses Pig to do all of their interesting web analytics, and they've taken all their analytics BAs who are used to doing ad-hoc SQL queries and trained them up with little problem in Pig. This is probably somewhere on our horizon, although there are larger cats to skin at the moment.

Column-oriented datastores

Cassandra and HBase were the major players here. Digg is moving all of their backend storage over to Cassandra after some successful pilots, and we saw a great talk by Mark Gunnels about how he is using HBase because it makes it "simple to get sh*t done". Apparently recent releases of HBase have made strides in fault tolerance (there is now an election algorithm for some of the special name nodes) and latency (apparently performance optimization was a major focus of the 0.20 release). There's an interesting article that describes some wide-area replication schemes that are available with HBase that sound intriguing (although once you are beyond two datacenters, I am convinced that you are better off with the VPN LAN solution if you want to have any hope of achieving eventual consistency).

Document-oriented databases

While the column-oriented datastores are good at organizing semi-structured data, the document-oriented guys are really all about organizing largely unstructured documents, and focus on doing some wild ad-hoc MapReduce queries. I still need to be convinced about the scaling, replication, and geographic distribution capabilities in this space, so it may be a while before we dip our toes in here.

Other

There was a very interesting talk about Tin ("the database so tiny they had to shorten its name"). This basically seemed to be a very clever approach for delivering stock data that leveraged a basic filesystem with range queries and rewriting rules via Sinatra. Turns out web servers serving static files goes pretty fast, if that's a suitable representation for your stuff (seems like his primary use case is for delivering read-only out to clients, where the data gets updated asynchronously in the background by the system). We actually have some datasets that are like that, so this is intriguing!

There was a talk from Neo4j, a graph-oriented database, which I just can't wrap my head around yet. I think I need to read up on some of the background research papers in this area. Certainly, the notion of a datastore based around the notion of link relations would likely be easy to expose as a REST HTTP interface, which is attractive. Our particular domain model can actually be nicely normalized, however (we are currently running off a traditional RDBMS after all), so I'm not sure we need the full semantic flexibility this offers.

There was also a talk from Redis; this is primarily an in-memory storage system with blazing-fast speeds, and the ability to write out to disk is really an afterthought. During the talk he showed a screen snapshot of a "top" running on a cloud-hosted virtual node with 60GB of memory. I cannot make up my mind whether this is ultimately just a badass memcached, or if he's on the cusp of something: if you have enough memory to hold your whole dataset, and you are sufficiently fault-tolerant, why bother writing to disk at all? Especially if you can easily get a periodic snapshot out to disk in the background that can be properly backed up for disaster recovery.

Conclusion

As folks pointed out, "NoSQL" might better be written "NOSQL" to mean "Not Only SQL". I didn't sense a lot of MySQL hatred in here; quite the contrary, many people were very complementary that there were certain things it does *really* well, and that the MySQL hammer was something that was staying firmly in their toolboxes. However, it is clear that there is a maturing community of very practically-minded folks that are looking for a new set of tools to drive their particular screws. Although to be sure (and this is something that I think some of the conference tweets corroborated), this also implies we all have a screw loose or two....

Friday, October 30, 2009
Object Calls == Message Passing in Erlang

I started playing around with Erlang last night as a result of learning about Basho's key-value store Riak at NoSQL East yesterday (more specifically, it was due to Justin Sheehy's talk, and two realizations: (1) this guys *gets* a lot of important *operational* design choices in this space, and (2) he decided to build his system in Erlang).

So I decided to read the online Erlang "course" and started working on the exercises. One of them was:

Write a function which starts N processes in a ring, and sends a message M times around all the processes in the ring. After the messages have been sent the processes should terminate gracefully. picture of a unidirectional ring of nodes

And so, summoning vaguely-remembered lectures by Bob Harper in "Fundamentals of Computer Science II" at CMU on doing object-oriented programming in Scheme, and remembering that the original Smalltalk OOP guys always said "send an object a message" rather than "invoke a method on an object", I set to work. [Editor's note: please feel free to post comments showing me better ways, I have known Erlang for all of about 12 hours at this point!]

Let's get the declarations out of the way. I need to define the entry point function which creates the ring of N nodes and sends the message around it M times, and I know I'm going to need a function representing a node in the ring, since I'm going to have to spawn processes for them.

-module(ring).
-export([ring_msg/3, ring_node/1]).

Ok, what job does a node in the ring have? Well, most of the time, when it receives a message, it just needs to pass it on to the next guy. So my node process is going to need to know about its next neighbor. Now in Erlang, what I would normally think of as an object can be modelled as a recursive function that passes its current state back into itself as an argument, and processes "method calls" by receiving messages. Interestingly, not all method calls actually have to send something back to the caller!

ring_node(Next) ->
  receive
    { pass, M, Msg } ->
      % Note that we got this message. 
      io:format("Node ~w~n",[Msg]),
      % Pass the message on around the ring.
      Next ! { pass, M, Msg },
      % If the count was down to zero, I can
      % exit, otherwise, I loop and wait for
      % the next incoming message.
      if M == 0 -> ok;
         true -> ring_node(Next)
      end
  end.
Ok, seems pretty straightforward. But if I had a ring of these set up, a message would just keep running around the ring. At least one node needs to be special, so that it can decrement the count M as the message comes through. It's pretty similar to the ring_node above, but is a little different.
init_node(Next) -> receive
    % message has been all the way around
    % the last time, so I can quit
    { pass, 0, _ } -> ok;
    % otherwise, log the message and pass
    % it on, decrementing the count
    { pass, M, Msg } ->
      io:format("Node ~w~n",[Msg]),
      Next ! { pass, M-1, Msg },
      init_node(Next)
  end.
Now an interesting thing here is that the init_node and the ring_node can both handle the "pass" message, and that when they send the message on, they don't actually care what their "Next" process is. It's like both of these "objects" implement the following interface:
public interface MessagePasser {
  void pass(int count, Object msg);
}
Ok, so now if we can create a ring with 1 init_node and (N-1) ring_nodes, we're all set if we inject the initial Msg into the init_node. So let's think about constructing a ring of nodes; if we have a node handy, we can pass that in as the initial argument (think "constructor") to a ring_node process to use as its Next node, then we just count down:
ring(Last, 0) -> Last;
ring(Last, N) -> 
  RN = spawn(ring, ring_node, [Last]),
  ring(RN, N-1).
Hmm, that's close, but that's a linked-list of nodes, not a ring. But we can't pass a node in as a constructor argument to the first node we create, because we don't have any yet! So it seems like we'll need to construct a linked-list of nodes, and then "close the loop" by stitching the front and the back together. Our init_node is already a special node, so maybe we can extend it this way:
init_node(Next) -> receive
    % acknowledge the request, update state
    { setNext, N, From } -> From ! ok, init_node(N);
...
In other words, the init_node can get a special message telling it to "update" its Next state. In some sense, we've just done this:
public interface InitNode extends MessagePasser {
  void setNext(MessagePasser N);
}
We want to acknowledge the request so our ring construction knows when that message has been processed -- we don't want to hand the ring back until it's all stitched together, and we can't guarantee ordering of message delivery unless we specifically wait for a response. So here's the full ring construction:
ring(N) when is_integer(N) ->
  % just pass in a placeholder for Next
  RN0 = spawn(ring, init_node, [nil]),
  ring(RN0, RN0, N-1);
% finished stitching, can return our
% init node to the caller
ring(Init) -> receive ok -> Init end.
ring(Init, Last, 0) -> Init ! { setNext, Last, self()}, ring(Init);
ring(Init, Last, N) ->
  RN = spawn(ring, ring_node, [Last]),
  ring(Init, RN, N-1).
Finally, the thing we're trying to do (including optimizing the degenerate cases):
ring_msg(0, _, _) -> ok;
ring_msg(_, 0, _) -> ok;
ring_msg(N, M, Msg) ->
  Init = ring(N), Init ! { pass, M, Msg }, ok.
Actually runs, too! It's pretty neat to see polymorphism via being able to accept the same message, and I've always loved the pattern matching in ML (both SML and OCaml variants!). Some pretty serious systems programs are getting written in this language; it's clear that the process spawning methodology lends itself well to a SEDA-style approach which is great for graceful degradation of service, and the fully-functional style (no mutation) means that you have no locks, no shared state, and hence safe concurrency (as long as you can model what you're doing properly).

Wednesday, August 26, 2009
Agile Architecture Anti-Patterns

I have been the solution architect for an enterprise-spanning project which is now getting geared up for its next phase. This project touches more than 10 different development teams across at least three company divisions and involves at least one external ISV. So, you know, a pretty small project.

As I reflect on the process of hashing out all the technical details, I realized I got stretched pretty thin. I really didn't have time to write much down in any formal fashion this time around, due to the time pressure on the project and the number of other things I was also trying to juggle. I have instead been spending a lot of time drawing stuff on whiteboards for various technical audiences. The things I did have time to write up were only the most general concepts and patterns - i.e. the really high-level architecture.

At least part of the reason I didn't have time to write anything else down was that I was spending all my time in technical design discussions with all the various groups, hashing out nitty-gritty integration details and then bringing full integration designs back to the development groups to run with. The last time through, this worked out well, because the developers in my home organization are used to agile development and being responsible for their own technical design, so they were able to take my whiteboard sketches, continue to consult with me, and produce solid, working code. It also worked out ok because the last phase was actually the first phase of this project, and I had enough lead time to work out all the details ahead of time.

This time around, the time pressure is pretty high, plus the work-ahead time I would have had in a traditional "up-front" architecture process was actually consumed by helping to get the first phase out the door. So I find I actually don't have enough time to do the same detailed design that I did the first time around before the development teams are scheduled to start. This would seem to be a major conundrum.

Antipattern 1: waterfall, up-front architecture doesn't pipeline well if the architecture phase extends concurrently all the way through the development phase.

Corollary: trying to waterfall development and QA doesn't pipeline well either for exactly the same reason (because developers have to keep working their code to fix bugs).

The other anti-pattern that I was running into was with development teams in some of the other divisions; I would work with the enterprise architects to hash out a pretty detailed design, but when we brought that to the development teams, there was a ton of convincing, resistance, and redesign that ended up happening. Now, I'm not sure exactly which anti-pattern we were running into, but it was one of the following:

Antipattern 2: up-front architecture doesn't work if the architects aren't familiar enough with the details of the systems/teams they are designing for, because the designs won't "fit right".

Antipattern 3: up-front architecture doesn't work if the architects haven't been able to build up sufficient technical cred to sell their designs to the developers.

For sure both of those applied to me with respect to the developers and technical leads of the groups in the other divisions.

I've been thinking a lot recently about the right way to approach this, and really enjoyed this essay on Agile Enterprise Architecture by Scott W. Adler. He proposes a much more lightweight, hands-on approach to architecture which can best be described as guiding the agile development teams to develop the right architecture themselves.

Fred Brooks writes in The Mythical Man-Month that a systems' architecture must proceed from one or a small number of minds in order to be coherent (I can't find a copy right now, so that's paraphrased). At the same time, agile development techniques like Extreme Programming (XP) or Scrum suggest that the development teams ought to be responsible for evolving their architecture organically. I'm starting to think that both of these are true, and that they are not mutually exclusive, and it is related to another realization I've had (but that I have a hard time remembering):

A solution architect is responsible for seeing that the end-to-end solution hangs together technically. It is not necessary for the architect to produce the whole end-to-end design himself to achieve this.

Going forward, I think I'm going to take the following approach: figure out broadly which development teams are going to need to interact and what their coarse responsibilities are, then immediately get them involved on working out the solution. In true agile fashion, let them work out the details, and just play Product Owner to set high-level (technical) requirements, being on-hand to participate in the design discussions. I think this leverages best the available technical design talent in the development teams, while making sure that someone is tracking it all and fitting it into a coherent mental model to make Mr. Brooks happy.

Friday, May 15, 2009
RSA Public Key Cryptography in Java

Public key cryptography is a well-known concept, but for some reason the JCE (Java Cryptography Extensions) documentation doesn't at all make it clear how to interoperate with common public key formats such as those produced by openssl. If you try to do a search on the web for how to make RSA public key cryptography work in Java, you quickly find a lot of people asking questions and not a lot of people answering them. In this post, I'm going to try to lay out very clearly how I got this working.

Just to set expectations, this is not a tutorial about how to use the cryptography APIs themselves in javax.crypto (look at the JCE tutorials from Sun for this); nor is this a primer about how public key cryptography works. This article is really about how to manage the keys with off-the-shelf utilities available to your friendly, neighborhood sysadmin and still make use of them from Java programs. Really, this boils down to "how do I get these darn keys loaded into a Java program where they can be used?" This is the article I wish I had when I started trying to muck around with this stuff....

Managing the keys

Openssl. This is the de-facto tool sysadmins use for managing public/private keys, X.509 certificates, etc. This is what we want to create/manage our keys with, so that they can be stored in formats that are common across most Un*x systems and utilities (like, say, C programs using the openssl library...). Java has this notion of its own keystore, and Sun will give you the keytool command with Java, but that doesn't do you much good outside of Java world.

Creating the keypair. We are going to create a keypair, saving it in openssl's preferred PEM format. PEM formats are ASCII and hence easy to email around as needed. However, we will need to save the keys in the binary DER format so Java can read them. Without further ado, here is the magical incantation for creating the keys we'll use:

# generate a 2048-bit RSA private key
$ openssl genrsa -out private_key.pem 2048

# convert private Key to PKCS#8 format (so Java can read it)
$ openssl pkcs8 -topk8 -inform PEM -outform DER -in private_key.pem \
    -out private_key.der -nocrypt

# output public key portion in DER format (so Java can read it)
$ openssl rsa -in private_key.pem -pubout -outform DER -out public_key.der

You keep private_key.pem around for reference, but you hand the DER versions to your Java programs.

Loading the keys into Java

Really, this boils down to knowing what type of KeySpec to use when reading in the keys. To read in the private key:

import java.io.*;
import java.security.*;
import java.security.spec.*;

public class PrivateKeyReader {

  public static PrivateKey get(String filename)
    throws Exception {
    
    File f = new File(filename);
    FileInputStream fis = new FileInputStream(f);
    DataInputStream dis = new DataInputStream(fis);
    byte[] keyBytes = new byte[(int)f.length()];
    dis.readFully(keyBytes);
    dis.close();

    PKCS8EncodedKeySpec spec =
      new PKCS8EncodedKeySpec(keyBytes);
    KeyFactory kf = KeyFactory.getInstance("RSA");
    return kf.generatePrivate(spec);
  }
}

And now, to read in the public key:

import java.io.*;
import java.security.*;
import java.security.spec.*;

public class PublicKeyReader {

  public static PublicKey get(String filename)
    throws Exception {
    
    File f = new File(filename);
    FileInputStream fis = new FileInputStream(f);
    DataInputStream dis = new DataInputStream(fis);
    byte[] keyBytes = new byte[(int)f.length()];
    dis.readFully(keyBytes);
    dis.close();

    X509EncodedKeySpec spec =
      new X509EncodedKeySpec(keyBytes);
    KeyFactory kf = KeyFactory.getInstance("RSA");
    return kf.generatePublic(spec);
  }
}

That's about it. The hard part was figuring out a compatible set of:

  1. openssl DER output options (particularly the PKCS#8 encoding)
  2. which type of KeySpec Java needed to use (strangely enough, the public key needs the "X509" keyspec, even though you would normally handle X.509 certificates with the openssl x509 command, not the openssl rsa command. Real intuitive.)

From here, signing and verifying work as described in the JCE documentation; the only other thing you need to know is that you can use the "SHA1withRSA" algorithm when you get your java.security.Signature instance for signing/verifying, and that you want the "RSA" algorithm when you get your javax.crypto.Cipher instance for encrypting/decrypting.

Many happy security returns to you.

Tuesday, February 10, 2009
Downloading your Blogger archives

A friend was looking for a way to grab an archive of his Blogger posts into a CSV file he could do text mining on (and presumably, for a low-fi backup mechanism). I wrote this Python script for him, enjoy.

#!/usr/bin/env python
#
# Copyright (C) 2009 by Jon Moore
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

import csv
import urllib2
import unicodedata
import xml.etree.ElementTree as etree

blog_feed = 'http://codeartisan.blogspot.com/feeds/posts/default'
output = 'posts.csv'

ATOM_NS = 'http://www.w3.org/2005/Atom'

def norm(s):
    if not s: return None
    return s.encode('ascii','ignore')

def main():
    f = open(output, 'wb')
    csv_wr = csv.writer(f)
    url = blog_feed + '?max-results=100'
    csv_wr.writerow(['id','published','updated','permalink','title','content'])
    while url:
        print "fetching", url
        feed = etree.fromstring(urllib2.urlopen(url).read())
        for entry in feed.findall("{%s}entry" % ATOM_NS):
            id = entry.find("{%s}id" % ATOM_NS).text
            published = entry.find("{%s}published" % ATOM_NS).text
            updated = entry.find("{%s}updated" % ATOM_NS).text
            title = norm(entry.find("{%s}title" % ATOM_NS).text)
            content = norm(entry.find("{%s}content" % ATOM_NS).text)
            perm_url = ''
            for link in entry.findall("{%s}link" % ATOM_NS):
                if (link.get('rel') == 'alternate'
                    and link.get('type') == 'text/html'):
                    perm_url = link.get('href')
                    break
            csv_wr.writerow([id,published,updated,perm_url,title,content])
            print "wrote",id
        url = None
        for link in feed.findall("{%s}link" % ATOM_NS):
            if link.get('rel') == 'next':
                url = link.get('href')
                break
    f.close()

if __name__ == "__main__":
    main()

Saturday, January 24, 2009
Business Cases and Cloud Computing

I just read a very interesting article by Gregory Ness on seekingalpha.com that talks about some of the technology trends behind cloud computing. One key quote:

Automation and control has been both a key driver and a barrier for the adoption of new technology as well as an enterprise’s ability to monetize past investments. Increasingly complex networks are requiring escalating rates of manual intervention. This dynamic will have more impact on IT spending over the next five years than the global recession, because automation is often the best answer to the productivity and expense challenge.

One other cited link is to an IDC study that includes the following graph:

Graph showing that 60% of the total cost of ownership (TCO) for a server over a 3 year lifetime comes from staffing costs.

Note that staffing accounts for 60% of the cost of maintaining a server over its lifetime. Cloud infrastructure services like Amazon EC2 would really only save an enterprise data center on hardware setup / software install costs, which are probably, in terms of staffing, a small amount of staff time for a given server. Actually administering the server once it is running is really the bulk of the cost, and that won't go away on EC2 -- you'll still need operations staff to provision/image cloud infrastructure. EC2 makes sense if the economies of scale of AWS are such that they can achieve a lower operational cost for that other 40% than you can, or if there is a business / time-to-market value proposition that makes sense in being able to provision hardware on EC2 more rapidly than we can acquire and install hardware yourself.

Given the huge economy of scale that the large cloud providers have--tens of thousands of servers, it is going to be hard to get your costs for that 40% lower than what they can achieve with their existing infrastructure automation and ability to purchase hardware in bulk, especially for a startup company whose hardware needs are initially modest. Let's guess that there's a 33% markup on cost for EC2, so when you are getting charged $0.10 per CPU hour, it's really only costing them $0.075. Let's assume a 75% experience curve on infrastructure (meaning, once you have doubled the number of servers you have deployed, the last server costs only 75% of what the halfway point was).

By one estimate, Amazon has 30,000 servers. Now let's work backward (1/0.75 = 1.33): at 15,000 servers, their cost was $0.075 * 1.33 = $0.9975. At 7500 servers, their marginal cost was $0.9975 * 1.33 = $0.13. In other words, you'd have to be planning to deploy 15,000 servers in order to have a hope of getting your marginal cost under what they'll charge you retail.

(I think this is actually a conservative estimate: the experience/learning curve for infrastructure deployment is probably steeper than 75% due to existing hierarchical deployment patterns and a product (provisioned servers) that lends itself well to automation. Also, due to the high barrier to entry for cloud computing in terms of number of servers you need to be competitive, they can probably get away with charging an even higher markup).

One corollary of this is that if you are currently running a data center with far fewer servers (i.e. the hardware is a sunk cost), you might actually be better off turning your data center off and leasing from Amazon. Now of course, there are some things (customer credit card data, extremely sensitive business information) that you just wouldn't be willing to host somewhere outside your own data center. But that's probably a very specific set of data--host that stuff and lease the rest in the cloud, particularly if you can get adequate SLAs from your cloud vendor.

So that deals with the 40% of the TCO for a server that isn't staffing. How do you cut costs on the other 60%?

You won't really be able to make a dent in that 60% until you get not just to fully automated infrastructure provisioning, but until you get to fully automated software deployment and provisioning. This is not possible until you get to standardized computing platforms with specific functionality that are scale-on-demand, like Akamai NetStorage, Amazon S3/EBS/SQS/SimpleDB, and Google AppEngine. These are known as "Platform-as-a-Service" (PaaS) offerings.

There's a similar experience curve argument here: you could spend internal development time here to set up some kind of application deployment framework, but you'd essentially have to be willing to build and deploy within orders of magnitude the number of different apps as the Google App Engine team in order to get your costs under what Google will charge you. Unless you are in the business of directly competing with them in the PaaS market, you might as well buy from them and focus your energy on providing your unique business value, not software or hardware infrastructure. [Editor's note: this was something Matt Stevens said to me a while ago, and it wasn't until I went through the mental exercise of writing this article that I actually got it].

Yesterday I implemented (not prototyped) a service in Google App Engine in about 6 hours that would cost around $400 per month (according to their recent pricing announcements) if projected usage were more than double what it is now. I estimate this would require at least 10 database servers just to host the data in a scalable, performant fashion, nevermind the REST data interface (webnodes) sitting in front of it. On Amazon EC2, that'd be $720 per month on your small instances (assuming those were even beefy enough), and per the experience curve argument above, it's probably way more than that in our data center. And that's not counting any of the reliability/load balancing infrastructure.

So my open question is: how, as a software developer, can you justify not building your app in one of these cloud frameworks?

Monday, January 12, 2009
Websites are also RESTFul Web Services

I have been reading the Richardson and Ruby book RESTful Web Services and recently had an epiphany: if you design a RESTful web site it is also a RESTful web API. In this post I'll show exactly how that works and how you can use this to rapidly build a prototype of a modern web application.

First of all, let's start with a very simple application concept and build it from the ground up. Let's consider a simple application that lets you keep a list of favorite items. The resources we'll want to model are:

  • a favorite item
  • a list of a user's favorite items

We'll assign the following URLs:

  • /favorites/1234 for favorite item with primary key 1234
  • /favorites is the list of everyone's favorite items
  • /favorites/-/{owner}joe@example.com (or its URL-encoded equivalent) is the list of items belonging to our friend Joe (this is a URL format inspired by Google's GData protocol).

Now, we'll support the following HTTP methods:

  • You can GET and DELETE an individual item (we could allow PUT if we wanted to allow editing, but we'll keep the example simple)
  • You can create a new favorite item with a POST to /favorites.
  • You can GET the list of a user's favorites.

Ok. Now we want to rapidly prototype this so we know if we have the resources modelled correctly. Fire up your favorite web application framework (Ruby on Rails, Django, Spring, etc.) and map those URLs to controllers. Now most of these frameworks let you fill in implementations for the various HTTP methods. We'll make a minor simplification and allow "overloaded POST" where we allow passing a URL parameter to POST to specify PUT and DELETE (e.g. "_method=DELETE"). We can implement the proper HTTP method but we'll allow you to use POST to do it too; browsers and some Javascript HTTP implementations can only do GET and POST.

Ok, now a funny thing happens: it you render an HTML response for everything, you can start playing with your API in your browser! In particular, when we render the list of items, we will naturally put the text of those items on the page, but we can also throw the following HTML snippet at the top of the page:

<p>Add a new favorite:</p>
<form action="/items" method="post">
  <input type="text" name="itemname"/>
  <input type="submit" value="Add"/>
</form>

We can also add the following form after each item's text:

<!-- use a specific item's URL for the action -->
<form action="/items/1234" method="post"/>
 <input type="hidden" name="_method" value="DELETE"/>
 <input type="submit" value="Delete"/>
</form>

which gives us this ugly beastie:

A Few of My Favorite Things:
Add a new favorite:
  • raindrops on roses
  • whiskers on kittens
  • bright copper kettles
  • warm woolen mittens
  • wild geese that fly with the moon on their wings

Now, one key item is that when you render the result page for adding an item, you can send a 201 (Created) response and say something like "item added", throwing in a link back to the list page. The whole HTML response might be nothing more than:

<html>
  <head></head>
  <body>
  <p>I created your item <a href="/items/2345">here</a>.</p>
  <p>All your items are <a href="/items/-/{owner}joe@example.com">here</a>.</p>
  </body>
</html>

We similarly want to render a confirmation page after a DELETE. This makes for an awkward user experience "add, ok, add, ok,..." but you'll notice that the back and forward buttons on your browser actually work without having to rePOST any exchanges.

[Side note: you could, instead of returning a success page, return a 302 that lands you back on the list page, which maybe gets you closer to what you wanted from a user experience, but this is precisely what will break your browser's back button and make you rePOST.]

Now you also have the interesting property that all the links on your site are safe (without side effects) GETs, and all the buttons are potentially destructive (write operations of one sort or another). I say only "potentially" because you might have a search form with action="get" to do a query, and not all of your POST/PUT/DELETEs will actually change anything.

At any rate, at this point, you have a functionally working website that someone could use, if somewhat awkwardly. Plus, if you have my frontend aesthetic design sensibilities, your users will have the pleasure of suppressing a gag reflex while using your site.

So let's spruce this up a little bit. Now, on the HTML page for the list of favorites, we can apply some Javascript. At a first blush, we can hide the delete buttons until a mouseover, which cleans things up somewhat. But the real magic happens when we attach an AJAX event to the delete buttons. Now the script can actually do the very same POST that the form would have done, and then check the HTTP status code, removing the item text from the DOM on success.

Suddenly, the user never leaves that list page, and we haven't had to change any of the rest of the API -- just the HTML representation of that list page. The AJAX call doesn't care if it gets HTML back (in this case), it just cares about the response code. Now we have the nice AJAXy experience we would expect, but oddly enough you still have a website that will work for people with Javascript disabled.

The last step towards finishing out your API is probably simply to make structured versions of your representations available (e.g. JSON or XML formats like Atom) with an optional parameter like "?format=json". Now all of your client-side functions can call URLs with the appropriate format on them and get well-structured data, and everyone else gets HTML.

Well, I guess that's the second to last step. You probably actually want to apply some graphic design and CSS to your site too...