Monday, December 01, 2008

The early reviews of Amazon's CloudFront are in. The question is will Akamai do business differently now? Is Amazon going to eat Akamai's business from the bottom up, much like Windows NT and Linux did (in different ways) for the UNIX workstation market? Akamai's CDN includes real-time communications like VOIP and conferencing. I wonder if Amazon wants into that space too.

Tuesday, November 18, 2008

The opening salvo in the coming war between Amazon and Akamai.

Wednesday, November 05, 2008

That went well, at least for pollster.com. Predicted:

Actual:

Saturday, September 27, 2008

Went hiking in the mountains today with the folks and my nephew. The colors are changing beautifully.


DSC_0106





Saturday, September 20, 2008

Rereading an old networking textbook this afternoon, I ran across this passage:

There is frequently a trade-off between some sort of optimality and scalability. When hierarchy is introduced, information is hidden from some nodes in the network, hindering their ability to make perfectly optimal decisions. However, information hiding is essential to scalability, since it saves all nodes from having global knowledge. It is invariably true in large networks that scalability is a more pressing design goal than perfect optimality.

Computer Networks, A Systems Approach, 3rd ed., p. 318
Larry L. Peterson and Bruce S. Davie
This reminds me of an idea I and a friend have been kicking around lately: software can be optimal only within its own domain, where a domain amounts to a layer in some software stack. Kernels optimize, say, the mappings between virtual and physical pages. Compilers optimize register allocation. Application writers optimize data structures and algorithms. And that's about as good as it gets. As a counterexample, if an application writer were also to have control over and knowledge of register allocation, page tables, i/o buffering, and so on down the stack, she might be able to create a maximally optimal application, but it would come at the cost of scalability of her own productivity. It would take her longer. So efficient use of programmer time requires layering.

At first glance I thought that layering also requires software reuse, but perhaps it doesn't. An ISV, for example, could ship an application for which it wrote a one-off operating system and compiler, with layering dictated by the fact that the application, operating system, and compiler teams were completely separate and communicated only by design specs. But that's inefficient as well. So the spirit of the law of layering requires software reuse, which in turn requires that each layer be as general as possible.

All of which reminds me of the argument David Clark made back in the 80s about layering in networks. I suppose there are equivalent papers about layering in software architecture as well, the most likely (canonical) one being Douglas McIlroy's Mass Produced Software Components.

Thursday, September 04, 2008

Sometimes -- okay, usually -- it seems the Republic Party (might as well return the favor for "Democrat Party") thinks drilling constitutes a comprehensive energy policy. And as if that wasn't apparent before, it sure was clear after Guliani's speech last night in St. Paul:

John McCain will bring about the change that will create jobs and prosperity…let’s talk specifics…John McCain will lower taxes so our economy can grow. He will reduce government spending to strengthen our dollar. He will expand free trade so we can be even more competitive. He will lead us to energy independence so we can be free of foreign oil. And he’ll do it with an all-of-the-above approach, including nuclear power and off-shore drilling.

[ chants of “Drill, baby, drill”]

Giuliani laughs as the audience chants.

Thursday, August 28, 2008

In case you're wondering about the differences among select(2), poll(2), and the device-based polling approaches, they all do fundamentally the same thing -- notify the program when a socket is ready for reading or writing or when an error occurs on a socket. Select() came first in the history of UNIX, but it has a basic limitation on the number of sockets that can be monitored. It's 1024 in most cases, unless you recompile. Poll() solves this problem; it requires the programmer to allocate the array of socket structures that she passes to the system call, so in theory you can monitor an arbitrary number of sockets. (Poll() is also superior to select() in that it gives the programmer finer control over the kinds of events to watch for.) The problem that poll() has only really occurs on servers that handle an enormous number of sockets. That problem is, simply, that the array of socket structures must be copied from user space to kernel space and back again every time you call poll(). So far as I recall, this performance bottleneck came to light in the early 2000s when people were doing research on the scalability of Linux, but that's just a vague memory. The first implementation could have been done in FreeBSD or Solaris. At any rate, the user-kernel-user copy problem is the reason for what I call the device-based approaches (because they involve a file in /dev). With epoll (as it's known on Linux), the program tells the operating system which particular sockets to monitor, and the operating system tells the program when a particular socket has changed. It only tells the program that that particular socket changed. It doesn't say, "Hey, here is the entire of array of sockets you care about, and it's up to you to examine the structures to figure out which ones changed." So device-based polling is useful if you are polling a large number of sockets. Otherwise, plain old poll() should work just fine.

Friday, August 15, 2008

Thinking about domain-specific languages (DSLs) ... Generally, it is easier to keep track of the role of each argument to some function/method in languages with keyword parameters (e.g. Python, Ruby). Names are easier to remember than positions in a parameter list. In a language without keyword parameters, how do you make it easy to remember which parameter is what (putting aside for the moment the usefulness of IDEs in displaying the function/method signature for you)? Here's an example of how to do that in Java. Take this function:

public void validateState(PBXConference conference, int added, int connecting, int connected, int disconnecting, int disconected) {
assertEquals(conference.added(), added);
assertEquals(conference.connecting(), connecting);
assertEquals(conference.connected(), connected);
assertEquals(conference.disconnecting(), disconnecting);
assertEquals(conference.disconnected(), disconnecting);
}

A client would invoke it like:
validateState(conference, 2, 1, 1, 0, 0);

but that sequence of numbers doesn't help the readability of the test. So instead, while it's a bit more verbose, we can change the function definition to:
public class ConferenceStateValidator {
private PBXConference conference;

private ConferenceStateValidator(PBXConference conference) {
this.conference = conference;
}

public ConferenceStateValidator added(int n) {
assertEquals(conference.added(), n);
return this;
}

public ConferenceStateValidator connecting(int n) {
assertEquals(conference.connecting(), n);
return this;
}

public ConferenceStateValidator connected(int n) {
assertEquals(conference.connected(), n);
return this;
}

public ConferenceStateValidator disconnecting(int n) {
assertEquals(conference.disconnecting(), n);
return this;
}

public ConferenceStateValidator disconnected(int n) {
assertEquals(conference.disconnected(), n);
return this;
}

public static ConferenceStateValidator validateState(PBXConference conference) {
return new ConferenceStateValidator(conference);
}
}

And the client (assuming they've statically imported validateState), can do:
validateState(conference).added(2)
.connected(1).connecting(1)
.disconnected(0).disconnecting(0);

which is much cleaner.

Wednesday, August 13, 2008

From Mozilla Labs, an idea whose time is coming. I was struck by this bit though.

Our next step is to gather feedback on the prototype and the ideas behind it. We want to know if the concept has promise and is worth pursuing further. We’re particularly interested in feedback on how messaging might fit into the browsing experience and if there are other interfaces (or refinements to the two interfaces built into the prototype) that would make it easier for users to have online conversations.

We’re still considering what may come after that, but possible extensions to the Snowl prototype include:

  • support for additional message sources, e.g. Facebook, AIM, Google Talk, etc.;
  • an interface for writing and sending messages to enable true two-way conversations;
Since Facebook and Google Talk already support or are going to support XMPP, the only question is whether Snowl will support it too. Other chat services should just follow suit. In other words, there's no point in mentioning "additional message sources" when all of those sources use XMPP. Just mention XMPP!

Thursday, August 07, 2008

Maybe a couple of months ago some blogger whose feed is aggregated at Planet Intertwingly (I don't remember who) wrote a post summarizing his complaints about Erlang. One complaint was about extracting a value from a tuple. Say you assign a tuple to some variable, as in

1> X = {a, 10}.
{a,10}
and you want the value of the second element of the tuple. How do you do that? Conventionally,
2> {_, Y} = X.
{a,10}
3> Y.
10
Now the variable Y has the value 10. (In Erlang, the underscore is the anonymous variable.) If a tuple has a large number of fields or contains nested tuples, such tuple-unpacking statements are unwieldy. And that's what the guy objected to. It's too verbose and easy to botch.

Fortunately, pattern matching provides a simple way to work around this. Define a function that matches the tuple (particularly the first atom)
4> ValueOf = fun({a, Value}) -> Value end.
#Fun
5> ValueOf(X).
10
If you define one function for each element in a tuple, you get accessor methods for the tuple and you don't have to continue writing long expressions with a lot of anonymous variables.

Saturday, August 02, 2008

Here's the second GPS email for Bob's hike:

SPOT Check OK. All is well !! Bob
ESN:0-7425741
Latitude:43.9801
Longitude:-121.8084
Nearest Location: Elk Lake, United States
Distance: 0 km(s)
Time:07/31/2008 22:35:38 (US/Mountain)
http://maps.google.com/maps?f=q&hl=en&geocode=&q=43.9801,-121.8084&ie=UTF8&z=12&om=1

A while ago I tweeted about the possibility of the growth of more continental/regional/local manufacturing in response to the growing costs of transportation. Twitter has some problems preventing me from getting the permalink; even so, it sort of goes without saying that that will happen. Anyway, the Times now has an article about the phenomenon.

Thursday, July 31, 2008

My stepdad Bob and stepsister Heather are hiking a section of the Pacific Crest Trail this month. Bob signed up for a location service that emails his friends his current location and a simple status message. I just got the first one. Here's what it looks like, including the link to the map:

SPOT Check OK. All is well !! Bob
ESN:0-7425741
Latitude:44.1548
Longitude:-121.8187
Nearest Location: Belknap Springs, United States
Distance: 19 km(s)
Time:07/31/2008 08:04:40 (US/Mountain)
http://maps.google.com/maps?f=q&hl=en&geocode=&q=44.1548,-121.8187&ie=UTF8&z=12&om=1

Wednesday, July 30, 2008

Why are there such long delays in anaconda when upgrading from RedHat Enterprise Linux 4.x to 5.0? For a while there, restorecon(1) and find(1) were taking up most of the CPU, which I assume had something to do with setting default modes on certain files, but other than that it's just been anaconda in the process table, sometimes with awkward, crickets-chirping silences.

Tuesday, June 03, 2008

Funny thing, I was just thinking that it's not possible to unit test C programs because, for example, a system call like read(2) just takes primitive arguments, which doesn't give you much opportunity for mocking. Then I had a small epiphany: use LD_PRELOAD to mock out external functions, including system calls.