Using Treemaps to Visualize Complex Information

I spent some time today digging into treemaps, a way to represent information visually as a series of nested rectangles whose colors are determined by an additional measurement. If that explanation sounds hopelessly obtuse, take a look at a world population treemap created using Honeycomb, enterprise treemapping software developed by the Hive Group:

World population treemap screenshot created by Honeycomb, the Hive Group's treemapping software

This section of the treemap shows the countries of Africa. The size of each rectangle shows its population relative to the other countries. The color indicates population density, ranging from dark green (most dense) to yellow (average) to dark orange (least dense). Hovering over a rectangle displays more information about it,.

A treemap can be adjusted to make the size and color represent different things, such as geographic area instead of population. You also can zoom in to a section of the map, focusing on a specific continent instead of the entire world. The Honeycomb treemapping software offers additional customization, which comes in handy on a Digg treemap that displays the most popular links on the site organized by section.

By tweaking the Digg treemap, you can see the hottest stories based on the number of Diggs, number of Diggs per minute and number of comments. You also can filter out results by number of Diggs, number of Diggs per minute or the age of the links.

I don't know how hard it is to feed a treemap with data, but it seems like an idea that would be useful across many different types of information. As a web publisher, I'd like to see a treemap that compares the web traffic and RSS readership my sites receive with the ad revenue they generate. The Hive Group also offers sample applications that apply treemaps to the NewsIsFree news aggregator, Amazon.Com products, and iTunes singles. This was not a good day to be a Jonas Brother.

Sweden Declares War on My Web Server

Since 4 a.m. Friday, a computer at a Swedish IT company made more than 1.5 million web requests to my web site URouLette, which links to random web pages stored in a MySQL database. They're coming in at a speed of 38 requests a second. My MySQL database server can't handle that many requests, so by Friday afternoon Workbench and a bunch of other sites slowed to a crawl as the web server began belching black smoke. A massive crash was imminent.

The last time somebody did this, I used the Linux utility iptables to reject all connections from the offending IP address, which solved the problem easy peasy lemon squeezy. This time around, iptables failed with a "Can't open dependencies file" error.

My new friend in Sweden appears to be building a database of web addresses by requesting a URouLette script that loads a random web page over and over. This is both obnoxious and dumb -- all links on URouLette come from the Open Directory Project and can be downloaded in one file. I've reduced the severity of the problem by sending the same link with every request -- the company's home page.

Flooding a web server with this many requests constitutes a denial of service attack. In the time I've composed this blog entry, another 100,000 requests have been made. Ironically, an employee of the company blogged recently that it was suffering its own attack, though on a much larger scale:

Tens of thousands of machines on the internet suddenly started trying to access a single host within the network. The IP they targeted has in fact never been publicly used as long as we've owned it (which is just a bit under two years) and it has never had any public services.

We have no clue whatsoever why someone would do this against us. We don’t have any particular services that anyone would gain anything by killing. We're just very puzzled.

Our "ISP", the guys we buy bandwidth and related services from, said they used up about 1 gigabit/sec worth of bandwidth and with our "mere" 10megabit/sec connection it was of course impossible to offer any services while this was going on.

This is a good time to mention that I never liked Bjorn Borg.

Fixing Page Not Found Errors on FeedBurner MyBrand Domains

Google has begun integrating FeedBurner, the service for publishing, tracking and promoting RSS feeds, into the rest of the Don't Be Evil Empire. As part of the move, FeedBurner users who are employing the MyBrand feature must make a change to the name service for their domain names.

MyBrand makes it possible to host your feeds on FeedBurner without losing any subscribers if you decide later to quit the service. I'm using it to host four feeds, including SportsFilter's RSS feed, on my own domains.

MyBrand domains used to point to feeds.feedburner.com, but they must be changed to a new subdomain of feedproxy.ghs.google.com. Each FeedBurner user is assigned a different subdomain. For SportsFilter, I updated it by revising one line in the BIND zone file for sportsfilter.com:

feeds IN CNAME subdomain.feedproxy.ghs.google.com.

The subdomain portion is based on your Google account.

This is supposed to be all that's required to make the move. Unfortunately, a giant honking bug in FeedBurner broke three of my four MyBrand domains this morning. Users received a 404 "Page Not Found" error when they tried to access my feeds. I found a workaround on Google's FeedBurner help site that explains how to fix the problem:

  • Log in to FeedBurner with your Google account.
  • Open the MyBrand page.
  • Remove the broken domain name and click Save.
  • Add the domain name back again and click Save.

Who Belongs in the Brat Pack?

March 1987 cover of Playgirl Magazine featuring Judd NelsonI don't spend enough time tackling the big questions on Workbench, so I'd like to rectify that today by addressing a subject of great import among those of us who came of age in the '80s: Are James Spader and Robert Downey Jr. part of the Brat Pack?

The term Brat Pack was coined by journalist David Blum in the June 10, 1985, issue of New York magazine. His cover story Hollywood's Brat Pack describes a world, now lost, in which attractive young women fought for the right to engage in consequence-free heterosexual coitus with Judd Nelson.

If Rob Lowe seemed to be inviting all too much attention from the girls, Judd Nelson acted as though he wanted nothing to do with it. His fame, too, helped attract them -- they recognized his tough-guy looks from his role as the wrong-way kid in The Breakfast Club and sought his attention. But as Alice sat down in an empty chair next to him, Judd Nelson announced to anyone within earshot, including Alice, "There is a line. When someone crosses the line, I get angry. And when someone sits down at the table, they have crossed the line. You can let them get close" -- he looked around at Alice and the swarm of girls -- "but you can't let them sit down." ...

Everyone in Hollywood differs over who belongs to the Brat Pack. That is because they are basing their decision on such trivial matters as whose movie is the biggest hit, whose star is rising and whose is falling, whose face is on the cover of Rolling Stone and whose isn't. And occasionally, some poor, misguided fool bases his judgment on whose talent is the greatest.

Only a fool would attempt to judge Brat Pack members on the basis of acting talent. The editors of the Pack's Wikipedia entry have spent a great deal of time defining eligibility for membership:

Appearance in one, or both, of the ensemble casts of John Hughes' The Breakfast Club and Joel Schumacher's St. Elmo's Fire is often cited as a prerequisite for being a core Brat Pack member.[10][11][12] With this criterion, the most commonly cited members include Emilio Estevez, Anthony Michael Hall, Rob Lowe, Andrew McCarthy, Demi Moore, Judd Nelson, Molly Ringwald and Ally Sheedy.[5][6][13][14][15][16] Conspicuously absent from most lists is Mare Winningham, the only principal member of either cast who never starred in any other films with any other cast mates.

When there are nine citations in just three sentences, you know that a major bloodbath has taken place behind the scenes at Wikipedia. The victorious editors, clambering over the corpses of their opponents with their cold, dead hands still clutching keyboards, have taken a conservative position on membership that relegates Spader and Downey to "close contributor" status. Jamie Currie, the web's preeminent Brat Pack scholar, also uses Wikipedia's eight members and consigns Downey and John Cusack to "Possibly Pack" status.

This is a crime against the '80s. I recently caught the tail end of Less Than Zero, a 1987 film I've seen in random order over the years while channel surfing and reassembled in my brain. That movie has everything we've come to associate with the great works of the Brat Pack: a lily white cast, self-absorbed young protagonists who yearn for more interesting personal problems, big haired women in shoulder pads and absolutely no awards for acting.

Press photo from the 1987 movie Less Than Zero starring Jami Gertz, Robert Downey Jr. and Andrew McCarthy

I take the liberal view of Brat Pack membership. If you've starred in at least two films with a lead actor from Breakfast Club or St. Elmo's Fire and you were younger than 30 at the time -- the Harry Dean Stanton exclusion -- you ran with the Pack.

James Spader starred with McCarthy and Ringwald in Pretty in Pink, McCarthy in Less Than Zero and Mannequin and Lowe in Bad Influence. Robert Downey Jr. starred with Hall in Weird Science and Johnny Be Good, Hall and Nelson in Hail Caesar and Ringwald in The Pick-Up Artist. He also starred with Spader in Tuff Turf and Less Than Zero, which counts once we've admitted Spader into the group.

The decision to admit these actors has far-reaching consequences that become clear when you spend too much time on IMDB's People Working Together search page.

The appearance of three or more Brat Pack members in a film grants it first-order status alongside Breakfast Club and St. Elmo's Fire, so Less Than Zero, Pretty in Pink and Hail Caesar also can bestow membership upon their stars.

Jami Gertz starred with Downey, McCarthy and Spader in Less Than Zero, Hall and Ringwald in Sixteen Candles and Spader in Endless Love. Count her in.

John Cusack starred with Gertz, Hall and Ringwald in Sixteen Candles, Lowe and McCarthy in Class, Spader in True Colors and Bob Roberts and Moore in One Crazy Summer. He's way, way in.

Gertz and Cusack bring Sixteen Candles and Class to first-order status.

Charlie Sheen starred with Cusack, Gertz, Hall and Ringwald in Sixteen Candles, Estevez and Nelson in Never on Tuesday, Estevez and Moore in Wisdom, Estevez in Young Guns and Men at Work, Cusack in Eight Men Out and Being Jon Malkovich and Spader in Wall Street. He's the mayor of in.

Because Sheen and Estevez are brothers who often work together, sometimes on films that no one outside their immediate family went to see, there's potential for chaos here. The first-order clause must be modified to exclude movies in which two of the three Brat Pack stars are siblings. (My apologies, John and Joan Cusack.)

So if you starred in at least two films with a lead actor from Breakfast Club, St. Elmo's Fire, Less Than Zero, Pretty in Pink, Hail Caesar, Sixteen Candles or Class, and you were younger than 30 at the time, and the film did not star two siblings with less than two otherwise eligible members, you belong to an imaginary group of middle-aged actors whose association will circumscribe your career until your dying day.

Oddly, I have yet to find a way to admit Kevin Bacon.

Coming soon on Workbench: Who's the better Darrin Stevens?

Judd Nelson forever!

Fixing a 'Recompile with -fPIC' Error in MySQL

I run my web servers by compiling the most important components from source code, which makes it possible for me to add security fixes more quickly and fine-tune my installations of Apache, MySQL and PHP. While compiling the new release PHP 5.2.8 this weekend, the make process failed with this error:

/usr/bin/ld: /usr/mysql/lib/mysql/libz.a(compress.o): relocation R_X86_64_32 against 'a local symbol' can not be used when making a shared object; recompile with -fPIC
/usr/mysql/lib/mysql/libz.a: could not read symbols: Bad value

Naturally, I had absolutely no idea what this meant.

The file libz.a is part of the Zlib compression library, which apparently is included in MySQL 5.0. A Google search for the error message uncovered a bunch of people suffering the same problem I encountered when compiling programs on Linux. The best explanation I found was a Gentoo Linux page on how to fix -fPIC errors. Unfortunately, none of Gentoo's tips worked for me.

Through trial and error (and error and error), I finally solved the problem by compiling a new copy of Zlib and specifying that it create a Unix shared library using the -s option:

Next, I added the option --with-zlib-dir=/usr/zlib when running configure to prepare PHP for installation. This didn't work until I figured out one last obstacle -- the Zlib option must be placed before the --with-mysql option. Otherwise, PHP tries to use the copy of Zlib included with MySQL.

Everything now compiles and runs successfully. So until the next time I try to install something, I can return my ego to its upright position. My new Linux technique is unstoppable.

New Word: Cupertino

There's a new meaning for the word cupertino that has nothing to do with the city in California, according to the etymology site World Wide Words. A cupertino is any word that's produced when a lazy editor accepts spellcheck suggestions without reviewing them, as in this press release:

In August, nGenera announced version 8.1 of its Talisma Knowledgebase, saying the release added enchantments to its search functionality through an OEM agreement with enterprise search vendor Autonomy.

The name comes from Microsoft Word 97's suggestion that Cupertino is the proper spelling of co-operation. "European writers who omitted the hyphen from co-operation (the standard form in British English) found that their automated checkers were turning it into Cupertino," Michael Quinlan writes.

In July, the Christian media site OneNewsNow turned the sprinter Tyson Gay into a human cupertino. In an attempt to reclaim the word "gay," for purposes as yet unknown, the site was automatically replacing it with "homosexual" in news stories. This resulted in several articles about the accomplishments of Tyson Homosexual, one of the fastest men alive. "He was ahead of American Tyson Homosexual from the get-go and beat Homosexual easily," one story states.

Finding Updated Feeds with Simple Update Protocol

FriendFeed is working on Simple Update Protocol (SUP), a means of discovering when RSS and Atom feeds on a particular service have been updated without checking all of the individual feeds. Feeds indicate that their updates can be tracked with SUP by adding a new link tag, as in this example from an Atom feed:

<link rel="http://api.friendfeed.com/2008/03#sup" href="http://friendfeed.com/api/sup.json#53924729" type="application/json" />

The rel attribute identifies an ID for the feed, which is called its SUP-ID. The href attribute contains a URL that uses JSON to identify updated feeds by their SUP-IDs. There's also a type attribute that contains "application/json" to indicate the content type at the linked resource.

Developer Paul Bucheit makes the case for the protocol on FriendFeed's blog. "[O]ur servers now download millions of feeds from over 43 services every hour," he writes. "One of the limitations of this approach is that it is difficult to get updates from services quickly without FriendFeed's crawler overloading other sites' servers with update checks."

My first take on the idea is that defining a relationship with a URI is too different than standard link relationships in HTML, which employ simple words like "previous", "next", and "alternate". When new relationships have been introduced, they follow this convention, as Google did when it proposed nofollow.

Also, neither RSS 1.0 nor RSS 2.0 allow more than one link tag in a feed, so the SUP tag only would be valid in Atom feeds.

Both of these concerns could be addressed by identifying the SUP provider with a new namespace, as in this hypothetical example:

<rss xmlns:sup="http://friendfeed.com/api/sup/">
<channel>
<sup:provider href="http://friendfeed.com/api/sup.json#53924729" type="application/json" />
...

Six Apart has offered an alternate solution that seems more likely to work for large hosting sites and constant feed-checking services like FriendFeed. The company produces an update stream of Atom data indicating an update on any of the thousands of TypePad or Vox blogs.

Another potential solution would be to borrow the technique used by Radio UserLand blogs to identify a list of recently updated sites: Add a category tag to the feed with the value "rssUpdates" and a domain attribute with the URI of XML data containing the list:

<category domain="http://rpc.weblogs.com/shortChanges.xml">rssUpdates>/category>

The XML data is in the weblog changes format used by Weblogs.Com.