Sharing Blog Posts on Your Facebook Profile

Facebook application Simplaris BlogcastOver the past few months, I've gotten back into contact with more than a dozen old friends and coworkers through Facebook. After blogging for nine years, I prefer hanging out here on Workbench over social networking sites, but I'm beginning to feel like an anachronism. It's easier for people to keep up with their BFFs on sites like Facebook than to visit a bunch of personal blogs, even with the help of RSS and a feed reader. I recently began linking my posts on Facebook using Simplaris Blogcast, a Facebook application that posts the title and link of blog posts to your Facebook profile. You can manually post items from your blog, pull them automatically from an RSS feed or ping Simplaris with each new post.

For reasons unknown, Simplaris Blogcast stopped pulling items automatically from my feed a month ago. To get automatic posts working again, I've updated my weblog ping library for PHP so that it can ping Blogcast each time I post on Workbench.

Blogcast uses the same ping protocol as Weblogs.Com. Before you can use the Weblog-Pinger library in a PHP script, you must add Blogcast to your Facebook account and retrieve your ping info, which includes a ping URL that includes a special ID unique to your account. In the example URL, the ID is 0dd8dfad5c842b600091ba. You'll need this ID when sending a ping, as in this example code:

$pinger = new Weblog_Pinger();
$pinger->ping_simplaris_blogcast($post_title, $post_link, "0dd8dfad5c842b600091ba");

Once Blogcast has successfully received a ping, the application setting Update Mode will have the Ping Automatic selection chosen.

The code's available under the open source GPL license. If it worked, this post will show up on my Facebook profile.

Obama's White House Adopts Atom Format

I became the first subscriber on Bloglines to the feed for the new White House web site, which launched at 12:00 p.m. as Barack Obama became the 44th president of the United States. As a syndication dork, I was interested to discover that the feed employs Atom as its format:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="">
  <title>White Blog Feed</title>
  <link href="" />
    <title>A National Day of Renewal and Reconciliation</title>
    <link href="" />
    <summary>President Barack Obama's first proclamation.</summary>

The Atom feed passes the Feed Validator, but there are four issues that trigger warning messages:

  • Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII" [help]
  • Missing atom:link with rel="self" [help]
  • Two entries with the same id: urn:uuid:ca4baafc-b6bc-45e5-9144-79c5289d9518 (4 occurrences) [help]
  • Two entries with the same value for atom:updated: 2009-01-20T17:01:00Z [help]

When he has the time, President Obama can address these issues pretty quickly.

First, the XML element should reflect the actual encoding transmitted by the White House server:

<?xml version="1.0" encoding="US-ASCII"?>

Alternatively, the feed should be published using the UTF-8 encoding.

Next, the feed's link element must include an rel="self" attribute indicating that it's the feed's own URL:

<link rel="self" href="" />

Finally, steps should be taken so that each feed entry has a unique ID. I recommend using the tag URI format, which for the White House could produce id elements like this:


The final number in the id element should be a unique number, such as the index number of a blog entry.

The new White House site promises more feeds to come, but describes them as RSS feeds:

RSS is an acronym for Really Simple Syndication or Rich Site Summary. It is an XML-based method for distributing the latest news and information from a website that can be easily read by a variety of news readers or aggregators.

Either this is an error -- Atom feeds are not in RSS format, of course -- or Obama's effort towards national reconciliation includes the combatants in the RSS/Atom war.

Creating PHP Web Sites with Smarty

I recently relaunched SportsFilter using the site's original web design on top of new programming, replacing a ColdFusion site with one written in PHP. The project turned out to be the most difficult web application I've ever worked on. For months, I kept writing PHP code only to throw it all out and start over as it became a ginormous pile of spaghetti.

Back in July, SportsFilter began crashing frequently and neither I nor the hosting service were able to find the cause. I've never been an expert in ColdFusion, Microsoft IIS or Microsoft SQL Server, the platform we chose in 2002 when SportsFilter's founders paid Matt Haughey to develop a sports community weblog inspired by MetaFilter. Haughey puts a phenomenal amount of effort into the user interface of his sites, and web designer Kirk Franklin made a lot of improvements over the years to SportsFilter. Users liked the way the site worked and didn't want to lose that interface. After I cobbled together a site using the same code as the Drudge Retort, SportsFilter's longtime users kept grasping for a delicate way to tell me that my design sucked big rocks.

PHP's a handy language for simple web programming, but when you get into more complex projects or work in a team, it can be difficult to create something that's easy to maintain. The ability to embed PHP code in web pages also makes it hard to hand off pages to web designers who are not programmers.

I thought about switching to Ruby on Rails and bought some books towards that end, but I didn't want to watch SportsFilter regulars drift away while I spent a couple months learning a new programming language and web framework.

During the Festivus holidays, after the family gathered around a pole and aired our grievances, I found a way to recode SportsFilter while retaining the existing design. The Smarty template engine makes it much easier to create a PHP web site that enables programmers and web designers to work together without messing up each other's work.

Smarty works by letting web designers create templates for web pages that contain three things: HTML markup, functions that control how information is displayed, and simple foreach and if-else commands written in Smarty's template language instead of PHP. Here's the template that display SportsFilter's RSS feed:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss xmlns:dc="" xmlns:atom="" version="2.0">
    <description>Sports community weblog with {$member_count} members.</description>
    <atom:link rel="self" href="" type="application/rss+xml" />
{foreach from=$entries item=entry}
      <pubDate>{$entry.timestamp|date_format:"%a, %d %b %Y %H:%M:%S %z"}</pubDate>
      <guid isPermaLink="false">,2002:weblog.{$entry.dex}</guid>

The Smarty code in this template is placed within "{" and "}" brackets. The foreach loop pulls rows of weblog entries from the $entries array, storing each one in an $entry array. Elements of the array are displayed when you reference them in the template -- for example, $ displays the username of the entry's author.

The display of variables can be modified by functions that use the "|" pipe operator. The escape function, used in {$entry.title|escape:'html'}, formats characters to properly encode them for use in an XML format such as RSS. (It's actually formatting them as HTML, but that works for this purpose.)

Because Smarty was developed with web applications in mind, there are a lot of built-in functions that make the task easier. SportsFilter displays dates in a lot of different forms. In my old code, I stored each form of a date in a different variable. Here, I just store a date once as a Unix timestamp value and call Smarty's date_format function to determine how it is displayed.

Smarty makes all session variables, cookies, and the request variables from form submissions available to templates. In SportsFilter, usernames are in $smarty.session.username and submitted comments are in $smarty.request.comment. There also are a few standard variables such as $, the current time.

To use Smarty templates, you write a PHP script that stores the variables used by the template and then display the template. Here's the script that displays the RSS feed:

// load libraries
$spofi = new SportsFilter();

// load data
$entries = $spofi->get_recent_entries("", 15, "sports,");
$member_count = floor($spofi->get_member_count() / 1000) * 1000;

// make data available to templates
$smarty->assign('spofi', $spofi);
$smarty->assign('entries', $entries);
$smarty->assign('page_title', "SportsFilter");
$smarty->assign('member_count', $member_count);

// display output
header("Content-Type: text/xml; charset=ISO-8859-1");

Smarty compiles web page templates into PHP code, so if something doesn't work like you expected, you can look under the hood. There's a lot more I could say about Smarty, but I'm starting to confuse myself.

There are two major chores involved in creating a web application in PHP: displaying content on web pages and reading or writing that content from a database. Smarty makes one of them considerably easier and more fun to program. I'm fighting the urge to rewrite every site I've ever created in PHP to use it. That would probably be overkill.

Peace Declared Between Myself and Sweden

As it turns out, Sweden did not intentionally declare war on my web server earlier this month. Programmer Daniel Stenberg explains how the international incident happened:

A few years ago I wrote up silly little perl script (let's call it that would fetch a page from a site that returns a "random URL off the internet." I needed a range of URLs for a test program of mine and just making up a thousand or so URLs is tricky. Thus I wrote this script that I would run and allow to get a range of URLs on each invoke and then run it again later and append to the log file. It wasn't a fancy script, but it solved my task.

The script was part of a project I got funded to work on, that was improving libcurl back in 2005/2006 so I thought adding and committing the script to CVS felt only natural and served a good purpose. To allow others to repeat what I did.

His script ended up on a publicly accessible web site that was misconfigured to execute the Perl script instead of displaying the code. So each time a web crawler requested the script, it ran again, making 2.6 million requests on URouLette in two days before it was shut down.

Sternberg's the lead developer of CURL and libcurl, open source software for downloading web documents that I've used for years in my own programming. I think it's cool to have helped the project in a serendipitous, though admittedly server destroying, way.

To make it easier for programmers to scarf up URouLette links without international strife, I've added an RSS feed that contains 1,000 random links, generated once every 10 minutes. There are some character encoding issues with the feed, which I need to address the next time I revise the code that builds URouLette's database.

This does not change how I feel about Bjorn Borg.

Using Treemaps to Visualize Complex Information

I spent some time today digging into treemaps, a way to represent information visually as a series of nested rectangles whose colors are determined by an additional measurement. If that explanation sounds hopelessly obtuse, take a look at a world population treemap created using Honeycomb, enterprise treemapping software developed by the Hive Group:

World population treemap screenshot created by Honeycomb, the Hive Group's treemapping software

This section of the treemap shows the countries of Africa. The size of each rectangle shows its population relative to the other countries. The color indicates population density, ranging from dark green (most dense) to yellow (average) to dark orange (least dense). Hovering over a rectangle displays more information about it,.

A treemap can be adjusted to make the size and color represent different things, such as geographic area instead of population. You also can zoom in to a section of the map, focusing on a specific continent instead of the entire world. The Honeycomb treemapping software offers additional customization, which comes in handy on a Digg treemap that displays the most popular links on the site organized by section.

By tweaking the Digg treemap, you can see the hottest stories based on the number of Diggs, number of Diggs per minute and number of comments. You also can filter out results by number of Diggs, number of Diggs per minute or the age of the links.

I don't know how hard it is to feed a treemap with data, but it seems like an idea that would be useful across many different types of information. As a web publisher, I'd like to see a treemap that compares the web traffic and RSS readership my sites receive with the ad revenue they generate. The Hive Group also offers sample applications that apply treemaps to the NewsIsFree news aggregator, Amazon.Com products, and iTunes singles. This was not a good day to be a Jonas Brother.

Finding Updated Feeds with Simple Update Protocol

FriendFeed is working on Simple Update Protocol (SUP), a means of discovering when RSS and Atom feeds on a particular service have been updated without checking all of the individual feeds. Feeds indicate that their updates can be tracked with SUP by adding a new link tag, as in this example from an Atom feed:

<link rel="" href="" type="application/json" />

The rel attribute identifies an ID for the feed, which is called its SUP-ID. The href attribute contains a URL that uses JSON to identify updated feeds by their SUP-IDs. There's also a type attribute that contains "application/json" to indicate the content type at the linked resource.

Developer Paul Bucheit makes the case for the protocol on FriendFeed's blog. "[O]ur servers now download millions of feeds from over 43 services every hour," he writes. "One of the limitations of this approach is that it is difficult to get updates from services quickly without FriendFeed's crawler overloading other sites' servers with update checks."

My first take on the idea is that defining a relationship with a URI is too different than standard link relationships in HTML, which employ simple words like "previous", "next", and "alternate". When new relationships have been introduced, they follow this convention, as Google did when it proposed nofollow.

Also, neither RSS 1.0 nor RSS 2.0 allow more than one link tag in a feed, so the SUP tag only would be valid in Atom feeds.

Both of these concerns could be addressed by identifying the SUP provider with a new namespace, as in this hypothetical example:

<rss xmlns:sup="">
<sup:provider href="" type="application/json" />

Six Apart has offered an alternate solution that seems more likely to work for large hosting sites and constant feed-checking services like FriendFeed. The company produces an update stream of Atom data indicating an update on any of the thousands of TypePad or Vox blogs.

Another potential solution would be to borrow the technique used by Radio UserLand blogs to identify a list of recently updated sites: Add a category tag to the feed with the value "rssUpdates" and a domain attribute with the URI of XML data containing the list:

<category domain="">rssUpdates>/category>

The XML data is in the weblog changes format used by Weblogs.Com.

Customizing Apache Directory Listings with .htaccess

I was clearing off my desk today when I found an article I've been meaning to scan and send to somebody -- the story of how my friends almost elected a dalmatian and squirrel to the homecoming court of the University of North Texas in 1989. The alumni magazine wrote a feature on Hector the Eagle Dog and Agnes the Squirrel's campaign, which attracted national media and made a few of the human homecoming candidates very angry.

I can never tell when a file's too big to send in email without aggravating the recipient, so I upload files to my server and email the links instead. I decided to make this process easier by creating a clippings directory where uploaded files show up automatically.

The Apache web server can publish a listing of all files in a directory, as the official Apache site does in its images subdirectory. I wanted to make my clippings page look more like the rest of my weblog, so I found a tutorial on customizing directory listing pages.

First, I created an .htaccess file in the directory and turned directory indexing on with this command:

Options +Indexes

This command only works on servers that are configured to allow users to change options. For security reasons, I turn directory listings off by default, so they only appear when I specifically configure a directory to reveal its contents.

Next, I created header and footer web pages that contain the HTML markup to display above and below the directory listing. These files are identified by two more commands in .htaccess:

HeaderName header.html
ReadmeName footer.html

These web pages are located in the clippings directory. For the final step, I added a description of PDF documents and made sure that the header and footer files are not included in the listing:

AddDescription "PDF Document" .pdf
IndexIgnore header.html footer.html

There's a lot more that can be customized in an Apache directory listing, as the tutorial demonstrates, but for my project it seemed like overkill.

Update: Alternatively, I could've checked to see if the story was already online. Auugh.

Sharing Bookmarks and Feed Lists with XML

I'm working on a programming project that requires an XML format to represent bookmarks and other collections of URIs, but before I reinvent the wheel I'd like to see if there's an existing format that meets my goals. The format should be able to hold all of the following information:

There are several potential formats that could be put to use: XBEL, the outline formats OPML and XOXO and the syndication formats RSS and Atom. Each has drawbacks, as I'll go over in upcoming posts here on Workbench.

I'm starting with XBEL, because that's the best-supported format specifically designed to hold bookmarks. XBEL was created in 1998 by members of the Python community led by Fred L. Drake Jr. XBEL 1.0 continues to be the only release, though there's occasional talk on the XBEL-Specs mailing list about developing a new version.

XBEL was designed to represent browser bookmarks and has become the native format for storing them in the Konqueror and Galeon browsers. There are add-ons that extend XBEL support to more popular browsers -- one example is SyncPlaces, a Firefox add-on that can manually import and export XBEL bookmarks.

Here's what a bookmark looks like in XBEL data produced by SyncPlaces:

<bookmark id="row123" added="2008-11-25T17:30:22.352" modified="2008-11-25T17:30:22.522" href="">
    <metadata owner="Mozilla" dateadded="1227634222352963" lastmodified="1227634222522963"/>
  <desc>Rogers Cadenhead's personal weblog</desc>

Bookmarks in XBEL can be grouped into folders, which themselves can contain more folders to create a hierarchy. The format's well-designed and can be extended by namespaces or the metadata element, which in the preceding example carries Firefox-specific information.

There are several drawbacks to using XBEL. The format predates social bookmarking and lacks support for tagging bookmarks or assigning them to categories like the ones employed by the Open Directory Project.

XBEL also predates the popularity of syndication, so there's no way to identify that bookmarks are RSS or Atom feeds. You also can't establish a relationship between a web site's home page and its feed. A few years ago on XBEL-Specs I floated the idea of adding type and rel attributes to bookmarks that function like they do in Atom, which would be all that's required to publish blogrolls and feed subscription lists with the format.

XBEL can't be used for web directories, feed lists or social bookmarks without extending the format. I think all three are strong enough use cases to be part of a bookmark format's core set of elements. If I choose XBEL, most of my project's functionality won't be supported by today's XBEL tools or client libraries, which is the primary reason to adopt an existing format.

New Comment Pages Added to Workbench

I spent a little time this morning improving the comment system on Workbench. There's now a comments page that shows the 50 most recent comments submitted to the weblog.

After you submit a comment here, the site will store your name and home page link in cookies for 180 days so you don't have to type them in again.

I've also added a line to the site's moderation policy: "Comments that have nothing to do with the subject of a post will be deleted." There's too much off-topic noise here. I'm not interested in seeing every single post I write turned into an opportunity to rant against liberals or the policies I follow on the Drudge Retort. If you have a beef about the Retort or the liberal slant of the site, take it up there.

Comments that I delete on this blog are published for a few days on a new deleted comments page. So if you've posted something here that gets got, you can retrieve the text and post it somewhere else.

Adding ReCaptcha to a Weblog

I've added a ReCaptcha component to the comment form on Workbench to deter spammers. The ReCaptcha system presents two hard-to-read words that must be typed in successfully for a comment to be saved. Here's what the component looks like:

Recaptcha box for spam detection

I tried as long as possible to avoid using captchas, but the amount of spam hitting this blog continues to grow, particularly from foreign IP addresses. Workbench has received 16,000 comments and more than 260,000 spam since it began accepting comments in 2002.

The ReCaptcha project serves a useful purpose, digitizing old books and newspaper articles by getting millions of people to identify words that OCR software couldn't recognize. Adding the component took around 10 minutes: I signed up for a ReCaptcha account, stored the PHP library recaptchalib.php on my web server, and added less than 20 lines of PHP to the page that takes comments.

The addition of captchas serves as official notice that my comment flak technique failed to deter spammers. I'm retiring that code.

SiteMeter Crashes Internet Explorer with 'Operation Aborted'

Last night several of my web sites, including the Drudge Retort, began crashing Internet Explorer with the error message "Internet Explorer cannot open the Internet site ... Operation aborted."

I've encountered this error before, and when it occurs out of the blue on a site you haven't changed, the culprit is usually a problem with third-party Javascript code, as CNet's Clientside blog explains:

IE does this when you attempt to modify a DOM element before it is closed. This means that if you try and append a child element to another and that other element (like the document.body) is still loading, you'll get this error.

To find the error, remove JavaScript widgets one at a time from your site until the error disappears. The culprit here was SiteMeter, which made some recent changes to its code. I've pulled the SiteMeter code until they announce a fix.

Displaying Twitter Updates on a Web Page

I recently began using Twitter, a microblogging service for posting short, chat-like blog entries and reading what other users of the service are doing. The site has severe reliability problems, but it's still an entertaining way to get real-time updates from bloggers I read along with others I know who've been sucked into Twitter's maw.

I wrote some code to display my most recent Twitter update on my weblog, Workbench, in a sidebar at upper right. This afternoon, I've released the Twitter-RSS-to-HTML PHP script under an open source license. The script requires MagpieRSS for PHP, an open source PHP library that can parse RSS and Atom feeds.

MagpieRSS caches feed data, so at times when Twitter is glacially slow or can't be accessed, this script won't hurt the performance of your server.

The first release of the script only works with a Twitter user's RSS feed, which can be found in the "RSS" link at the bottom of a user's Twitter page. The only tough part about writing the script was creating regular expressions to turn URLs into hyperlinks and "@" references into links to Twitter user pages:

// turn URLs into hyperlinks
$tweet = preg_replace("/(http:\/\/)(.*?)\/([\w\.\/\&\=\?\-\,\:\;\#\_\~\%\+]*)/", "<a href=\"\\0\">Link</a>", $tweet);
// link to users in replies
$tweet = preg_replace("(@([a-zA-Z0-9]+))", "<a href=\"\\1\">\\0</a>", $tweet);

If you're reading this and wondering why anyone should bother with Twitter, I recommend reading the updates by Jay Rosen, a former university journalism chair who uses the service to share a running dialogue on the media. He punches above his weight in this 140-character-or-less medium.

Following Web Page Redirects with Java

CNET moved a bunch of its blogs to a different domain this weekend, including Beyond Binary, Coop's Corner, Geek Gestalt, One More Thing, Outside the Lines and The Social. I mention this because the change hosed Meme13, which treated all six as if they were newly discovered sites.

One of my ground rules for developing Meme13 is that I won't hand-edit the site to make it smarter. I need the application to recognize when existing sites in its database have moved.

Meme13 monitors sites using a Java application I wrote that downloads web pages with the Apache HTTPClient 3.0 class library. Web servers indicate that a page has moved by sending an HTTP redirect response of either "301 Moved Permanently," which indicates a permanent move, or "302 Found," which is intended for temporary changes. I wrote a Java method that can find the current location of a web page, even if it has been redirected one or more times:

public String checkFeedUrl(String feedUrl) {
    String response = feedUrl;
    HttpClient client = new HttpClient();
    HttpMethod method = new HeadMethod(feedUrl);
    try {
        // request feed
        int statusCode = client.executeMethod(method);
        if ((statusCode == 301) | (statusCode == 302)) {
            // feed has moved
            Header location = method.getResponseHeader("Location");
            if (!location.getValue().equals("")) {
                // recursively check URL until it's not redirected any more
                response = checkFeedUrl(location.getValue());
        } else {
            response = feedUrl;
    } catch (IOException ioe) {
        response = feedUrl;
    return response;

The HeadMethod class requests a web page's headers instead of requesting the entire page, consuming far less bandwidth as it checks for redirects. My Java method looks for both kinds of redirects, because web publishers have a bad habit of using "302 Found" when they've moved a page permanently.

Setting the Link on a ShareThis Widget

ShareThis widgetI'm continuing to work on Meme13, a site that packages together the last 13 sites to show up on the Techmeme Leaderboard so they can be sampled as a feed or web site. The site has attracted around 25 RSS subscribers in its first month.

I've added a ShareThis widget on each entry that makes it easy to share content from Meme13 on sites like, Digg and Facebook.

Normally, ShareThis links to the page the widget has been displayed on. That doesn't suit my purposes on Meme13, because I'm trying to promote the originators of the content. If someone reads the article about landing a startup job by Ryan Spoon on Meme13, the ShareThis widget should link to the article on Spoon's blog.

ShareThis has a JavaScript API that can be used to teach the widget new tricks. Here's the JavaScript code to set the widget's target link and display the widget:

<p><script language="javascript" type="text/javascript">
title:'<TMPL_VAR title>',
url:'<TMPL_VAR link ESCAPE="HTML">',
}, {button:true} );

The <TMPL_VAR title> and <TMPL_VAR link ESCAPE="HTML"> tags are part of the template language used by Planet Planet, the software that publishes Meme13. Here's how the same thing could be done in PHP:

<p><script language="javascript" type="text/javascript">
title:'<? echo $site_title; ?>',
url:'<? echo $site_link; ?>',
}, {button:true} );