I was reading Tim's MT 3.1 Dynamic Publishing Blues and it reminded me a topic I read almost two years ago -- Half baked and little fried -- about dynamic vs. static generation. The topic of Timothy's post is about controversy of recently introduced dynamic publishing in MovableType that uses PHP as its engine. While Ben Trott explained their reasoning behind the decision I'm not sure I agree with him.

In his analysis Ben describes several options and dismisses most of Perl related options as inefficient. Without considering speed of interpreters, I guess he is talking about startup penalty of perl interpreter that is called on every request. This cost is definitely there, but the solution reminds me advice to buy more hardware to solve a performance problem without looking at optimizing an algorithm that is being used. Let's look at it.

The stated problem of reducing latency and server load can be solved in several ways:

1. Expires and Last-Modified headers. The script can return these headers (along with ETag) to make the page cacheable. The tricky part is to set proper Expires header. It should be long enough to minimize calls to the server and short enough to allow those calls when the content changes. This can be achieved by giving different values to different pages or even doing something similar to what google bot does visiting more often pages that change more frequently; +10% on expiration and -50% on modification may be a good start. Expires/Last-Modified also works well with static content (images, client-side scripts, and stylesheets). Expires header can be set to a fairly large value; if a file is updated, it can be served using a new URL.

2. Handling of If-Modified-Since and If-None-Match. The script should return 304 if content hasn't been modified. It doesn't even need to have a copy of the page; all it needs to know that it hasn't changed. It may be as simple as one LastModified time per blog/site for rarely updated sites that invalidates all caches; or as complex as dependency tracking to know exactly what information was used to generate a page.

3. Local cache of generated pages. The dependency check still needs to be done, but the page may already be generated and served from a local cache (likely file system). At this point most people would point out that all this can be done by saving generated pages as static pages and have them to be served by a webserver using little bit of mod_rewrite-like magic. While that's true, there are still several things that need to be addressed:

  • Expires values need to be configured and they likely to be static
  • Any custom headers need to be configured
  • No personalization is possible (new since the last visit and other similar things)
  • Authentication requests may need to be handled separately
  • No parametrized request: pagination, searches and the like.

Pages can be cached (they can even be compressed) along with their headers and served when necessary. While this may be a viable option in many cases, there is still a question of how this cache should be in/validated: dependencies can be checked on every request, or they can be checked when pages are added/updated/deleted.

4. Template fragment caching. While the script may not cache the entire page, it still may be feasible to cache some of page fragments, especially the most time consuming or most frequently used, as recently updated items or list of subcategories that are likely to be used across many pages. This requires tracking of what fragment uses what information, so they can, again, be properly invalidated, but this may not be as complex as it seems.

5. File/memory cache. While template fragments may not be cached, the information that is necessary for page generation can be cached in memory (applicable to mod_perl, daemon and similar server solutions) or in files (this works well for filesystem-based solutions like Blooki, Blosxom, and other file I/O hungry solutions). It is not necessary to cache all the information; in most cases modification date/time, title, and some meta information is enough.

6. Access optimization. This probably doesn't apply to MT, but it definitely applies to Blooki (which uses filesystem to store its information). Even when information is not available in a cache, it's still possible to optimize a process of getting this information. Blooki is super-lazy about getting the stuff it needs. First, it's driven by templates; if it's not requested by a templates it probably won't be processed. Second, it only read directories first without even stat'ing files in them. Then it only stat files if you ask for their modification times. And then it only reads their content if you ask for title, meta, or other information.

7. Direct access. If nothing else helps, then information has to be read and page has to be regenerated from scratch.

Now, back to the original question: was it worth it? Unfortunately, it's not clear from the description if Perl is being used at all when PHP-based rendering is used, but as far as I understand it is (please correct me if I'm wrong). Switching from Perl to PHP only addresses items 2 and 3; doing everything else still requires Perl interpreter (and hence startup penalty). Now, both 2 and 3 can be quite effectively achieved by using a local proxy/cache, which some users may already have and for those that don't it is a one-time deal and is much easier than PHP engine integration. In my opinion the asnwer is clear.

Shelley has been writing about upcoming Wordpress 1.3 release and one of the features that will be included in the release -- pagination. I got curious about how difficult it would be to add something like this to Blooki.

First, I added two template variables (entries::prev and entries::next) that calculate number of entries to display and URL to use.

The rest was easy: update template to provide default parameters for number of entries listed and starting position:

TemplateVar page::entries <<.
  $entries{
    start => $request{args}{start} || 1,      # start from 1 by default
    display => $request{args}{display} || 20, # display no more than 20 entries
    header => $request{isentrypage} ? $entry::prevnext : '', # show entry prev/next
    footer => q[$entries::prevnext], # show entries prev/next if necessary
  }
.

This code checks if start and display parameters from query string available or uses 1 and 20 by default; it also sets footer to the value of $entries::prevnext template variable, which is defined as follows:

TemplateVar entries::prevnext <<.
<div class="prevnext">$get{join ' | ', grep {length}
  $entries::prev ? qq[<a href="$entries::prev{url}">&lt;&lt; Prev $entries::prev{display}</a>] : '',
  $entries::next ? qq[<a href="$entries::next{url}">Next $entries::next{display} &gt;&gt;</a>] : '', 
}</div>
.

This code generates prev/next links. As parameters are not hardcoded anywhere, users can check start and display values as they need (start can even be negative; in this case it will display entries from the end of the list, just like @foo[-2..-1] does).

Excited by the ease of the change I also tried to do prev/next links on entries. This didn't required any coding at all; only a small template change:

TemplateVar entry::prevnext <<.
<div class="prevnext">$get{join ' | ', grep {length}
  $entries{display => 1, sort => "-modified_on", modified_on => "<$entry{modified_on}", q[<a href="$entry::permalink">&lt;&lt; $entry{title}</a>]},
  $entries{display => 1, sort => "+modified_on", modified_on => ">$entry{modified_on}", q[<a href="$entry::permalink">$entry{title} &gt;&gt;</a>]},
}</div>
.

This code displays prev/next links with titles of entries (if there is previous/next entry). It's also easy to make it display prev/next entries in a particular category, just add a category filter category => $entry{category}:

TemplateVar entry::prevnext <<.
<div class="prevnext">$get{join ' | ', grep {length}
  $entries{display => 1, category => $entry{category}, sort => "-modified_on", modified_on => "<$entry{modified_on}", q[<a href="$entry::permalink">&lt;&lt; $entry{title}</a>]},
  $entries{display => 1, category => $entry{category}, sort => "+modified_on", modified_on => ">$entry{modified_on}", q[<a href="$entry::permalink">$entry{title} &gt;&gt;</a>]},
}</div>
.

Even though Blooki currently provides Textile, Tiki and ConvertBreaks filters, I've been thinking about adding Markdown as several people asked me about it. Markdown is a perl module, so I didn't expect any troubles integrating it with Blooki. There were few challenges though both related to the fact that Markdown is not distributed as a Perl module; instead it's distributed as a Perl script.

First, since it can run as a script and as a plugin for MovableType and Blosxom it checks if either of those is avaiable and if not, it waits for input to be processed. This is easily fixable by changing defined($blosxom::version) (line 184 in Markdown 1.0) to something like (caller(0))[7], which returns true if the script was called using use/require/do.

Second, the old IfModule directive in a config file only processesed "real" modules/packages, and not files. So I extended it to accept files and Perl versions (as Markdown requires Perl 5.6 and later). Now the fragment that loads Markdown looks like the following:

  <IfModule 5.006 Markdown.pl> # Markdown requires Perl 5.006
    AddModule Markdown
  </IfModule>

This goes into Blooki 0.48.

I've been quitely working on this application for some time, but it looks like a good time to make it public. The package seems to be good enough to be useful to other people (one of the things that is missing is documentation). I'm dog-fooding blooki to run this site: it's used to run a development weblog and I also plan on using it to power support forums and a project documentation website.

Calendar

September 2004 | Oct >>
SunMonTueWedThuFriSat
   1234
567891011
12131415161718
19202122232425
2627282930

Recent Entries

Syndication