Best In Class has just been overhauled. From a slowly cooked PHP/Wordpress solution, to the hip blazing 250% faster Clojure driven version. In this post I'll outline the major strategies used in this rewrite.
When I launched Best In Class last october I knew that selling Clojure as my primary service was going to take more than words, so I decided to launch a blog simultaneously with launching the company in order to demonstrate why I think Clojure is ideal for use in the industry. The blog was initially launched on Wordpress.com, but since I was not seeing a spillover from that domain onto BestInClass.dk I decided to fuse the blog together with the main site on the .dk domain, by running both sites on a single Wordpress installation. There were some quick-wins that I needed at the time, primarily time-to-launch was very low as setting up Wordpress and importing your old posts only takes a day or so, but the drawbacks finally caught up with me.
When a site is cooked it's doing some kind of processing on every request, like PHP does for instance. When a site is baked, it's pre-rendered into static files which are then served by a webserver. The site you're looking at it fully baked, with no dynamic content. Definitions borrowed from here & here.
This is the desired way to go for a number of reasons. First off, I can backup and deploy the entire site using nothing but rsync, there is no database to update, backup or maintain. Secondly, its about 250% faster, if not more than that, because even though PHP is quite fast it doesn't beat the serving of static files. And finally, good luck cracking a .html file - a lot of security concerns disappear with the removal of code evaluation in the frontend.
So making the transition from cooking to baking began with me mentallly running over the elements of each page, thinking of what needed to by dynamic and what didn't. The old index.php was dynamic, in that it got a list of all the blogposts from the SQL database, then proceeded to render the excerpts in some paginated style, but why not just paginate using JS and hardcode the excerpts? I'll go through the elements one by one, showing how I made them static.
The Clojure driven index (or blog.html) is different from index.php, in that it doesn't load anything but instead it is updated everytime I publish a blogpost. I have a fancy WYSIWYG backend where I can write my posts and as soon as I hit publish, a file is generated with that post and the excerpt is stripped out and prepended to the blog.html file. This process is very simple thanks to Enlives clever templating system, where with I define what an excerpt looks like:
(defsnippet teaser "teaser.html" [:body :> any-node] [title link thumb excerpt] [:a.title-link] (do-> (set-attr :href link) (content title)) [:a.thumb-link] (set-attr :href link) [:div.link-float-right :a.perma-link] (set-attr :href link) [:img.avatar] (set-attr :src (str thumb-prefix thumb)) [:div.excerpt] (content (html-snippet excerpt)))
For that to work, you just have to provide an html file which contains elements to match the selectors.
Simple enough? Its just plain CSS selectors, working sort of like Pure. To prepend an excerpt to the main index is then as simple as:
(->> ((template (File. "site/blog.html") [title link thumb excerpt] [:ul.content] (prepend (select (teaser title link (str "/" thumb) excerpt) [:ul :> any-node]))) title url avatar (-> (.split (slurp "draft") "") first)) (apply str) (spit "site/blog.html"))
So generating a blogpost is a 2 stage process. Every page on the site has certain similarities, ie. the header, the menu, the footer. So to avoid having to edit a ton of files everytime I change something, these are all abstracted away in a template appropriately named 'page':
; Raw template for all pages, include header/footer (deftemplate page "template.html" [title scripts styles body] [:title] (content title) [:div#pages :a] (clone-for [[href src] [["/index.html" "/images/forside-lnk.png"] ["/services.html" "/images/services-lnk.png"] ["/produkter.html" "/images/produkter-lnk.png"] ["/blog.html" "/images/blog-lnk.png"] ["/kontakt.html" "/images/kontakt-lnk.png"]]] this-node (set-attr :href href) [:img] (set-attr :src src)) [:script.header] (clone-for [src scripts] (set-attr :src src)) [:link] (clone-for [href styles] (set-attr :href href)) [:div#content] (substitute body))
You'll notice that this relies heavily on Enlives 'clone-for', which behaves exactly as a for-loop except that its spitting out html. So in the case of the menu, I supply link-icons and hrefs and these then get destructured and transformed into the menu you see at the very top of the page. Finally the #content is then substituted for whatever else might be in div#content. Four fns which will be your best friends when using Enlive are append, prepend, substitute and content. The final line is important, because it allows me to pass other transformations, ie. snippets as the content, so for instance to render my frontpage I'll make the html first:
<body> <div class="scrollable"> <div class="items"> <div> <img src="/images/slider/webudvikling.png" class="thumb"/> <img src="/images/slider/webudvikling-quote.png"/> <a class="clink" href="/services.html">L?s merea> div> <div> <img src="/images/slider/appudvikling.png" class="thumb"/> <img src="/images/slider/appudvikling-quote.png"/> <a class="clink" href="/services.html#2">L?s merea> div> <div> <img src="/images/slider/cljudvikling.png" class="thumb"/> <img src="/images/slider/cljudvikling-quote.png"/> <a class="clink" href="/services.html#2">L?s merea> div> div> div> body>
And this is then loaded into a snippet:
(defsnippet frontpage "index.html" [:body :> any-node] [])
So generating the frontpage can now be done like so:
(page "Best In Class" scripts css-files (frontpage))
It couldn't be much simpler and it makes for highly reuseable and maintable code -also its the case of 'optimize once, win everywhere'
Ah, but there is one gotcha. Comments are dynamic right? Well, half of them is. As you are probably able to deduce from the snippets above, appending a comment to a blogpost is trivial, but receiving it and moderating posts are still dynamic tasks. For that reason, the backend of the site is driven by Moustache, Christophes micro web-framework. Moustache has a few simple tasks
Since we are serving multiple users, we are risking race-conditions in several of these challenges, so its a good thing I decided to write the site in Clojure. When you submit a comment to the site (and I hope you do), that comment is sent to an in-memory queue, which when skipping the urlencoding/decoding looks like so:
(if (= captcha answer) (dosync (alter comment-queue conj {:url url, :name name, :email email :captcha (format "Answered %s to question #%s (%s)" captcha cid question) :date (.toString date) :comment comment}) {:body "OK"}) {:body "NOT OK"}))
Every minute an agent is checking that queue and persisting it to a file on disk, in case of a server crash:
(defn backup-comments [a] (doseq [comment (dosync (let [comments @comment-queue] (ref-set comment-queue []) comments))] (append-spit "comment-queue" (with-out-str (prn comment)))) (Thread/sleep 60000) (send-off *agent* backup-comments))
So the moderation panel is just a matter of checking which comments are in queue and either delete or prepend them to a post. Simple right? But there's another gotcha: StreamWriters aren't atomic. So that means while Enlive is busy printing the new blogpost, some poor reader comes by and sees a halfway written html file. Luckily Unix systems provide a number of atomic filesystem (fs) operations, like 'mv':
(defn append-to-post [{:keys [url name date comment]}] (let [url (-> (str "site/" url) (.replaceAll "//" "/")) url2 (str url (hash url)) c-class (if (= name "Lau") "comment-lau" "comment")] (->> ((template (-> url File. html-resource) [new-comment] [:div#debate] (append new-comment)) (a-comment c-class name date comment)) (apply str) (spit url2)) (sh "mv" url2 url)))
So there you have it, atomicity on the filesystem. All this does it sanitize the url and then make a second version with its own hash value prepended. Then it checks if Im (Lau) posting and if so changes the class of the :div#comment tag - prints the html to disk and swaps the files.
The combination of Enlive, Moustache and Clojure is extremely powerful, so powerful that you can generate almost anything with it.....
Yes, even an atom feed becomes trivial to generate, so whenever I publish from the backend, not only is the post produced and the index modified, but the atom feed is also updated. All this takes is a simple atom.xml file, which has the basic structure required by the RFC. Then you make a template which spews the elements, ie. the posts and calling it is as simple as:
(atom-feed (.format (SimpleDateFormat. "yyyy-MM-dd'T'HH:mm:ss'+08:00'") (java.util.Date.)) (take 10 (sort-by :updated #(compare %2 %1) data)))))
The 'data' variable is just a hash-map of the posts taken from a file-seq. This data is then sorted in descending order (thanks Chris) and the top 10 posts are passed to the atom-feed template. If you're not seeing the full picture, wait until I put it on Github :)
So there's just one thing missing, what about all of my old posts? 42 to be exact. Well, the evil twin of Enlives templating is selectors, which are perfect for scraping, so I've written a small lib which swallows a Wordpress.xml export file and converts it to whatever you like. There are 3 main stages
If you are the proud owner of a Wordpress blog, try exporting your site and looking at the comments and you'll see that they are neatly organized in
(defn extract-comments [post] (let [comments (select post [[:wp:comment (has [[:wp:comment_approved (pred #(= "1" (text %)))]]) (but (has [[:wp:comment_type (pred #(= "pingback" (text %)))]]))]])] (sort-by :date compare (for [c comments] (loot c [:author :email :date :comment] [:wp:comment_author :wp:comment_author_email :wp:comment_date :wp:comment_content])))))
Christophe, not wanting to shadow "not", named that operator "but" which makes for some unclear reading. The loot function wasn't meant for primetime, but it kept coming in handy! It simples take a collection of names and a collection of selectors. In the maps the result, of taking the content of those selectors to the name supplied:
(defn loot [chunk names selectors] (knit names (map (fn [selector] (pick chunk (if (coll? selector) selector [selector content]))) selectors)))
So with all the comments tucked away in a collection, we can now grab the main content:
(defn get-posts " Takes an Wordpress backup file as its first argument and a function of 1-args as its second. The wordpress file is parsed for post data and this is return in hash-maps containing keys [:title :link :body :thumb] Thumb is specific to users of the post-avatar plugin. After the data is retrieved the post-capture-hook is applied to each item. Use this to sanitize, modify, etc." [file post-capture-hook] (let [posts (-> file xml-resource (select [[:item (has [[:wp:post_type (pred #(= "post" (text %)))]])]]))] (map post-capture-hook (for [{i :content} posts] (-> (loot i [:title :link :body :date :thumb] [:title :link :content:encoded :wp:post_date [[:wp:postmeta (has [[:wp:meta_key (pred #(= "postuserpic" (text %)))]])] [:wp:meta_value] content]]) (assoc :comments (extract-comments i)))))))
There are no surprises and its great to see how little code you have to write, in order to import from an entirely different CMS. You see me picking out the "postuserpic" which is a property unique to users of the 'post avatar' plugin. If you dont use that on your blog, it'll return nil. This first loops over all the posts, extracing the interesting details, then it associated the :comments to each entry and finally it maps a post-capture-hook unto all the elements. The hook allows you to do arbitrary post-capture formatting, like fixing dates et al. My post-capture hook pulls out the excerpts and fixes links, yours might do something else.
So the last step is simply to launch the site on a webserver. The old links looked like so:
http://www.bestinclass.dk/index.php/2010/04/prototurtle-the-tale-of-the-bleeding-turtle/
And the new links like so:
http://www.bestinclass.dk/index.clj/2010/04/prototurtle-the-tale-of-the-bleeding-turtle.html
Nginx (Engine X) provides some fancy rewritiing with regexes, so if you click the old link, you'll actually see it transform into the new one before your very eyes. This is how I do it:
if ($request_method ~* GET ) {
rewrite ^/(.*)/$ /$1;
}
if ($uri ~* /index.php) {
rewrite ^(.+)(\.php)(.+)$ http://www.bestinclass.dk$1.clj$3.html last;
break;
}
The first is a common rule of all GET requests, removing a possible trailing slash. The second captures 3 groups and then knits them together around the domain address, works perfectly! So hopefully all of the old links still work.
Another thing which you need to keep in mind, is that some of the typical caching rules for Nginx also cache html files, which would make for a very boring blog, so it makes sense to disable this and only cache the truly static files.
Converting an entire Wordpress blog to a new slick baked solution is a piece of cake so to speak. The awesome expressive power of Clojure makes it just a few lines to produce a thread-safe webapplication. The new version of Best In Class is much (MUCH!) easier to manage, maintain and update. Time is scarce these days, but I'll try to bundle the code and OpenSource most of it, in case anybody else is looking to get off Wordpress and into the baking business :)
If you're in Europe late june and would like to learn how to deploy Clojure in the industry, be sure to check out Conj Labs.