Tom Petrocelli's take on technology. Tom is the author of the book "Data Protection and Information Lifecycle Management" and a natural technology curmudgeon. This blog represents only my own views and not those of my employer, Enterprise Strategy Group. Frankly, mine are more amusing.
Wednesday, October 11, 2006
Eating My Own Cooking
Over the past few months I've been doing a fair bit of writing about open source software and new information interfaces such as tag clouds and spouting to friends and colleagues about Web 2.0 and AJAX. All this gabbing on my part inspired me to actually write an application, something I haven't done in a long while. I was intrigued with the idea of a tag cloud program that would help me catalog and categorize (tag) my most important files.
Now, you might ask, "Why bother?" With all the desktop search programs out there you can find almost anything, right? Sort of. Many desktop search products do not support OpenOffice, my office suite of choice, or don't support it well. Search engines also assume that you know the something of the content. If I'm not sure what I'm looking for, the search engine is limited in it's usefulness. You either get nothing back or too much. Like any search engine, desktop search can only return files based on your keyword input. I might be looking for a marketing piece I wrote but not have appropriate keywords in my head.
A tag cloud, in contrast, classifies information by a category, usually called a tag. Most tagging systems allow for multidimensional tagging wherein one piece of information is classified by multiple tags. With a tag cloud I can classify a marketing brochure as "marketing", "brochure" and "sales literature". With these tags in place, I can find my brochure no matter how I'm thinking about it today.
Tag clouds are common on Web sites like Flickr and MySpace. It seemed reasonable that an open source system for files would exist. Despite extensive searching, I've not found one yet that runs on Windows XP. I ran across a couple of commercial ones but they were really extensions to search engines. They stick you with the keywords that the search engine gleans from file content but you can't assign your own tags. Some are extensions of file systems but who wants to install an entirely different file system just to tag a bunch of files?
All this is to say that I ended up building one. It's pretty primitive (this was a hobby project after all) but still useful. It also gave me a good sense of the good, the bad, and the ugly of AJAX architectures. That alone was worth it. There's a lot of rah-rah going on about AJAX, most it well deserved, but there are some drawbacks. Still, it is the only way to go for web applications. With AJAX you can now achieve something close to a standard application interface with a web-based system. You also get a lot of services without coding, making mutli-tier architectures easy. This also makes web-based applications more attractive as a replacement for standard enterprise appliacations, not just Internet services. Sweet!
Installing a WAMP stack also turned out to be a bit of a chore. WAMP stands for Windows/Apache/MySQL/PHP (or Perl), and provides an application server environment. This is the same as the LAMP stack but with Windows as the OS instead of Linux. The good part of the WAMP or LAMP stack is that once in place, you don't have to worry about basic Internet services. No need to write a process to listen for a TCP/IP connection or interpret HTTP. The Apache Web Server does it for you. It also provides for portability. Theoretically, one should be able to take the same server code and put it on an any other box and have it run. I say theoretically because I discovered there are small differences in component implementations. I started on a LAMP stack and had to make changes to my PHP code for it to run under Windows XP. Still, the changes were quite small.
The big hassle was getting the WAMP stack configured. Configuration is the Achilles heel of open source. It is a pain in the neck! Despite configuration scripts, books,a nd decent documentation, I had no choice but to hand edit several different configuration files and download updated libraries for several components. That was just to get the basic infrastructure up and running. No application code, just a web server capable of running PHP which, in turn, could access the MySQL database. I can see now why O'Reilly and other technical book publishers can have dozens of titles on how to set up and configure these open source parts. It also makes evident how Microsoft can still make money in this space. Once the environment was properly configured and operational, writing the code was swift and pretty easy. In no time at all I had my Tag Cloud program.
The Tag Cloud program is implemented as a typical three tier system. There is a SQL database, implemented with MySQL, for persistent storage. The second tier is the application server code written in PHP and hosted on the Apache web server. This tier provides an indirect (read: more secure) interface to the database, does parameter checking, and formats the information heading back to the client.
While returning pure XML makes it easier to integrate the server responses into other client applications, such as a Yahoo Widget, it also requires double the text processing. With pure XML output you need to generate the XML on the server and then interpret and format the XML into XHTML on the client. It is possible to do that with fairly easily with XSLT and XPath statements but in the interactive AJAX environment, this adds a lot of complexity. I've also discovered that XSLT doesn't always work the same way in different browsers and I was hell-bent on this being cross-browser.
The system has all the advantages of web applications with an interactive interface. No page refreshes, no long waits, no interface acrobatics. It's easy to see why folks like Google are embracing this methodology. There's a lot I could do with this if I had more time to devote to programming but Hey! it's only a hobby.
At the very least, I have a very useful information management tool. Finding important files has become much easier. One of the nice aspects of this is that I only bother to tag important files, not everything. It's more efficent to bake bread when you have already seperated the wheat from the chafe. It's also good to eat my own cooking and find that it's pretty good.