Tom Petrocelli's take on technology. Tom is the author of the book "Data Protection and Information Lifecycle Management" and a natural technology curmudgeon. This blog represents only my own views and not those of my employer, Enterprise Strategy Group. Frankly, mine are more amusing.

Wednesday, October 11, 2006

Eating My Own Cooking

Like so many analysts, I pontificate on a number of topics. It's one of the perks of the job. You get to shoot your mouth off without actually having to do the things you write or speak about. Every once in awhile though, I get the urge to do the things that I tell other people they should be doing. To eat my own cooking you might say.

Over the past few months I've been doing a fair bit of writing about open source software and new information interfaces such as tag clouds and spouting to friends and colleagues about Web 2.0 and AJAX. All this gabbing on my part inspired me to actually write an application, something I haven't done in a long while. I was intrigued with the idea of a tag cloud program that would help me catalog and categorize (tag) my most important files.

Now, you might ask, "Why bother?" With all the desktop search programs out there you can find almost anything, right? Sort of. Many desktop search products do not support OpenOffice, my office suite of choice, or don't support it well. Search engines also assume that you know the something of the content. If I'm not sure what I'm looking for, the search engine is limited in it's usefulness. You either get nothing back or too much. Like any search engine, desktop search can only return files based on your keyword input. I might be looking for a marketing piece I wrote but not have appropriate keywords in my head.

A tag cloud, in contrast, classifies information by a category, usually called a tag. Most tagging systems allow for multidimensional tagging wherein one piece of information is classified by multiple tags. With a tag cloud I can classify a marketing brochure as "marketing", "brochure" and "sales literature". With these tags in place, I can find my brochure no matter how I'm thinking about it today.

Tag clouds are common on Web sites like Flickr and MySpace. It seemed reasonable that an open source system for files would exist. Despite extensive searching, I've not found one yet that runs on Windows XP. I ran across a couple of commercial ones but they were really extensions to search engines. They stick you with the keywords that the search engine gleans from file content but you can't assign your own tags. Some are extensions of file systems but who wants to install an entirely different file system just to tag a bunch of files?

All this is to say that I ended up building one. It's pretty primitive (this was a hobby project after all) but still useful. It also gave me a good sense of the good, the bad, and the ugly of AJAX architectures. That alone was worth it. There's a lot of rah-rah going on about AJAX, most it well deserved, but there are some drawbacks. Still, it is the only way to go for web applications. With AJAX you can now achieve something close to a standard application interface with a web-based system. You also get a lot of services without coding, making mutli-tier architectures easy. This also makes web-based applications more attractive as a replacement for standard enterprise appliacations, not just Internet services. Sweet!

The downsides - the infrastructure is complex and you need to write code in multiple languages. The latter creates an error prone process. Most web scripting languages have a syntax that is similar in most ways but not all ways. They share the C legacy, as does C++, C#, and Java, but each implements the semantics in their own way. This carried forward to two of the most common languages in the web scripting world, PHP and JavaScript. In this environment, it is easy to make small mistakes in coding that slow down the programming process.

Installing a WAMP stack also turned out to be a bit of a chore. WAMP stands for Windows/Apache/MySQL/PHP (or Perl), and provides an application server environment. This is the same as the LAMP stack but with Windows as the OS instead of Linux. The good part of the WAMP or LAMP stack is that once in place, you don't have to worry about basic Internet services. No need to write a process to listen for a TCP/IP connection or interpret HTTP. The Apache Web Server does it for you. It also provides for portability. Theoretically, one should be able to take the same server code and put it on an any other box and have it run. I say theoretically because I discovered there are small differences in component implementations. I started on a LAMP stack and had to make changes to my PHP code for it to run under Windows XP. Still, the changes were quite small.

The big hassle was getting the WAMP stack configured. Configuration is the Achilles heel of open source. It is a pain in the neck! Despite configuration scripts, books,a nd decent documentation, I had no choice but to hand edit several different configuration files and download updated libraries for several components. That was just to get the basic infrastructure up and running. No application code, just a web server capable of running PHP which, in turn, could access the MySQL database. I can see now why O'Reilly and other technical book publishers can have dozens of titles on how to set up and configure these open source parts. It also makes evident how Microsoft can still make money in this space. Once the environment was properly configured and operational, writing the code was swift and pretty easy. In no time at all I had my Tag Cloud program.

The Tag Cloud program is implemented as a typical three tier system. There is a SQL database, implemented with MySQL, for persistent storage. The second tier is the application server code written in PHP and hosted on the Apache web server. This tier provides an indirect (read: more secure) interface to the database, does parameter checking, and formats the information heading back to the client.

As an aside, I originally thought to send XML to the client and wrote the server code that way. What I discovered was that it was quite cumbersome. Instead of simply displaying information returned from the server, I had to process XML trees and reformat them for display. This turned out to be quite slow given the amount of information returned and just tough to code right. Instead, I had the server return fragments of XHTML which were integrated into the client XHTML. The effect was the same but coding was much easier. In truth, PHP excels at text formating and JavaScript (the client coding language in AJAX) does not.

While returning pure XML makes it easier to integrate the server responses into other client applications, such as a Yahoo Widget, it also requires double the text processing. With pure XML output you need to generate the XML on the server and then interpret and format the XML into XHTML on the client. It is possible to do that with fairly easily with XSLT and XPath statements but in the interactive AJAX environment, this adds a lot of complexity. I've also discovered that XSLT doesn't always work the same way in different browsers and I was hell-bent on this being cross-browser.

The JavaScript client was an exercise in easy programming once the basic AJAX framework was in place. All that was required was two pieces of code. One was Nicholas Zakas' excellent cross-browser AJAX library, zXml. Unfortunately, I discovered too late that it also included cross-browser implementations of XSLT and XPath as well. Oh well. Maybe next time.

The second element was the HTTPRequest object wrapper class. HTTPRequest is the JavaScript object used to make requests of HTTP servers. It is implemented differently in different browsers and client application frameworks. zXml makes it much easier to have HTTPRequest work correctly in different browsers. managing multiple connections to the web server though was difficult. Since I wanted the AJAX code to be asychronous, I kept running into concurrency problems. The solution was wrapper for the HTTPRequest object to assist in managing connections to the web server and encapsulate some of the more redundant code that popped up along the way. Easy enough to do in JavaScript and it made the code less error prone too! After that it was all SMOP (a Simple Matter of Programming). Adding new functions is also easy as pie. I have a dozen ideas for improvements but all the core functions are working well.

The basic architecture is simple. A web page provides basic structure and acts as a container for the interactive elements. It's pretty simple XHTML. In fact, if you look at the source it would look like nothing. There are three DIV sections with named identifiers. These represent the three interactive panels. Depending on user interaction, the HTTPRequest helper objects are instantiated and make a request of the server. The server runs the request PHP code which returns XHTML fragments that are for display (such as the TagCloud itself) or represent errors. The wrapper objects place them in the appropriate display panels. Keep in mind, it is possible to write a completely different web page with small JavaScript coding changes or even just changes to the static XHTML.

The system has all the advantages of web applications with an interactive interface. No page refreshes, no long waits, no interface acrobatics. It's easy to see why folks like Google are embracing this methodology. There's a lot I could do with this if I had more time to devote to programming but Hey! it's only a hobby.

At the very least, I have a very useful information management tool. Finding important files has become much easier. One of the nice aspects of this is that I only bother to tag important files, not everything. It's more efficent to bake bread when you have already seperated the wheat from the chafe. It's also good to eat my own cooking and find that it's pretty good.

2 comments:

Christopher said...

Hi Tom,

Are you willing to share this great tool with others?

Tom Petrocelli said...

Maybe someday I will but right now, it's not ready for prime time. It has no user control and needs a bunch of improvements.

What is interesting is how well the model works. With the AJAX methodology, you get nice user interaction in a web based system. AJAX also avoids having fat applications like a lot of Java and ActiveX application.

Ultimately, AJAX web-based a more desirable approach to implementing real application.