Tom Petrocelli's take on technology. Tom is the author of the book "Data Protection and Information Lifecycle Management" and a natural technology curmudgeon. This blog represents only my own views and not those of my employer, Enterprise Strategy Group. Frankly, mine are more amusing.

Wednesday, February 17, 2010

Into the Matrix with Neo

Everyone needs a hobby. Lately, mine happens to be writing code. I used to be a software engineer so I used to code for a living. Over time two things happened. One, it ceased to be fun (that's why we call it work folks) and two I didn't need to do it anymore. As my career transitioned into management and then executive management, I rarely got my fingernails dirty with real coding projects.

What's good about that is that coding could become fun again. So a couple of months ago I decided to start on a new coding project. I had two goals – learn some new technology and do something at least marginally useful. That has led to my latest project, a document management system built on the idea of relationships between documents.

Most document management centers around classifying documents in some fashion. Whether you use a hierarchical category system or free form tagging schema, it's about putting documents in buckets. I wanted to add something else to the mix. Documents rarely stand on their own. They exist in relationship to other documents. Think social networking for your files.

Unlike people, documents don't know other documents nor do they care if another document is having lunch at Spot Coffee. Documents do belong to an ecosystem just like we humans do. They refer to other documents and are part of larger documents and collections of documents. They have their own relationships.

To model these relationships in more traditional databases is difficult. Using an SQL RDBMS you end up with a lot of cross reference tables and lots of Joins. It's not what SQL or relational databases were designed for. Instead, I decided to use a graphing database called Neo. Graphing databases organize data as a series of nodes connected by explicit relationships. This allows you to build applications that focus on finding like objects. For example, what documents are referenced by this one? Or, which are the child documents to this one? These questions are more easily answered by graphing database.

To date, Graphing databases are primarily used for social networking applications. That makes sense since managing data by relationships sits at the core of social networking. Graphing databases have a lot of other potential uses. They would be great for modeling workflows, simulations, and building ontologies, all hot areas of software.

Neo has a few warts. It's still only a release candidate so things are still changing. The recent most version, bringing Neo out of Beta and to an RC, changed the names of several basic objects. That forced me to go back and recode certain key sections of the application. The online documentation is good at documenting the API but light on how to make things work right. Figuring out the transaction model, even though it's pretty simple, required digging into the class level documentations and a bit of trial and error. Might be a book in there. Hmmm...

In the end, I won't have a commercial grade application. My GUI design skills are too poor to make it look and behave the way I want it to. However, once my pet project is done, the application will at least be useful. I will have learned something interesting and it will have been fun. What more can one want out of a hobby.

No comments: