Wednesday, February 21, 2007

Managing Metadata

Everywhere I turn, I hear more about metadata. It seems that everyone is jumping on the metadata bandwagon. For those of you who have never heard the term, it is data about data. More precisely, metadata describes data, providing the context that allows us to make it useful.

Bringing Structure to Chaos

Organizing data is something we do in our heads but that computers are pretty poor at. It is natural for a human being to develop schema and categories for the endless streams of data that invade our consciousness every moment we that are awake. We can file things away for later use, delete it as unimportant, and connect it with other data to form relationships. It is an innate human capability.

Not so with computers. They are literal in nature and driven by the commands that we humans give them. No matter how smart we think computers are, compared to us organics, they are as dumb as rocks.

Metadata is an attempt to give computers a brain boost. By describing data, we are able to automate the categorization and presentation of data in order to make it more meaningful. In other words, we can build schema out of unstructured data. Databases do this by imposing a rigid structure on the data. This works fine for data that is naturally organized into neat little arrangements. For sloppy situations, say 90% of the data in our lives, databases are not so useful.

Metadata Is All Around Us

We are already swimming in metadata. All those music files clogging up our hard drives have important metadata associated with them. That's why your iPod can display the name, artist and other important information when you play a song and iTunes can build playlists automatically. Your digital camera places metadata into all of those pictures of your kids. Because of metadata, you can attach titles and other information to them and have them be available to all kinds of software. Internet services use metadata extensively to provide those cool tag clouds, relevant search responses, and social networking links.

Businesses have a keen need for metadata. With so many word processor documents, presentations, graphics, and spreadsheets strewn about corporate servers, there needs to be a good way to organize and manage them. Information Lifecycle Management assumes the ability to generate and use metadata. Advanced backup and recovery also uses metadata. Companies are trying to make sense out of the vast stores of unstructured data in their clutches. Whether it's to help find, manage, or protect data, organizations are increasingly turning to metadata approaches to do so.

Dragged Down By The Boat

So, we turn to metadata to keep us from drowning in data. Unfortunately, we are starting to find ourselves drowning in metadata too. A lot of metadata is unmanaged. Managing metadata sounds a lot like watching the watchers. If we don't start to do a better job of managing metadata, we are going to find out an ugly truth about it – it can quickly become meaningless. Just check out the tag cloud on an on-line service such as Technorati or Flicr. They are so huge that it's practically useless. I'm a big fan of tag clouds when they are done right. The ability to associate well thought out words and phrases to a piece of data makes it much easier to find what you want and attach meaning to whatever the data represents.

The important phrase here is “well thought out”. A lot of metadata is impulsive. Like a three year old with a tendency to say whatever silly thought comes into their brains, a lot of tags are meaningless and transient. Whereas the purpose of metadata is to impart some extended meaning to the data, a lot of metadata does the opposite. It creates a confused jumble of words that shine no light on the meaning of the data.

The solution is to start to manage the metadata. That means (and I know this is heresy that I speak) rules. Rules about what words can be used in what circumstances. Rules about the number of tags associated with any piece of data. Rules about the rules basically. It makes my stomach hurt but it is necessary.

I don't expect this discipline from Internet services. It runs counter to the “one happy and equal family” attitude that draws people to these services. For companies this is necessary as they implement metadata driven solutions. Unfortunately, it means guidelines (tagged as “information”, “guidelines”, “metadata”, or something) and some person with a horrible, bureaucratic personality to enforce them. Think of it as a necessary evil, like lawmen in the old West.

For the most part, companies will probably start to manage metadata when it is already too late, when they are already drowning in the stuff. There is an opportunity to still avoid the bad scenario though. Metadata-based management of unstructured data is still pretty new. Set up the rules and guidelines now. Enforce the rules and review the tags and how they are used regularly. Eventually, there will be metadata analysis software to assist you but in the meantime, put the framework in place. The opportunity is there to do it right from the beginning and avoid making a big mistake. Use metadata to create value from your data rather than confuse things further.

