Tom Petrocelli's take on technology. Tom was a IT industry executive, analyst, and practitioner as well as the author of the book "Data Protection and Information Lifecycle Management" and many technical and market definition papers. He is also a natural technology curmudgeon.

Tuesday, March 02, 2010

Tiers of a Clown

I've been following the debate about automated storage tiering with amused interest. The various marketing operatives of data storage companies (and a few C-Level folks to boot) are all lining up into one of two camps – tiering is necessary or tiering is unnecessary. There has been dueling animations (very clever) from The Storage Anarchist and 3Par's Marc Farley as well as commentary from a host of industry bigwigs. I love the animations but then again, I always loved cartoons.


Automated storage tiering or automated tiered storage (or data lifecycle management, or whatever else it used to be described as) is using different types of physical storage for different classes of data mostly to save money and maintain performance. The promise of storage tiering is that you can move less important, unchanging, or less frequently accessed data to cheaper slower, storage. You can keep the most important, frequently changing, and most accessed data in a really expensive array that combines high performance with heavy duty data protection features. For data that you don't need quite so often and doesn't change, you can move it to something slower and not as rigorous. And so on until you finally archive it to an archive system or deletion. This has been the bread and butter of folks like Compellent and has been picked up by most of the bigger storage companies since. The ultimate goal is high levels of efficiency in your data storage systems. The more important the data is the more resources it can consume. Less important data consumes fewer resources and balance in the universe is maintained.

A great example of where one might use tiered storage is with a check image. For a short while a check image has to be available to online customers and tellers immediately. Then it has to be stored for seven years and only moderately available. Then it is deleted. Chances are good that after 90 days you won't care to see the actual image so moving it to slower storage is not much of a burden but it saves money.

Three things about tiered storage that are important to consider. These considerations are what fuel the debate. First, automating it is tough. You have to get the software right or you lose data and have diminished efficiency. The second consideration is the ever dropping cost of storage. As data storage continues to become even more stupid cheap, it raises the question of whether you need to be all that efficient in the first place. If a high performance array is inexpensive then everything can have high performance storage without moving data around to a bunch of arrays. Finally, it's hard to decide what data belongs on what resource. Do I base it on age? Class of data? How do I decide what data is what class? These are not technical problems. They are business problems which are much harder to overcome. Wrangling with your organization is hard work. You have to put a lot of effort into deciding what goes where and hope that your vendor supports your criteria.

To me, the problem of storage tiering is that it is a good idea that can be tough to execute. It's like the old joke about teenage sex – everyone talks about it, no one really does it, those that do it don't do it well. I'm sure that lots of folks will say that they have products that allow folks to do this well. However, technology doesn't solve the organizational problem which makes it hard for folks to want to implement it. That doesn't effect the bread and butter customers that top tier storage companies (sorry – couldn't resist) who tend to be huge companies. They have the business process resources to pull it off. It might explain why automated storage tiering is not generating a huge following in mid-sized and smaller companies. They have other things to do with their limited resources then try and squeeze a bit more efficiency out of their storage system. The ROI for them is simply not big enough. Heck, many are still struggling with the blocking and tackling of doing backups and security.

So, where do I weigh in on this debate. I agree with both sides. If that sounds a bit weasel-like then sorry. For some companies there are mission critical applications that would benefit from an automated tiered storage system. For others, it's hard see how there would be benefit enough to warrant the time and effort. For me, the debate is a non-debate. It's not about whether automated storage tiers is beneficial or not. What matters is whether it's beneficial to you. If you think in terms of customers, instead of products and technology, it becomes clear. What applications do you have that need this approach? Does your organization need it at all? Can you decelerate the pace of your storage buying enough to justify the costs and time involved in implementing this? Will you be able to decide what data should go where and when?

In the end, it's a feature like all other features. If it has value for you then it's a winner. If it doesn't then find something that does. But watch the debate. It's quite entertaining.

Saturday, February 20, 2010

HP and Cisco Square Off

I love a good dust up in the computer industry. The latest WWF style bout comes to us courtesy of HP and Cisco. In one corner of the ring is Cisco who won't be renewing a system integrator contract with HP. In the other corner is HP who plans to bail from the Cisco Certified Channel and Global Service Alliance Partner programs.

Two giants of the industry beating on each other is so much better than David and Goliath match ups. Those make us feel sorry for the little guy and angry at the mean old big company beating down on the poor entrepreneur who's just trying to make the world a better place.

These bare knuckle brawls are both good and bad for the industry. First, it brings to light the farcical nature of big company alliances. Let's face it, they are marriages of convenience. These folks really want to be on top and there can only be one top dog. It's good that they occasionally remind us not to get too vested in them.

Competition is also good. It drives down prices and ratchets up innovation. When things get too cozy, the industry tends to stagnate. We need another round of wow! inducing products at woo hoo! low prices about now.

It is bad though for those caught in the wake of these two battleships as they try to sink each other. In this case, the indirect channel partners of both companies could become collateral damage. It may well become more difficult to integrate products from both companies and channel partner customers are not likely to want to pay extra for that added effort. It's not their problem that HP and Cisco went from lovers to bitter rivals practically overnight. There will be costs that someone has to absorb and the channel looks like the dry sponge here. Hopefully a hardware price war will ensue that give a little more margin to channel service providers.

This move is just one act in an ongoing drama in the computer industry. Over the past ten years we have seen the growth of the full service, full product line computer company. There are now only half a dozen (if that) companies that sell solutions to customers and they want to sell whole solutions. Servers, networking, storage, software, the whole system plus services. This is what is behind the Cisco Unified Computing initiative and HP's acquisition strategy. Everyone is trying to be IBM. Even software companies are getting into the act. Just look at Oracle and their purchase of Sun and other investments in hardware. It will get harder and harder for independent hardware companies to continue to exist unless they are making OEM equipment for one of the big, full service companies. A few will survive to provide niche products whose revenue stream is too small for the big guns to care about. A few others will get by on overservicing specialty markets. It's like grocery shopping. Most everyone buys from a big supermarket. Sure, you occasionally go out to the specialty market or “all local foods” shop but that's not for everyday purchases.

Next up: Exclusive channel partner programs. Want to sell our stuff? Then you can't sell anyone else's stuff. It's been done in the past and will likely happen again.

Wednesday, February 17, 2010

Into the Matrix with Neo

Everyone needs a hobby. Lately, mine happens to be writing code. I used to be a software engineer so I used to code for a living. Over time two things happened. One, it ceased to be fun (that's why we call it work folks) and two I didn't need to do it anymore. As my career transitioned into management and then executive management, I rarely got my fingernails dirty with real coding projects.

What's good about that is that coding could become fun again. So a couple of months ago I decided to start on a new coding project. I had two goals – learn some new technology and do something at least marginally useful. That has led to my latest project, a document management system built on the idea of relationships between documents.

Most document management centers around classifying documents in some fashion. Whether you use a hierarchical category system or free form tagging schema, it's about putting documents in buckets. I wanted to add something else to the mix. Documents rarely stand on their own. They exist in relationship to other documents. Think social networking for your files.

Unlike people, documents don't know other documents nor do they care if another document is having lunch at Spot Coffee. Documents do belong to an ecosystem just like we humans do. They refer to other documents and are part of larger documents and collections of documents. They have their own relationships.

To model these relationships in more traditional databases is difficult. Using an SQL RDBMS you end up with a lot of cross reference tables and lots of Joins. It's not what SQL or relational databases were designed for. Instead, I decided to use a graphing database called Neo. Graphing databases organize data as a series of nodes connected by explicit relationships. This allows you to build applications that focus on finding like objects. For example, what documents are referenced by this one? Or, which are the child documents to this one? These questions are more easily answered by graphing database.

To date, Graphing databases are primarily used for social networking applications. That makes sense since managing data by relationships sits at the core of social networking. Graphing databases have a lot of other potential uses. They would be great for modeling workflows, simulations, and building ontologies, all hot areas of software.

Neo has a few warts. It's still only a release candidate so things are still changing. The recent most version, bringing Neo out of Beta and to an RC, changed the names of several basic objects. That forced me to go back and recode certain key sections of the application. The online documentation is good at documenting the API but light on how to make things work right. Figuring out the transaction model, even though it's pretty simple, required digging into the class level documentations and a bit of trial and error. Might be a book in there. Hmmm...

In the end, I won't have a commercial grade application. My GUI design skills are too poor to make it look and behave the way I want it to. However, once my pet project is done, the application will at least be useful. I will have learned something interesting and it will have been fun. What more can one want out of a hobby.