Tom Petrocelli's take on technology. Tom is the author of the book "Data Protection and Information Lifecycle Management" and a natural technology curmudgeon. This blog represents only my own views and not those of my employer, Enterprise Strategy Group. Frankly, mine are more amusing.

Monday, March 22, 2010

Monkeys Flinging Poo

As anyone who reads this blog (thanks to both of you) knows, I've taken up writing code again. It's a hobby to keep me busy while I look for my next great adventure. The act of writing code is an act of creation. You make something. Software is especially satisfying since, in a sense, you make something out of nothing. Feels kind of god-like in that way. You start with with nothing, say “let there be applications”, and it comes into being. I'll grant you, it's not as easy as that but neither was creation. The big band, stellar and planetary formation, and evolution all took energy.

At the same time, I've been watching various members of the software industry throw patent lawsuits at each other. It's a bit like watching monkeys in the zoo fling poo at each other. Mildly amusing until some of the poo escapes the confines of the cage and hits a spectator. All of a sudden, it's not so funny. Well, it is kind of funny but not for the one who gets hit with the poo.

All of this legal poo flinging just doesn't feel right to most people. Yes, we want our creations protected. If someone tries to steal my work, I would become an angry god and want to throw thunderbolts (and poo probably). On the other hand, what is being patented is ephemeral. There is still a lot of rancor over Amazon's One-Click patent. The idea of patenting the idea of a single click purchase seems absurd to most people. A lot of software patents are that absurd. The upshot for the software company is that they are expected to protect important assets but their own customers think they are greedy hatemongers when they do.

Worse of all is that customers get caught in the crossfire. They worry that they will lose their investment through no fault of their own. Will they have to change what is working for them in the future because of some crazy corporate rock throwing? In essence, they are afraid of being the spectator that gets hit when the monkeys go at each other.

Lawsuits are not good for companies either. In technology-based industries, even when you can claim victory in a lawsuit, it's almost always a Pyrrhic one. You don't so much win as lose less. Take Apple for instance. They are suing HTC for making a smartphone whose software, they feel, violates patents associated with the iPhone. It doesn't matter if, as a matter of law, they are right or wrong. The damage to their image is already done. Instead of appearing to be a technology company that wants to transform the world (“Think Different!”), they are revealed to be a company like any other - more concerned with money than with customers. Win or lose, they have already lost something. What did the Sun and NetApp lawsuits do besides make both look venial?

At the heart of the problem is the nature of software. It doesn't follow the same rules as other things that are awarded patents and copyrights. Software is not physical. You cannot hold it in your hand. Holding the a CD or DVD is not the same. It's like holding an empty glass and claiming your are really holding the air. A physicist might agree but everyone else will think you're being silly.

Software is not literature as much as we like to think of it as art. Digital music is still music and an ebook is still a book. Software is neither of these. It is a thing unto itself that follows it's own rules. Code is more than mere instructions but less than art.

Software represents a new type of intellectual property. We need to recognize that. Copyright law doesn't adequately protect the software creator which is why End User License Agreements stuffed into a PC game box read like the US Constitution. With the amendments and commentary. Patents don't work since there is no physical manifestation and software is hopelessly vague to define under patent law. Just read a couple of software patents and you will find yourself saying things like “ Well Duh!” and “We've been doing that for 20 years now!”

IP law, especially in the US, has struggled for two generations with software. How do we protect our creations when they are unlike any other creations? How do we set up rules that people can easily follow? Patent and Copyright wars are counter productive. We need guideposts that avoid these conflicts.

I propose a hybrid of copyrights and patents. Patent law gives a short term monopoly to someone who devises something unique. That uniqueness is the code base. For the software industry to keep moving apace, it needs to be a really short term. A year or so, not seven or ten. That's just enough to give a company a head start.

After that, it should be protected more like a copyrighted material. People shouldn't be able to just copy and distribute your product without permission. They can come up with something of their own but not take your product as their own. That forces them to invest something in their take on what you did. But not until you have time to grab a little market share.

I'll let the lawyers work out the details. They're good at that.

Like the aforementioned monkeys, the patent lawsuit winner is the one with less poo on them. They still end up with poo on them though. And no one wants to hang around and watch for fear of getting poo on themselves. In the end, you find yourself alone and covered in poo. Not the way to go.

Tuesday, March 02, 2010

Tiers of a Clown

I've been following the debate about automated storage tiering with amused interest. The various marketing operatives of data storage companies (and a few C-Level folks to boot) are all lining up into one of two camps – tiering is necessary or tiering is unnecessary. There has been dueling animations (very clever) from The Storage Anarchist and 3Par's Marc Farley as well as commentary from a host of industry bigwigs. I love the animations but then again, I always loved cartoons.


Automated storage tiering or automated tiered storage (or data lifecycle management, or whatever else it used to be described as) is using different types of physical storage for different classes of data mostly to save money and maintain performance. The promise of storage tiering is that you can move less important, unchanging, or less frequently accessed data to cheaper slower, storage. You can keep the most important, frequently changing, and most accessed data in a really expensive array that combines high performance with heavy duty data protection features. For data that you don't need quite so often and doesn't change, you can move it to something slower and not as rigorous. And so on until you finally archive it to an archive system or deletion. This has been the bread and butter of folks like Compellent and has been picked up by most of the bigger storage companies since. The ultimate goal is high levels of efficiency in your data storage systems. The more important the data is the more resources it can consume. Less important data consumes fewer resources and balance in the universe is maintained.

A great example of where one might use tiered storage is with a check image. For a short while a check image has to be available to online customers and tellers immediately. Then it has to be stored for seven years and only moderately available. Then it is deleted. Chances are good that after 90 days you won't care to see the actual image so moving it to slower storage is not much of a burden but it saves money.

Three things about tiered storage that are important to consider. These considerations are what fuel the debate. First, automating it is tough. You have to get the software right or you lose data and have diminished efficiency. The second consideration is the ever dropping cost of storage. As data storage continues to become even more stupid cheap, it raises the question of whether you need to be all that efficient in the first place. If a high performance array is inexpensive then everything can have high performance storage without moving data around to a bunch of arrays. Finally, it's hard to decide what data belongs on what resource. Do I base it on age? Class of data? How do I decide what data is what class? These are not technical problems. They are business problems which are much harder to overcome. Wrangling with your organization is hard work. You have to put a lot of effort into deciding what goes where and hope that your vendor supports your criteria.

To me, the problem of storage tiering is that it is a good idea that can be tough to execute. It's like the old joke about teenage sex – everyone talks about it, no one really does it, those that do it don't do it well. I'm sure that lots of folks will say that they have products that allow folks to do this well. However, technology doesn't solve the organizational problem which makes it hard for folks to want to implement it. That doesn't effect the bread and butter customers that top tier storage companies (sorry – couldn't resist) who tend to be huge companies. They have the business process resources to pull it off. It might explain why automated storage tiering is not generating a huge following in mid-sized and smaller companies. They have other things to do with their limited resources then try and squeeze a bit more efficiency out of their storage system. The ROI for them is simply not big enough. Heck, many are still struggling with the blocking and tackling of doing backups and security.

So, where do I weigh in on this debate. I agree with both sides. If that sounds a bit weasel-like then sorry. For some companies there are mission critical applications that would benefit from an automated tiered storage system. For others, it's hard see how there would be benefit enough to warrant the time and effort. For me, the debate is a non-debate. It's not about whether automated storage tiers is beneficial or not. What matters is whether it's beneficial to you. If you think in terms of customers, instead of products and technology, it becomes clear. What applications do you have that need this approach? Does your organization need it at all? Can you decelerate the pace of your storage buying enough to justify the costs and time involved in implementing this? Will you be able to decide what data should go where and when?

In the end, it's a feature like all other features. If it has value for you then it's a winner. If it doesn't then find something that does. But watch the debate. It's quite entertaining.