Tom Petrocelli's take on technology. Tom is the author of the book "Data Protection and Information Lifecycle Management" and a natural technology curmudgeon. This blog represents only my own views and not those of my employer, Enterprise Strategy Group. Frankly, mine are more amusing.

Wednesday, August 12, 2009

i4i Pokes Microsoft In the Eye

Microsoft just a got a poke in the eye with a sharp stick. It was delivered by a company out of Toronto called i4i with the help of a judge in Texas. They have, in theory, halted sales of Microsoft Word 2003 and up and the Office bundles that contain them. How long this will actually hold, given appeals and such, is uncertain but the basis for the injunction is interesting.

i4i has a patent, US Patent 5,787,449“Method and system for manipulating the architecture and the content of a document separately from each other” , that describes how to finely format documents without embedding formating codes in them. The i4i method is to create a map of formatting marks associated with locations in a document. On the surface, this may sound like a common method but on closer inspection that might not be so.

The patent application itself gives a rather good history of document formatting starting with pre-printing press days through to the current electronic documents. You see, the most typical method for formatting electronic documents is to embedded formatting codes into the document itself. That's how the .DOC, RTF, and lots of other document formats work. You want a word to be bold, you embed a code for start bold text and end bold text in the document. In the old Wordstar days you actually saw the formats in the document. I guess I'm showing my age here.

The second most usual way to format an electronic document is to assign codes to parts of a document (such as a paragraph or header) which describe their structure. An external source file is then used to provide formatting for the document based on the structure. This is common on web sites since HTML describes the structure of a document but not its format. CSS describes the look of a document by defining the format of each type of content. So, in an HTML document, all <H1> tags define a header but not how the header looks. CSS defines how H1 headers look when displayed. These are further modified by embedding codes the old fashion way such as inline CSS. Separating structure from formatting has the advantage of allowing you to present different views of the same content. This is one of the ways that websites are able to give you a special view formatted for printing rather than viewing.

Both approaches have limitations. The first method tends to tie the document to a particular software package or API limiting it's openness. Like MS Word .DOC documents, the file might not look or print right when rendered in a different word processor or even a different version of MS Word. The second method tends to take a sledgehammer approach, coarsely limiting how the document is formatted. To get fine formatting you have to resort to kludges, such as using format types in only one place, or embedding codes the old-fashioned way and ruining portability.

What i4i came up with is a different method. It claims a system which creates a map, called a metacode map, which maps formatting to specific places in a document. It doesn't need to know anything about the structure of the content. In fact, it might have no structure at all other than what is forced on it by the formatting. The map is external to the actual document content allowing for different format files to be used with different content. This is apparently what Microsoft does in .DOCX, .XML, and .DOCM files. The i4i approach combines the fine formatting control of embedded formats with the portability and multi-view advantage of the external definition approach.

Since I'm not an expert on Word file formats, I can't comment on whether they infringe on the i4i patents. The judge seems to think so or he would not have ordered an injunction against the sale of the product. That means the judge thinks that Microsoft is infringing and doing harm to the patent holders.

What is Microsoft to do? They could try and get the injunction overturned. Likely they will try and do that no matter what. They might try and invalidate the patent but one usually does that before the injunction is handed down so I'm guessing they haven't had a lot of luck with that.

They can license the patent from i4i. I can't imagine why they wouldn't do that in the first place. No matter what it costs, it can't be as bad as this. At the moment, i4i has no real incentive to license anything to them. They have Word at a standstill and US$200M in Microsoft money. It doesn't get any better. Heck, if I was Google, I would buy i4i just to get the patent and kick Microsoft while they are down.

They could also change Word. To stop infringing, they will need to adopt another file format that is not tied to the patent. There are open source formats, like the ones that OpenOffice.org uses, or they could fall back on an older format. In any case, if they can't overturn the patent, they will need to change Word or pay more money to i4i.

There is a bigger problem looming and not just for Microsoft. How many other folks do the same thing? It is a logical thing to do. That doesn't make it legally obvious, especially in 1998 when the patent was issued. Most software companies tend to encode content in XML and use something like an XML style sheet or CSS to format it. However, if Microsoft could come up with this method for Word, why not lots of others. i4i should be emboldened to go after more infringing companies now. Once you have slain one big giant, the others do not seem so intimidating. Smaller companies will feel like easy pickings after Microsoft.

My advice to Microsoft – change Word now. Use the same format as OpenOffice.org. It also helps you with your open source cred.

My advice to everyone else who writes document-centric software – check your products. i4i will now have a more solid patent. If you do something like this you might want to change it or come up with an alternative. Otherwise you have only yourself to blame.

2 comments:

Brian Dell said...

Using an "open source format" would not be any more legal, just as using a peer-to-peer file sharing service to download copyrighted materials would not be legal just because the P2P software happening to be open source.

In fact, it may be MORE dangerous for users to use OpenOffice, since the absence of a central entity controlling OpenOffice's development and distribution may simply mean i4i sues individual users.

Tom Petrocelli said...

Not true. The specific open source format would have to violate this patent. There is no evidence (yet) that that is so. I only suggest that Microsoft gains something if they adopt an open source format that doesn't violate the patent.

It is also not true that there is no central authority. Most of the bigger open source projects are sponsored by foundations or even companies that can reject features on IP grounds.

I suggest OpenOffice.org for these reasons. They have a tested format that is portable. Microsoft needs some credibility in the open source arena. It's a win for them.