CIO

Seek and Ye Shall (Hopefully) Find

The democratizing power of the Web has encouraged enterprises to place vast amounts of unstructured online information into the hands of employees, customers and others. But this development comes as a double-edged sword.

Enterprise search | For enterprises with rapidly expanding Web sites and portals, the need to make unstructured information - that is, data that hasn't been formatted, tagged or indexed for fast retrieval - more manageable is undeniable. "It's a well-known statistic in the search business that 80 percent of corporate data is unstructured versus 20 percent that's contained in databases and ERP systems," says Tammy Alairys, a partner with business and technology consultancy Accenture. Alairys leads Accenture's information management practice, which provides consulting services on enterprise content management (ECM), business intelligence (BI), and search and collaboration technologies.

It's the promise of getting that 80 percent of data into the hands of people who can put it to good use that's driving a growing interest in search technology. "There's an awful lot of information sitting out there that is very difficult to extract value from," says Alairys. For CIOs, search technology provides a fast and efficient way (and sometimes the only way) to locate and retrieve vital intelligence. Yet, as more enterprises turn to search tools, CIOs are discovering that the technology also comes with strings attached, particularly in the areas of usability and security.

With search engines becoming increasingly powerful and useful, search engine companies are discovering the truth behind the old axiom: Knowledge is power, and power is money. That's why the search engine business is suddenly red hot. Google is making serious enterprise moves. And an array of tech vendors - including giants such as IBM, Microsoft and Yahoo, along with smaller players such as Endeca Technologies, Verity, Vivisimo and X1 Technologies - are all hoping to snag at least a snippet of Google's success with their own search products. Heightened competition has also encouraged the companies to give away software, such as desktop search tools, in an effort to bring in more customers.

But despite the flurry of activity, Forrester Research's Laura Ramos believes that the enterprise search market stands at a crossroads, leading to consolidation with either ECM or BI software. "Its role in ECM is to help organize and retrieve the content under management," observes Ramos, who is a search technology analyst at Forrester. "In BI, it is the logical equivalent to data mining for text." Ramos feels that "search vendors must pick a path and strike the right deals or partnerships to move forward". For CIOs, consolidation promises to ease costs and simplify deployment and management by integrating search technology into a larger product.

Blame It on the Web

Why the heightened stature of search? The democratizing power of the Web has encouraged enterprises to place vast amounts of unstructured online information into the hands of employees, customers and others. But this development comes as a double-edged sword. "People find it very easy to create documents and content using computers, but they have a much harder time finding it later," says Hadley Reynolds, director of research for Delphi Group. (Of course, you could always control it all with a digital asset management system such as the one used by public TV station WGBH. See "From Tapes to Bits", page 92. But such systems are costly, complicated and ill-suited for day-to-day content such as e-mail messages.)

Basic search engines have no trouble finding information, but they aren't so good at placing results into context. That's why a simple search for "anthrax" will uncover links about vaccines, homeland security and a heavy metal band. And searchers are increasingly more demanding. "If people don't find value in a search technology, they stop using it pretty quickly," says Alairys. To attract and retain satisfied searchers and enterprises, vendors are jockeying to prove that their tools go beyond simply matching keywords to links. "It's not good enough to give someone a search result," says Ramos. "You want to guide them through the self-service process."

Search engine developers rely on secret mixtures of algorithms, user interfaces and other technologies to create unique tools that generate fast, relevant searches. Verity aims to guide searchers with an add-on feature, the Content Classification Engine (CCE), that can be built into its Ultraseek search platform. The CCE tightly integrates searching and browsing functions. While viewing topics, users can conduct focused queries by searching within a subject area.

XML and Security

Web services, in the form of XML, can play a big role in streamlining and improving the accuracy of user searches. XML allows document designers to classify discrete pieces of data, essentially turning unstructured documents into structured documents. The advantage of structured data is that it lets users fine-tune their searches using concepts instead of keywords or phrases. For instance, it is helpful to limit a search for Jaguar to the category transportation>autos>UK, rather than mammals>cats. And once the user sees a result, he can use classifiers to browse within a category or subcategory for other results that may be conceptually similar, such as X-Type or XKE, even if they don't contain the keyword Jaguar.

But while XML promises to help untangle online information, it's no magic elixir. That's because the technology has yet to be widely applied.

"Industry must publish more content with XML structure so that engines can then extract it and find it," says Forrester's Ramos. Therefore, while an enterprise can ensure better searches of in-house databases with XML-tagged documents, external searching remains problematic.

On the other hand, there is much information - such as trade secrets and employee payroll records - that enterprises don't want everyone to access. Yet whenever enterprise databases are made public, there's always the chance that critical information can inadvertently slip out. Fortunately, virtually all enterprise search engine systems have controls that grant or restrict information access based on the user's name, title, division, location and other key criteria.

Still, access controls aren't foolproof, and carelessness and misunderstandings can lead to potentially disastrous security gaps. Google's desktop search tool, for instance, was found last year to have a serious security flaw (which was quickly patched). Ultimately, it's up to the CIO to establish firm guidelines on access controls and the type of search software that may be installed on enterprise systems. "There are a lot of [inappropriate] things that can be done if documents are not stored in the right place or if they're not saved with the appropriate security attributes," says Alairys.

Multiple Layers

As snowballing information makes it increasingly difficult to conduct useful searches, a consensus is gradually forming that it may be impossible for any single vendor to provide a search solution that meets the needs of all enterprises and all searchers in all situations. A layered approach, leveraging the strengths of various search tools to build an aggregate solution, may be the best approach for enterprises that need to balance the search requirements of employees, customers, business partners and others. "The whole idea of search being an enterprise utility, like water coming out of the tap, is very much still a myth," says Delphi's Reynolds.

America Online, for example, bases its search feature on Google but enhances it with Vivisimo technology, which organizes results into categories to make them easier to browse. Many enterprises also utilize multiple search engines, such as law firms that give their staff attorneys access to Google for general searches and Lexis for prowling through legal citations. Yet deploying and managing several engines can lead to greater up-front and maintenance costs, as well as possible technology conflicts, if the engines aren't thoroughly tested under various real-world conditions. Many enterprises use Web services at the data level and a portal at the user interface level to tie engines together for different user search scenarios.

As search technology improves, the need to combine individual search tools may diminish. Artificial intelligence-based enhancements, such as natural language interfaces, will eventually enable users to ask questions like, "How old will Prime Minister John Howard be on July 26, 2006?" and receive a precise answer instead of links. (He will be 67 years old on that date, in case you're inclined to send a card.)

Search researchers have been promising "semantic searches" for years. The World Wide Web Consortium has an entire project dedicated to the effort (www.w3.org/ 2001/sw), but the complex technology has so far eluded the best efforts of the world's most skilled search experts and remains years away from fruition. For now, enterprises are simply wondering how to use the tools at hand to help people find the information they need. "The problem is," says Reynolds, "they can't turn to a search engine to find the answer." v

Page Break

Turning Search Technology on Its Head

Jim Jansen thinks that most of today's search engines have things backward. The assistant professor of information sciences and technology at Penn State University has developed a search engine assist system that actively helps people find the information they need.

The new approach is the direct opposite of conventional search technologies, which require people to actively look for guidance (if any is offered) somewhere on the browser screen. Jansen's Agent Improved Information Retrieval System (AI2RS) uses "implicit feedback" as revealed through the searcher's query patterns. Incorporated into a browser, the software monitors the user's search actions and automatically provides suggestions for structuring and refining queries.

Jansen says AI2RS shouldn't be confused with "Clippy", the much-derided Microsoft Office assistant that many people believe was designed with the sole purpose of irritating users. "We tried to learn from this experience," he says. Jansen notes that he and his co-researchers studied users' search habits and designed a system that interjects only when it knows the user is amenable to receiving assistance. "The goal is to help, not annoy," he says.

And help is what beleaguered Web searchers desperately need. "Research shows that 50 percent of all Web search results retrieved are not relevant," says Jansen. "This technology provides a 20 percent performance increase."

Jansen hopes to follow in the footsteps of earlier university-based search technology pioneers, such as Google (which began life at Stanford University) and Vivisimo (a Carnegie Mellon University spin-off). "We're definitely considering this technology's commercial potential," he says.

Easy Integration Dances

By Christopher Lindquist

Intergration | The siren call of "dead simple integration" has lured legions of software vendors, venture capitalists and CIOs into the icy depths in the last couple of decades, but past failure hasn't stopped adventurous souls from dreaming of future success.

One of the latest attempts is an MIT technology project called DOME, which has been licensed to start-up software company Tambora. Tambora plans to commercialize the integration platform, which it has renamed BreadnButter, in hopes of bringing super-simple integration to the masses. The project started when MIT began working with Ford to create a way to ease data sharing and collaboration among employees who were designing new doors and windows for automobiles. Ultimately, they wanted a technology that would allow the collaborators themselves to quickly and easily integrate data that included some 3000 parameters, rather than involving the IT department.

After seven years and $US4 million, a roll-out is now happening inside Ford. But Bruce Anderson, acting CEO at Tambora, says that the technology is far more generally applicable and could be used to integrate many different types of data and applications. As proof, Tambora has already launched a demo site (www.tamborasoftware.com) in which users can quickly integrate Excel spreadsheets that update in real time whenever someone edits cells. The company announced plans to launch a hosted spreadsheet linking service in May, with additional features - such as the capability to run a Tambora integration server behind a corporate firewall, and removal of the need to upload spreadsheets to the server - coming later this year. Tambora will also offer a number of pre-packaged, pre-integrated spreadsheets that will be available free of charge on the site, with customization available for a fee. Anderson also says that spreadsheets are only the tip of the iceberg for Tambora; the company has plans to introduce other Bre adnButter-based integration options at some point in the future

The ROI of Open Source

Is open source cheaper than commercial?

It all depends on the job at hand

By Bernard Golden

SOFTWARE | The ROI of open source software is a contentious issue. Several recent studies, conducted by Yankee Group, JupiterResearch, Forrester Research and others, have focused on the ROI of upgrading a Windows installation versus switching to Linux and have concluded that it is less expensive to stick with Windows. But the reports miss a critical point: Switching from Windows to Linux is the worst-case ROI scenario. After all, the new platform requires training and perhaps hiring new personnel - always expensive propositions - versus merely paying for licences.

A more important question is, can open source generate real ROI elsewhere? Yes. Oregon State University (OSU), for example, has Web sites that visitors need to search, so the school bought a Google appliance for about $US125,000 per year. Two years later, OSU's IT department, aided by the Open Source Lab, replaced the appliance with an open source search product called Nutch (licence cost: $0). Nutch is not as easy to use as the Google software, so additional administration costs run to about $US10,000 yearly. The overall five-year payback, however, even when you consider additional hardware and engineering time, still produced an internal rate of return of 2300 percent.

As they say in the weight loss commercials, these results may not be typical. But other companies have also found success. ABB, for instance, is an $US18 billion Swiss industrial company. When its Power Technologies Products (PTPR) business needed to integrate new features into its software infrastructure, the PTPR Software Factory Group constructed a J2EE-based "Integration Framework" using a popular open source tool called Jboss. By using open source, ABB estimates it can save $US1.1 million in just its first five factories, with further savings to come as it rolls out to more of PTPR's 52 locations. Interestingly, the Integration Framework runs on Windows and uses SQL Server as its data store, belying the perception that moving to open source is a massive rip-and-replace operation.

The key to success is determining which projects make sense for open source. To get started, treat each product individually. Savvy organizations consider both commercial and open source options for projects, and choose the right product for the given situation. Then make sure you evaluate using the proper time horizon. A single heads-up comparison between a commercial product and its open source counterpart may not offer good open source ROI, because the costs of training and switching can outweigh the cost of a commercial licence. But if you extend the time horizon to the realistic life of the application, it may tip the balance toward open source. Finally, take the entire organization into account. While a specific open source project may not offer great ROI, the cost benefits of pioneer applications often materialize downstream in later projects that are able to adopt the open source package. Even if you purchase enterprise licences for your commercial products so that your marginal cost for a new applica tion is effectively zero, keep in mind that someday, when those licences are up for renewal, that marginal cost may be much higher.

This column hasn't touched on any of the other reasons organizations use open source: flexibility, reduced operational costs through not needing to track licence compliance, and greater control of the organization's software stack, since there are no forced upgrades or product end-of-life announcements. Because ROI is so tangible, however, it is critical to address it explicitly. Just keep in mind that there is no single answer; you need to find the right choice for your organization and your application.

Bernard Golden is CEO of Navica, an open source consultancy, and is the author of Succeeding with Open Source (Addison-Wesley, 2004), and the forthcoming Open Source Best Practices