Clients and Case Studies
Clients
The KAPS Group has done projects for an extremely wide range of client industries from retail to manufacturing to finance to pharmaceuticals and more. This breath of experience has resulted in a set of procedures that can be applied to any industry.
Communications
- Amdocs
- Text Analytics Software Evaluation
- Develop Auto Tagging Application
- Develop Sentiment Analysis Application
- Technology: SAS Enterprise Content Categorization
Computer-Manufacturing
- Intel
- Mini-POC
- Mini-Strategy
- Technology: SAS Enterprise Content Categorization
Consulting
- Battelle
- Taxonomy Development
- Text Analytics Software Evaluation
- Develop Info Portal
- Develop Expertise Location Application
- Technology: Inxight (SAP)
Financial-Development
- IFC
- Text Analytics Workshop
- Text Analytics Strategy
- Develop Content Structure Model
- IMF
- Text Analytics Software Evaluation
- Text Analytics and Metadata Strategy
- Develop Content Structure Model
- Technology: Expert AI
- Inter-American Development Bank
- Text Analytics Strategy
- Taxonomy-Vocabulary Assessment
- World Bank Group
- Develop Metadata Standard
- Develop Taxonomy
- Text Analytics Software Evaluation
- Develop Auto-Categorization and Data Extraction Application
- Technology: SAS / Teragram
Financial Services
- Connolly
- Text Analytics Strategy
- Develop Search Application
Foundations
- Kellogg
- Mini-POC
- Mini-Strategy
- Technology: SAS
- Robert Wood Johnson Foundation
- Text Analytics Software Evaluation
- Develop Auto-Categorization and Data Extraction
- Technology: Megaputer, then Expert AI
- Foundation Center
- Taxonomy and Search Strategy
- Text Analytics Strategy
- Text Analytics Evaluation
Government
- British Parliament
- Search Strategy
- Department of Transportation
- Text Analytics Software Evaluation
- Develop Auto-Categorization
- Develop Auto-Tagging Application
- Technology: SAS
- Federal Reserve Board
- Text Analytics Workshop
- Text Analytics Software Evaluation
- FDA
- Text Analytics Workshop
- Text Analytics Software Evaluation
- Text Analytics and Search Strategy
- Army Medical
- Taxonomy Strategy
- Text Analytics Strategy
Hotel Industry
- Hilton
- Taxonomy and Search Strategy
- Hyatt
- Taxonomy and Search Strategy
Insurance
- Northwestern Mutual
- Develop Taxonomy
Library
- Harvard Business Library
- Taxonomy Evaluation
Pharmaceutical
- Boehringer Ingelheim
- Text Analytics Software Evaluation
- Develop Auto-Categorization
- Technology: SAS
- Chiron (Novartis)
- Develop Taxonomy
- Text Analytics Evaluation
- Genentech
- Develop Taxonomy
- Develop Info Portal for Sales
Publisher
- Foundry-IDG
- Text Analytics Software Evaluation
- Develop Auto-Categorization and Data Extraction
- Develop Auto-Tagging Application
- Technology: Expert AI
- Financial Times
- Taxonomy and Search Strategy
Retail
- Home Depot
- Legal Expert Witness
- Reed Construction
- Develop Taxonomy
- Develop Data Application – Facts
- Develop Automatic Summary Application
- Technology: SAS / Teragram
Science – Technology
- American Institute of Physics
- Develop Taxonomy
- Tagging Strategy
- Association of Computing Machinery
- Develop Taxonomy
- Develop Data Extraction
World Bank Group
Enterprise Text Analytics Platform for Multiple Applications
The World Bank Group consists of the World Bank, the IMF, and the IFC. The KAPS Group has done a number of projects for all three groups. All three groups had similar problems: poor enterprise search, fragmented collections of structured and unstructured information, labor intensive information gathering processes and attendant errors, and a lack of standards.
The initial engagement was with the World Bank and began with a text analytics evaluation project that recommended SAS Enterprise Content Categorization software. This included extensive interviews with the full range of potential stakeholders to generate comprehensive requirements. These interviews included high level context interviews to obtain an overview of their current information management environment and its major issues. We also conducted a number of information interviews that focused on how information was being used in various jobs in order to build targeted applications.
In addition to the interviews, we carried out an extensive content analysis including existing metadata using our text analytics tools, a taxonomy analysis, and several auto-categorization and data extraction pilots and prototypes. The goal of these activities was to create a foundation for future projects that the World Bank could build on.
Other projects included a shared drive reclamation project that included a search pilot, a new metadata standard, and tools to identify similar and duplicate content. We also developed a governance plan. Finally, we also conducted a number of experiments such as a sentiment analysis pilot.
Key Points:
The KAPS Group was brought in when IT realized that they needed help.
Selecting the right software is critical to success.
Text analytics is not a single application. It is a platform for applications.
The best results are when technology and KM or business groups collaborate.
Reed Construction Data
Automatic Fact and Data Extraction Application
Reed Construction Data is a company that takes hundreds of thousands of constructions proposals that can range from 10 to 1,000s of pages in all different formats and create a 3-5 page standard summary. They had been doing it manually, but the cost and inconsistency was becoming too great. They wanted to create an automated Table of Contents and extract a broad range of key data. Their first attempt used a combination of two software packages but they were unable to achieve the desired accuracy.
The basic issue was that they needed to extract facts, not just data. For example, it is easy to extract all the dates in a document, but they had several kinds of dates. Our solution was to recommend the SAS Enterprise Content Categorization software which could employ their categorization capability to, for example, disambiguate a Bid Date from all the other dates found throughout the documents. This was done by taking the context of the words around each date and using those words to determine if it was a Bid Date or not.
Another issue was that their earlier rules mixed the logic of the rule and the text that the rule used. This led to extremely complex rules that were very difficult to understand and even more difficult to maintain. We developed a set of rules that separated the logic from the text, a technique that we have used ever since to great effect.
Key Points:
- KAPS was brought in after an earlier failure and we succeeded
- Selecting the right software is critical to success.
- Fact extraction is much more difficult than simple data extraction and requires the ability to capture and categorize the context of each data element
- Creating rules that separate logic and text is critical for understanding and maintenance.
Robert Wood Johnson Foundation (RWJF)
Auto-Categorization Using Content Structure Models
RWJF is a charitable foundation that focuses on healthcare and related areas. They have a large number of proposals that needed extensive subject tagging. RWJF had recently developed a new enterprise taxonomy and wanted to use auto-categorization to apply the taxonomy to both new and legacy content. Their initial project was not able to achieve the targeted accuracy with the software chosen and so they contracted with the KAPS Group. We started with two of our Mini-projects – a Mini-Strategy and Mini-POC.
The Mini-POC utilized one of the KAPS Group’s secret-to-success technique of developing a content structure model for their test content and achieved a major success.
Following this success, we developed an overall text analytics strategy for the organization and then a text analytics software evaluation project that selected two top vendors, Megaputer and Expert AI. We utilized both.
The next step was to develop a complete auto-categorization capability for tagging documents, using their enterprise taxonomy. They had been tagging documents manually and in addition to the labor costs, their tagging was extremely inconsistent. For example, we discovered that some documents had 1-2 subject tags and others, very similar, were tagged with 10 of the 12 subject categories. These tags were being used in an analytical application that no one was using.
After building a set of auto-categorization rules that achieved 80%+ accuracy, the first application was to re-tag all the documents in that repository with 1 primary tag and 1-2 secondary tags if their relevance was within 10 points of the primary tag. This re-tagging took only about 1 day. A second application refined the initial rules to cover any vocabulary differences between the initial document repository and the new, larger one. This also took very little effort and the retagging took a few days with little human involvement.
Key Points:
- KAPS was brought in after an earlier failure and we succeeded.
- Selecting the right software is critical to success.
- Content structure models can improve accuracy by over 40%
- Mini-POCs are a powerful way to prove the effectiveness of auto-categorization, develop a starting foundation, and promote the solution throughout the organization.
- A Mini-Strategy is a good and low-cost way to understand what text analytics can do for an organization.
- Once a text analytics foundation is built, subsequent applications can be done quickly and cheaply.