spacer.png, 0 kB
CompletePlanet

A BRIGHTPLANET CASE STUDY

Quite often, a company becomes its own customer. It has a problem that can be solved by one of its own products. For BrightPlanet, its CompletePlanet web site was in need of an enterprise solution to the problem of federating disparate searchable databases from all over the Internet. BrightPlanet's own Deep Query Manager/Publisher came to the rescue and automatically maintains CompletePlanet with minimal staff involvement.

CompletePlanet is both a developmental resource and a public service provided by BrightPlanet. As a developmental source, new functionality is often implemented on CompletePlanet and finds its way into BrightPlanet's commercial products. As a public service, the site is available to all researchers looking for searchable databases focused upon their area of research.

The Problem

CompletePlanet updates are complex and comprehensive. Tens of thousands of searchable databases (over 70,000 currently) must periodically be harvested, cataloged, and characterized. This cannot be performed manually as the process would take a massive staff and many months or years to complete per harvest.

The searchable databases are all over the world and are in a constant state of flux. There are no standards for configuring a search form or accessing searchable databases. As a result, they can and are created in almost every way imaginable. In addition, they are often being changed so that settings made to access a database today will not work to access it again tomorrow because the databases' developers have made a change in their code.

BrightPlanet Solution

BrightPlanet's Deep Query Manager/Publisher (DQM/P) is an automated portal creation and maintenance application. The portal administrator needs to provide the initial subject taxonomy delineating the topic structure and the searchable databases from which to harvest. Once provided, the DQM/P will build the entire site quickly, usually in several days. From then on, the only time needed of the administrator is that to add new topic nodes and/or harvest sites, and to start the harvest. Alternatively, the harvest can be scheduled to run automatically on a periodic basis.

Benefits

"The site almost completely takes care of itself", reports Susan Niemeck, BrightPlanet's staff ontologist responsible for the subject taxonomy and content quality. Building a site portal is a comprehensive multi-step process:

  1. The subject taxonomy must be created.
  2. The configuration needed to harvest each individual site must be determined
  3. The content from the searchable databases must be harvested
  4. That content must be analyzed and indexed
  5. All content documents must finally be placed at the appropriate location in the subject tree so that site visitors can navigate the topic nodes to find their desired information.

"The tasks that are performed rarely, such as creating and maintaining the taxonomy are not the problem", says Niemeck. "The real problem is the actual building and periodic updates of the site. It's all brute force work, a lot of it, and it can take months to perform manually. How do you do that when you want content to be updated and maintained 'fresh' on a weekly basis? It must be automated!"

From the multi-step process list shown above, the first and second step must be performed by a human being. From step three on is fully automated.

"The DQM/P makes the entire content searchable", indicates Niemeck. "There's no need to navigate the site if you already know and can describe what you're looking for. And the DQM/P let's you use natural language or extended Boolean operators and phrases to find your documents directly."

Niemeck particularly likes the scheduling options; "There is great flexibility in how you setup and run harvests. For example, you can batch 90% of the harvests and run them monthly. At the same time, you can take a site that changes more frequently and harvest it weekly. You can set high traffic sites to be harvested only during early morning hours. Regardless of the different harvest periods and times, the harvest updates are all seamlessly integrated into the DQM/P portal site. This flexibility allows you to harvest any permutation of sites and schedules."

The DQM/P can even harvest from Web sites with no search facility. So if you want content from a large site that isn't searchable, you can harvest it into the DQM/P and then your site visitors can search that content from the portal.

 

completeplanetscreenshot

 

The Deep Query Manager/Publisher performs the following functions:

  1. Automatic Batch submission to all harvest sites
  2. Automatic self configuration of changing database access methods
  3. Automatic, scheduled update of the portal site
  4. Automatic characterization of sources which determines what kind of information is in each searchable database
  5. Authoritative sources — Results from authoritative sources are properly weighted so that they rise to the top and are found first by the user.
 
spacer.png, 0 kB
spacer.png, 0 kB
spacer.png, 0 kB

Sitemap Privacy About Us Contact Us Site Use

spacer.png, 0 kB