With respect to the mass categorization that is central to most computer operations, there are two types of relevant data which affect speed of assimilation as well as information recall: structured data and unstructured data.
Structured vs. Unstructured data
What is Big Data?
They say that curiosity is the cure for boredom, and if you’ve ever tried to find specific and detailed information using one of the popular search engines, you’ve probably spent your share of hours in front of the computer screen frustrated. It seems as though you should be able to find relevant websites and databases, but your search results return many different websites with the exact same information. What many people don’t know is that beyond their online presence on Facebook, YouTube and Google there is a vast wealth of knowledge waiting to be discovered. This information is commonly referred to as big data, and is largely inaccessible to conventional web users. Continue reading
Video: Discover the Deep Web in 90 Seconds
Curious about BrightPlanet and our solutions? Want to know more about the Deep Web and how to take advantage of the resources that live there? This video will help you understand how our technology works – and how it can work for you.
BrightPlanet from Vision Video Interactive on Vimeo.
The Deep Web: Surfacing Hidden Value
THE DEEP WEB: SURFACING HIDDEN VALUE
Abstract:
Searching on the Internet today can be compared to dragging a net across the surface of the ocean. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. The reason is simple: Most of the Web’s information is buried far down on dynamically generated sites, and standard search engines never find it.
The Deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a “one at a time” laborious way to research. BrightPlanet’s harvest technology automates the process of making dozens of directed queries simultaneously using multiple-thread technology, and is the only current search technology that is capable of identifying, retrieving, qualifying, classifying, and organizing deep and surface content.
Download this White Paper in PDF Format
Industry Specific Articles & Insights
WHAT IS OSINT?
In short, Open Source Intelligence (OSINT) is the practice of using the Web to create intelligence. The longer definition of OSINT is an information processing discipline that involves finding, selecting, and acquiring information from publicly available sources and analyzing it to produce actionable intelligence. In the U.S. Intelligence Community (agencies like the DoD), the term “open” refers to overt, publicly available sources, as opposed to covert or classified sources.
Many other “INTs” exist, including HUMINT, which exploits intelligence from humans via communication and interviews; and GEOINT, which is geo-spatial Intelligence gathered from satellites, aerial photography, and mapping/terrain data.
Best Practices in Deep Web Harvesting
Linked below are some helpful articles and best practices in Deep Web Harvesting.
Guide to Effective Searching of the Internet
Accessing the Deep Web – A Survey
The Big Promise of BigData
The Web Search Guide
BrightPlanet Unlocks the Deep Web
Deep Web: Advanced
Can you use Surface Web sites to find Deep Web content?
For all practical purposes, no. Surface Web search results are links based on “relevancy by popularity”, ranked by how often documents link to each other (page rank). Thus, the first results you see are only the ones that have had the most references by other documents, and not necessarily the most relevant or recent data. This typically is the information you are looking for when searching for a good place to eat, the name of a company that you just heard about, or the capital of South Dakota (Pierre). Continue reading
Deep Web: a Primer
What is the Deep Web?
The Deep Web is a complex concept. It is essentially two categories of data.
The first is basically any information that is not easy to obtain through standard searching, which could be Twitter or Facebook posts, links buried many layers down in a dynamic page, or results that sit so far down the standard search results that typical users will never find them.
The second category is the larger of the two and represents a vast repository of information that is not accessible to standard search engines. It is comprised of content found in websites, databases, and other sources. Often it is only accessible through a custom query directed at individual websites, which cannot be accomplished by a simple “surface web” search.

