With respect to the mass categorization that is central to most computer operations, there are two types of relevant data which affect speed of assimilation as well as information recall: structured data and unstructured data.
They say that curiosity is the cure for boredom, and if you’ve ever tried to find specific and detailed information using one of the popular search engines, you’ve probably spent your share of hours in front of the computer screen frustrated. It seems as though you should be able to find relevant websites and databases, but your search results return many different websites with the exact same information. What many people don’t know is that beyond their online presence on Facebook, YouTube and Google there is a vast wealth of knowledge waiting to be discovered. This information is commonly referred to as big data, and is largely inaccessible to conventional web users. Continue reading
Curious about BrightPlanet and our solutions? Want to know more about the Deep Web and how to take advantage of the resources that live there? This video will help you understand how our technology works – and how it can work for you.
THE DEEP WEB: SURFACING HIDDEN VALUE
Searching on the Internet today can be compared to dragging a net across the surface of the ocean. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. The reason is simple: Most of the Web’s information is buried far down on dynamically generated sites, and standard search engines never find it.
The Deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a “one at a time” laborious way to research. BrightPlanet’s harvest technology automates the process of making dozens of directed queries simultaneously using multiple-thread technology, and is the only current search technology that is capable of identifying, retrieving, qualifying, classifying, and organizing deep and surface content.
WHAT IS OSINT?
In short, Open Source Intelligence (OSINT) is the practice of using the Web to create intelligence. The longer definition of OSINT is an information processing discipline that involves finding, selecting, and acquiring information from publicly available sources and analyzing it to produce actionable intelligence. In the U.S. Intelligence Community (agencies like the DoD), the term “open” refers to overt, publicly available sources, as opposed to covert or classified sources.
Many other “INTs” exist, including HUMINT, which exploits intelligence from humans via communication and interviews; and GEOINT, which is geo-spatial Intelligence gathered from satellites, aerial photography, and mapping/terrain data.
Linked below are some helpful articles and best practices in Deep Web Harvesting.
What is the Deep Web?
The Deep Web is a complex concept. It is essentially two categories of data.
The first is basically any information that is not easy to obtain through standard searching, which could be Twitter or Facebook posts, links buried many layers down in a dynamic page, or results that sit so far down the standard search results that typical users will never find them.
The second category is the larger of the two and represents a vast repository of information that is not accessible to standard search engines. It is comprised of content found in websites, databases, and other sources. Often it is only accessible through a custom query directed at individual websites, which cannot be accomplished by a simple “surface web” search.