2012 Top 5 Post Countdown: What is a Deep Web harvest?

We’ve made it to #3 on our 2012 Top 5 countdown of BrightPlanet’s most popular Deep Web University content: What is a Deep Web Harvest? The post was posted on July 23 and explains what a Deep Web Harvester is, what it does, and why you need it.

Top 2012 Post #3

 

Photo: jjandames

Posted in Deep Web and Big Data | Tagged , , , |

What is a Deep Web harvest?

Far below the surface of the Internet lies the Deep Web. Google can’t take you there, nor can any other standard search engine. The Deep Web is full of valuable research, content and data sets too large for the average Internet user to analyze one click at a time. The Deep Web can be accessed only by searching a website’s search box, thereby querying the database behind a Deep Web website. Without access to the Deep Web, the user will not find content on the Internet that isn’t explicitly linked to.

Continue reading »

Posted in Deep Web and Big Data, Financial Industry, Healthcare | Tagged , , , , , , , |

Big Data Mining: Harvesting the Deep Web

Tracking online activity is a difficult business. People move more and more of their lives to the world wide web, and there is thus a wealth of information out there that people have exposed, whether intentionally or unintentionally. With this comes all new methods of tracking down wrongdoing–every day, people use online mediums to communicate about or coordinate illegal activities. But the internet is a big place, and tracking down these cases–performing the necessary Big Data Mining–is not so simple as just typing a few keywords into Google, or another search engine.

Continue reading »

Posted in Deep Web and Big Data | Tagged , , , |

Deep Web: Advanced

Can you use Surface Web sites to find Deep Web content?

For all practical purposes, no. Surface Web search results are links based on “relevancy by popularity”, ranked by how often documents link to each other (page rank). Thus, the first results you see are only the ones that have had the most references by other documents, and not necessarily the most relevant or recent data. This typically is the information you are looking for when searching for a good place to eat, the name of a company that you just heard about, or the capital of South Dakota (Pierre). Continue reading »

Posted in Deep Web and Big Data | Tagged , , , |