Big Data Mining: Harvesting the Deep Web

Tracking online activity is a difficult business. People move more and more of their lives to the world wide web, and there is thus a wealth of information out there that people have exposed, whether intentionally or unintentionally. With this comes all new methods of tracking down wrongdoing–every day, people use online mediums to communicate about or coordinate illegal activities. But the internet is a big place, and tracking down these cases–performing the necessary Big Data Mining–is not so simple as just typing a few keywords into Google, or another search engine.

Continue reading »

Posted in Deep Web and Big Data | Tagged , , , |

Deep Web: a Primer

What is the Deep Web?

The Deep Web is a complex concept. It is essentially two categories of data.

The first is basically any information that is not easy to obtain through standard searching, which could be Twitter or Facebook posts, links buried many layers down in a dynamic page, or results that sit so far down the standard search results that typical users will never find them.

The second category is the larger of the two and represents a vast repository of information that is not accessible to standard search engines. It is comprised of content found in websites, databases, and other sources. Often it is only accessible through a custom query directed at individual websites, which cannot be accomplished by a simple “surface web” search.

Continue reading »

Posted in Deep Web and Big Data | Tagged , , , |