Eleven years ago, BrightPlanet coined the term “Deep Web” to describe Internet databases not reached by surface web crawlers like Google, Bing, or Yahoo. However, there continues to be confusion between the Deep Web and the “Dark Web.” There are major differences between the two terms; Deep Web and Dark Web should NOT be used interchangeably.
The Dark Web
The Dark Web refers to any Web page that has been intentionally concealed to hide in plain sight or reside within a separate, but public layer of the standard Internet.
The internet is built around Web pages that reference other Web pages, if you have a destination Web page which has no inbound links you have concealed that page and it cannot be found by users or search engines. One example of this would be a blog posting that has not been published yet, it may exist on the public Internet but unless you know the exact URL, it will never be found.
Other examples of Dark Web content and techniques include:
- Search boxes that will reveal a Web page or answer if a special keyword is searched. Try this on Google with the keyword “distance from sioux falls to new york”.
- Hiding a hidden message within a Web page comment which would require knowledge of where to look. You can view the Source of this page to see an example.
- Sub-domain names that are never linked to; for example, “internal.brightplanet.com”.
- Relying on special HTTP headers to be present to show a different version of a Web page.
- Images that are published but never actually references, for example “/image/logo_back.gif”.
Virtual private networks are another aspect of the Dark Web that exist within the public Web which often require additional software to access. TOR (The Onion Router) is a great example. Hidden within the public Web is an entire network of different content which can only be accessed by using the TOR network.
While personal freedom and privacy are admirable goals of the TOR network, the ability to traverse the Internet with complete anonymity nurtures a platform ripe for what is considered illegal activity in some countries, including:
- Controlled substance marketplaces
- Armories selling all kinds of weapons
- Child pornography
- Unauthorized leaks of sensitive information
- Money laundering
- Copyright infringement
- Credit Card fraud and identity theft
Users must use an anonymizer to access TOR Network/Dark Web websites. The Silk Road, an online marketplace/infamous drug bazaar on the Dark Web, is inaccessible using a normal search engine or web browser.
The Deep Web
The Deep Web consists of dynamically-generated Internet content accessible only by querying a search box on a Deep Web website.
Surface Web search engines (Google/Bing/Yahoo) can lead you to websites that have unstructured Deep Web content. Think of searching for government grants; most researchers start by searching “government grants” in Google, and find very few listings for government grants. Google will direct researchers to the website www.grants.gov, but not to specific grants within the website’s database.
However, researchers can search thousands of grants at www.grants.gov by searching the database via the website search box. In this example, a Surface Web search engine (Google) led users to a Deep Web website (www.grants.gov) where a directed query to the search box brings back Deep Web content not found via Google search.
Everything you can find on the Deep Web you could find manually clicking through websites one-at-a-time, but BrightPlanet’s expertise allows intelligence agencies, governments, and private sector entities a scalable solution to search Deep Web websites automatically. Customized filters and analytic tools within BrightPlanet harvesting technologies target specific information quickly without needing Google to first index the content.
Exploiting Big Data from the Deep Web
Interested in the data on the Deep Web? Check out our whitepaper on creating intelligence from the Deep Web.