Welcome to the second post in BrightPlanet’s three part series that follows completely unformatted, unstructured web pages through the three step process that data follows to be made into actionable intelligence.
- Stage 1 – Harvesting
- Stage 2 (this post) – Normalization / Enrichment
- Stage 3 – Reporting and Analytics
In our last blog posting, we covered the first stage, harvesting. The post talks about how BrightPlanet harvested over 100,000 news articles from the top 50 newspapers using the Deep Web Harvester. In this post we’ll talk about the second stage, normalizing.
The Big Data explosion is impossible to ignore. The problem is 90 percent of new data is unstructured, making it challenging for analytics to create intelligence. Most of this unstructured data is hidden from regular web searches in what is called the Deep Web. U.S. intelligence agencies have exploited Big Data from the Deep Web for years; only recently has this technology been available commercially.
WHAT IS OSINT?
In short, Open Source Intelligence (OSINT) is the practice of using the Web to create intelligence. The longer definition of OSINT is an information processing discipline that involves finding, selecting, and acquiring information from publicly available sources and analyzing it to produce actionable intelligence. In the U.S. Intelligence Community (agencies like the DoD), the term “open” refers to overt, publicly available sources, as opposed to covert or classified sources.
Many other “INTs” exist, including HUMINT, which exploits intelligence from humans via communication and interviews; and GEOINT, which is geo-spatial Intelligence gathered from satellites, aerial photography, and mapping/terrain data.