Open Source Investigations – OSINT II
Where to look? Let’s start with some basics. Understanding the difference between the internet and the different types of information that travel upon it, is where I usually start when explaining the process to a new investigator. It is here that I discuss at a high level, the protocols, that control how particular information travels; HTTP, FTP, NNTP, IM, IRC etc.
The internet is a network of interconnected networks or in plain English, the wire, copper, fibre or radio waves (wireless). I then pose the question, ‘Can you surf the internet’? The correct answer is; ‘You can’t surf copper wire, cable etc. You can surf the world wide web or http which is only one of many protocols that is available on the internet’.
I use this question to differentiate between the internet and the data that travels upon it. Here I am trying to show the wider ‘where to look‘ and at first glance this might seem blindingly obvious. However by not really understanding this distinction, an open source investigator can limit their options and restrict their searches to smaller areas, or fail to recognise the significance of what they discover.
It may be thought that search engines search the internet for information, but actually they index the world wide web by following links and indexing the resultant web sites. It has to be said they do this efficiently and on a scale that is just immense. They use proprietary search criteria which in Google’s case is asking some 200 questions about pages during the indexing process.
In addition they apply a proprietary ‘ranking’ structure invented by Larry Page and Sergey Brin. These ranking algorithms differ from search engine to search engine and all deliver impressive results. But when you query a search engine, you are only searching the database of indexed results based on criteria defined by that search engine. Sites, pages or other resources without direct links to them will not be indexed and therefore you won’t find them.
That isn’t the only limitation. Search engines send you results based on your geographic location, or more specifically the regional version of the search engine you are using. Try searching for something on Google UK and then again using Google Australia. Compare the results and it will demonstrate what I mean. This of course is something that you can control, by selecting a regional version of the search engine.
The other important aspect to search engines is that they filter your results! Their motivation is good; they want to filter out explicit and offensive/unlawful material. It’s critical then that the open source researcher remembers that they are simply searching an indexed database and that the results returned are a filtered dataset. The filters by default are set to a ‘moderate’ level, and again these can be changed.
Using more than one search engine is obviously a good thing to do – which leads us to meta search engines. These search the search engines, in other words, they aggregate the results of various other search engines to give you a broader result set. Using more than one search engine and a number of meta search engines will yield much better results for any given query.
By adding web directories to the mix, which are human edited, you’ll quickly realise that by using all three types of search that an open source researcher can achieve a much wider range of results than from just using Google on its own.
To conclude then, a good open source researcher will know that they have to use more than one search engine, a number of meta search engines and include web directories when appropriate.
In my next piece I want to look at how we record what we find to allow the end product to be distilled into intelligence, evidence and unused material. These days all three are relevant to any civil, corporate or law enforcement investigation. Happy hunting!
By Ray Massie – Operations Director, First Response