Keywords, keywords, keywords!

Keyword searching has been used as a way to locate evidence from the very beginning of what we now refer to as data investigations, or digital forensics – that is the forensic analysis of digital or computer based data.

Keywords are simply words or phrases that you would logically expect to be closely related to data that you’re looking for. For example, in a case involving email harassment, you would expect that searching the suspects computer for the victims email address would be a good place to start – and you’d be right. What makes an email address like this a good choice of keyword is its ‘uniqueness’.

The more unique a keyword is to the matter at hand, the more likely you are to home in on the important evidence and reduce the number of false positives. False positives refer to successful discoveries of the word you’re looking for, but that are unrelated to your case.

In this series of blog posts we’ll discuss how a thoughtful approach to keyword selection along with a few simple operators will often produce the desired results quickly. In most cases, investigations revolve around references to people, places or things. The first important step therefore in selecting keywords to provide to your data analyst is to understand roughly what it is your looking for.

This may seem like an obvious point, but it actually exposes one of the biggest weaknesses of using keywords to find evidence – you’ll only find what you know to look for. This may mean that relevant information relating to other activities may go undiscovered. It’s for this reason that keywords should be used together with other techniques to provide a more complete coverage.

You will probably be surprised by the way that keywords will often throw up unexpected results – searching for the person’s name, like ‘sam’, will also return results like ‘sample’, ‘same’, ‘samsung’ and countless others. Searching for ‘cash’ will return hits for ‘lancashire’, ‘cashier’ and so on.

In our example case of email harassment, how can we use the concept of uniqueness to reduce the false positives and increase the chances of returned hits being relevant to us? The simplest way is to select a collection of words or phrases from the offending email and set that up as a ‘search phrase’. Keywords don’t have to be single words – they can be as long as you like (within reason) and can include spaces and punctuation.

So looking for ‘sam’ can be made far more focussed by capitalising the ‘s’ and adding a space at the end: ‘Sam ’. By doing this, we’re far more likely to find instances where the person’s name is used in a sentence – the downside is that we’ll miss the instance at the beginning of a letter: “Dear Sam,” because we’re looking for ‘Sam’ followed by a space and not a comma. Of course there’s nothing stopping you from having multiple keyword searches to cover all of the variations you might encounter, but that can potentially get very inefficient.

In my next instalment about keywords, we’ll cover some interesting ways to search for more than one thing at a time as well as thinking about close matches, like: “(john) within ten words of (smith)”. We’ll also look at how you can define a keyword search strategy to ensure you’re covering as much ground as possible.

By John DouglasTechnical Director, First Response

