Tuesday, January 18, 2011

Metadata and You

I first encountered the word metadata back when I started creating Web pages in the mid 1990's. Programs called "search engines", like Webcrawler and Altavista, would automatically find your page and make it available for the world to find. They did have issues adequately describing your page, so as a Web developer adding "meta tags" was critical. It was a way of adding text that did not show up in the Web browser, but allowed search engines to find information about your site.

Metadata is something that is used today in many areas, from computer forensics to corporate espionage. I will give you a regular example first. As a professor, there are times when I think something might be an exact copy of someone else's file. The first thing I will do is take a look at the file properties. In Office 2010, I would go to the File tab and select "Info". On the right side are properties. I can see very easily the name of the person who created the file. In a computer lab, most people probably have the same user name, so that may not tell me anything. However, if you created it at home and gave it to a friend, there is pretty damning evidence since your friend has handed in a file with your name in it. Other ways include "date created" - this tells me the day and time the file was created. I am of course not opening up my whole bag of tricks here, but these are two ways to investigate a file further.

In terms of corporate espionage and hacking...many times, the metadata in programs such as Word (and most of the rest of Office) includes data like username, company name and a file path. If this file was created on a network drive, I now know the name of one of your company's internal servers and possibly your username. This information is valuable for hackers!

If you are distributing a file from Office, also be aware if your company uses tracking changes, revisions, comments, or hidden text, that information can be included in a file you distribute. If a member of a company's staff left a comment in the file, there is a good chance it could be found. You can use the Office 2010 Prepare for Sharing options to minimize this risk, though once again, most people do not realize this.

Even programs like Photoshop can cause metadata issues. Let's say you have an image, and you choose to blur out bits of it. Photoshop will save a thumbnail as part of the file, to make it quicker for the operating system to give users a preview. Therefore, a smart hacker may be able to see your original image using some advanced techniques. Programs such as jStrip will help minimize this risk, but many people don't realize it is a risk.

Like many other technology issues, the only way people know about this generally seems to be if they are burned by it.

