Data, Information and Knowledge

Data by itself is useless. An MP3 file is garbage without software to render it into a song, which is a kind of information. Ditto with your bank balance and the video you shot over the holiday, or the formula or source code for a new product. Instead of making data security our top priority, wouldn't we be better off focusing on data transformation?


I still see far too many examples of content confusing the ideas of data and information. Sometimes it seems a writer is simply trying to avoid being redundant when using data and information in the same sentence to mean the same thing. Of course, they are different, and the result is unnecessary confusion.

I just wrote a paper for a European law journal on the topic, and I learned more about it than is healthy for one person. The piece will be out in August. Generally, I admire the effort the Europeans are making to get it right, though they are less concerned with data and information per se than they are with privacy and security. These things all intersect but in sometimes unpredictable ways. The more I think about things, the less I am sure of -- and the more questions I have.

The European parliament is trying to figure out laws that protect individual rights to privacy, which necessarily affect what data is kept and what is not. That makes sense, and it sounds simple, but how do you do that? Does a person walking on a street have a right to privacy and thus a right to determine how you use a crowd photo?

What if a corporation like Google or a government takes the photo? Are we to prevent photos, based on the premise that someone someday might do something to a person in one of the photos based on the picture? From there it gets silly, but there are some concrete situations that are nothing to laugh at.

The Persistence of Memory

Take the case of a nurse in Connecticut who was arrested for possessing a small amount of pot. The case was dismissed when she agreed to take some drug education courses, according to an article in The New York Times. In the good old days, that would have been the end of it, because according to Connecticut law -- and the laws in many other states -- her record was wiped clean with the dismissal. Under Connecticut law, she can even testify under oath that she has never been arrested now that the record has been cleared.

That all makes good sense to me. It might not be factually correct, but these expungement laws are one of the fictions we create in modern life to keep the world spinning. However, with the Internet, there's no such thing as expungement, and a search still comes up with the original news article that -- while true when it was published -- is now false.

It matters, because this nurse can't find a job any more, thanks to the simple expedient of prospective employers doing a rudimentary search on every new job applicant. What to do? She's suing the news organizations that wrote the story for slander, but the story was true when it was reported. Yikes!
The Internet and our modern world are full of examples like this. Society used to be able to conveniently forget small indiscretions, and we all got on with life. Now that's being taken away, without anyone even giving permission or any new law being adopted. The Internet is the defacto repository of all things digital about us -- but should it be? The Europeans take all this very seriously, and perhaps we should too.

Information Alchemy

It seems to me that the biggest issue we have with data and information today is not data security, even though lots of it gets stolen (I'm talking to you People's Liberation Army unit 61398). In fact, I think we've put too much emphasis on physically securing data and given too little thought to how it is transformed into information.

After all, data by itself is useless. An MP3 file is garbage without software to render it into a song, which is a kind of information. Ditto with your bank balance and the video you shot over the holiday, or the formula or source code for a new product.

Wouldn't we be better off focusing on data transformation? A new photo sharing service, SnapChat, takes this approach by delivering photos that disintegrate after 10 seconds. That's far from ideal for most applications, but it's on the right track.

Generally, I think data ought to be handled like milk in a supermarket; it ought to have an outdate after which it automatically becomes archival. You might be able to access archival data, but transforming it back into its original information content would have to be restricted in some way.
Look, we can still access information about various flat Earth theories, but we all know this is archival and historic and no longer scientific. Some of us can still take it seriously if we want to, but we can't take it to the bank or whatever -- you know what I mean.

We don't have anything like that for data yet -- something that says this does not yield not the information it once did. On a parallel path, if we were better able to control the conversion of data to information so that only the data's owners could de-encrypt it, might we have less data theft and the loss of intellectual property that goes with it?

If any of this makes sense, then it's not data security we should focus on as much as secure data conversion or transformation into information -- those are different issues with different approaches. When you think of it this way, the differences between data and information are starkly clear. It gives us all good reason to consciously choose the right words to convey our meaning.

Comments