Visualizing Data Breaches
With all the security companies cropping in recent years, I can't help but wonder: "is the Internet finally getting any better at security?"
In Search of Data
Obtaining breach statistics is a bit like a scavenger hunt; a tidbit of data from news article here, a shard of content from a press release there. Many of the statistics cited aren't official, and there's often conflicting information. Using the Yahoo breach an example, some sources cite 500 million accounts were exposed, others cite one billion. Moreover, the virtual Venn diagram of coverage across security sites has both overlap and gaps.
In my perusing of various security sites I came across Vigilante.pw, a site which catalogs database breaches over time. With close to 2,500 entries, Vigilante.pw is one of the most comprehensive breach databases I've seen so far, and the good folks who run then site obliged my request to visualize the site's data set.
When you examine a spreadsheet of 2,475 incidents, patterns don't jump right out at you. However when we look at aggregations in any common data visualization tool, we start to see some interesting insights, such as:
- Gaming sites are by far the most targeted assets, followed by shopping sites, and (surprise!) hacking sites.
- Social media sites seem to have bled the most content over time; about three quarters of a billion accounts!
- The vast majority of breaches lose under one million accounts. The 100 million+ behemoth breaches we see in the headlines you can count on two hands as there's actually less than ten.
Given all these breaches, I can't help but wonder: why are all these sites shedding confidential data on a wholesale scale? Perhaps there's issues with the way passwords themselves are stored:
I was less than surprised to see "no passwords" and "plaintext" gracing the worst offender top-10 list. And given SHA-1 and MD5 are known to be generally unsafe, their attendance at this motley party lends no shock, either. That leaves MyBB, osCommerce, vB, and IPB: an eCommerce software package and a collection of bulletin board systems. I'm not sure what, if anything, those solutions have in common. I suppose this is one of those times when data only prompts more questions.
So, are we getting "better" at security? It's hard to say. When examining the timeline above, we're clearly seeing not only more breaches, but breaches of greater magnitude. Statistically, that may be due in part to the fact that the Internet community is merely keeping better records of its modern-day breaches. In other words, the past may have been just as bad if not worse than today, but lack of transparency will likely forever shield us from those historical details. I suppose only time (and solid record keeping) will tell.