Publication
EuroSys 2011
Conference paper

An experimental study on the measurement of data sensitivity

View publication

Abstract

Data-centric security proposes to leverage the business value of data to determine the level of overall IT security. It has gained much enthusiasm from the security community, but has not materialized into a practical security system. In this paper, we introduce our recent work towards fine-grained data centric security, which estimates the sensitivity of enterprise data semi-automatically. Specifically, the categories of sensitive data and their relative sensitivities are initially determined by subject matter experts (SMEs). We then apply a suite of text analytics and classification tools to automatically discover sensitive information in enterprise data, such as personally identifiable information (PII) and confidential documents, and estimates the sensitivity of individual data. To validate the idea, we developed a proof-of-concept system that crawls all the files in a personal computer and estimates the sensitivity of individual files and the overall sensitivity level of the computer. We conducted a pilot test at a large IT company with its employees' laptops. The pilot scanned 28 different laptops, in which 2.2 million files stored in various file formats were analyzed. Specifically, the files were analyzed to determine if they contain any of the pre-defined sensitive information, comprising 11 different PII types and 11 sensitive topics. In addition to the sensitivity estimation, we also conducted a risk survey to estimate the risk level of the laptops. We found that, surprisingly, 7% of the analyzed files belong to one of the eleven sensitive data categories defined by the SMEs of the company, and 37% of the files contain at least one piece of sensitive information such as address or person name. The analysis also discovered that the laptops have similar overall sensitivity levels, but a few machines have exceptionally high sensitivity. Interestingly, those few highly sensitive laptops were also most at risk of data loss and of malware infection, according to user survey responses. Furthermore, the tool produces the evidence of the discovered sensitive information including the surrounding context in the document, and thus users can easily redact the sensitive information or move it to a more secure location. Thus, this system can be used as a privacy enhancing tool as well as a security tool. © 2011 ACM.

Date

Publication

EuroSys 2011

Authors

Share