CSD sponsors an initiative to facilitate the accessibility of computer and network operational data for use in cybersecurity defensive research and development. The primary goal of the Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT) system is to bridge the gap between producers of security-relevant network operations data and technology developers, both commercial and academic, and evaluators who can leverage this data to accelerate the design, production, and evaluation of next-generation cybersecurity solutions.
PREDICT was initiated to assist technology developers and evaluators, who often had to determine the efficacy of their technical solutions based on anecdotal evidence or small-scale test experiments, rather than using more comprehensive real-world data. PREDICT now provides regularly updated network operations data sources relevant to cybersecurity technology development. PREDICT is intended to provide timely and detailed insight into cyberattack phenomena occurring across the Internet, as well as insight into the health of the Internet to include outage detection. Data in PREDICT is appropriately anonymized to ensure non-attribution.
The PREDICT website (https://www.predict.org/) contains an overview, general background information, and the data repository catalog. Basic categories of datasets include those relating to Internet traffic flow, Internet topology data, Domain Name System (DNS) data, Border Gateway Protocol (BGP), Intrusion Detection System (IDS) and firewall data, and botnet behavior. Future datasets will include data used to evaluate CSD research projects. Descriptions of the specific categories are provided along with descriptors relating to the fields of the individual datasets. Access to the PREDICT data repository is available to eligible researchers and technology developers from approved DHS locations upon approval of their user accounts. In addition, new sources of data are continually being sought.
Considerable effort has been devoted within the PREDICT community to ensuring the privacy of individuals and organizations with respect to the contents of the data repository. The DHS PREDICT Privacy Impact Assessment document represents a significant proactive analysis of the privacy concerns and what measures are needed to address them.
An important part of making data available via PREDICT has included an exploration of ethics applied to human use and privacy aspects of cyber security research. As a result, the Menlo Report was created to propose a framework for ethical guidelines for computer and information security research, based on the principles set forth in the 1979 Belmont Report for ethical research in the biomedical and behavioral sciences.