The Department has a Data Asset Catalog that stores information about the Data Assets used to accomplish the mission of homeland security. It is being expanded to encompass the broader mission of data dissemination in addition to information sharing as part of the Open Government Initiative.
Data Asset Catalog
- A Data Asset is a distinct organized collection of structured, semi-structured or unstructured values. Examples include a database, web site, document repository, extended mark-up language (XML) file, a geospatial image file or a data service.
- A data asset may produce or store one or more datasets. For example, the National Emergency Management Information System (NEMIS) - Emergency Support Module is a FEMA data asset. The FEMA Disaster Declarations Summary and the FEMA Hazard Mitigation Program Summary are two datasets extracted from NEMIS Emergency Support Module.
Each DHS Component is responsible for maintaining an accurate, up to date description of its data assets within the Data Asset Catalog under DHS policy, and as documented in the Enterprise Data Management Concept of Operations. Progress on this initiative has been captured for the past three years on the Enterprise Data Management Scorecard, which is presented quarterly to the DHS CIO Council.
The DHS Data Asset Catalog currently includes approximately 900 data assets of an overall total of approximately 1,200 data assets. The majority of the remaining 300 data assets will be included in the catalog in Fiscal Year 2011 through continued support of the Data Management Working Group.
The information in the Data Asset Catalog includes security classification, privacy sensitivity, and handling restrictions such as For Official Use Only, Law Enforcement Sensitive, Special Security Information and other types of Controlled But Unclassified categories including non-government restrictions such as data protected by trade agreements or those to protect intellectual propriety of our private sector partners. Because of its homeland security and national security missions, the categorization of the Data Asset Catalog shows that only 5 percent of the 900 data assets contain data that is releasable to the general public.
Institutionalizing data dissemination to the public and creating a culture for Open Government includes putting into place a process where each data asset owner within the Components will review each data asset and identify potential candidate datasets which could be served to the public via Data.gov. This will be added to the Enterprise Data Management Scorecard in the 2011 timeframe.
In this process, the data owners will specify the broadest allowable scope for dissemination of the candidate dataset – the general public, private sector partners, state and local government, other federal government, and other DHS organizations. This list of potential candidate datasets for dissemination to the public will be put through the Department’s Open Government Initiative review process to address legal, financial, privacy and security concerns with release ability, which will result in the publishing of publically releasable data.
Of the 900 data assets, approximately one third have been reviewed, resulting in a list of 75 candidate datasets in the process of review.
Data Set Identification
The Department nominated a set of high value datasets as candidates for publication on Data.gov. These candidate datasets were initially the primary focus and the starting point for identification of the DHS candidate pipeline. They provided an example of the kind of Department information that would be considered high value that could guide the Components in targeting additional candidates.
Components and programs self-nominated datasets they could contribute. Suggestions have also been received from the Data Management Working Group. The Department identified data that is already published by the Department through component websites, which can be provided in a more open, usable format. The Department has established a site on the DHS intranet where DHS employees can view the pipeline and make additional suggestions.
Most importantly, suggestions are provided by the general public through the Data.gov public forum. These suggestions are provided to DHS through the Data.gov project management office. When suggestions for information are deemed to be too sensitive for release are received, Components strive to see if the data can be modified such that it is releasable and still useful.
The Department identifies potential sources for the candidate datasets and collects some high level information to determine whether or not the dataset is eligible for release. This high level summary answers four basic questions:
- What is the data in the submission?
- How is it generated?
- How can the data be used?
- What data types will be in the dataset?
The Department works with the organization that maintains the source system for the data to determine the level of effort that would be required to produce the dataset.
The Department is also working with the Components, the Data Management Working Group, and the Open Government Working Group to establish Data.gov candidate submission and review processes within the Components.