Secure Data Enclave — SDE

A secure virtual desktop research environment.

The Secure Data Enclave (SDE) provides Columbia researchers with a secure, remotely accessible, virtual Windows 10 desktop environment to store and collaboratively analyze PII and PHI data as an alternative to traditional cold room computing environments.

Using a web browser, researchers can work on sensitive data and collaborate with other members of their project simultaneously. Researchers will only be able to access data explicitly placed in the virtual environment which is destroyed after use and is restricted so it can only reach other systems in the SDE. Data can only be transferred to and from the system by a designated Data Security Officer who is assigned to each individual project.

Data Compliance

The SDE is certified by CUMC Security as HIPAA compliant, and certified for the storage and analysis of PII and PHI data. Users can also reference the CUMC Security RSAM registration ID number, 3868, in their IRB applications, which confirms its certification by CUMC Security for HIPAA compliance.

Additionally the SDE has been approved for use of popular restricted datasets including the Bureau of Labor Statistics National Longitudinal Surveys (NLSY) datasets, University of North Carolina Longitudinal Study of Adolescent Health (Add Health) datasets, and European Commission Eurostat restricted economic datasets.

Computing Resources

Researchers on the SDE have access to a virtual Windows 10 desktop system with 4 cores of an E5-2680 CPU and 16GB of RAM. The amount of storage is divided between shared and individual user storage. A standard project is given 60 GB of data storage and 25 GB of group work collaborative space. Individual users get 20 GB of working directory space, and 2 GB for a Home Directory for code files and an additional 2 GB to stage files that need to be moved off the SDE.

If increases in CPU, RAM  or storage resources are necessary, reach out to sde-support@columbia.edu to open a review by RCS. Such changes may incur additional fees to acquire and provide said resources.

Software Support

The Research Computing Services (RCS) team handles software installation and updates, but user licenses must be provided by the project. Currently the SDE supports many types of statistical packages, including Stata, R, STAN, QGIS, and more. Other historically used programs, depending on licensing availability, have included SPSS, SAS, and more.

Projects

The standard offering is 3 accounts: Primary Investigator (PI), Research Assistant (RA), and Data Security Officer (DSO). Both for security and system access volume we ask project applicants to err towards restricting project members to as few users as necessary. If more researchers require access, it can be accommodated, but there may be additional costs associated.

Users must have a UNI and VPN access to use the SDE. However, outside collaborators can get these through appropriate department level HR status. CUIT’s RCS team manages accounts for the SDE for Columbia affiliated users.

Cost

The SDE is priced at $526 per project, per year, with discounts available for bulk purchases. Contact sde-support@columbia.edu 

Project prerequisites

  1. Provide proof of IRB approval for using the SDE. Users can also reference the CUMC Security RSAM registration ID number, 3868, which confirms its certification by CUMC Security for HIPAA compliance. This makes it viable for most PII and PHI data, though it's still up to the data provider.
  2. Provide proof of Data Provider Approval for using the SDE. Typically this is in the form of a data agreement. If no such formal approval exists, some sort of written approval by the data provider should be acquired.
  3. Complete the SDE Application and sign the SDE User Agreement.
  4. Provide payment information. Provide chartstring and approval for yearly project fee, which will be confirmed with user before processing.
  5. Undergo user/DSO training. Training covers basic operation of accessing and conducting analysis on the SDE, as well as overview of data security measures in place and expectations of enforcement.
  6. DSO uploads data to SDE. After training, the DSO is permitted to upload their project's data in a manner compliant with the agreement of the data provider.
  7. Begin data analysis on SDE. Once training is complete and the data is uploaded, users may begin their work on the SDE.

Representative Projects

Restricted Datasets

  • Add Health

The SDE features several projects using the University of North Carolina’s Adolescent Health (Add Health) restricted datasets. Projects have focused on a variety of areas including the relationship of genetic factors and social outcomes. Examples range from projects testing the phenotype distinctions and social mobility among second-generation Latinos.

  • NLSY

The SDE has featured several Bureau of Labor Statistics National Longitudinal Survey of Youth projects, with a range of varying research projects, such as studying the degree of the transmission of economic advantage.

Government Datasets

  • VAT Tax Project

In agreement with the the city government in South America, Columbia researchers are analyzing the economic and fiscal impacts of special tax treatments in Value Added Tax (VAT) systems using government provided tax data.

  • State Health Data

In a multi-part study leveraging state-level Department of Health data, researchers at Columbia are using the SDE to compare and analyze the effects of prenatal care on subsequent fertility.

Proprietary Data

Researchers who collect sensitive PII and PHI data may use the SDE to securely analyze their data with projects - assuming the data meets their research design criteria approved by IRB and is used properly within the parameters of the consent granted by the subjects. Several projects have moved their survey and even lab findings to the SDE for analysis by their group.

Data Stewardship

Some researchers have found use in the SDE as a method of securely curating sensitive data to students and other junior researchers.