iDASH CLOUD

The iDASH program offers a HIPAA-compliant private cloud environment for research on sensitive PHI data. CLOUD platforms are considered an evolution of the cluster technology with heavy emphasis on resource virtualization and automation to reduce operational costs and facilitate broader usage. Note that private clouds, such as iDASH, cannot directly compare in sheer computational power with offerings of commercial giants, such as Amazon EC2, which focuses on popular resource packages (in terms of CPUs, memory size, disk space, and networking). Instead we focus on scientific workloads that require (i) very large memory; (ii) high cores count; (iii) low-latency, high-bandwidth networking; or (iv) high volume storage. Additionally, given our handling of human subjects data, we have ensured that HIPAA and FISMA requirements were met. The iDASH CLOUD consists of over $2M in hardware and software at the San Diego Supercomputing Center, supported by a team of system administrators, software developers, software quality assurance experts, and security officers. The associated data repositories allow authorized users to control access to data that resides in private or public “communities” (i.e., collaborative spaces for groups of users). A legal framework based on Data Use Agreements between the participating institutions ensures the proper operation of this environment and safeguards the data access.

Resources and Services Provided

 

Computing

  • 800 cores, 7.5 Tb RAM, 600 Tb disk space including 10Tb SSD for high I/O
  • Fully redundancy
  • 10G network connectivity
  • Largest possible VM is 32 CPU, 256G of RAM

 

Storage and Data

  • 300Tb storage, accessible via NFS
  • 10Tb of Public Data (human reference genome, dbSNP, 1000 Genomes)

 

Security

  • Firewalls
  • VPN
  • two-factor RSA authentication
  • Intrusion Detection and Protection (IDP)
  • Fully Hardened Virtual Machines (VM’s)

 

Virtual Machines Blueprints

Our catalog of VM blueprints has been developed using publicly available software, application packages (native or dockerized) or public VM images, modified to our security standards. They include:

  • Bare Ubuntu 12.04
  • Bare CentOS 6.4
  • GenomeVIP (WashU)
  • PCAWG variant calling (Sanger Institute)
  • bcbio-nextgen (Blue Collar Bioinformatics)
  • TOM: The Oncogenomic Machine  (UC San Diego)

 

Specific Applications

The FlightDeck portal is a web-based interface that allows you to run specific workflow directly from your web-browser. The feel is similar to Illumina BaseSpace, but in HIPAA compliant space and not necessary limited to sequencing data. Current applications include

  • Whole human genome alignment (Fastq -> Bam)
  • Somatic variant calling with VarScan or Mutect
  • Sanger PCAWG process of whole genomes
  • GATK Joint Indel Realignment (licensed to UCSD only)

 

Research as a Service

A number of scientists are already using the iDASH resources for their research and can assist with your specific research needs:

  • Cancer Genomics
  • Transcriptome Analysis
  • Biostatistics
  • Human and Bacterial Genetics
  • EHR data mining and analysis
  • Clinical Data Capture (RedCap)
  • Data storage, integrity and security

________________________________________________________________

Differences with Commercial Clouds

  • Point to Point connection with major academic centers increase data transfer efficiency and safety
  • Prescreened, limited set of known and monitored users for additional security/safety
  • Physically located in one place : San Diego Super-Computer Center
  • No cost for moving data. Only pay computing (and storing if you choose to do so).
  • Comprehensive solution for genomics research
    • Curated collection of VM blueprints for genomics research
    • Dedicated Web-based applications via FlightDeck
    • Access to world class research service in clinical and translational data analysis
 

Example Users and Applications

  • The ICGC/TCGA PanCancer Analysis Work-Group (pancancer.info)
  • The CLL consortium (PI: Tom Kipps) cll.ucsd.edu
  • The ELLA research study (PI: Patty Thomson)
  • The Kawasaki Disease Research Center(PI: Jane Burns) Kawasaki Disease Research Center Home
  • The ATHENA breast cancer research network Athena | Breast Health Network
  • The Inflammatory Bowel Disease XXX
  • The Alzheimer XXX
  • HIV/AIDS Associated Neurocognitive disorders  (PI: Stephen Spector)
 
________      
HIPAA CLOUD Agreement

 

Access to PII data must first be approved by the UCSD IRB and then by the UCSD/CTRI Change management process and approved by the UCSD/CTRI CTO, CSA and the ISO.
The CTRI /DBMI organization categorizes data as PHI, PII or non-PHI information with appropriate network controls in place for the different classifications which follow below;

PHI – Information about health status, provision of health care, or payments for health care that can
be linked to a specific individual.
PII - Information that can be used on its own or with other information to identify, contact, or locate
A single person, or to identify on individual in context.

All databases containing sensitive data must be encrypted at rest and in transit, with encryption required on laptops and tablets. This is the responsibility of the owner of the laptop and/or tablet. PII data is treated the same as PHI data. PHI and PII data storage on mobile devices is strictly prohibited.

Access to datasets containing PII or PHI requires the completion of a Data Use Agreement. In addition the
Following should be adhered to;

  • Two-factor authentication may be required for accessing PHI or PII information stored on CTRI networks.
  • Users are required to have a UCSD AD account prior to receiving access to CTRI/IDASH systems containing PHI/PII.
  • Passwords must conform to the UCSD organizational Password Policy (https://blink.ucsd.edu/_files/password-standards.pdf)
  • Access to datasets containing PII or PHI requires the completion of a Data Use Agreement (https://idash.ucsd.edu/procedures)
  • Data cannot be removed from iDASH/CTRI repositories, unless with explicit, written authorization.
  • Accounts will be created for individuals only. Account sharing and group accounts are strictly forbidden. Password sharing is also strictly forbidden. Anonymous guest access is not allowed under any circumstance.
  • User accounts may be subject to additional monitoring or auditing at the discretion of the IT Manager or executive team, or as required by applicable regulations or third-party agreements
  • Remote Access Policy will require acquiring a UCSD, Medical Center or IDASH VPN account, depending on the application/data to be accessed.
  • Industry best practices state that username and password combinations must never be sent as plain text
  • Therefore, authentication credentials must be encrypted during transmission across any network, whether the transmission occurs internal to the CTRI/IDASH network or across a public network such as the Internet.
  • Any system connecting to the network can have a serious impact on the security of the entire network. A vulnerability, virus, or other malware may be inadvertently introduced in this manner. For this reason, users must strictly adhere to UCSD BSB-S-3 Electronic Information Security Policy standards (http://policy.ucop.edu/doc/7000543/BFB-IS-3) with regard to antivirus software and patch levels on their machines. Users may not be permitted network access if these standards are not met.
  • End-users and employees are responsible for promptly reporting the theft, loss or unauthorized disclosure of UCSD CTRI/IDASH proprietary information
  • You may access, use or share UCSD CTRI/IDASH proprietary information as per above policies and only to the extent it is authorized under the DUA, IRB and as necessary to fulfill your assigned job duties
  • UCSD CTRI/IDASH reserves the right to audit networks and systems on a periodic basis to ensure compliance with this policy, through various methods, including but not limited to, business tool reports, internal and external audits, and feedback to the policy owner

Depending on the number of security violations and the specific information involved, disciplinary action for the violation of these policies may consist of a letter or warning, revocation of access to UCSD CTRI/IDASH information systems, and/or suspension or removal from UCSD CTRI/IDASH systems and could result in criminal prosecution. An employee found to have violated this policy may be subject to disciplinary action, up to and including termination of employment.

Acknowledgements


iDASH is a member of the NIH/NHLBI National Centers for Biomedical Computing (U54HL108460). It is the result of a collaboration between the UC San Diego Department of Biomedical Sciences (Dr Ohno-Machado) and the Clinical and Translational Research Institute (Dr Firestein).

 

Contact

For general questions, contact