2016 Summer Internship

iDASH is pleased to announce the availability of summer research internships for undergraduates, graduate students, and postgraduates.


Program Structure

Each intern is paired with one primary mentor and, in some cases, secondary mentors. Interns may work independently or on small teams for their projects. Interns are generally expected to reside in San Diego during the internship, and are required to attend a kickoff meeting at the beginning of the program* and the Internship Symposium at the end of the program to present the results of their work. During the internship period, iDASH will host weekly lunch seminars to be attended by all interns and selected program mentors and leaders. Interns are also expected to schedule regular 1:1 meetings with their mentor(s) throughout the internship period. Monthly social activities will also be hosted for interns – in summers past, this included potluck lunches, day hikes, and a surfing session.



February 1

Application Period opens

March 25

Applications will no longer be accepted

April 8

Notification of acceptance or waitlist status

April 15

Response required from intern to guarantee participation

May 31/June 13*

Program begins with kickoff meeting

July 15

Midterm evaluations

August 5 Final presentations

August 5/19*

Program concludes












* Students who are unavailable due to their school calendar for the June 2nd kickoff meeting will start on June 16th and finish on August 22nd.


Participating Faculty and Project Information

Olivier Harismendy

The role of inherited variation in cancer somatic landscape

The role of germline or inherited variation in cancer has been studied in selected families and led to the identification of genetic variants that are dominant and responsible for cancer syndromes. Similarly, rare recessive variants with lower penetrance are responsible for the increase risk in breast and ovarian cancer

(BRCA1/2). More common variants in the population have also been identified through GWAS, and have revealed multiple SNPs associated a modest increase in cancer risk. Despite these advances, multiple variants of intermediate allelic frequency in the population, or carried by patients with undocumented family history still remain variants of unknown significance (VUS) and can still play a role in tumor development. In addition, the contribution of variants located outside of the coding region has been underexplored and can now be reexamined in the light of recent maps of the regulatory landscape. The long-term goal of this research is to utilize germline genetics variation in cancer prevention and care to better stage patients or predict their response to treatment.

We propose to identify the germline variants in the UCSD Cancer center patients

(targeted gene panel) as well as in the public TCGA/ICGC datasets (whole genomes). We will then test these variants, alone or in combination to identify the ones that impact cancer onset, the tumor somatic landscape or tissue specific regulatory network. The project will involve processing of high throughput sequencing data, population genetics and statistical analysis, in a HIPAA compliant cloud-computing environment. 


Jina Huh

Project 1
In this project, we will develop a tool to help patients learn from other patients’ illness trajectories. The tool will help patient users understand how other patients like themselves have developed or overcame their health problems over time, based on various behavioral (e.g., exercise) and medical treatments (e.g., metformin). The information will be presented in ways that would positively influence the patient users, facilitating making informed decisions. The work involved in this project includes modeling patient profiles and developing a patient-friendly interface that helps patients make informed choices in health management.
Project 2
A critical goal of Patient Powered Research Networks (PPRN) and NIH Precision Medicine Initiative Cohort (PMI) includes patients’ continued participation in their programs. Patients’ sharing of their personal health data available in these networks outside electronic health records serves as critical components in delivering personalized healthcare to patients. The challenge, however, is being able to sustain participation of the patients in these networks to collect meaningful longitudinal data. We will investigate influential behavioral and health factors in sustained participation with networks. Our project will help PMI and PPRN initiatives in gathering individual health data and help patients receive personalized healthcare.

Xiaoqian Jiang

Project 1

Structural representation of EHR data: traditional methods of modeling EHR data are symbolic and often not structured. That is, they do not consider the corrections between medical attributes and lose information. We seek novel representations that are more semantic (inspired by deep learning) to build advanced predictive models.

Project 2

Next generation security technology based on Intel SGX architecture: privacy and security are important concerns in biomedical research, which often involves sensitive data. A reasonable solution should protect data storage, communication, and analysis in centralized or distributed environments. It has a very hard and challenging tasks to accomplish. The new SGX architecture is a hardware supported architecture to protect sensitive information and it shreds light on novel solutions to tackle the privacy challenge. We will develop novel models to tackle real world biomedical tasks while respecting privacy and security.

Hyeoneui Kim

Project 1

Bridging TICS to EHR: Giving patients a granular control on sharing their health data for research through a Tiered Informed Consent System (TICS) is becoming a reality. While a TICS presents health data items in a simplified way to promote their comprehension by patients, a seemingly simple health data item is often related to multiple data fields in an Electronic Health Records (EHR) system. This project is to develop a simple yet robust NLP pipeline that maps data items in the locally developed TICS called iCONCUR to relevant data fields in EHR to realize accurate and complete implementation of iCONCUR.

Project 2

Pain is a common yet challenging problem as many factors influence one's pain experience. This project is to develop a conceptual predictive model of acute pain by analyzing literature and existing pain theories.

Shuang Wang

Protecting genomic data privacy in research studies.

In this project, we will develop practical methods to support privacy-preserving genomic data analysis. The development of such privacy technology may increase public trust in research. The privacy technology to be developed will also contribute to the sharing of genomic data in ways that meet the needs of those in biomedical research.
Olivier Harismendy & Hyeoneui Kim

Sharing genetic test results

Across UC hospitals, an increasing number of patients are undergoing some kind of molecular testing. These tests include germline information (neo-natal, disease familial risk, pre-conception counseling) as well as somatic events (cancer). These tests are either ordered through internal laboratory (at UCSD: the Center for Advanced Laboratory Medicine - CALM) or through third party providers. Increasingly these tests include a large number of genes and in a few cases exome-wide or genome-wide information. It is important to track and record such data systematically to increase their value for the practice of precision medicine and for patient oriented outcome research. However the data is currently deposited in a highly heterogeneous format in the Electronic Medical records. Most tests are not ordered outside the EMR, and their results are generally captured post-hoc, as a PDF file instead of a machine-readable format (XML, JSON). Even the provider’s name is not easy to determine, since the results of the test are only one of many documents uploaded through the “media” tab. Consequently, the information is difficult to access and query by investigators. Useful queries can be as simple as identifying which patient underwent genetic testing and as refined as knowing the patients who carry a specific mutation in a specific gene. None of those queries are possible today. Our goal is to develop the procedure, including data formatting and SOP, to allow UC institutions to share this information through UCRex or through the pSCANNER.

The University of California Research eXchange (UC-ReX) Data Explorer, is a secure online system that enables cross-campus queries of clinical aggregate data. Search criteria can include demographics, diagnosis and procedure codes (ICD-9 and CPT), top 150+ lab orders, and a proof of concept for four medications. The sources of information for the Data Explorer are de-identified data-sets that are extracted from each institution's clinical data warehouse (CDW), transformed into a common data representation, and stored in de-identified manner, in a separate, dedicated data repository at each institution. pSCANNER is a PCORI funded initiative to enable Research on patient outcomes in a distributed manner. Both pSCANNER CDW use the OMOP 4.0 data model. OMOP specifies tables for laboratory diagnostic tests resulting the form oa lab_result table with dedicated fields such as laboratory ID, Data of test or value range. It can be customized to include fields specifics of genetic tests.


Here we propose to pilot the inclusion of genetic test results in the CDW and pSCANNER and to implement the procedure at other UC institutions for sharing the aggregated information. 

Aim 1: Import genetic test information into the UCSD CDW. Using examples from a private (FoundationMedicine) and institutional (UCSD - CALM) tests, we will evaluate the optimal data model to import the information to the CDW. We will capture three level of information with increasing level of granularity: 1) Name of the laboratory and test, 2) Presence and Identity of a mutated gene, 3) identity of the mutation and interpretation (benign vs pathogenic). Additionally meta-data about the test will also be parsed and formatted such as date ordered, date returned, physician’s name. We will follow the specification of the OMOP data model used by CDW as well as recommendation from the HL7 guidelines for clinical genomics testing .

Aim 2: Deploy to include the UCLA (Clinical Genomics Center) and the UCD

(Foundation Medicine) genetic test results. We will work with both institutions to 1) list the different sources of genetic test results and 2) Implement the solution established in Aim1 to their own CDW records. 


Previous Internships

iDASH has hosted dozens of undergraduate and graduate students over the past five summers. To read internship alumni bios and see videos from their Symposium presentations, please use the links in the dropdown menu above.

View presentations from the 2015 Internship Symposium here.



Application period ended March 25, 2016.