Search Content

Evaluating the Heterogeneity of Logistic Regression Models to Predict Coronary Artery Disease Status

Description

Coronary artery disease (CAD) is one of the most diagnosed heart diseases globally, affecting about 5% of adults over the age of twenty[1]. Lifestyle changes can positively impact risk of developing CAD and are especially important for individuals with high genetic risk [1]. In this study, we sought to predict…

Coronary artery disease (CAD) is one of the most diagnosed heart diseases globally, affecting about 5% of adults over the age of twenty[1]. Lifestyle changes can positively impact risk of developing CAD and are especially important for individuals with high genetic risk [1]. In this study, we sought to predict the likelihood of developing CAD using genetic, demographic, and clinical variables. Leveraging genetic and clinical data from the UK Biobank on over 500,000 individuals, we classified and separated 500 genetically similar individuals to a target individual from another 500 genetically dissimilar individuals. This process was repeated for 10 target individuals as a proof-of-concept. Then, CAD-related variables were used and these include age, relevant clinical factors, and polygenic risk score to train models for predicting CAD status for the 500 genetically similar and 500 genetically dissimilar groups, and determine which group predicts the likelihood of CAD more accurately. To compute genetic similarity to the target individuals we used the Mahalanobis distance. To reduce the heterogeneity between sexes and races, the studies were restricted to British male Caucasians. The models using the more similar individuals demonstrated better predictive performance. The area under the receiver operating characteristic curve (AUC) was found to be significantly higher for the ‘similar’ rather than the ’dissimilar’ groups, indicating better predictive capability (AUC=0.67 vs. 0.65, respectively; p-value<0.05). These findings support the potential of precision prevention strategies, since one should build predictive models of disease for any one target individual from more similar individuals to that target even within an otherwise homogenous group of individuals (e.g., British Caucasians). Although intuitive, such practices are not done routinely. Further validation and exploration of additional predictors are warranted to enhance the predictive accuracy and applicability of the model.

ContributorsPandari, Sadhana (Author) / Ghassamzadeh, Hassan (Thesis director) / Scotch, Matthew (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor)

Created2024-05

A KNOWLEDGE-DRIVEN, GENERALIZABLE, AND AUTOMATIC METHOD TO CREATE MEDICATION CODE SETS: OPIOID AND ANTIDEPRESSANT USE CASES

Description

Background: Creation and reuse of reliable clinical code sets could accelerate the use of EHR data for research. To support that vision, there is an imperative need for methodologically. driven, transparent and automatic approaches to create error-free clinical code sets. Objectives: Propose and evaluate an automatic, generalizable, and knowledge-based approach…

Background: Creation and reuse of reliable clinical code sets could accelerate the use of EHR data for research. To support that vision, there is an imperative need for methodologically. driven, transparent and automatic approaches to create error-free clinical code sets. Objectives: Propose and evaluate an automatic, generalizable, and knowledge-based approach that uses as starting point a correct and complete knowledge base of ingredients (e.g., the US Drug Enforcement Administration Controlled Substance repository list includes fentanyl as an opioid) to create medication code sets (e.g., Abstral is an opioid medication with fentanyl as ingredient). Methods: Algorithms were written to convert lists of ingredients into medication code sets, where all the medications are codified in the RxNorm terminology, are active medications and have at least one ingredient from the ingredient list. Generalizability and accuracy of the methods was demonstrated by applying them to the discovery of opioid and anti-depressant medications. Results: Errors (39 (1.73%) and 13 (6.28%)), obsolete drugs (172 (7.61%) and 0 (0%)) and missing medications (1,587 (41.26%) and 1,456 (87.55%)) were found in publicly available opioid and antidepressant medication code sets, respectively. Conclusion: The proposed knowledge-based algorithms to discover correct, complete, and up to date ingredient-based medication code sets proved to be accurate and reusable. The resulting algorithms and code sets have been made publicly available for others to use.

ContributorsMendoza, Daniel (Author) / Grando, Adela (Thesis director) / Scotch, Matthew (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor) / School of Life Sciences (Contributor)

Created2023-05

Pharmacogenomics of Selective Serotonin Reuptake Inhibitor Treatment for Major Depressive Disorder: a Genome Wide Association Study

Description

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings…

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings have the potential to elucidate novel mechanisms underlying drug response for SSRIs. This work will be continued further, with machine learning and deep learning analyses to perform non-linear analyses and employing a biologist or geneticist to provide more specialized knowledge for interpretation of results.

ContributorsLeiter-Weintraub, Ethan (Author) / Dinu, Valentin (Thesis director) / Scotch, Matthew (Committee member) / Barrett, The Honors College (Contributor) / Dean, W.P. Carey School of Business (Contributor) / College of Health Solutions (Contributor) / School of Life Sciences (Contributor)

Created2024-05

Liberating Science: Accelerating the Innovation of Health Technology

Description

Translating research has been a goal of the Department of Health and Human Services since 1999. Through two years of iteration and interview with our community members, we have collected insights into the barriers to accomplishing this goal. Liberating Science is a think-tank of researchers and scientists who seek to…

Translating research has been a goal of the Department of Health and Human Services since 1999. Through two years of iteration and interview with our community members, we have collected insights into the barriers to accomplishing this goal. Liberating Science is a think-tank of researchers and scientists who seek to create a more transparent process to accelerate innovation starting with behavioral health research.

ContributorsRaghani, Pooja Sioux (Author) / Hekler, Eric (Thesis director) / Buman, Matthew (Committee member) / Pruthi, Virgilia Kaur (Committee member) / Barrett, The Honors College (Contributor) / Department of Chemistry and Biochemistry (Contributor) / Biomedical Informatics Program (Contributor)

Created2014-05

Preliminary Metabolic Reconstruction of Two Methane Producing Microbes: Methanoregula boonei 6A8 and Methanosphaerula palustris E1-9c

Description

Methane (CH4) is very important in the environment as it is a greenhouse gas and important for the degradation of organic matter. During the last 200 years the atmospheric concentration of CH4 has tripled. Methanogens are methane-producing microbes from the Archaea domain that complete the final step in breaking down…

Methane (CH4) is very important in the environment as it is a greenhouse gas and important for the degradation of organic matter. During the last 200 years the atmospheric concentration of CH4 has tripled. Methanogens are methane-producing microbes from the Archaea domain that complete the final step in breaking down organic matter to generate methane through a process called methanogenesis. They contribute to about 74% of the CH4 present on the Earth's atmosphere, producing 1 billion tons of methane annually. The purpose of this work is to generate a preliminary metabolic reconstruction model of two methanogens: Methanoregula boonei 6A8 and Methanosphaerula palustris E1-9c. M. boonei and M. palustris are part of the Methanomicrobiales order and perform hydrogenotrophic methanogenesis, which means that they reduce CO2 to CH4 by using H2 as their major electron donor. Metabolic models are frameworks for understanding a cell as a system and they provide the means to assess the changes in gene regulation in response in various environmental and physiological constraints. The Pathway-Tools software v16 was used to generate these draft models. The models were manually curated using literature searches, the KEGG database and homology methods with the Methanosarcina acetivorans strain, the closest methanogen strain with a nearly complete metabolic reconstruction. These preliminary models attempt to complete the pathways required for amino acid biosynthesis, methanogenesis, and major cofactors related to methanogenesis. The M. boonei reconstruction currently includes 99 pathways and has 82% of its reactions completed, while the M. palustris reconstruction includes 102 pathways and has 89% of its reactions completed.

ContributorsMahendra, Divya (Author) / Cadillo-Quiroz, Hinsby (Thesis director) / Wang, Xuan (Committee member) / Stout, Valerie (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / School of Life Sciences (Contributor) / Biomedical Informatics Program (Contributor)

Created2014-05

Phylogeography of Influenza in the Southwest United States

Description

Influenza remains a constant concern for public health agencies across the nation and worldwide. Current methods of surveillance suffice but they fall short of their true potential. Incorporation of evolutionary data and analysis through studies such as phylogeography could reveal geographic sources of variation. Identification and targeting of such sources…

Influenza remains a constant concern for public health agencies across the nation and worldwide. Current methods of surveillance suffice but they fall short of their true potential. Incorporation of evolutionary data and analysis through studies such as phylogeography could reveal geographic sources of variation. Identification and targeting of such sources for public health initiatives could yield increased effectiveness of influenza treatments. As it stands there is a lack of evolutionary data available for such use, particularly in the southwest. Our study focused on the sequencing and phylogeography of southwestern Influenza A samples from the Mayo Clinic. We fully sequenced two neuraminidase genes and combined them with archived sequence data from the Influenza Research Database. Using RAxML we identified the clade containing our sequences and performed a phylogeographic analysis using ZooPhy. The resultant data were analyzed using programs such as SPREAD and Tracer. Our results show that the southwest sequences emerged from California and the ancestral root of the clade came from New York. Our Bayesian maximum clade credibility (MCC) tree data and SPREAD analysis implicates California as a source of influenza variation in the United States. This study demonstrates that phylogeography is a viable tool to incorporate evolutionary data into existing forms of influenza surveillance.

ContributorsTurnock, Adam Ryan (Author) / Scotch, Matthew (Thesis director) / Halden, Rolf (Committee member) / Pycke, Benny (Committee member) / Barrett, The Honors College (Contributor) / School of Life Sciences (Contributor)

Created2013-05

How the EnvZ/OmpR Two-component Regulatory System Affects fepA Gene Expression in Escherichia coli

Description

This study focused on the connection between the EnvZ/OmpR two-component regulatory system and the iron homeostasis system in Escherichia coli, specifically how a mutant form of EnvZ11/OmpR is able to reduce the expression of fepA::lacZ, a reporter gene fusion in E. coli. FepA is one of several outer membrane siderophore…

This study focused on the connection between the EnvZ/OmpR two-component regulatory system and the iron homeostasis system in Escherichia coli, specifically how a mutant form of EnvZ11/OmpR is able to reduce the expression of fepA::lacZ, a reporter gene fusion in E. coli. FepA is one of several outer membrane siderophore receptors that allow extracellular siderophores bound to iron to enter the cells to power various biological processes. Previous studies have shown that in E. coli cells that expressed a mutant allele of envZ, called envZ11, which led to altered expression of various iron genes including down regulation of fepA::lacZ. The wild type EnvZ/OmpR system is not considered to regulate iron genes, but because these envz11 strains had downregulated fepA::lacZ, this study was undertaken to understand the connection and mechanisms of this downregulation. A large number of Lac+ revertants were obtained from the B32-2483 strain (envz11 and fepA::lacZ) and 7 Lac+ revertants that had reversion mutations not directly correcting the envZ11 allele were further characterized. With P1 phage transduction genetic mapping that involved moving a kanamycin resistance marker linked to fepA::lacZ, two Lac+ revertants were found to have their reversion mutations in the fepA promoter region, while the other five revertants had their mutations mapping outside the fepA region. These two revertants underwent DNA sequencing and found to carry two different single base pair mutations in two different locations of the fepA promoter region. Each one is in the Fur repressor binding region, but one also may have affected the Shine-Dalgarno region involved in translation initiation. All 7 reveratants underwent beta-galactosidase assays to measure fepA::lacZ expression. The two revertants that had mutations in the fepA promoter region had significantly increased fepA activity, with the revertant with the Shine-Dalgarno mutation having the most elevated fepA expression. The other 5 revertants that did not map in the fepA region had fepA expression elevated to the same level as that found in the wild type EnvZ/OmpR background. The data suggest that the negative effect of envZ11 can be overcome by multiple mechanisms, including directly correcting the envZ11 allele or changing the fepA promoter region.

ContributorsKalinkin, Victor Arkady (Co-author) / Misra, Rajeev (Co-author, Thesis director) / Mason, Hugh (Committee member) / Foy, Joseph (Committee member) / Biomedical Informatics Program (Contributor) / School of Life Sciences (Contributor) / W. P. Carey School of Business (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Integrative analysis of genomic aberrations in cancer and xenograft Models

Description

No two cancers are alike. Cancer is a dynamic and heterogeneous disease, such heterogeneity arise among patients with the same cancer type, among cancer cells within the same individual’s tumor and even among cells within the same sub-clone over time. The recent application of next-generation sequencing and precision medicine techniques…

No two cancers are alike. Cancer is a dynamic and heterogeneous disease, such heterogeneity arise among patients with the same cancer type, among cancer cells within the same individual’s tumor and even among cells within the same sub-clone over time. The recent application of next-generation sequencing and precision medicine techniques is the driving force to uncover the complexity of cancer and the best clinical practice. The core concept of precision medicine is to move away from crowd-based, best-for-most treatment and take individual variability into account when optimizing the prevention and treatment strategies. Next-generation sequencing is the method to sift through the entire 3 billion letters of each patient’s DNA genetic code in a massively parallel fashion.

The deluge of next-generation sequencing data nowadays has shifted the bottleneck of cancer research from multiple “-omics” data collection to integrative analysis and data interpretation. In this dissertation, I attempt to address two distinct, but dependent, challenges. The first is to design specific computational algorithms and tools that can process and extract useful information from the raw data in an efficient, robust, and reproducible manner. The second challenge is to develop high-level computational methods and data frameworks for integrating and interpreting these data. Specifically, Chapter 2 presents a tool called Snipea (SNv Integration, Prioritization, Ensemble, and Annotation) to further identify, prioritize and annotate somatic SNVs (Single Nucleotide Variant) called from multiple variant callers. Chapter 3 describes a novel alignment-based algorithm to accurately and losslessly classify sequencing reads from xenograft models. Chapter 4 describes a direct and biologically motivated framework and associated methods for identification of putative aberrations causing survival difference in GBM patients by integrating whole-genome sequencing, exome sequencing, RNA-Sequencing, methylation array and clinical data. Lastly, chapter 5 explores longitudinal and intratumor heterogeneity studies to reveal the temporal and spatial context of tumor evolution. The long-term goal is to help patients with cancer, particularly those who are in front of us today. Genome-based analysis of the patient tumor can identify genomic alterations unique to each patient’s tumor that are candidate therapeutic targets to decrease therapy resistance and improve clinical outcome.

ContributorsPeng, Sen (Author) / Dinu, Valentin (Thesis advisor) / Scotch, Matthew (Committee member) / Wallstrom, Garrick (Committee member) / Arizona State University (Publisher)

Created2015

Ensuring high-quality colonoscopy by reducing polyp miss-rates

Description

Colorectal cancer is the second-highest cause of cancer-related deaths in the United States with approximately 50,000 estimated deaths in 2015. The advanced stages of colorectal cancer has a poor five-year survival rate of 10%, whereas the diagnosis in early stages of development has showed a more favorable five-year survival…

Colorectal cancer is the second-highest cause of cancer-related deaths in the United States with approximately 50,000 estimated deaths in 2015. The advanced stages of colorectal cancer has a poor five-year survival rate of 10%, whereas the diagnosis in early stages of development has showed a more favorable five-year survival rate of 90%. Early diagnosis of colorectal cancer is achievable if colorectal polyps, a possible precursor to cancer, are detected and removed before developing into malignancy.

The preferred method for polyp detection and removal is optical colonoscopy. A colonoscopic procedure consists of two phases: (1) insertion phase during which a flexible endoscope (a flexible tube with a tiny video camera at the tip) is advanced via the anus and then gradually to the end of the colon--called the cecum, and (2) withdrawal phase during which the endoscope is gradually withdrawn while colonoscopists examine the colon wall to find and remove polyps. Colonoscopy is an effective procedure and has led to a significant decline in the incidence and mortality of colon cancer. However, despite many screening and therapeutic advantages, 1 out of every 4 polyps and 1 out of 13 colon cancers are missed during colonoscopy.

There are many factors that contribute to missed polyps and cancers including poor colon preparation, inadequate navigational skills, and fatigue. Poor colon preparation results in a substantial portion of colon covered with fecal content, hindering a careful examination of the colon. Inadequate navigational skills can prevent a colonoscopist from examining hard-to-reach regions of the colon that may contain a polyp. Fatigue can manifest itself in the performance of a colonoscopist by decreasing diligence and vigilance during procedures. Lack of vigilance may prevent a colonoscopist from detecting the polyps that briefly appear in the colonoscopy videos. Lack of diligence may result in hasty examination of the colon that is likely to miss polyps and lesions.

To reduce polyp and cancer miss rates, this research presents a quality assurance system with 3 components. The first component is an automatic polyp detection system that highlights the regions with suspected polyps in colonoscopy videos. The goal is to encourage more vigilance during procedures. The suggested polyp detection system consists of several novel modules: (1) a new patch descriptor that characterizes image appearance around boundaries more accurately and more efficiently than widely-used patch descriptors such HoG, LBP, and Daisy; (2) A 2-stage classification framework that is able to enhance low level image features prior to classification. Unlike the traditional way of image classification where a single patch undergoes the processing pipeline, our system fuses the information extracted from a pair of patches for more accurate edge classification; (3) a new vote accumulation scheme that robustly localizes objects with curvy boundaries in fragmented edge maps. Our voting scheme produces a probabilistic output for each polyp candidate but unlike the existing methods (e.g., Hough transform) does not require any predefined parametric model of the object of interest; (4) and a unique three-way image representation coupled with convolutional neural networks (CNNs) for classifying the polyp candidates. Our image representation efficiently captures a variety of features such as color, texture, shape, and temporal information and significantly improves the performance of the subsequent CNNs for candidate classification. This contrasts with the exiting methods that mainly rely on a subset of the above image features for polyp detection. Furthermore, this research is the first to investigate the use of CNNs for polyp detection in colonoscopy videos.

The second component of our quality assurance system is an automatic image quality assessment for colonoscopy. The goal is to encourage more diligence during procedures by warning against hasty and low quality colon examination. We detect a low quality colon examination by identifying a number of consecutive non-informative frames in videos. We base our methodology for detecting non-informative frames on two key observations: (1) non-informative frames

most often show an unrecognizable scene with few details and blurry edges and thus their information can be locally compressed in a few Discrete Cosine Transform (DCT) coefficients; however, informative images include much more details and their information content cannot be summarized by a small subset of DCT coefficients; (2) information content is spread all over the image in the case of informative frames, whereas in non-informative frames, depending on image artifacts and degradation factors, details may appear in only a few regions. We use the former observation in designing our global features and the latter in designing our local image features. We demonstrated that the suggested new features are superior to the existing features based on wavelet and Fourier transforms.

The third component of our quality assurance system is a 3D visualization system. The goal is to provide colonoscopists with feedback about the regions of the colon that have remained unexamined during colonoscopy, thereby helping them improve their navigational skills. The suggested system is based on a new 3D reconstruction algorithm that combines depth and position information for 3D reconstruction. We propose to use a depth camera and a tracking sensor to obtain depth and position information. Our system contrasts with the existing works where the depth and position information are unreliably estimated from the colonoscopy frames. We conducted a use case experiment, demonstrating that the suggested 3D visualization system can determine the unseen regions of the navigated environment. However, due to technology limitations, we were not able to evaluate our 3D visualization system using a phantom model of the colon.

ContributorsTajbakhsh, Nima (Author) / Liang, Jianming (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)

Created2015

Structural variant detection: a novel approach

Description

Genomic structural variation (SV) is defined as gross alterations in the genome broadly classified as insertions/duplications, deletions inversions and translocations. DNA sequencing ushered structural variant discovery beyond laboratory detection techniques to high resolution informatics approaches. Bioinformatics tools for computational discovery of SVs however are still missing variants in the complex…

Genomic structural variation (SV) is defined as gross alterations in the genome broadly classified as insertions/duplications, deletions inversions and translocations. DNA sequencing ushered structural variant discovery beyond laboratory detection techniques to high resolution informatics approaches. Bioinformatics tools for computational discovery of SVs however are still missing variants in the complex cancer genome. This study aimed to define genomic context leading to tool failure and design novel algorithm addressing this context. Methods: The study tested the widely held but unproven hypothesis that tools fail to detect variants which lie in repeat regions. Publicly available 1000-Genomes dataset with experimentally validated variants was tested with SVDetect-tool for presence of true positives (TP) SVs versus false negative (FN) SVs, expecting that FNs would be overrepresented in repeat regions. Further, the novel algorithm designed to informatically capture the biological etiology of translocations (non-allelic homologous recombination and 3&ndashD; placement of chromosomes in cells –context) was tested using simulated dataset. Translocations were created in known translocation hotspots and the novel&ndashalgorithm; tool compared with SVDetect and BreakDancer. Results: 53% of false negative (FN) deletions were within repeat structure compared to 81% true positive (TP) deletions. Similarly, 33% FN insertions versus 42% TP, 26% FN duplication versus 57% TP and 54% FN novel sequences versus 62% TP were within repeats. Repeat structure was not driving the tool's inability to detect variants and could not be used as context. The novel algorithm with a redefined context, when tested against SVDetect and BreakDancer was able to detect 10/10 simulated translocations with 30X coverage dataset and 100% allele frequency, while SVDetect captured 4/10 and BreakDancer detected 6/10. For 15X coverage dataset with 100% allele frequency, novel algorithm was able to detect all ten translocations albeit with fewer reads supporting the same. BreakDancer detected 4/10 and SVDetect detected 2/10 Conclusion: This study showed that presence of repetitive elements in general within a structural variant did not influence the tool's ability to capture it. This context-based algorithm proved better than current tools even with half the genome coverage than accepted protocol and provides an important first step for novel translocation discovery in cancer genome.

ContributorsShetty, Sheetal (Author) / Dinu, Valentin (Thesis advisor) / Bussey, Kimberly (Committee member) / Scotch, Matthew (Committee member) / Wallstrom, Garrick (Committee member) / Arizona State University (Publisher)

Created2014

Filtering by