Matching Items (121)

Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval
Description
Background
Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords.
Results
In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes.
Conclusions
We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.
Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords.
Results
In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes.
Conclusions
We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.
ContributorsYuan, Lei (Author) / Woodard, Alexander (Author) / Ji, Shuiwang (Author) / Jiang, Yuan (Author) / Zhou, Zhi-Hua (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / Ira A. Fulton School of Engineering (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)
Created2012-05-23

Description
Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
ContributorsSun, Qian (Author) / Muckatira, Sherin (Author) / Yuan, Lei (Author) / Ji, Shuiwang (Author) / Newfeld, Stuart (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / Ira A. Fulton School of Engineering (Contributor)
Created2013-12-03
Description
The Population Receptive Field (pRF) model is widely used to predict the location (retinotopy) and size of receptive fields on the visual space. Doing so allows for the creation of a mapping from locations in the visual field to the associated groups of neurons in the cortical region (within the visual cortex of the brain). However, using the pRF model is very time consuming. Past research has focused on the creation of Convolutional Neural Networks (CNN) to mimic the pRF model in a fraction of the time, and they have worked well under highly controlled conditions. However, these models have not been thoroughly tested on real human data. This thesis focused on adapting one of these CNNs to accurately predict the retinotopy of a real human subject using a dataset from the Human Connectome Project. The results show promise towards creating a fully functioning CNN, but they also expose new challenges that must be overcome before the model could be used to predict the retinotopy of new human subjects.
ContributorsBurgard, Braeden (Author) / Wang, Yalin (Thesis director) / Ta, Duyan (Committee member) / Barrett, The Honors College (Contributor) / School of International Letters and Cultures (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2022-05
Description
Twitter has become a very popular social media site that is used daily by many people and organizations. This paper will focus on the financial aspect of Twitter, as a process will be shown to be able to mine data about specific companies' stock prices. This was done by writing a program to grab tweets about the stocks of the thirty companies in the Dow Jones.
ContributorsLarson, Grant Elliott (Author) / Davulcu, Hasan (Thesis director) / Ye, Jieping (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2014-05

Description
Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions. Due to its capability of migrating knowledge from related domains, transfer learning has shown to be effective for cross-domain learning problems. In this dissertation, I carry out research along this direction with a particular focus on designing efficient and effective algorithms for BioImaging and Bilingual applications. Specifically, I propose deep transfer learning algorithms which combine transfer learning and deep learning to improve image annotation performance. Firstly, I propose to generate the deep features for the Drosophila embryo images via pretrained deep models and build linear classifiers on top of the deep features. Secondly, I propose to fine-tune the pretrained model with a small amount of labeled images. The time complexity and performance of deep transfer learning methodologies are investigated. Promising results have demonstrated the knowledge transfer ability of proposed deep transfer algorithms. Moreover, I propose a novel Robust Principal Component Analysis (RPCA) approach to process the noisy images in advance. In addition, I also present a two-stage re-weighting framework for general domain adaptation problems. The distribution of source domain is mapped towards the target domain in the first stage, and an adaptive learning model is proposed in the second stage to incorporate label information from the target domain if it is available. Then the proposed model is applied to tackle cross lingual spam detection problem at LinkedIn’s website. Our experimental results on real data demonstrate the efficiency and effectiveness of the proposed algorithms.
ContributorsSun, Qian (Author) / Ye, Jieping (Committee member) / Xue, Guoliang (Committee member) / Liu, Huan (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)
Created2015

Description
One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity and enriched the media, it has also added complexity. To cope with the relentless expansion, many enthusiastic bloggers have embarked on voluntarily writing, tagging, labeling, and cataloguing their posts in hopes of reaching the widest possible audience. Unbeknown to them, this reaching-for-others process triggers the generation of a new kind of collective wisdom, a result of shared collaboration, and the exchange of ideas, purpose, and objectives, through the formation of associations, links, and relations. Mastering an understanding of the Blogosphere can greatly help facilitate the needs of the ever growing number of these users, as well as producers, service providers, and advertisers into facilitation of the categorization and navigation of this vast environment. This work explores a novel method to leverage the collective wisdom from the infused label space for blog search and discovery. The work demonstrates that the wisdom space can provide a most unique and desirable framework to which to discover the highly sought after background information that could aid in the building of classifiers. This work incorporates this insight into the construction of a better clustering of blogs which boosts the performance of classifiers for identifying more relevant labels for blogs, and offers a mechanism that can be incorporated into replacing spurious labels and mislabels in a multi-labeled space.
ContributorsGalan, Magdiel F (Author) / Liu, Huan (Thesis advisor) / Davulcu, Hasan (Committee member) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2015

Description
Software-as-a-Service (SaaS) has received significant attention in recent years as major computer companies such as Google, Microsoft, Amazon, and Salesforce are adopting this new approach to develop software and systems. Cloud computing is a computing infrastructure to enable rapid delivery of computing resources as a utility in a dynamic, scalable, and virtualized manner. Computer Simulations are widely utilized to analyze the behaviors of software and test them before fully implementations. Simulation can further benefit SaaS application in a cost-effective way taking the advantages of cloud such as customizability, configurability and multi-tendency.
This research introduces Modeling, Simulation and Analysis for Software-as-Service in Cloud. The researches cover the following topics: service modeling, policy specification, code generation, dynamic simulation, timing, event and log analysis. Moreover, the framework integrates current advantages of cloud: configurability, Multi-Tenancy, scalability and recoverability.
The following chapters are provided in the architecture:
Multi-Tenancy Simulation Software-as-a-Service.
Policy Specification for MTA simulation environment.
Model Driven PaaS Based SaaS modeling.
Dynamic analysis and dynamic calibration for timing analysis.
Event-driven Service-Oriented Simulation Framework.
LTBD: A Triage Solution for SaaS.
This research introduces Modeling, Simulation and Analysis for Software-as-Service in Cloud. The researches cover the following topics: service modeling, policy specification, code generation, dynamic simulation, timing, event and log analysis. Moreover, the framework integrates current advantages of cloud: configurability, Multi-Tenancy, scalability and recoverability.
The following chapters are provided in the architecture:
Multi-Tenancy Simulation Software-as-a-Service.
Policy Specification for MTA simulation environment.
Model Driven PaaS Based SaaS modeling.
Dynamic analysis and dynamic calibration for timing analysis.
Event-driven Service-Oriented Simulation Framework.
LTBD: A Triage Solution for SaaS.
ContributorsLi, Wu (Author) / Tsai, Wei-Tek (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)
Created2015

Description
Understanding the complexity of temporal and spatial characteristics of gene expression over brain development is one of the crucial research topics in neuroscience. An accurate description of the locations and expression status of relative genes requires extensive experiment resources. The Allen Developing Mouse Brain Atlas provides a large number of in situ hybridization (ISH) images of gene expression over seven different mouse brain developmental stages. Studying mouse brain models helps us understand the gene expressions in human brains. This atlas collects about thousands of genes and now they are manually annotated by biologists. Due to the high labor cost of manual annotation, investigating an efficient approach to perform automated gene expression annotation on mouse brain images becomes necessary. In this thesis, a novel efficient approach based on machine learning framework is proposed. Features are extracted from raw brain images, and both binary classification and multi-class classification models are built with some supervised learning methods. To generate features, one of the most adopted methods in current research effort is to apply the bag-of-words (BoW) algorithm. However, both the efficiency and the accuracy of BoW are not outstanding when dealing with large-scale data. Thus, an augmented sparse coding method, which is called Stochastic Coordinate Coding, is adopted to generate high-level features in this thesis. In addition, a new multi-label classification model is proposed in this thesis. Label hierarchy is built based on the given brain ontology structure. Experiments have been conducted on the atlas and the results show that this approach is efficient and classifies the images with a relatively higher accuracy.
ContributorsZhao, Xinlin (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Thesis advisor) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2016

Description
In brain imaging study, 3D surface-based algorithms may provide more advantages over volume-based methods, due to their sub-voxel accuracy to represent subtle subregional changes and solid mathematical foundations on which global shape analyses can be achieved on complicated topological structures, such as the convoluted cortical surfaces. On the other hand, given the enormous amount of data being generated daily, it is still challenging to develop effective and efficient surface-based methods to analyze brain shape morphometry. There are two major problems in surface-based shape analysis research: correspondence and similarity. This dissertation covers both topics by proposing novel surface registration and indexing algorithms based on conformal geometry for brain morphometry analysis.
First, I propose a surface fluid registration system, which extends the traditional image fluid registration to surfaces. With surface conformal parameterization, the complexity of the proposed registration formula has been greatly reduced, compared to prior methods. Inverse consistency is also incorporated to drive a symmetric correspondence between surfaces. After registration, the multivariate tensor-based morphometry (mTBM) is computed to measure local shape deformations. The algorithm was applied to study hippocampal atrophy associated with Alzheimer's disease (AD).
Next, I propose a ventricular surface registration algorithm based on hyperbolic Ricci flow, which computes a global conformal parameterization for each ventricular surface without introducing any singularity. Furthermore, in the parameter space, unique hyperbolic geodesic curves are introduced to guide consistent correspondences across subjects, a technique called geodesic curve lifting. Tensor-based morphometry (TBM) statistic is computed from the registration to measure shape changes. This algorithm was applied to study ventricular enlargement in mild cognitive impatient (MCI) converters.
Finally, a new shape index, the hyperbolic Wasserstein distance, is introduced. This algorithm computes the Wasserstein distance between general topological surfaces as a shape similarity measure of different surfaces. It is based on hyperbolic Ricci flow, hyperbolic harmonic map, and optimal mass transportation map, which is extended to hyperbolic space. This method fills a gap in the Wasserstein distance study, where prior work only dealt with images or genus-0 closed surfaces. The algorithm was applied in an AD vs. control cortical shape classification study and achieved promising accuracy rate.
First, I propose a surface fluid registration system, which extends the traditional image fluid registration to surfaces. With surface conformal parameterization, the complexity of the proposed registration formula has been greatly reduced, compared to prior methods. Inverse consistency is also incorporated to drive a symmetric correspondence between surfaces. After registration, the multivariate tensor-based morphometry (mTBM) is computed to measure local shape deformations. The algorithm was applied to study hippocampal atrophy associated with Alzheimer's disease (AD).
Next, I propose a ventricular surface registration algorithm based on hyperbolic Ricci flow, which computes a global conformal parameterization for each ventricular surface without introducing any singularity. Furthermore, in the parameter space, unique hyperbolic geodesic curves are introduced to guide consistent correspondences across subjects, a technique called geodesic curve lifting. Tensor-based morphometry (TBM) statistic is computed from the registration to measure shape changes. This algorithm was applied to study ventricular enlargement in mild cognitive impatient (MCI) converters.
Finally, a new shape index, the hyperbolic Wasserstein distance, is introduced. This algorithm computes the Wasserstein distance between general topological surfaces as a shape similarity measure of different surfaces. It is based on hyperbolic Ricci flow, hyperbolic harmonic map, and optimal mass transportation map, which is extended to hyperbolic space. This method fills a gap in the Wasserstein distance study, where prior work only dealt with images or genus-0 closed surfaces. The algorithm was applied in an AD vs. control cortical shape classification study and achieved promising accuracy rate.
ContributorsShi, Jie, Ph.D (Author) / Wang, Yalin (Thesis advisor) / Caselli, Richard (Committee member) / Li, Baoxin (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)
Created2016

Description
The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized in multiple visual computing tasks to bridge the so-called "semantic gap" between extractable low-level feature representations and high-level semantic understanding of the visual objects.
Despite years of research, there are still some unsolved problems on semantic attribute learning. First, real-world applications usually involve hundreds of attributes which requires great effort to acquire sufficient amount of labeled data for model learning. Second, existing attribute learning work for visual objects focuses primarily on images, with semantic analysis on videos left largely unexplored.
In this dissertation I conduct innovative research and propose novel approaches to tackling the aforementioned problems. In particular, I propose robust and accurate learning frameworks on both attribute ranking and prediction by exploring the correlation among multiple attributes and utilizing various types of label information. Furthermore, I propose a video-based skill coaching framework by extending attribute learning to the video domain for robust motion skill analysis. Experiments on various types of applications and datasets and comparisons with multiple state-of-the-art baseline approaches confirm that my proposed approaches can achieve significant performance improvements for the general attribute learning problem.
Despite years of research, there are still some unsolved problems on semantic attribute learning. First, real-world applications usually involve hundreds of attributes which requires great effort to acquire sufficient amount of labeled data for model learning. Second, existing attribute learning work for visual objects focuses primarily on images, with semantic analysis on videos left largely unexplored.
In this dissertation I conduct innovative research and propose novel approaches to tackling the aforementioned problems. In particular, I propose robust and accurate learning frameworks on both attribute ranking and prediction by exploring the correlation among multiple attributes and utilizing various types of label information. Furthermore, I propose a video-based skill coaching framework by extending attribute learning to the video domain for robust motion skill analysis. Experiments on various types of applications and datasets and comparisons with multiple state-of-the-art baseline approaches confirm that my proposed approaches can achieve significant performance improvements for the general attribute learning problem.
ContributorsChen, Lin (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Yalin (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2016