Search Content

Artificial Intelligence with Graph Neural Networks Applied to a Risk-like Board Game

Description

This project aspires to develop an AI capable of playing on a variety of maps in a Risk-like board game. While AI has been successfully applied to many other board games, such as Chess and Go, most research is confined to a single board and is inflexible to topological changes.…

This project aspires to develop an AI capable of playing on a variety of maps in a Risk-like board game. While AI has been successfully applied to many other board games, such as Chess and Go, most research is confined to a single board and is inflexible to topological changes. Further, almost all of these games are played on a rectangular grid. Contrarily, this project develops an AI player, referred to as GG-net, to play the online strategy game Warzone, which is based on the classic board game Risk. Warzone is played on a wide variety of irregularly shaped maps. Prior research has struggled to create an effective AI for Risk-like games due to the immense branching factor. The most successful attempts tended to rely on manually restricting the set of actions the AI considered while also engineering useful features for the AI to consider. GG-net uses no human knowledge, but rather a genetic algorithm combined with a graph neural network. Together, these methods allow GG-net to perform competitively across a multitude of maps. GG-net outperformed the built-in rule-based AI by 413 Elo (representing an 80.7% chance of winning) and an approach based on AlphaZero using graph neural networks by 304 Elo (representing a 74.2% chance of winning). This same advantage holds across both seen and unseen maps. GG-net appears to be a strong opponent on both small and medium maps, however, on large maps with hundreds of territories, inefficiencies in GG-net become more significant and GG-net struggles against the rule-based approach. Overall, GG-net was able to successfully learn the game and generalize across maps of a similar size, albeit further work is required for GG-net to become more successful on large maps.

ContributorsBauer, Andrew (Author) / Yang, Yezhou (Thesis director) / Harrison, Blake (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Machine Learning Approaches to Tumor Estimation of Whole Slide Images

Description

Molecular pathology makes use of estimates of tumor content (tumor percentage) for pre-analytic and analytic purposes, such as molecular oncology testing, massive parallel sequencing, or next-generation sequencing (NGS), assessment of sample acceptability, accurate quantitation of variants, assessment of copy number changes (among other applications), determination of specimen viability for testing…

Molecular pathology makes use of estimates of tumor content (tumor percentage) for pre-analytic and analytic purposes, such as molecular oncology testing, massive parallel sequencing, or next-generation sequencing (NGS), assessment of sample acceptability, accurate quantitation of variants, assessment of copy number changes (among other applications), determination of specimen viability for testing (since many assays require a minimum tumor content to report variants at the limit of detection) may all be improved with more accurate and reproducible estimates of tumor content. Currently, tumor percentages of samples submitted for molecular testing are estimated by visual examination of Hematoxylin and Eosin (H&E) stained tissue slides under the microscope by pathologists. These estimations can be automated, expedited, and rendered more accurate by applying machine learning methods on digital whole slide images (WSI).

ContributorsCirelli, Claire (Author) / Yang, Yezhou (Thesis director) / Yalim, Jason (Committee member) / Velu, Priya (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

Machine Learning and Mario Speedruns

Description

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will…

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will be described is self-play in video games, where a neural model will be researched and described that will teach a computer to complete a level of Super Mario World (1990) on its own. The neural model in question was inspired by the academic paper “Evolving Neural Networks through Augmenting Topologies”, which was written by Kenneth O. Stanley and Risto Miikkulainen of University of Texas at Austin. The model that will actually be described is from YouTuber SethBling of the California Institute of Technology. The second department that will be described is cybersecurity, where an algorithm is described from the academic paper “Process Based Volatile Memory Forensics for Ransomware Detection”, written by Asad Arfeen, Muhammad Asim Khan, Obad Zafar, and Usama Ahsan. This algorithm utilizes Python and the Volatility framework to detect malicious software in an infected system.

ContributorsBallecer, Joshua (Author) / Yang, Yezhou (Thesis director) / Luo, Yiran (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

A Framework for Measuring Human Uncertainty of Autonomous Vehicles with Specific Attention to the Inclusion of Empathy: Can Human Eyes Reveal Surprise?

Description

Currently, autonomous vehicles are being evaluated by how well they interact with humans without evaluating how well humans interact with them. Since people are not going to unanimously switch over to using autonomous vehicles, attention must be given to how well these new vehicles signal intent to human drivers from…

Currently, autonomous vehicles are being evaluated by how well they interact with humans without evaluating how well humans interact with them. Since people are not going to unanimously switch over to using autonomous vehicles, attention must be given to how well these new vehicles signal intent to human drivers from the driver’s point of view. Ineffective communication will lead to unnecessary discomfort among drivers caused by an underlying uncertainty about what an autonomous vehicle is or isn’t about to do. Recent studies suggest that humans tend to fixate on areas of higher uncertainty so scenarios that have a higher number of vehicle fixations can be reasoned to be more uncertain. We provide a framework for measuring human uncertainty and use the framework to measure the effect of empathetic vs non-empathetic agents. We used a simulated driving environment to create recorded scenarios and manipulate the autonomous vehicle to include either an empathetic or non-empathetic agent. The driving interaction is composed of two vehicles approaching an uncontrolled intersection. These scenarios were played to twelve participants while their gaze was recorded to track what the participants were fixating on. The overall intent was to provide an analytical framework as a tool for evaluating autonomous driving features; and in this case, we choose to evaluate how effective it was for vehicles to have empathetic behaviors included in the autonomous vehicle decision making. A t-test analysis of the gaze indicated that empathy did not in fact reduce uncertainty although additional testing of this hypothesis will be needed due to the small sample size.

ContributorsGreenhagen, Tanner Patrick (Author) / Yang, Yezhou (Thesis director) / Jammula, Varun C (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Stochastic models of patient access management in healthcare

Description

This dissertation addresses access management problems that occur in both emergency and outpatient clinics with the objective of allocating the available resources to improve performance measures by considering the trade-offs. Two main settings are considered for estimating patient willingness-to-wait (WtW) behavior for outpatient appointments with statistical analyses of data: allocation…

This dissertation addresses access management problems that occur in both emergency and outpatient clinics with the objective of allocating the available resources to improve performance measures by considering the trade-offs. Two main settings are considered for estimating patient willingness-to-wait (WtW) behavior for outpatient appointments with statistical analyses of data: allocation of the limited booking horizon to patients of different priorities by using time windows in an outpatient setting considering patient behavior, and allocation of hospital beds to admitted Emergency Department (ED) patients. For each chapter, a different approach based on the problem context is developed and the performance is analyzed by implementing analytical and simulation models. Real hospital data is used in the analyses to provide evidence that the methodologies introduced are beneficial in addressing real life problems, and real improvements can be achievable by using the policies that are suggested.

This dissertation starts with studying an outpatient clinic context to develop an effective resource allocation mechanism that can improve patient access to clinic appointments. I first start with identifying patient behavior in terms of willingness-to-wait to an outpatient appointment. Two statistical models are developed to estimate patient WtW distribution by using data on booked appointments and appointment requests. Several analyses are conducted on simulated data to observe effectiveness and accuracy of the estimations.

Then, this dissertation introduces a time windows based policy that utilizes patient behavior to improve access by using appointment delay as a lever. The policy improves patient access by allocating the available capacity to the patients from different priorities by dividing the booking horizon into time intervals that can be used by each priority group which strategically delay lower priority patients.

Finally, the patient routing between ED and inpatient units to improve the patient access to hospital beds is studied. The strategy that captures the trade-off between patient safety and quality of care is characterized as a threshold type. Through the simulation experiments developed by real data collected from a hospital, the achievable improvement of implementing such a strategy that considers the safety-quality of care trade-off is illustrated.

ContributorsKilinc, Derya (Author) / Gel, Esma (Thesis advisor) / Pasupathy, Kalyan (Committee member) / Sefair, Jorge (Committee member) / Sir, Mustafa (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2019

Towards understanding natural language: semantic parsing, commonsense knowledge acquisition, reasoning framework and applications

Description

Reasoning with commonsense knowledge is an integral component of human behavior. It is due to this capability that people know that a weak person may not be able to lift someone. It has been a long standing goal of the Artificial Intelligence community to simulate such commonsense reasoning abilities in…

Reasoning with commonsense knowledge is an integral component of human behavior. It is due to this capability that people know that a weak person may not be able to lift someone. It has been a long standing goal of the Artificial Intelligence community to simulate such commonsense reasoning abilities in machines. Over the years, many advances have been made and various challenges have been proposed to test their abilities. The Winograd Schema Challenge (WSC) is one such Natural Language Understanding (NLU) task which was also proposed as an alternative to the Turing Test. It is made up of textual question answering problems which require resolution of a pronoun to its correct antecedent.

In this thesis, two approaches of developing NLU systems to solve the Winograd Schema Challenge are demonstrated. To this end, a semantic parser is presented, various kinds of commonsense knowledge are identified, techniques to extract commonsense knowledge are developed and two commonsense reasoning algorithms are presented. The usefulness of the developed tools and techniques is shown by applying them to solve the challenge.

ContributorsSharma, Arpita (Author) / Baral, Chitta (Thesis advisor) / Lee, Joohyung (Committee member) / Papotti, Paolo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2019

Towards learning compact visual embeddings using deep neural networks

Description

Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models…

Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models have seen numerous applications in both language and vision community as they capture the information in the modality (English language) efficiently. Inspired by these language models, this work focuses on learning embedding spaces for two visual computing tasks, 1. Image Hashing 2. Zero Shot Learning. The training set was used to learn embedding spaces over which similarity/dissimilarity is measured using several distance metrics like hamming / euclidean / cosine distances. While the above-mentioned language models learn generic word embeddings, in this work task specific embeddings were learnt which can be used for Image Retrieval and Classification separately.

Image Hashing is the task of mapping images to binary codes such that some notion of user-defined similarity is preserved. The first part of this work focuses on designing a new framework that uses the hash-tags associated with web images to learn the binary codes. Such codes can be used in several applications like Image Retrieval and Image Classification. Further, this framework requires no labelled data, leaving it very inexpensive. Results show that the proposed approach surpasses the state-of-art approaches by a significant margin.

Zero-shot classification is the task of classifying the test sample into a new class which was not seen during training. This is possible by establishing a relationship between the training and the testing classes using auxiliary information. In the second part of this thesis, a framework is designed that trains using the handcrafted attribute vectors and word vectors but doesn’t require the expensive attribute vectors during test time. More specifically, an intermediate space is learnt between the word vector space and the image feature space using the hand-crafted attribute vectors. Preliminary results on two zero-shot classification datasets show that this is a promising direction to explore.

ContributorsGattupalli, Jaya Vijetha (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)

Created2019

Monocular depth estimation with edge-based constraints and active learning

Description

The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid…

The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid sparse patterns used by previous work. This work focuses on using these feature-based sparse patterns to generate additional depth information by interpolating regions between clusters of samples that are in close proximity to each other. These interpolated sparse depths are used to enforce additional constraints on the network’s predictions. In addition to the improved depth prediction performance observed from incorporating the sparse sample information in the network compared to pure RGB-based methods, the experiments show that actively retraining a network on a small number of samples that deviate most from the interpolated sparse depths leads to better depth prediction overall.

This thesis also introduces a new metric, titled Edge, to quantify model performance in regions of an image that show the highest change in ground truth depth values along either the x-axis or the y-axis. Existing metrics in depth estimation like Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) quantify model performance across the entire image and don’t focus on specific regions of an image that are hard to predict. To this end, the proposed Edge metric focuses specifically on these hard to classify regions. The experiments also show that using the Edge metric as a small addition to existing loss functions like L1 loss in current state-of-the-art methods leads to vastly improved performance in these hard to classify regions, while also improving performance across the board in every other metric.

ContributorsRai, Anshul (Author) / Yang, Yezhou (Thesis advisor) / Zhang, Wenlong (Committee member) / Liang, Jianming (Committee member) / Arizona State University (Publisher)

Created2019

Learning in Compressed Domains

Description

A massive volume of data is generated at an unprecedented rate in the information age. The growth of data signiﬁcantly exceeds the computing and storage capacities of the existing digital infrastructure. In the past decade, many methods are invented for data compression, compressive sensing and reconstruction, and compressed learning (learning…

A massive volume of data is generated at an unprecedented rate in the information age. The growth of data signiﬁcantly exceeds the computing and storage capacities of the existing digital infrastructure. In the past decade, many methods are invented for data compression, compressive sensing and reconstruction, and compressed learning (learning directly upon compressed data) to overcome the data-explosion challenge. While prior works are predominantly model-based, focus on small models, and not suitable for task-oriented sensing or hardware acceleration, the number of available models for compression-related tasks has escalated by orders of magnitude in the past decade. Motivated by this signiﬁcant growth and the success of big data, this dissertation proposes to revolutionize both the compressive sensing reconstruction (CSR) and compressed learning (CL) methods from the data-driven perspective. In this dissertation, a series of topics on data-driven CSR are discussed. Individual data-driven models are proposed for the CSR of bio-signals, images, and videos with improved compression ratio and recovery ﬁdelity trade-oﬀ. Speciﬁcally, a scalable Laplacian pyramid reconstructive adversarial network (LAPRAN) is proposed for single-image CSR. LAPRAN progressively reconstructs images following the concept of the Laplacian pyramid through the concatenation of multiple reconstructive adversarial networks (RANs). For the CSR of videos, CSVideoNet is proposed to improve the spatial-temporal resolution of reconstructed videos. Apart from CSR, data-driven CL is discussed in the dissertation. A CL framework is proposed to extract features directly from compressed data for image classiﬁcation, objection detection, and semantic/instance segmentation. Besides, the spectral bias of neural networks is analyzed from the frequency perspective, leading to a learning-based frequency selection method for identifying the trivial frequency components which can be removed without accuracy loss. Compared with the conventional spatial downsampling approaches, the proposed frequency-domain learning method can achieve higher accuracy with reduced input data size. The methodologies proposed in this dissertation are not restricted to the above-mentioned applications. The dissertation also discusses other potential applications and directions for future research.

ContributorsXu, Kai (Author) / Ren, Fengbo (Thesis advisor) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

irRotate - Automatic Screen Rotation Based on Face Orientation using Infrared Cameras

Description

This work solves the problem of incorrect rotations while using handheld devices.Two new methods which improve upon previous works are explored. The first method
uses an infrared camera to capture and detect the user’s face position and orient the
display accordingly. The second method utilizes gyroscopic and accelerometer data
as input to a…

This work solves the problem of incorrect rotations while using handheld devices.Two new methods which improve upon previous works are explored. The first method
uses an infrared camera to capture and detect the user’s face position and orient the
display accordingly. The second method utilizes gyroscopic and accelerometer data
as input to a machine learning model to classify correct and incorrect rotations.
Experiments show that these new methods achieve an overall success rate of 67%
for the first and 92% for the second which reaches a new high for this performance
category. The paper also discusses logistical and legal reasons for implementing this
feature into an end-user product from a business perspective. Lastly, the monetary
incentive behind a feature like irRotate in a consumer device and explore related
patents is discussed.

ContributorsTallman, Riley (Author) / Yang, Yezhou (Thesis advisor) / Liang, Jianming (Committee member) / Chen, Yinong (Committee member) / Arizona State University (Publisher)

Created2020