Matching Items (38)

Description
This dissertation addresses access management problems that occur in both emergency and outpatient clinics with the objective of allocating the available resources to improve performance measures by considering the trade-offs. Two main settings are considered for estimating patient willingness-to-wait (WtW) behavior for outpatient appointments with statistical analyses of data: allocation of the limited booking horizon to patients of different priorities by using time windows in an outpatient setting considering patient behavior, and allocation of hospital beds to admitted Emergency Department (ED) patients. For each chapter, a different approach based on the problem context is developed and the performance is analyzed by implementing analytical and simulation models. Real hospital data is used in the analyses to provide evidence that the methodologies introduced are beneficial in addressing real life problems, and real improvements can be achievable by using the policies that are suggested.
This dissertation starts with studying an outpatient clinic context to develop an effective resource allocation mechanism that can improve patient access to clinic appointments. I first start with identifying patient behavior in terms of willingness-to-wait to an outpatient appointment. Two statistical models are developed to estimate patient WtW distribution by using data on booked appointments and appointment requests. Several analyses are conducted on simulated data to observe effectiveness and accuracy of the estimations.
Then, this dissertation introduces a time windows based policy that utilizes patient behavior to improve access by using appointment delay as a lever. The policy improves patient access by allocating the available capacity to the patients from different priorities by dividing the booking horizon into time intervals that can be used by each priority group which strategically delay lower priority patients.
Finally, the patient routing between ED and inpatient units to improve the patient access to hospital beds is studied. The strategy that captures the trade-off between patient safety and quality of care is characterized as a threshold type. Through the simulation experiments developed by real data collected from a hospital, the achievable improvement of implementing such a strategy that considers the safety-quality of care trade-off is illustrated.
This dissertation starts with studying an outpatient clinic context to develop an effective resource allocation mechanism that can improve patient access to clinic appointments. I first start with identifying patient behavior in terms of willingness-to-wait to an outpatient appointment. Two statistical models are developed to estimate patient WtW distribution by using data on booked appointments and appointment requests. Several analyses are conducted on simulated data to observe effectiveness and accuracy of the estimations.
Then, this dissertation introduces a time windows based policy that utilizes patient behavior to improve access by using appointment delay as a lever. The policy improves patient access by allocating the available capacity to the patients from different priorities by dividing the booking horizon into time intervals that can be used by each priority group which strategically delay lower priority patients.
Finally, the patient routing between ED and inpatient units to improve the patient access to hospital beds is studied. The strategy that captures the trade-off between patient safety and quality of care is characterized as a threshold type. Through the simulation experiments developed by real data collected from a hospital, the achievable improvement of implementing such a strategy that considers the safety-quality of care trade-off is illustrated.
ContributorsKilinc, Derya (Author) / Gel, Esma (Thesis advisor) / Pasupathy, Kalyan (Committee member) / Sefair, Jorge (Committee member) / Sir, Mustafa (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)
Created2019

Description
In conventional supervised learning tasks, information retrieval from extensive collections of data happens automatically at low cost, whereas in many real-world problems obtaining labeled data can be hard, time-consuming, and expensive. Consider healthcare systems, for example, where unlabeled medical images are abundant while labeling requires a considerable amount of knowledge from experienced physicians. Active learning addresses this challenge with an iterative process to select instances from the unlabeled data to annotate and improve the supervised learner. At each step, the query of examples to be labeled can be considered as a dilemma between exploitation of the supervised learner's current knowledge and exploration of the unlabeled input features.
Motivated by the need for efficient active learning strategies, this dissertation proposes new algorithms for batch-mode, pool-based active learning. The research considers the following questions: how can unsupervised knowledge of the input features (exploration) improve learning when incorporated with supervised learning (exploitation)? How to characterize exploration in active learning when data is high-dimensional? Finally, how to adaptively make a balance between exploration and exploitation?
The first contribution proposes a new active learning algorithm, Cluster-based Stochastic Query-by-Forest (CSQBF), which provides a batch-mode strategy that accelerates learning with added value from exploration and improved exploitation scores. CSQBF balances exploration and exploitation using a probabilistic scoring criterion based on classification probabilities from a tree-based ensemble model within each data cluster.
The second contribution introduces two more query strategies, Double Margin Active Learning (DMAL) and Cluster Agnostic Active Learning (CAAL), that combine consistent exploration and exploitation modules into a coherent and unified measure for label query. Instead of assuming a fixed clustering structure, CAAL and DMAL adopt a soft-clustering strategy which provides a new approach to formalize exploration in active learning.
The third contribution addresses the challenge of dynamically making a balance between exploration and exploitation criteria throughout the active learning process. Two adaptive algorithms are proposed based on feedback-driven bandit optimization frameworks that elegantly handle this issue by learning the relationship between exploration-exploitation trade-off and an active learner's performance.
Motivated by the need for efficient active learning strategies, this dissertation proposes new algorithms for batch-mode, pool-based active learning. The research considers the following questions: how can unsupervised knowledge of the input features (exploration) improve learning when incorporated with supervised learning (exploitation)? How to characterize exploration in active learning when data is high-dimensional? Finally, how to adaptively make a balance between exploration and exploitation?
The first contribution proposes a new active learning algorithm, Cluster-based Stochastic Query-by-Forest (CSQBF), which provides a batch-mode strategy that accelerates learning with added value from exploration and improved exploitation scores. CSQBF balances exploration and exploitation using a probabilistic scoring criterion based on classification probabilities from a tree-based ensemble model within each data cluster.
The second contribution introduces two more query strategies, Double Margin Active Learning (DMAL) and Cluster Agnostic Active Learning (CAAL), that combine consistent exploration and exploitation modules into a coherent and unified measure for label query. Instead of assuming a fixed clustering structure, CAAL and DMAL adopt a soft-clustering strategy which provides a new approach to formalize exploration in active learning.
The third contribution addresses the challenge of dynamically making a balance between exploration and exploitation criteria throughout the active learning process. Two adaptive algorithms are proposed based on feedback-driven bandit optimization frameworks that elegantly handle this issue by learning the relationship between exploration-exploitation trade-off and an active learner's performance.
ContributorsShams, Ghazal (Author) / Runger, George C. (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Escobedo, Adolfo (Committee member) / Pedrielli, Giulia (Committee member) / Arizona State University (Publisher)
Created2020

Description
The Cognitive Decision Support (CDS) model is proposed. The model is widely applicable and scales to realistic, complex decision problems based on adaptive learning. The utility of a decision is discussed and four types of decisions associated with CDS model are identified. The CDS model is designed to learn decision utilities. Data enrichment is introduced to promote the effectiveness of learning. Grouping is introduced for large-scale decision learning. Introspection and adjustment are presented for adaptive learning. Triage recommendation is incorporated to indicate the trustworthiness of suggested decisions.
The CDS model and methodologies are integrated into an architecture using concepts from cognitive computing. The proposed architecture is implemented with an example use case to inventory management.
Reinforcement learning (RL) is discussed as an alternative, generalized adaptive learning engine for the CDS system to handle the complexity of many problems with unknown environments. An adaptive state dimension with context that can increase with newly available information is discussed. Several enhanced components for RL which are critical for complex use cases are integrated. Deep Q networks are embedded with the adaptive learning methodologies and applied to an example supply chain management problem on capacity planning.
A new approach using Ito stochastic processes is proposed as a more generalized method to generate non-stationary demands in various patterns that can be used in decision problems. The proposed method generates demands with varying non-stationary patterns, including trend, cyclical, seasonal, and irregular patterns. Conventional approaches are identified as special cases of the proposed method. Demands are illustrated in realistic settings for various decision models. Various statistical criteria are applied to filter the generated demands. The method is applied to a real-world example.
The CDS model and methodologies are integrated into an architecture using concepts from cognitive computing. The proposed architecture is implemented with an example use case to inventory management.
Reinforcement learning (RL) is discussed as an alternative, generalized adaptive learning engine for the CDS system to handle the complexity of many problems with unknown environments. An adaptive state dimension with context that can increase with newly available information is discussed. Several enhanced components for RL which are critical for complex use cases are integrated. Deep Q networks are embedded with the adaptive learning methodologies and applied to an example supply chain management problem on capacity planning.
A new approach using Ito stochastic processes is proposed as a more generalized method to generate non-stationary demands in various patterns that can be used in decision problems. The proposed method generates demands with varying non-stationary patterns, including trend, cyclical, seasonal, and irregular patterns. Conventional approaches are identified as special cases of the proposed method. Demands are illustrated in realistic settings for various decision models. Various statistical criteria are applied to filter the generated demands. The method is applied to a real-world example.
ContributorsKee, Seho (Author) / Runger, George C. (Thesis advisor) / Escobedo, Adolfo (Committee member) / Gel, Esma (Committee member) / Janakiram, Mani (Committee member) / Rogers, Dale (Committee member) / Arizona State University (Publisher)
Created2020

Description
Graphs are commonly used visualization tools in a variety of fields. Algorithms have been proposed that claim to improve the readability of graphs by reducing edge crossings, adjusting edge length, or some other means. However, little research has been done to determine which of these algorithms best suit human perception for particular graph properties. This thesis explores four different graph properties: average local clustering coefficient (ALCC), global clustering coefficient (GCC), number of triangles (NT), and diameter. For each of these properties, three different graph layouts are applied to represent three different approaches to graph visualization: multidimensional scaling (MDS), force directed (FD), and tsNET. In a series of studies conducted through the crowdsourcing platform Amazon Mechanical Turk, participants are tasked with discriminating between two graphs in order to determine their just noticeable differences (JNDs) for the four graph properties and three layout algorithm pairs. These results are analyzed using previously established methods presented by Rensink et al. and Kay and Heer.The average JNDs are analyzed using a linear model that determines whether the property-layout pair seems to follow Weber's Law, and the individual JNDs are run through a log-linear model to determine whether it is possible to model the individual variance of the participant's JNDs. The models are evaluated using the R2 score to determine if they adequately explain the data and compared using the Mann-Whitney pairwise U-test to determine whether the layout has a significant effect on the perception of the graph property. These tests indicate that the data collected in the studies can not always be modelled well with either the linear model or log-linear model, which suggests that some properties may not follow Weber's Law. Additionally, the layout algorithm is not found to have a significant impact on the perception of some of these properties.
ContributorsClayton, Benjamin (Author) / Maciejewski, Ross (Thesis advisor) / Kobourov, Stephen (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)
Created2019

Description
Breeding seeds to include desirable traits (increased yield, drought/temperature resistance, etc.) is a growing and important method of establishing food security. However, besides breeder intuition, few decision-making tools exist that can provide the breeders with credible evidence to make decisions on which seeds to progress to further stages of development. This thesis attempts to create a chance-constrained knapsack optimization model, which the breeder can use to make better decisions about seed progression and help reduce the levels of risk in their selections. The model’s objective is to select seed varieties out of a larger pool of varieties and maximize the average yield of the “knapsack” based on meeting some risk criteria. Two models are created for different cases. First is the risk reduction model which seeks to reduce the risk of getting a bad yield but still maximize the total yield. The second model considers the possibility of adverse environmental effects and seeks to mitigate the negative effects it could have on the total yield. In practice, breeders can use these models to better quantify uncertainty in selecting seed varieties
ContributorsOzcan, Ozkan Meric (Author) / Armbruster, Dieter (Thesis advisor) / Gel, Esma (Thesis advisor) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)
Created2019

Description
Monitoring a system for deviations from standard or reference behavior is essential for many data-driven tasks. Whether it is monitoring sensor data or the interactions between system elements, such as edges in a path or transactions in a network, the goal is to detect significant changes from a reference. As technological advancements allow for more data to be collected from systems, monitoring approaches should evolve to accommodate the greater collection of high-dimensional data and complex system settings. This dissertation introduces system-level models for monitoring tasks characterized by changes in a subset of system components, utilizing component-level information and relationships. A change may only affect a portion of the data or system (partial change). The first three parts of this dissertation present applications and methods for detecting partial changes. The first part introduces a methodology for partial change detection in a simple, univariate setting. Changes are detected with posterior probabilities and statistical mixture models which allow only a fraction of data to change. The second and third parts of this dissertation center around monitoring more complex multivariate systems modeled through networks. The goal is to detect partial changes in the underlying network attributes and topology. The contributions of the second and third parts are two non-parametric system-level monitoring techniques that consider relationships between network elements. The algorithm Supervised Network Monitoring (SNetM) leverages Graph Neural Networks and transforms the problem into supervised learning. The other algorithm Supervised Network Monitoring for Partial Temporal Inhomogeneity (SNetMP) generates a network embedding, and then transforms the problem to supervised learning. At the end, both SNetM and SNetMP construct measures and transform them to pseudo-probabilities to be monitored for changes. The last topic addresses predicting and monitoring system-level delays on paths in a transportation/delivery system. For each item, the risk of delay is quantified. Machine learning is used to build a system-level model for delay risk, given the information available (such as environmental conditions) on the edges of a path, which integrates edge models. The outputs can then be used in a system-wide monitoring framework, and items most at risk are identified for potential corrective actions.
ContributorsKasaei Roodsari, Maziar (Author) / Runger, George (Thesis advisor) / Escobedo, Adolfo (Committee member) / Pan, Rong (Committee member) / Shinde, Amit (Committee member) / Arizona State University (Publisher)
Created2021

Description
Matching or stratification is commonly used in observational studies to remove bias due to confounding variables. Analyzing matched data sets requires specific methods which handle dependency among observations within a stratum. Also, modern studies often include hundreds or thousands of variables. Traditional methods for matched data sets are challenged in high-dimensional settings, mixed type variables (numerical and categorical), nonlinear andinteraction effects. Furthermore, machine learning research for such structured data is quite limited. This dissertation addresses this important gap and proposes machine learning models for identifying informative variables from high-dimensional matched data sets. The first part of this dissertation proposes a machine learning model to identify informative variables from high-dimensional matched case-control data sets. The outcome of interest in this study design is binary (case or control), and each stratum is assumed to have one unit from each outcome level. The proposed method which is referred to as Matched Forest (MF) is effective for large number of variables and identifying interaction effects. The second part of this dissertation proposes three enhancements of MF algorithm. First, a regularization framework is proposed to improve variable selection performance in excessively high-dimensional settings. Second, a classification method is proposed to classify unlabeled pairs of data. Third, two metrics are proposed to estimate the effects of important variables identified by MF. The third part proposes a machine learning model based on Neural Networks to identify important variables from a more generalized matched case-control data set where each stratum has one unit from case outcome level and more than one unit from control outcome level. This method which is referred to as Matched Neural Network (MNN) performs better than current algorithms to identify variables with interaction effects. Lastly, a generalized machine learning model is proposed to identify informative variables from high-dimensional matched data sets where the outcome has more than two levels. This method outperforms existing algorithms in the literature in identifying variables with complex nonlinear and interaction effects.
ContributorsShomal Zadeh, Nooshin (Author) / Runger, George (Thesis advisor) / Montgomery, Douglas (Committee member) / Shinde, Shilpa (Committee member) / Escobedo, Adolfo (Committee member) / Arizona State University (Publisher)
Created2021

Description
Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must install Global Positioning System(GPS) receivers and administrators must continuously monitor these devices. There have been some urban traffic simulators trying to generate such data with different features. However, they suffer from two critical issues (1) Scalability: most of them only offer single-machine solution which is not adequate to produce large-scale data. Some simulators can generate traffic in parallel but do not well balance the load among machines in a cluster. (2) Granularity: many simulators do not consider microscopic traffic situations including traffic lights, lane changing, car following. This paper proposed GeoSparkSim, a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark, to deliver a holistic approach that allows data scientists to simulate, analyze and visualize large-scale urban traffic data. To implement microscopic traffic models, GeoSparkSim employs a simulation-aware vehicle partitioning method to partition vehicles among different machines such that each machine has a balanced workload. The experimental analysis shows that GeoSparkSim can simulate the movements of 200 thousand cars over an extensive road network (250 thousand road junctions and 300 thousand road segments).
ContributorsFu, Zishan (Author) / Sarwat, Mohamed (Thesis advisor) / Pedrielli, Giulia (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)
Created2019

Description
The shift in focus of manufacturing systems to high-mix and low-volume production poses a challenge to both efficient scheduling of manufacturing operations and effective assessment of production capacity. This thesis considers the problem of scheduling a set of jobs that require machine and worker resources to complete their manufacturing operations. Although planners in manufacturing contexts typically focus solely on machines, schedules that only consider machining requirements may be problematic during implementation because machines need skilled workers and cannot run unsupervised. The model used in this research will be beneficial to these environments as planners would be able to determine more realistic assignments and operation sequences to minimize the total time required to complete all jobs. This thesis presents a mathematical formulation for concurrent scheduling of machines and workers that can optimally schedule a set of jobs while accounting for changeover times between operations. The mathematical formulation is based on disjunctive constraints that capture the conflict between operations when trying to schedule them to be performed by the same machine or worker. An additional formulation extends the previous one to consider how cross-training may impact the production capacity and, for a given budget, provide training recommendations for specific workers and operations to reduce the makespan. If training a worker is advantageous to increase production capacity, the model recommends the best time window to complete it such that overlaps with work assignments are avoided. It is assumed that workers can perform tasks involving the recently acquired skills as soon as training is complete. As an alternative to the mixed-integer programming formulations, this thesis provides a math-heuristic approach that fixes the order of some operations based on Largest Processing Time (LPT) and Shortest Processing Time (SPT) procedures, while allowing the exact formulation to find the optimal schedule for the remaining operations. Computational experiments include the use of the solution for the no-training problem as a starting feasible solution to the training problem. Although the models provided are general, the manufacturing of Printed Circuit Boards are used as a case study.
ContributorsAdams, Katherine Bahia (Author) / Sefair, Jorge (Thesis advisor) / Askin, Ronald (Thesis advisor) / Webster, Scott (Committee member) / Arizona State University (Publisher)
Created2019

Description
To maintain long term success, a manufacturing company should be managed and operated under the guidance of properly designed capacity, production and logistics plans that are formulated in coordination with its manufacturing footprint, so that its managerial goals on both strategic and tactical levels can be fulfilled. In particular, sufficient flexibility and efficiency should be ensured so that future customer demand can be met at a profit. This dissertation is motivated by an automobile manufacturer's mid-term and long-term decision problems, but applies to any multi-plant, multi-product manufacturer with evolving product portfolios and significant fixed and variable production costs. Via introducing the concepts of effective capacity and product-specific flexibility, two mixed integer programming (MIP) models are proposed to help manufacturers shape their mid-term capacity plans and long-term product allocation plans. With fixed tooling flexibility, production and logistics considerations are integrated into a mid-term capacity planning model to develop well-informed and balanced tactical plans, which utilize various capacity adjustment options to coordinate production, inventory, and shipping schedules throughout the planning horizon so that overall operational and capacity adjustment costs are minimized. For long-term product allocation planning, strategic tooling configuration plans that empower the production of multi-generation products at minimal configuration and operational costs are established for all plants throughout the planning horizon considering product-specific commonality and compatibility. New product introductions and demand uncertainty over the planning horizon are incorporated. As a result, potential production sites for each product and corresponding process flexibility are determined. An efficient heuristic method is developed and shown to perform well in solution quality and computational requirements.
ContributorsYao, Xufeng (Author) / Askin, Ronald (Thesis advisor) / Sefair, Jorge (Thesis advisor) / Escobedo, Adolfo (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)
Created2021