Matching Items (25)
Filtering by

Clear all filters

Description

Recent advancements in machine learning methods have allowed companies to develop advanced computer vision aided production lines that take advantage of the raw and labeled data captured by high-definition cameras mounted at vantage points in their factory floor. We experiment with two different methods of developing one such system to

Recent advancements in machine learning methods have allowed companies to develop advanced computer vision aided production lines that take advantage of the raw and labeled data captured by high-definition cameras mounted at vantage points in their factory floor. We experiment with two different methods of developing one such system to automatically track key components on a production line. By tracking the state of these key components using object detection we can accurately determine and report production line metrics like part arrival and start/stop times for key factory processes. We began by collecting and labeling raw image data from the cameras overlooking the factory floor. Using that data we trained two dedicated object detection models. Our training utilized transfer learning to start from a Faster R-CNN ResNet model trained on Microsoft’s COCO dataset. The first model we developed is a binary classifier that detects the state of a single object while the second model is a multiclass classifier that detects the state of two distinct objects on the factory floor. Both models achieved over 95% classification and localization accuracy on our test datasets. Having two additional classes did not affect the classification or localization accuracy of the multiclass model compared to the binary model.

ContributorsPaulson, Hunter (Author) / Ju, Feng (Thesis director) / Balasubramanian, Ramkumar (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)
Created2022-05
Description
Machine learning algorithms have a wide variety of applications and use cases. They are robust in the sense that they can continue to learn and improve long after they have been deployed without much programmer supervision. One key area that machine learning has been used for is in

Machine learning algorithms have a wide variety of applications and use cases. They are robust in the sense that they can continue to learn and improve long after they have been deployed without much programmer supervision. One key area that machine learning has been used for is in the detection and classification of objects in images and videos. This so-called computer vision has typically been used by companies to extract user information from the images and videos that they post. Meta (formerly known as Facebook) had been using such algorithms to automatically tag users in pictures that were uploaded to the Facebook website up until November 2021 [1]. Although these algorithms have been used to exploit user’s privacy, they can also be used to help ensure this privacy. For this creative project, I developed a machine learning model that could detect faces in a given picture and identify the area of the picture that these faces took up. Training a model from scratch can take millions of images of data and hundreds of hours on powerful GPUs. Since I didn’t have access to those resources, I began with a pre-trained model known as VGG16 by Karen Simonyan & Andrew Zisserman. From there, I took 90 pictures of myself and annotated where in the image my face was located. Since 90 pictures wouldn’t be enough data for this algorithm, I used an image augmentation algorithm to randomly crop, flip, change brightness, change gamma, and recolor the images to expand the dataset. In total, I used 5400 images to train the algorithm. The machine learning model had a loss value that hovered around 0.1 thanks to the VGG16 model. It was able to accurately detect my face and also adapt whenever I moved my face horizontally and vertically across a camera. However, the model struggled to draw a bounding box whenever I moved my face forward or backward in the camera shot.
ContributorsGutierrez, Ariel (Author) / Osburn, Steven (Thesis director) / Panchoo, Anthony (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor)
Created2024-05
Description
Tracking moving objects with code isn’t a new concept. There are many computer vision libraries that have functions that can track changes in position very accurately. This allows for computers to be able to provide data about situations that aren’t able to be observed in a reasonable amount of time.

Tracking moving objects with code isn’t a new concept. There are many computer vision libraries that have functions that can track changes in position very accurately. This allows for computers to be able to provide data about situations that aren’t able to be observed in a reasonable amount of time. For example, tracking hundreds of moving cars over a day would take a lot of time if done by hand, but with code, one can get that data quicker. This thesis aims to provide a clear, simple, and effective application to track moving objects in a given video, trace their paths, and color-code these paths to see which ones are the most congested. This is to provide an efficient and deployable algorithm to track moving objects. This research was in collaboration with Moog Inc, an aerospace and defense company, to develop an algorithm that would analyze a video of a parking lot and determine the empty parking spaces and the common traffic paths that cars take while in a parking lot. Moog Inc. provides an Optimized Development Environment (ODE) to develop the application. Since the hardware is efficient on power and has a small form factor, the applications that are run on it are very easily deployable and portable, which makes it useful for any environment. The process of tracking cars in a video is somewhat straightforward as well. It consists of filtering the video, drawing rectangles around each region (car), tracing their paths (movements) and applying a heatmap to that path. Since it isn’t too computationally intensive, it can work well on the ODE. Since the ODE is small and has a portable form factor, this algorithm can be deployed fairly easily. The heatmap generation was effective in showing the densities of certain paths that cars traveled through. There are also various colormaps that can be used, to provide a clearer idea of the paths. There were attempts to optimize this algorithm by processing every other frame instead, but ultimately the tradeoff between efficiency and accuracy was deemed to be unfavorable. There were still some limitations that this approach had, as initially the algorithm would draw paths between areas that weren’t traversed by cars. While this was fixed in the final result, there are still some slight inaccuracies within the roads. There are also ethical concerns with the use of this software, as Moog Inc. does a lot of work in defense and this software could be used in wartime scenarios. However, this software can be applied to various other scenarios like tracking wildlife in an area to study their habits, or tracking particles to see their density in a given environment. Since the algorithm is ran on a low-powered environment, it can be deployed and tested in many different scenarios without being costly.
ContributorsChandra, Rohan (Author) / Chavez Echeagaray, Maria Elena (Thesis director) / Rieckmann, Tyron (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2024-05
Description
Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional

Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional approaches to dynamical modeling have included linear and nonlinear methods with their respective drawbacks. An alternative idea I propose is the use of descriptors of the shape of the dynamical attractor as a feature representation for quantification of nature of dynamics. The framework has two main advantages over traditional approaches: a) representation of the dynamical system is derived directly from the observational data, without any inherent assumptions, and b) the proposed features show stability under different time-series lengths where traditional dynamical invariants fail.

Approximately 1\% of the total world population are stroke survivors, making it the most common neurological disorder. This increasing demand for rehabilitation facilities has been seen as a significant healthcare problem worldwide. The laborious and expensive process of visual monitoring by physical therapists has motivated my research to invent novel strategies to supplement therapy received in hospital in a home-setting. In this direction, I propose a general framework for tuning component-level kinematic features using therapists’ overall impressions of movement quality, in the context of a Home-based Adaptive Mixed Reality Rehabilitation (HAMRR) system.

The rapid technological advancements in computing and sensing has resulted in large amounts of data which requires powerful tools to analyze. In the recent past, topological data analysis methods have been investigated in various communities, and the work by Carlsson establishes that persistent homology can be used as a powerful topological data analysis approach for effectively analyzing large datasets. I have explored suitable topological data analysis methods and propose a framework for human activity analysis utilizing the same for applications such as action recognition.
ContributorsVenkataraman, Vinay (Author) / Turaga, Pavan (Thesis advisor) / Papandreou-Suppappol, Antonia (Committee member) / Krishnamurthi, Narayanan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2016
Description
The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized

The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized in multiple visual computing tasks to bridge the so-called "semantic gap" between extractable low-level feature representations and high-level semantic understanding of the visual objects.

Despite years of research, there are still some unsolved problems on semantic attribute learning. First, real-world applications usually involve hundreds of attributes which requires great effort to acquire sufficient amount of labeled data for model learning. Second, existing attribute learning work for visual objects focuses primarily on images, with semantic analysis on videos left largely unexplored.

In this dissertation I conduct innovative research and propose novel approaches to tackling the aforementioned problems. In particular, I propose robust and accurate learning frameworks on both attribute ranking and prediction by exploring the correlation among multiple attributes and utilizing various types of label information. Furthermore, I propose a video-based skill coaching framework by extending attribute learning to the video domain for robust motion skill analysis. Experiments on various types of applications and datasets and comparisons with multiple state-of-the-art baseline approaches confirm that my proposed approaches can achieve significant performance improvements for the general attribute learning problem.
ContributorsChen, Lin (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Yalin (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2016
Description
As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a

As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a conventional camera, into a single step. A popular variant is the single-pixel camera that obtains measurements of the scene using a pseudo-random measurement matrix. Advances in compressive sensing (CS) theory in the past decade have supplied the tools that, in theory, allow near-perfect reconstruction of an image from these measurements even for sub-Nyquist sampling rates. However, current state-of-the-art reconstruction algorithms suffer from two drawbacks -- They are (1) computationally very expensive and (2) incapable of yielding high fidelity reconstructions for high compression ratios. In computer vision, the final goal is usually to perform an inference task using the images acquired and not signal recovery. With this motivation, this thesis considers the possibility of inference directly from compressed measurements, thereby obviating the need to use expensive reconstruction algorithms. It is often the case that non-linear features are used for inference tasks in computer vision. However, currently, it is unclear how to extract such features from compressed measurements. Instead, using the theoretical basis provided by the Johnson-Lindenstrauss lemma, discriminative features using smashed correlation filters are derived and it is shown that it is indeed possible to perform reconstruction-free inference at high compression ratios with only a marginal loss in accuracy. As a specific inference problem in computer vision, face recognition is considered, mainly beyond the visible spectrum such as in the short wave infra-red region (SWIR), where sensors are expensive.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2015
Description
Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of

Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of blindness among American adults. Recent studies have shown that diagnosis based on digital retinal imaging has potential benefits over traditional face-to-face evaluation. Yet there is a dearth of computer-based systems that can match the level of performance achieved by ophthalmologists. This thesis takes a fresh perspective in developing a computer-based system aimed at improving diagnosis of DR images. These images are categorized into three classes according to their severity level. The proposed approach explores effective methods to classify new images and retrieve clinically-relevant images from a database with prior diagnosis information associated with them. Retrieval provides a novel way to utilize the vast knowledge in the archives of previously-diagnosed DR images and thereby improve a clinician's performance while classification can safely reduce the burden on DR screening programs and possibly achieve higher detection accuracy than human experts. To solve the three-class retrieval and classification problem, the approach uses a multi-class multiple-instance medical image retrieval framework that makes use of spectrally tuned color correlogram and steerable Gaussian filter response features. The results show better retrieval and classification performances than prior-art methods and are also observed to be of clinical and visual relevance.
ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Frakes, David (Committee member) / Arizona State University (Publisher)
Created2012
Description
Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many

Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many of such sparse learning methods focus on designing or application of some learning techniques for certain feature space without much explicit consideration on possible interaction between the underlying semantics of the visual data and the employed learning technique. Rich semantic information in most visual data, if properly incorporated into algorithm design, should help achieving improved performance while delivering intuitive interpretation of the algorithmic outcomes. My study addresses the problem of how to explicitly consider the semantic information of the visual data in the sparse learning algorithms. In this work, we identify four problems which are of great importance and broad interest to the community. Specifically, a novel approach is proposed to incorporate label information to learn a dictionary which is not only reconstructive but also discriminative; considering the formation process of face images, a novel image decomposition approach for an ensemble of correlated images is proposed, where a subspace is built from the decomposition and applied to face recognition; based on the observation that, the foreground (or salient) objects are sparse in input domain and the background is sparse in frequency domain, a novel and efficient spatio-temporal saliency detection algorithm is proposed to identify the salient regions in video; and a novel hidden Markov model learning approach is proposed by utilizing a sparse set of pairwise comparisons among the data, which is easier to obtain and more meaningful, consistent than tradition labels, in many scenarios, e.g., evaluating motion skills in surgical simulations. In those four problems, different types of semantic information are modeled and incorporated in designing sparse learning algorithms for the corresponding visual computing tasks. Several real world applications are selected to demonstrate the effectiveness of the proposed methods, including, face recognition, spatio-temporal saliency detection, abnormality detection, spatio-temporal interest point detection, motion analysis and emotion recognition. In those applications, data of different modalities are involved, ranging from audio signal, image to video. Experiments on large scale real world data with comparisons to state-of-art methods confirm the proposed approaches deliver salient advantages, showing adding those semantic information dramatically improve the performances of the general sparse learning methods.
ContributorsZhang, Qiang (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Yalin (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2014
Description
In the sport of competitive water skiing, the skill of a human boat driver can affect athletic performance. Driver influence is not necessarily inhibitive to skiers, however, it reduces the fairness and credibility of the sport overall. In response to the stated problem, this thesis proposes a vision-based real-time control

In the sport of competitive water skiing, the skill of a human boat driver can affect athletic performance. Driver influence is not necessarily inhibitive to skiers, however, it reduces the fairness and credibility of the sport overall. In response to the stated problem, this thesis proposes a vision-based real-time control system designed specifically for tournament waterski boats. The challenges addressed in this thesis include: one, the segmentation of floating objects in frame sequences captured by a moving camera, two, the identification of segmented objects which fit a predefined model, and three, the accurate and fast estimation of camera position and orientation from coplanar point correspondences. This thesis discusses current ideas and proposes new methods for the three challenges mentioned. In the end, a working prototype is produced.
ContributorsWalker, Collin (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Claveau, David (Committee member) / Arizona State University (Publisher)
Created2014
Description
Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical

Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical rotation of head movement in marmoset monkeys.

Experiments were conducted in a sound-attenuated acoustic chamber. Head movement of marmoset monkey was studied under various auditory and visual stimulation conditions. With increasing complexity, these conditions are (1) idle, (2) sound-alone, (3) sound and visual signals, and (4) alert signal by opening and closing of the chamber door. All of these conditions were tested with either house light on or off. Infra-red camera with a frame rate of 90 Hz was used to capture of the head movement of monkeys. To assist the signal detection, two circular markers were attached to the top of monkey head. The data analysis used an image-based marker detection scheme. Images were processed using the Computation Vision Toolbox in Matlab. The markers and their positions were detected using blob detection techniques. Based on the frame-by-frame information of marker positions, the angular position, velocity and acceleration were extracted in horizontal and vertical planes. Adaptive Otsu Thresholding, Kalman filtering and bound setting for marker properties were used to overcome a number of challenges encountered during this analysis, such as finding image segmentation threshold, continuously tracking markers during large head movement, and false alarm detection.

The results show that the blob detection method together with Kalman filtering yielded better performances than other image based techniques like optical flow and SURF features .The median of the maximal head turn in the horizontal plane was in the range of 20 to 70 degrees and the median of the maximal velocity in horizontal plane was in the range of a few hundreds of degrees per second. In comparison, the natural alert signal - door opening and closing - evoked the faster head turns than other stimulus conditions. These results suggest that behaviorally relevant stimulus such as alert signals evoke faster head-turn responses in marmoset monkeys.
ContributorsSimhadri, Sravanthi (Author) / Zhou, Yi (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2014