Filtering by
- Genre: Academic theses

Artificial intelligence (AI) systems are increasingly being deployed in safety-critical real-world environments involving human participants, which may pose risks to human safety. The unpredictable nature of real-world conditions and the inherent variability of human behavior often lead to situations that were not anticipated during the systems design or verification phases. Moreover, the inclusion of AI components like large language models (LLM) often regarded as "black boxes”, adds complexity to these systems, heightening the likelihood of encountering unforeseen challenging scenarios or “unknown-unknowns”.
Unknown-unknowns present a significant challenge because their causes and impacts on the system are often not identified or are not known to the human-in-the-loop at the time of the error. These errors often precipitate a chain of events over time, starting with errors, leading to faults, which may escalate into hazards, and ultimately lead to accidents or safety violations, adversely affecting the human participants. To address these challenges, this thesis considers a conformal inference-based detection framework for identifying unknown-unknowns. This framework relearns operational models using physics guided surrogate models. The incorporation of physical into the framework ensures that it preemptively detects unknown-unknowns before causing any harm or safety violation. Unlike traditional rare class detection and anomaly detection methods, this approach does not rely on predefined error traces or definitions since for unknown-unknowns these are completely new scenarios not present during training or validation.
Lastly, to distinguish unknown-unknowns from traditional anomalies or rare events, this thesis proposes categorizing them into two main subclasses: those stemming from predictive model errors and those arising from other physical dynamics and human interactions. Additionally, this thesis also investigates the effects of LLM-based agents in real world scenarios and their role in introducing more unknown-unknowns into the overall system.
This research aims to make the interactions between AI enabled assistive devices and humans safer in the real world which is essential for the widespread deployment of such systems. By addressing the problem of unknown-unknowns associated with these safety-critical systems, this research contributes to increased trust and acceptance in diverse sectors such as healthcare, daily planning, transportation, and industrial automation.
Relational databases are widely used to store structured data across domains such as finance, healthcare, and education. However, accessing this data typically requires writing complex structured query language (SQL) queries—an often challenging task for non-experts. This research aims to simplify the process of retrieving information from such databases by developing automated approaches that enable users to interact with structured data without needing SQL proficiency.
A significant aspect of this work focuses on multi-table question answering (QA), where information is distributed across multiple relational tables. Unlike single-table scenarios, multi-table QA involves additional challenges such as identifying relevant tables, understanding table relationships, and executing correct joins. To address these complexities, I propose an approach that enhances the retrieval and reasoning mechanisms needed to accurately interpret and connect information across relational schemas, making multi-table QA more accessible and effective.
Another key challenge addressed in this research is the trustworthiness of QA systems powered by large language models (LLMs). While LLMs are capable of generating fluent and coherent responses, they often hallucinate or produce factually incorrect outputs, undermining user trust. To mitigate this, I explore a method known as attribution, which ensures that answers are grounded in verifiable evidence. Specifically, I introduce a technique for cell-level attribution in single-table QA settings, where each piece of data supporting an answer is precisely traced back to its originating cell in the table. This enhances transparency and makes the reasoning behind answers auditable and trustworthy.
By addressing both the structural complexity of multi-table databases and the reliability issues in single-table reasoning, this research contributes toward building QA systems that are not only accurate but also interpretable. These improvements are particularly critical in high-stakes domains such as healthcare, law, and business, where decisions must be based on traceable and dependable data-driven insights.

Despite their remarkable capabilities, Large Language Models (LLMs) exhibit concerning vulnerabilities to minor input perturbations, posing risks in safety-critical applications. This thesis systematically analyzes how semantic-preserving prompt perturbations—at the character, word, and sentence levels—affect LLM safety behavior. It investigates whether even slight rephrasings can cause an otherwise aligned model to flip from a safe (refusal or benign) response to an unsafe one, or vice versa. The study also identifies key perturbation characteristics that drive such flips and examines how flip likelihood varies across different categories of harmful content.To answer these questions, the study evaluates multiple advanced LLMs (LLaMA2, LLaMA3, Mistral) on the CatQA dataset of harmful queries. Each query is perturbed by applying controlled character-level typos, word-level substitutions, and sentence-level paraphrases while preserving semantic meaning. Model outputs are assessed using the automated safety evaluation tool LlamaGuard.
Empirical results show that sentence-level paraphrasing consistently improves safety compliance, whereas fine-grained character-level noise often degrades it due to tokenization weaknesses. Word-level changes yield the most inconsistent behavior, with random insertions particularly likely to elicit unsafe outputs. Among the models tested, LLaMA3 was the most vulnerable, exhibiting the highest rate of unsafe responses under perturbation.
Overall, these findings underscore the importance of perturbation-aware and context-aware safety evaluation and offer practical insights for improving LLM alignment in real-world deployments.