Detection of Data Exposure in Software Services Using Large Language Models

Goswami, Sourabh

Description

Data exposure in software services engineering is critical because leakages of confidential or sensitive information are frequent, but developers often struggle to prevent them due to diverse causes like hardcoded secrets, storage misconfigurations, insecure logging, improper data transmission, or unsafe…

Data exposure in software services engineering is critical because leakages of confidential or sensitive information are frequent, but developers often struggle to prevent them due to diverse causes like hardcoded secrets, storage misconfigurations, insecure logging, improper data transmission, or unsafe deserialization. Modern software development practices can exacerbate these risks. Existing methods for detecting data exposure, often using regular expressions or static pattern analysis, frequently generate many false positives or lack the deep contextual understanding required for reliable detection across varied programming languages. To address these limitations, this thesis presents an effective approach to improve data exposure detection in software services using a Large Language Model, enhanced via low rank adapters (LoRA) for efficient specialization and few-shot learning for ambiguity refinement. Unlike detection systems relying solely on static patterns or simple heuristics, the Large Language Model-powered framework presented provides deep contextual code analysis, enabling highly accurate identification of exposures. The evaluation demonstrates performance significantly surpassing existing tools and alternative techniques in both precision and recall. Key capabilities include efficient model adaptation through LoRA, ambiguity resolution using few-shot learning based on optimized thresholds, and precise line-level localization of identified exposures, ensuring more reliable results and facilitating faster remediation. The integration of a feedback loop for continuous learning further distinguishes the framework as an accurate, scalable, and intelligent solution suited for detecting data exposures in complex software service environments.

Details

Contributors

Goswami, Sourabh (Author)
Yau, Stephen S. (Thesis advisor)
Ahn, Gail Joon (Committee member)
Baek, Jaejong (Committee member)
Arizona State University (Publisher)

Date Created

2025

Topical Subject

Language

en

Note

Partial requirement for: M.S., Arizona State University, 2025
Field of study: Computer Science

Additional Information

Language English

Extent

66 pages

Genre

Open Access

Peer-reviewed

Detection of Data Exposure in Software Services Using Large Language Models

Downloads

Details

Machine-readable links