Filtering by
- Creators: Computer Science and Engineering Program

In the traditional Von-Neumann architectures, such as Central Processing Units (CPUs) and Graphics Processing Units (GPUs), the performance gap (about 1000x) between the processor and memory (Memory Wall) has drastically limited the throughput of data-intensive applications. Additionally, the high power consumption of each data transfer (about 40x more than an arithmetic operation) over the memory bus of limited bandwidth (Power Wall), has severely diminished the energy efficiency of CPUs/GPUs. This degraded performance and energy efficiency of the CPUs/GPUs have been exacerbated by the meteoric rise of applications such as Artificial Intelligence (AI), Machine Learning (ML) algorithms, databases, bio-informatics, etc., that often process gigabytes to terabytes of data.
In the last few years, several research proposals and industry prototypes have demonstrated that processing in/near memory (PIM) using Dynamic Random Access Memory (DRAM) is an effective computing paradigm to improve the throughput and energy efficiency of executing data-intensive applications by orders of magnitude as opposed to CPUs and GPUs. Notwithstanding the plethora of proposed PIM architectures, they suffer from challenges such as DRAM intrusion, the lack of flexibility to support multiple applications, an interface with a host processor or a controller, and either the complete absence or availability of a primitive programming and compilation framework.
This dissertation tackles the memory and power bottlenecks in CPUs/GPUs and barriers to the widespread adoption of PIM architectures for edge and server computing systems. It proposes PIM designs based on runtime-configurable Neuron Processing Elements (NPEs) integrated into DRAM without altering standard operation or protocols, while meeting strict area and power constraints. Each NPE, built from programmable artificial neurons (ANs) with local storage, uses a radically different approach to logic design and is therefore significantly smaller than equivalent CMOS designs. NPEs implement multiple operations of varying data formats and precision. Runtime configuration is achieved with minimal overhead. The integration enables a unified platform for various data-intensive workloads, achieving up to a 10x throughput increase and 10–100x energy efficiency gains over CPUs, GPUs, and prior PIM solutions.