APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

**Jackson Melchert** 

# **Processing Element Design Space**



# Key Challenges and Our Solutions

| <b>Challenge 1</b><br>A naïve exploration of the<br>design space leads to too<br>many candidate PEs | <b>Challenge 2</b><br>Risk of overspecialization<br>towards the applications<br>chosen for analysis | <b>Challenge 3</b><br>Evaluating the efficacy of<br>PE designs on applications<br>requires a functioning<br>compiler |
|-----------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|
| Generate candidate PEs by                                                                           | Merge frequent operations                                                                           | Leverage the Peak DSL to                                                                                             |
| finding frequently occurring                                                                        | into a baseline PE with a                                                                           | automatically synthesize                                                                                             |
| operations in the                                                                                   | general-purpose                                                                                     | rewrite rules for the                                                                                                |
| applications themselves                                                                             | instruction set                                                                                     | compiler for each PE                                                                                                 |

**APEX (Automated PE Exploration)** encompasses application analysis, PE specification, and CGRA hardware and compiler generation to create an end-to-end flow for PE DSE

## **APEX:** Application-Driven PE Exploration



[1] Mining Graph Patterns, Hong Cheng, Xifeng Yan, and Jiawei Han. Springer, 2010.

#### Stanford University

[2] Efficient Datapath Merging for Partially Reconfigurable Architectures. Nahri Moreano, Edson Borin, Cid C. de Souza, and Guido Araujo <sup>4</sup> IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2005.

# Subgraph Mining



# Subgraph Mining



# Finding Non-Overlapping Occurrences



# Finding Non-Overlapping Occurrences



# Finding Non-Overlapping Occurrences











## Automatic Rewrite Rule Synthesis Using SMT



Stanford University

Synthesizing Instruction Selection Rewrite Rules from RTL Using SMT. Ross Daly, Caleb Donovick, Jackson Melchert, Rajsekhar Setaluri, 14 Nestan Tsiskaridze Bullock, Priyanka Raina, Clark Barrett, and Pat Hanrahan. Formal Methods in Computer-Aided Design (FMCAD), 2022

# **APEX Design Space Exploration Framework**



### Demo

- In this demo, we will take a look at an example application or two and generate a PE specialized for those applications
- First, we will run two applications through the Halide to Hardware compiler:
- aha map apps/harris
- aha map apps/gaussian

### Demo

- Next, we will load them into the APEX tool and do some analysis:
- bash apex\_demo.sh mine
- We can see a visualization of the application compute graph and the mined subgraphs in /aha/APEX/pdf/

### Demo

- Finally, we can generate a customized PE including all the subgraphs that you might want with the following command:
- bash apex\_demo.sh specialize
- The visualization of the resulting PE is in /APEX/arch\_graph.pdf
- The Verilog of the resulting PE is in /APEX/outputs/verilog/PE.v

# **Evaluation - Baseline PE**

- One ALU
- One multiplier
- Two registers for integer operands
- Bit registers and LUT for bitwise operations

From 16b Connection Boxes (CBs) From 1b CBs



# **Image Processing Specialization**

- APEX generates a specialized PE for each application
- Each PE Spec contains the most common operations from those applications



Stanford University

20

## Architecture of Camera Pipeline PE Spec



# **Image Processing Specialization**

 APEX also generates PE IP, specialized across all image processing applications



## Architecture of PE IP



# **Image Processing Specialization**

- CGRA with PE IP provides significant area and energy improvements over the baseline PE
- CGRA with PE Spec provides significant improvement over PE IP





## **APEX Summary**

- Developed APEX: a framework for automated design space exploration of CGRA PEs
  - Allows for application domain-driven specialization of CGRAs using subgraph mining and merging
  - Includes automated hardware and compiler generation
  - Generates specialized CGRAs that are more area and energy efficient compared to a baseline CGRA