# µARCHDB EVENT VIEWER FOR GEMMINI

*Kevin He\*, Victor Hu\*, Nicolas Castaneda, Ryan Ma\**

kevinjhe@berkeley.edu, victorhu3@berkeley.edu, nicolas.a.castaneda@berkeley.edu, ryan.ma3011@berkeley.edu

# 1. INTRODUCTION

Gemmini is a systolic-array based DNN accelerator hardware generator. The Gemmini accelerator code, not including the Rocket core which it interfaces with, is comprised of 15,664 lines of Chisel HDL, split across 55 files and many more modules [\[2\]](#page-2-0). For open-source developers and users, understanding Gemmini's microarchitecture and codebase represents a major obstacle to debugging and optimization. We present a Gemmini implementation of our hardware-based event tagging and tracking tool with visualization for debugging, education, and performance analysis.

# 2. MOTIVATION

Gemmini is highly complex system with custom RISC-V ISA based RoCC instructions. It has a DMA, a store, load, and execute controller, scratchpad, accumulator, mesh systollic array, transposer, reservation station, Im2Col unit, and scaling arithmetic units. It also has hardware controllers for decoding CISC matmul and convolution commands into RISC instructions, giving the programmer more options to use coarse-grained or fine-grained commands. This complexity makes for an exceptionally difficult debugging process. Instructions are both received from the CPU and generated dynamically by Gemmini's internal FSMs. Instructions are complex with 64+64+32 bits of encoding space. Each instruction is responsible for spawning many more possible loads and stores with data being sent to different functional units, depending on the configuration.

Currently, debugging and developing on Gemmini requires an engineer to parse through simulation waveforms and thus, a highly detailed knowledge of the microarchitecture to track each signal. There is also no instruction diassembler like Spike for RISC-V instructions, which forces developers to hand splice addresses and correlate function codes with instruction dumps to track the instruction flow. There is no easy way to visualize where instructions and data go inside Gemmini. Lastly, with Gemmini's complex instruction handling, it becomes quite difficult to follow how instructions are decoded. For all but the most experienced Gemmini developers, these barriers slow down code optimization and any future modifications or improvements to the microarchitecture.

# 3. PRIOR WORK

We were inspired by the Gem5 O3 pipeline viewer, which visualizes out-of-order CPU instruction execution[\[4\]](#page-2-1). We use the Konata

\*EE290 Project Group Members

instruction pipeline visualizer for our frontend. This pipeline visualizer was also designed for Gem5 processors [\[5\]](#page-2-2). [\[1\]](#page-2-3) proposed a similar hardware token passing method to tracking event dependencies, which they used to identify critical instruction paths. The authors were able to use the instruction criticality data to reduce resource contention in instruction scheduling.

We previously worked on annotating the Rocket and Sodor inorder RISC-V cores. Rocket and Sodor have simpler instruction flow.

#### 4. IMPLEMENTATION DETAILS

Our microarchitectural event tracking tool is composed of three parts: Chisel tagging code, a Python graph processing script, and the frontend pipeline visualizer.



Figure 1: Tool Workflow

#### 4.1. Chisel GenEvent

The Chisel tagging code is comprised of a GenEvent object, the event annotations, and related token-passing hardware in the Gemmini microarchitecture. The GenEvent object creates a small snippet of hardware code and includes a Chisel printf, an unique ID generator, and a cycle counter. The GenEvent can be called like a function in the Gemmini RTL and can be conditioned with Chisel's *when*, *elsewhen*, and *otherwise* clauses. On each cycle that the GenEvent is "called" or enabled, it prints a JSON string with the event name, a unique id, parent event, cycle count, and an optional data field. A parent event ID can be optionally input into the Gen-Event and is used to connect two events such that the event's ID and its parent's ID can be used to generate an edge in the event graph. The GenEvent then outputs the unique event ID that can be passed through the hardware to the next GenEvent location in the microarchitecture. The event name field specifies the label for the location or microarchitecture event that we are capturing. The data field can be used to print values of wires or registers in the design such as tags, IDs, or addresses. GenEvent also allows developers to specify the event ID rather than having it generate a unique ID. This can be useful if there already exists IDs in the microarchitecture that can be used to uniquely identify event paths. For example,



Figure 2: Gemmini Annotation Locations

the *rob\_id* in Gemmini is used to track commands throughout their execution and is used in our implementation when the Chisel logic is not easily modified to support tag/token passing. Figure 8 in the Appendix lists the GenEvent and EventTag Chisel code.

## *4.1.1. Gemmini Microarchitecture*

We added GenEvent annotations to critical control flow and datapath locations in Gemmini. In *Controller.scala*, we annotated the IO where Gemmini RoCC commands are received from the CPU, where they are turned into fine-grain instructions by the FSMs, reservation station issue, load, execute, and store controller issue from the reservation station and finally, instruction retire from the three controllers. In *ExecuteController.scala*, we annotated when the systolic array was flushed or fired, scratchpad and accumulator read/writes for the A, B, D matrices, mesh data inputs, and Im2Col requests.

To enable event tag passing between different areas of RTL, we at times had to modify Chisel bundles and data structures to store and pass tags (See Figure 9 of the Appendix for an example). For instance, in Figure 10 of the Appendix, we have different events for each matrix and scratchpad bank number for scratchpad reads. The generated event tags are enqueued onto mesh\_cntl\_signals\_q which is eventually passed to Figure 11 of the Appendix, where the mesh input fire event is different depending on the matrix and where the data originated, with the parent event set to the pipeline\_tag. This event tag passing is important to reconstruct the event graph later on. The code for annotated Gemmini can be found at [https://github.com/](https://github.com/kevinhe5/gemmini/tree/pipeline) [kevinhe5/gemmini/tree/pipeline](https://github.com/kevinhe5/gemmini/tree/pipeline)

#### 4.2. Python Graph Processing

The purpose of the Python script is to take the Verilator or VCS log generated during RTL simulation, build a graph of events, and parse it into an output file compatible with the pipeline visualizer. We named the script iris.py.

The script is adaptable to other microarchitectures and takes a JSON configuration as input. Figure 12 shows the config file used for Gemmini. The config specifies event names, start stages, split stages, and end stages. Event names correspond to the eventName parameter of the Chisel GenEvents. Start stages specify where an instruction can originate; for Gemmini, these stages are CMD in the controller.scala, LOOP\_CONV, and LOOP\_MATMUL which refer to the LoopConv and LoopMatmul CISC instruction FSMs.

End stages specifies where an instruction can retire. In a CPU pipeline, this is could be the writeback stage. For Gemmini, instructions can end up in multiple locations. For example, some config instructions only go to configure the LoopConv and Loop-Matmul modules. Controller config instructions are sent to the execute, store, and load controllers, where they set the internal controller states for future execution. These instructions are retired almost immediately. Move-in, move-out, preload, and compute instructions are much longer latency and are retired in LD\_RET, EX\_RET, and ST\_RET stages when the respective controllers send a completed signal to the main controller.

We also decode Gemmini instructions in iris.py. RoCC instructions contain a 32 bit instruction and two 64 bit register values: rs1 and rs2. These bits are used for setting the number of rows, columns, strides, addresses, and various other configuration parameters in Gemmini. Our GenEvent prints out the raw instruction encoding, which we mask in iris.py to provide the instruction type and its parameters in the event viewer. Based on the various instruction decodings, the script is coded to fetch the various fields of all the different instructions, making our system able to detect fine-grained instruction flows.

We construct the event graph with the NetworkX [\[3\]](#page-2-4). Since each event output from the GenEvent has a parent ID and its own event ID, we can draw an edge between the event and its parent. However, we must first pre-process the event log from our RTL simulator. Figure 13 shows a small portion of an output from VCS for scratchpad reads events and the related scratch pad rows being sent to the mesh. Since the GenEvents allow users to specify their own event IDs and allow the event ID to be the same as the parent event ID, we must first uniquify the event IDs. Event IDs that are the same are given a unique ID while preserving the relationship/edge between the two adjacent events by sorting by cycle time. For example, as an instruction passes from the reservation station to the execute controller to the spatial array mesh and retires, the ROB ID is used as the event ID for all the events. Because, each event will occur after the other, they will have different cycle counter values. Thus, their uniqified IDs construct edges in the order of execution.

Once the graph is constructed, we perform depth first search through the graph to extract the instruction paths. Each event is a graph node, containing the corresponding cycle counter value and data field. From these paths, we can generate the Konata log, where each path goes on one line in the viewer.

The code for the iris.py can be found at [https://github.](https://github.com/ncastaneda02/uarchdb/tree/Gemmini) [com/ncastaneda02/uarchdb/tree/Gemmini](https://github.com/ncastaneda02/uarchdb/tree/Gemmini). The JSON config file for Gemmini is gemmini.json. An example Konata log for Gemmini can be found in gemmini.log.

# 4.3. Pipeline Viewer

We created a custom fork of the Konata instruction pipeline visualizer. This fork contains some quality of life changes we made such as freezing the vertical height when zooming to make the experience similar to other waveform viewers. Konata ingests the Kanata Log Format file generated from the previous step. The fork can be found at <https://github.com/victorhu3/Konata>.

#### 5. RESULTS

Refer to Figure 5 of the Appendix for an example of a weight stationary tiled matmul workflow being visualized. The blue bars represent the instructions sitting in the ROB and the green bars are when they reach their respective controllers. Figure 7 is an example of a convolution workflow. In Figure 6, the instruction decoding feature is overlaid.

The pipeline viewer allows the user to see interesting interactions in Gemmini. In Figure 3, an instruction waits in the reservation station before it can be issued to the execute controller, until the load instruction returns. Konata allows for visualizations of dependencies that would have otherwise been hard to identify in a waveform.



Figure 3: Execute Stalling

Using the Python script, we also were able to generate a topdown visualization of graph of events with GraphViz. The roots of each tree are instructions being issued by the host processor.



#### Figure 4: GraphViz Output

Since we are ultimately constructing a graph, our GenEvent tagging solution allows for designers to choose the level of abstraction they want to annotate in the microarchitecture. We mostly painted broad strokes, annotating the largest controllers and most common memory transactions; however, a more experienced Gemmini developer should have no difficulty with annotating the most minutia of Gemmini signals and dependencies. NetworkX is extensible to very large graphs, and there was no noticeable delay with the iris.py script for our event paths.

# 6. CONCLUSION

In this paper, we introduced a tool to annotate and visualize events for Gemmini. We hope that this tool will speedup debugging, encourage open-source development, and provide valuable insights for the Gemmini microarchitecture.

# 7. CONTRIBUTIONS

Kevin He annotated Gemmini with GenEvent tags and worked with Nico to design the iris.py graph parsing script.

Victor Hu helped annotate Gemmini, improved the iris.py script, and worked on quality of life improvements of the Konata pipeline viewer.

Ryan Ma wrote the Gemmini instruction dissassembler in iris.py.

We thank Nico Castaneda, Jerry Zhao, and Dima Nikiforov for their help and advice on this project.

# References

- <span id="page-2-3"></span>[1] B. Fields, S. Rubin, and R. Bodik. "Focusing processor policies via critical-path prediction". In: *Proceedings 28th Annual International Symposium on Computer Architecture*. 2001, pp. 74–85. DOI: [10.1109/ISCA.2001.937434](https://doi.org/10.1109/ISCA.2001.937434).
- <span id="page-2-0"></span>[2] Hasan Genc et al. "Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration". In: *Proceedings of the 58th Annual Design Automation Conference (DAC)*. 2021.
- <span id="page-2-4"></span>[3] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. "Exploring Network Structure, Dynamics, and Function using NetworkX". In: *Proceedings of the 7th Python in Science Conference*. Ed. by Gaël Varoquaux, Travis Vaught, and Jarrod Millman. 2008, pp. 11-15. URL: [http :](http://conference.scipy.org/proceedings/SciPy2008/paper_2/) [/ / conference . scipy . org / proceedings /](http://conference.scipy.org/proceedings/SciPy2008/paper_2/) [SciPy2008/paper\\_2/](http://conference.scipy.org/proceedings/SciPy2008/paper_2/).
- <span id="page-2-1"></span>[4] *O3 Pipeline Viewer*. URL: [https://www.gem5.org/](https://www.gem5.org/documentation/general_docs/cpu_models/visualization/) [documentation / general \\_ docs / cpu \\_ models /](https://www.gem5.org/documentation/general_docs/cpu_models/visualization/) [visualization/](https://www.gem5.org/documentation/general_docs/cpu_models/visualization/).
- <span id="page-2-2"></span>[5] Ryota Shioya. *Konata: An instruction pipeline visualizer for Onikiri2-Kanata/GEM5-o3pipeview formats.* URL: [https:](https://github.com/shioyadan/Konata) [//github.com/shioyadan/Konata](https://github.com/shioyadan/Konata).

# 8. APPENDIX

# 8.1. Reproducibility

For our development of the tool, we used the GemminiRocketConfig config and tiled\_matmul\_ws\_At-baremetal test in Chipyard: *make run-binary CONFIG=GemminiRocketConfig BINARY=/tools/designs/kevinhe/chipyard6/generators/gemmini/software/gemminirocc-tests/build/bareMetalC/tiled\_matmul\_ws\_At-baremetal*

To run the Python processing script, ensure that the pandas, numpy, and networkx packages are installed, then: *python3 iris.py –log\_file [path/to/VCS\_out\_file] –schema\_file gemmini.json –output\_file gemmini.log –verbose –gemmini*

This will output the Konata log to gemmini.log. Once Konata is launched, the log file can be drag and dropped in for visualization.



Figure 5: tiled\_matmul\_ws\_At Konata View

| 0: s268632 (t0: r0): k FLUSH                                                                                                                  |     |     |     |     |     |     |        |        |                 |                |
|-----------------------------------------------------------------------------------------------------------------------------------------------|-----|-----|-----|-----|-----|-----|--------|--------|-----------------|----------------|
| 1: s274569 (t0: r1): k CONFIG CONFIG EX Output stationary: WEIGHT STATIONARY, Activation: NO ACTIVATION, stride: 0x104, scalar: 0x10000, righ |     |     |     |     |     |     |        |        |                 |                |
| 2: s274574 (t0: r2): k CONFIG CONFIG STK CONFIG CONFIG ST                                                                                     |     |     |     |     |     |     |        |        |                 |                |
| 3: s274598 (t0: r3): k_CONFIG CONFIG_LD Spad stride: 0x101, scale: 0x100000, mem stride: 0x13k_CONFIG CONFIG_LD Spad stride: 0x101, scale: 0  |     |     |     |     |     |     |        |        |                 |                |
| 4: s274605 (t0: r4): k CONFIG CONFIG LD Spad stride: 0x109, scale: 0x100000, mem stride: 0x11k CONFIG CONFIG LD Spad stride: 0x109, scale: 0  |     |     |     |     |     |     |        |        |                 |                |
| 5: s274612 (t0: r5): k CONFIG CONFIG LD Spad stride: 0x111. scale: 0x100000. mem stride: 0x44k CONFIG CONFIG LD Spad stride: 0x111. scale: 0  |     |     |     |     |     |     |        |        |                 |                |
| 6: s274799 (t0: r6): k LOOP WS CONFIG BOUNDS Padding: I: 0xd, J: 0xd, K: 0xf0000, Addresses: I: 0x2, J: 0x2, K: 0x20000                       |     |     |     |     |     |     |        |        |                 |                |
| 7: s274800 (t0: r7): k LOOP WS CONFIG ADDRS AB A addr: 0x800034a0, B addr: 0x80003360                                                         |     |     |     |     |     |     |        |        |                 |                |
| 8: s274822 (t0: r8): k_LOOP_WS_CONFIG_ADDRS_DC D addr: 0x0, C addr: 0x80003210                                                                |     |     |     |     |     |     |        |        |                 |                |
| 9: s274826 (t0: r9): k LOOP WS CONFIG STRIDES AB A stride: 19, B stride: 17                                                                   |     |     |     |     |     |     |        |        |                 |                |
| 10: s274835 (t0: r10): k LOOP WS CONFIG STRIDES DC D stride: 17, C stride: 17                                                                 |     |     |     |     |     |     |        |        |                 |                |
| 11: s274881 (t0: r11): k LOOP WS Activation: NO ACTIVATION, Low D: 0, Full C: 0, Ex Accumulate: 0, B Transpose: 0, A Transpose: 1             |     |     |     |     |     |     |        |        |                 |                |
| 12: s274886 (t0: r12): k MVIN2 DRAM addr: 0x80003360, Scratchpad addr: 0x1fc0, 8128 cols loaded, 8128 rows loaded                             |     |     |     |     |     |     |        |        |                 |                |
| 13: s274887 (t0: r13): k MVIN DRAM addr: 0x800034a0. Scratchpad addr: 0x0. 0 cols loaded. 0 rows loaded                                       | 124 | 125 | 126 | 127 | 128 | 129 | 130    | LD RET |                 |                |
| 14: s274888 (t0: r14): k_PRELOAD D Scratchpad addr: 0x1fc0, 8128 cols, 8128 rows, C Scratchpad addr: 0x80000000, 0 cols, 0 rows               | 28  | 29  | 30  | 31  | 32  | 33  | 34     | 35     | 36              | 37             |
| 15: s274889 (t0: r15): k COMPUTE PRELOADED A Scratchpad addr: 0x0, 0 cols, 0 rows, B Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows      | 123 | 124 | 125 | 126 | 127 | 128 | 129    | 130    | <b>EX ISSUE</b> | $\overline{1}$ |
| 16: s274890 (t0: r16): k_PRELOAD D Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows, C Scratchpad addr: 0x80000020, 0 cols, 32 rows        | 122 | 123 | 124 | 125 | 126 | 127 | 128    | 129    | 130             | <b>EX ISS</b>  |
| 17: s274891 (t0: r17): k_COMPUTE_ACCUMULATE A Scratchpad addr: 0x10, 16 cols, 16 rows, B Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows  | 121 | 122 | 123 | 124 | 125 | 126 | 127    | 128    | 129             | 130            |
| 18: s274892 (t0: r18): k PRELOAD D Scratchpad addr: 0x1fd0, 8144 cols, 8144 rows, C Scratchpad addr: 0x80000010, 0 cols, 16 rows              | 120 | 121 | 122 | 123 | 124 | 125 | 126    | 127    | 128             | 129            |
| 19: s274893 (t0: r19): k COMPUTE PRELOADED A Scratchpad addr: 0x0. 0 cols. 0 rows. B Scratchpad addr: 0xe0007fff. 32767 cols. 32767 rows      | 119 | 120 | 121 | 122 | 123 | 124 | 125    | 126    | 127             | 128            |
| 20: s274894 (t0: r20): k PRELOAD D Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows, C Scratchpad addr: 0x80000030, 0 cols, 48 rows        | 118 | 119 | 120 | 121 | 122 | 123 | 124    | 125    | 126             | 127            |
| 21: s274895 (t0: r21): k COMPUTE ACCUMULATE A Scratchpad addr: 0x10, 16 cols, 16 rows, B Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows  | 117 | 118 | 119 | 120 | 121 | 122 | 123    | 124    | 125             | 126            |
| 22: s274896 (t0: r22): k MVIN DRAM addr: 0x800035d0. Scratchpad addr: 0x20. 32 cols loaded. 32 rows loaded                                    | 115 | 116 | 117 | 118 | 119 | 120 | LD RET |        |                 |                |
| 23: s274897 (t0: r23): k MVIN2 DRAM addr: 0x80003470, Scratchpad addr: 0x1fe0, 8160 cols loaded, 8160 rows loaded                             | 114 | 115 | 116 | 117 | 118 | 119 | 120    | 121    | 122             | 123            |
| 24: s274898 (t0: r24): k_PRELOAD D Scratchpad addr: 0x1fe0, 8160 cols, 8160 rows, C Scratchpad addr: 0xc0000000, 0 cols, 0 rows               | 114 | 115 | 116 | 117 | 118 | 119 | 120    | 121    | 122             | 123            |
| 25: s274899 (t0: r25): k_COMPUTE_PRELOADED A Scratchpad addr: 0x20, 32 cols, 32 rows, B Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows   | 113 | 114 | 115 | 116 | 117 | 118 | 119    | 120    | 121             | 122            |
| 26: s274900 (t0: r26): k PRELOAD D Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows, C Scratchpad addr: 0xc0000020, 0 cols, 32 rows        | 112 | 113 | 114 | 115 | 116 | 117 | 118    | 119    | 120             | 121            |
| 27: s274901 (t0: r27): k_COMPUTE_ACCUMULATE A Scratchpad addr: 0x30, 48 cols, 48 rows, B Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows  | 111 | 112 | 113 | 114 | 115 | 116 | 117    | 118    | 119             | 120            |
| 28: s274902 (t0: r28): k PRELOAD D Scratchpad addr: 0x1ff0, 8176 cols, 8176 rows, C Scratchpad addr: 0xc0000010, 0 cols, 16 rows              | 110 | 111 | 112 | 113 | 114 | 115 | 116    | 117    | 118             | 119            |
| 29: s274903 (t0: r29): k COMPUTE PRELOADED A Scratchpad addr: 0x20, 32 cols, 32 rows, B Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows   | 109 | 110 | 111 | 112 | 113 | 114 | 115    | 116    | 117             | 118            |
| 30: s274904 (t0: r30): k MVOUT DRAM addr: 0x80003210. Scratchpad addr: 0x80000000. 0 cols loaded. 0 rows loaded                               | 108 | 109 | 110 | 111 | 112 | 113 | 114    | 115    | 116             | 117            |
| 31: s274905 (t0: r31): k PRELOAD D Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows, C Scratchpad addr: 0xc0000030, 0 cols, 48 rows        | 107 | 108 | 109 | 110 | 111 | 112 | 113    | 114    | 115             | 116            |
| 32: s274906 (t0: r32): k COMPUTE ACCUMULATE A Scratchpad addr: 0x30, 48 cols, 48 rows, B Scratchpad addr: 0xe0007fff, 32767 cols, 32767 rows  | 106 | 107 | 108 | 109 | 110 | 111 | 112    | 113    | 114             | 115            |
| 33: s274907 (t0: r-1): k MVOUT DRAM addr: 0x80003320. Scratchpad addr: 0x80000020. 32 cols loaded. 32 rows loaded                             | 105 | 106 | 187 | 108 | 109 | 110 | 111    | 112    | 113             | 114            |

Figure 6: Konata View



Figure 7: conv-baremetal Konata View

# 8.2. Code Snippets

```
1 object GenEvent {
2 var instance_ctr: Int = 0
3 def apply(eventName: String, data: UInt, parent: Option[EventTag], id: Option[UInt] = None): EventTag
         = {
 4 var new_id = Wire(UInt(64.W))
 5 val id_ctr = RegInit(0.U(32.W))
       id_{ctr} := id_{ctr} + 1.U7 new_id := Cat(instance_ctr.asUInt(32.W), id_ctr)
 8 if (parent.isDefined) {
 9 if (id.isDefined) {
10 printf(cf"{\"id\": \"0x${id.get}%x\", \"parents\": \"0x${parent.get.id}%x\", \"cycle\": \"
        $id_ctr\", \"event_name\": \"$eventName\", \"data\": \"0x$data%x\"}\n")
11 } else {
12 printf(cf"{\"id\": \"0x$new_id%x\", \"parents\": \"0x${parent.get.id}%x\", \"cycle\": \"$id_ctr
       \", \"event_name\": \"$eventName\", \"data\": \"0x$data%x\"}\n")
13 }
14 } else {
15 if (id.isDefined) {
16 printf(cf"{\"id\": \"0x${id.get}%x\", \"parents\": \"None\", \"cycle\": \"$id_ctr\", \"
        event_name\":\"$eventName\",\"data\":\"0x$data%x\"}\n")
17 } else {
18 printf(cf"{\"id\": \"0x$new_id%x\", \"parents\": \"None\", \"cycle\": \"$id_ctr\", \"event_name
        \": \"$eventName\", \"data\": \"0x$data%x\"}\n")
\begin{picture}(180,10) \put(0,0){\dashbox{0.5}(10.10)(0,0){10}} \put(15,0){\circle{10}} \put(15,0){\circle{120 \quad \lambda21 instance ctr += 1
22 return EventTag(new_id)<br>23}
    \rightarrow24 }
25 class EventTag extends Bundle {
26 val id = UInt (64. W)27 }
28 object EventTag {
29 def apply(id: UInt): EventTag = {
30 val tag = Wire(new EventTag)
31 tag.id := id
\begin{array}{c|c}\n 32 \\
 33\n \end{array} return tag
    \rightarrow34 }
```
Figure 8: GenEvent and EventTag Chisel code

```
1 class ComputeCntlSignals extends Bundle {
2 \cdot \cdot \cdot3 //For pipeline viewer
4 val pipeline_tag_a = new EventTag
5 val pipeline_tag_b = new EventTag
6 val pipeline_tag_d = new EventTag
7 }
```


```
1 when(io.srams.read(i).req.fire) {
2 when (read_a && a_ready) {
3 mesh_cntl_signals_q.io.enq.bits.pipeline_tag_a := GenEvent(s"SP_RD_A$i", io.srams.read(i).req
     .bits.addr, Some(EventTag(cmd.bits(preload_cmd_place).rob_id.bits)))
4 }
5 when (read_b && b_ready) {
6 mesh_cntl_signals_q.io.enq.bits.pipeline_tag_b := GenEvent(s"SP_RD_B$i", io.srams.read(i).req
     .bits.addr, Some(EventTag(cmd.bits(preload_cmd_place).rob_id.bits)))
7 }
         when (read_d && d_ready) {
9 mesh_cntl_signals_q.io.enq.bits.pipeline_tag_d := GenEvent(s"SP_RD_D$i", io.srams.read(i).req
     .bits.addr, Some(EventTag(cmd.bits(preload_cmd_place).rob_id.bits)))
10 }
\overline{11}
```
Figure 10: GenEvent tagging code for Scratchpad reads in the ExecuteController.scala. For these GenEvents, the parent tag is the rob\_id and its own event id is a uniquely generated tag

```
1 //For pipeline viewer
2 when(mesh.io.a.fire) {
     when(cntl.a_garbage) {
     4 }.elsewhen(cntl.a_unpadded_cols === 0.U) {
5 GenEvent("A_0_PAD", 0.U, Some(EventTag(cntl.rob_id.bits)))
6 }.elsewhen(cntl.im2colling) {
7 GenEvent("A_IM2COL", 0.U, Some(cntl.pipeline_tag_a))
     8 }.elsewhen(cntl.a_read_from_acc) {
       9 GenEvent("A_ACC->MESH", cntl.a_bank_acc, Some(cntl.pipeline_tag_a))
10 }.otherwise {
11 GenEvent("A_SP->MESH", cntl.a_bank, Some(cntl.pipeline_tag_a))
12 \t-313 }
14
```
Figure 11: GenEvent tags for mesh input fires corresponding to Scratchpad read GenEvents in the previous figure. The GenEvent IDs from the previous figure are passed as the parent ID in the above GenEvents to construct a causal relationship where after a row is read from the Scratchpad, it is sent into the mesh.

```
1 {
2 "event_names": ["CMD", "LOOP_CONV", "LOOP_MM", "ROB_ISSUE", "LD_ISSUE", "ST_ISSUE", "EX_ISSUE", "
     ST_RET",
                    3 "LD_RET", "EX_RET", "MESH_FIRE", "A_GARBAGE", "B_GARBAGE", "D_GARBAGE", "A_0_PAD",
     "B_0_PAD", "D_0_PAD",
                    "A_ACC->MESH", "B_ACC->MESH", "D_ACC->MESH", "A_SP->MESH", "B_SP->MESH", "D_SP->
     MESH", "ACC_WR_0",
5 "ACC_WR_1", "SP_RD_A0", "SP_RD_D1", "LOOP_MM_CMD"],
6 "event_types": ["inst_bytes", "bytes", "bytes", "inst_bytes", "bytes", "bytes", "bytes", "bytes", "
      bytes", "bytes",
7 "bytes", "bytes", "bytes", "bytes", "bytes", "bytes", "bytes", "bytes", "bytes", "
      bytes",
8 "bytes", "bytes", "bytes", "bytes", "bytes", "bytes", "bytes", "bytes"],
9 "start_stages": ["CMD", "LOOP_CONV_CMD", "LOOP_MM_CMD"],
10 "split_stages": ["CMD", "LOOP_CONV", "MM_LOOP"],
11 "end_stages": ["ST_RET", "LD_RET", "EX_RET", "ST_ISSUE", "LD_ISSUE", "LOOP_MM", "LOOP_CONV", "A_ACC
     ->MESH",
12 "B_ACC->MESH", "D_ACC->MESH", "A_SP->MESH", "B_SP->MESH", "D_SP->MESH", "A_0_PAD",
     "B_0_PAD", "D_0_PAD", "ACC_WR_0", "ACC_WR_1"]
13 }
14
```
Figure 12: Iris.py Config JSON for Gemmini Stages

|    | $_1$ ("id": "0x14", "parents": "0x0000000000000014", "cycle": "                   |  | 275062", "event_name": "MESH_FIRE", "data |
|----|-----------------------------------------------------------------------------------|--|-------------------------------------------|
|    | ": "0x15200000042018140"}                                                         |  |                                           |
|    | 2 ["id": "0x0000001d00043276", "parents": "0x0000000700043271", "cycle": "        |  | 275062", "event name": "                  |
|    | $A$ SP->MESH", "data": "0x0"}                                                     |  |                                           |
|    |                                                                                   |  | 275063", "event_name": "                  |
|    | $SP\_RD\_D1$ ", "data": " $0xfde$ "}                                              |  |                                           |
|    | 4 ["id": "0x0000002300043277", "parents": "0x000000000043272", "cycle": "         |  | 275063", "event name": "                  |
|    | $D_S$ P->MESH", "data": "0x1"}                                                    |  |                                           |
|    |                                                                                   |  | 275064" "event name": "                   |
|    | $SP_RD_A0$ ", "data": "0x002"}                                                    |  |                                           |
|    | 6 ${\lceil "id": "0x000000000043278", "parents": "0x000000000000014", "cycle": "$ |  | 275064", "event name": "                  |
|    | $SP_RD_D1"$ "data": " $0xfdd$ "}                                                  |  |                                           |
|    | 7 ["id": "0x0000001d00043278", "parents": "0x0000000700043273", "cycle": "        |  | 275064" "event name": "                   |
|    | A SP->MESH", "data"; "0x0"}                                                       |  |                                           |
|    |                                                                                   |  | 275065", "event_name": "                  |
|    | $SP_RD_A0$ ", "data": "0x003"}                                                    |  |                                           |
|    |                                                                                   |  | 275067", "event name": "                  |
|    | $SP\_RD\_D1$ ", "data": " $0xfdc$ "}                                              |  |                                           |
|    | $_{10}$ ["id": "0x000000070004327c", "parents": "0x0000000000000014", "cycle": "  |  | 275068" "event name": "                   |
|    | $SP_RD_A0$ ", "data": "0x004"}                                                    |  |                                           |
|    | n {"id": "0x000000000004327c", "parents": "0x0000000000000014", "cycle": "        |  | 275068", "event_name": "                  |
|    | $SP_RD_D1"$ "data": " $0xfdb"$ }                                                  |  |                                           |
| 12 |                                                                                   |  |                                           |

Figure 13: Snippet from VCS output from GenEvents