Controller Diagram for MatMult (Options: SyncMem, Retimed)

Instrumentation Annotations (ArgIns:32 64 64, ArgIOs: )

x107: Sequenced (OuterControl)
25666 cycles/iter
(25666 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:111:11 - Accel {

x3095: Sequenced (OuterControl)
25655 cycles/iter
(25655 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:112:23 - Foreach(P by bp){k =>
Counter: x339

x3094: Pipelined (OuterControl)
25650 cycles/iter
(25650 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:115:58 - 'MAINPIPE.Foreach(M by bm par pM, N by bn par pN){(i,j) =>
Counter: x354

x675: ForkJoin (OuterControl)
9073 cycles/iter
(9073 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:118:20 - Parallel {
NBuf Connections

x369 (BankedSRAM "tileB_0")

x368 (BankedSRAM "tileB_0")

x370 (BankedSRAM "tileB_0")

x367 (BankedSRAM "tileB_0")

x364 (BankedSRAM "tileA_0")

x363 (BankedSRAM "tileA_0")

x366 (BankedSRAM "tileA_0")

x365 (BankedSRAM "tileA_0")

x446: ForkJoin (OuterControl)
5425 cycles/iter
(5425 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:118:20 - Parallel {

x407: Streaming (OuterControl)
1752 cycles/iter
(1752 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)
Counter: x372

x388: Sequenced (InnerControl)
18 cycles/iter
(288 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=15, II=1

Stream Info

Set(x375)----->

x406: Pipelined (InnerControl)
65 cycles/iter
(1040 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=7, II=1


Counter: x391
Stream Info

----->Set(x376)

x445: Streaming (OuterControl)
5420 cycles/iter
(5420 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)
Counter: x409

x425: Sequenced (InnerControl)
18 cycles/iter
(1152 total cycles, 64 total iters)
[64 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=15, II=1

Stream Info

Set(x412)----->

x444: Pipelined (InnerControl)
71 cycles/iter
(4592 total cycles, 64 total iters)
[64 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=7, II=1


Counter: x428
Stream Info

----->Set(x413)

x522: ForkJoin (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:118:20 - Parallel {

x483: Streaming (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)
Counter: x448

x464: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=15, II=1

Stream Info

Set(x451)----->

x482: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=7, II=1


Counter: x467
Stream Info

----->Set(x452)

x521: Streaming (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)
Counter: x485

x501: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=15, II=1

Stream Info

Set(x488)----->

x520: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=7, II=1


Counter: x504
Stream Info

----->Set(x489)

x598: ForkJoin (OuterControl)
9068 cycles/iter
(9068 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:118:20 - Parallel {

x559: Streaming (OuterControl)
5411 cycles/iter
(5411 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)
Counter: x524

x540: Sequenced (InnerControl)
18 cycles/iter
(288 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=15, II=1

Stream Info

Set(x527)----->

x558: Pipelined (InnerControl)
65 cycles/iter
(1040 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=7, II=1


Counter: x543
Stream Info

----->Set(x528)

x597: Streaming (OuterControl)
9063 cycles/iter
(9063 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)
Counter: x561

x577: Sequenced (InnerControl)
18 cycles/iter
(1152 total cycles, 64 total iters)
[64 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=15, II=1

Stream Info

Set(x564)----->

x596: Pipelined (InnerControl)
71 cycles/iter
(4592 total cycles, 64 total iters)
[64 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=7, II=1


Counter: x580
Stream Info

----->Set(x565)

x674: ForkJoin (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:118:20 - Parallel {

x635: Streaming (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)
Counter: x600

x616: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=15, II=1

Stream Info

Set(x603)----->

x634: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:119:19 - tileA load A(i::i+bm, k::k+bp)

Latency=7, II=1


Counter: x619
Stream Info

----->Set(x604)

x673: Streaming (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)
Counter: x637

x653: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=15, II=1

Stream Info

Set(x640)----->

x672: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:120:19 - tileB load B(k::k+bp, j::j+bn)

Latency=7, II=1


Counter: x656
Stream Info

----->Set(x641)

x2924: ForkJoin (OuterControl)
14108 cycles/iter
(14108 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:122:50 - Foreach(bm by 1 par pm, bn by 1 par pn){ (ii,jj) =>
NBuf Connections

x369 (BankedSRAM "tileB_0")

x368 (BankedSRAM "tileB_0")

x370 (BankedSRAM "tileB_0")

x367 (BankedSRAM "tileB_0")

x364 (BankedSRAM "tileA_0")

x346 (BankedSRAM "tileC_4")

x363 (BankedSRAM "tileA_0")

x342 (BankedSRAM "tileC_0")

x343 (BankedSRAM "tileC_1")

x349 (BankedSRAM "tileC_7")

x366 (BankedSRAM "tileA_0")

x365 (BankedSRAM "tileA_0")

x1246: Pipelined (OuterControl)
14103 cycles/iter
(14103 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:122:50 - Foreach(bm by 1 par pm, bn by 1 par pn){ (ii,jj) =>
Counter: x684

x1148: ForkJoin (OuterControl)
53 cycles/iter
(13568 total cycles, 256 total iters)
[256 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}
NBuf Connections

x702 (FF "prod_0")

x700 (FF "prod_0")

x698 (FF "prod_0")

x696 (FF "prod_0")

x820: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x708
x929: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x709
x1038: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x710
x1147: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x711
x1245: ForkJoin (OuterControl)
18 cycles/iter
(4608 total cycles, 256 total iters)
[256 iters/parent execution]


?:0:0 -
NBuf Connections

x702 (FF "prod_0")

x700 (FF "prod_0")

x698 (FF "prod_0")

x696 (FF "prod_0")

x1172: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x1196: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x1220: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x1244: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x1805: Pipelined (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:122:50 - Foreach(bm by 1 par pm, bn by 1 par pn){ (ii,jj) =>
Counter: x685

x1707: ForkJoin (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}
NBuf Connections

x1261 (FF "prod_0")

x1259 (FF "prod_0")

x1257 (FF "prod_0")

x1255 (FF "prod_0")

x1379: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1267
x1488: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1268
x1597: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1269
x1706: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1270
x1804: ForkJoin (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -
NBuf Connections

x1261 (FF "prod_0")

x1259 (FF "prod_0")

x1257 (FF "prod_0")

x1255 (FF "prod_0")

x1731: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x1755: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x1779: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x1803: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2364: Pipelined (OuterControl)
14103 cycles/iter
(14103 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:122:50 - Foreach(bm by 1 par pm, bn by 1 par pn){ (ii,jj) =>
Counter: x686

x2266: ForkJoin (OuterControl)
53 cycles/iter
(13568 total cycles, 256 total iters)
[256 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}
NBuf Connections

x1820 (FF "prod_0")

x1818 (FF "prod_0")

x1816 (FF "prod_0")

x1814 (FF "prod_0")

x1938: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1826
x2047: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1827
x2156: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1828
x2265: Pipelined (InnerControl)
48 cycles/iter
(12288 total cycles, 256 total iters)
[1 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x1829
x2363: ForkJoin (OuterControl)
18 cycles/iter
(4608 total cycles, 256 total iters)
[256 iters/parent execution]


?:0:0 -
NBuf Connections

x1820 (FF "prod_0")

x1818 (FF "prod_0")

x1816 (FF "prod_0")

x1814 (FF "prod_0")

x2290: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2314: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2338: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2362: Sequenced (InnerControl)
13 cycles/iter
(3328 total cycles, 256 total iters)
[1 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2923: Pipelined (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:122:50 - Foreach(bm by 1 par pm, bn by 1 par pn){ (ii,jj) =>
Counter: x687

x2825: ForkJoin (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}
NBuf Connections

x2379 (FF "prod_0")

x2377 (FF "prod_0")

x2375 (FF "prod_0")

x2373 (FF "prod_0")

x2497: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x2385
x2606: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x2386
x2715: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x2387
x2824: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:123:92 - val prod = Reduce(Reg[T])(bp by 1 par ip){kk => tileA(ii, kk) * tileB(kk, jj) }{_+_}

Latency=29, II=1


Counter: x2388
x2922: ForkJoin (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -
NBuf Connections

x2379 (FF "prod_0")

x2377 (FF "prod_0")

x2375 (FF "prod_0")

x2373 (FF "prod_0")

x2849: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2873: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2897: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x2921: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


?:0:0 -

Latency=10, II=4

x3093: ForkJoin (OuterControl)
2460 cycles/iter
(2460 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC
NBuf Connections

x349 (BankedSRAM "tileC_7")

x346 (BankedSRAM "tileC_4")

x343 (BankedSRAM "tileC_1")

x342 (BankedSRAM "tileC_0")

x2972: Streaming (OuterControl)
1375 cycles/iter
(1375 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC
Counter: x2929

x2949: Sequenced (InnerControl)
18 cycles/iter
(288 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=15, II=1

Stream Info

Set(x2935)----->

x2967: Pipelined (InnerControl)
75 cycles/iter
(1200 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=8, II=1


Counter: x2951
Stream Info

Set(x2936)----->

x2971: Sequenced (InnerControl)
2 cycles/iter
(32 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=1, II=1

Stream Info

----->Set(x2937)

x3012: Streaming (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC
Counter: x2930

x2989: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=15, II=1

Stream Info

Set(x2975)----->

x3007: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=8, II=1


Counter: x2991
Stream Info

Set(x2976)----->

x3011: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=1, II=1

Stream Info

----->Set(x2977)

x3052: Streaming (OuterControl)
2455 cycles/iter
(2455 total cycles, 1 total iters)
[1 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC
Counter: x2931

x3029: Sequenced (InnerControl)
18 cycles/iter
(288 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=15, II=1

Stream Info

Set(x3015)----->

x3047: Pipelined (InnerControl)
142 cycles/iter
(2279 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=8, II=1


Counter: x3031
Stream Info

Set(x3016)----->

x3051: Sequenced (InnerControl)
2 cycles/iter
(32 total cycles, 16 total iters)
[16 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=1, II=1

Stream Info

----->Set(x3017)

x3092: Streaming (OuterControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC
Counter: x2932

x3069: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=15, II=1

Stream Info

Set(x3055)----->

x3087: Pipelined (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=8, II=1


Counter: x3071
Stream Info

Set(x3056)----->

x3091: Sequenced (InnerControl)
0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]


HelloSpatial.scala:127:31 - C(i::i+bm, j::j+bn) store tileC

Latency=1, II=1

Stream Info

----->Set(x3057)

NBuf Mems

x369 (BankedSRAM "tileB_0")
lca = x3094
nBufs = 2
volume = 4096 (dims List(64, 64) + pads List(0, 0))
nBufs*volume = 8192
nBanks = List(8), a = List(1, 4), p = List(4, 2)
has XBarR, has XBarW

x368 (BankedSRAM "tileB_0")
lca = x3094
nBufs = 2
volume = 4096 (dims List(64, 64) + pads List(0, 0))
nBufs*volume = 8192
nBanks = List(8), a = List(1, 4), p = List(4, 2)
has XBarR, has XBarW

x370 (BankedSRAM "tileB_0")
lca = x3094
nBufs = 2
volume = 4096 (dims List(64, 64) + pads List(0, 0))
nBufs*volume = 8192
nBanks = List(8), a = List(1, 4), p = List(4, 2)
has XBarR, has XBarW

x367 (BankedSRAM "tileB_0")
lca = x3094
nBufs = 2
volume = 4096 (dims List(64, 64) + pads List(0, 0))
nBufs*volume = 8192
nBanks = List(8), a = List(1, 4), p = List(4, 2)
has XBarR, has XBarW

x364 (BankedSRAM "tileA_0")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 4), a = List(1, 1), p = List(2, 4)
has XBarR, has XBarW

x346 (BankedSRAM "tileC_4")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x363 (BankedSRAM "tileA_0")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 4), a = List(1, 1), p = List(2, 4)
has XBarR, has XBarW

x342 (BankedSRAM "tileC_0")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x343 (BankedSRAM "tileC_1")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x349 (BankedSRAM "tileC_7")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x366 (BankedSRAM "tileA_0")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 4), a = List(1, 1), p = List(2, 4)
has XBarR, has XBarW

x365 (BankedSRAM "tileA_0")
lca = x3094
nBufs = 2
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBufs*volume = 2048
nBanks = List(2, 4), a = List(1, 1), p = List(2, 4)
has XBarR, has XBarW

x702 (FF "prod_0")
lca = x1246
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x700 (FF "prod_0")
lca = x1246
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x698 (FF "prod_0")
lca = x1246
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x696 (FF "prod_0")
lca = x1246
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2379 (FF "prod_0")
lca = x2923
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2377 (FF "prod_0")
lca = x2923
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2375 (FF "prod_0")
lca = x2923
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2373 (FF "prod_0")
lca = x2923
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1820 (FF "prod_0")
lca = x2364
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1818 (FF "prod_0")
lca = x2364
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1816 (FF "prod_0")
lca = x2364
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1814 (FF "prod_0")
lca = x2364
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1261 (FF "prod_0")
lca = x1805
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1259 (FF "prod_0")
lca = x1805
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1257 (FF "prod_0")
lca = x1805
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1255 (FF "prod_0")
lca = x1805
nBufs = 2
volume = 1 (dims List() + pads List())
nBufs*volume = 2
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

Single-Buffered Mems

x348 (BankedSRAM "tileC_6")
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x347 (BankedSRAM "tileC_5")
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x345 (BankedSRAM "tileC_3")
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x344 (BankedSRAM "tileC_2")
volume = 1024 (dims List(16, 64) + pads List(0, 0))
nBanks = List(2, 2), a = List(1, 1), p = List(2, 2)
has XBarR, has XBarW

x1815 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2374 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x697 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1262 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x703 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1821 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2380 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1819 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2378 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1260 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x701 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x699 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1258 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1817 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x2376 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

x1256 (FF "prod_1")
volume = 1 (dims List() + pads List())
nBanks = List(1), a = List(), p = List()
has XBarR, has XBarW

Instrumentation Guide

Pipelined (OuterControl)

View Sample Waveform:


    
Sample Waveform:

  parent_en ____|```````````````````````````````````````````````````````````````````````````````````````````````````````````````````|__

parent_done ________________________________________________________________________________________________________________________|__ 5 cycles / iter <-----> child0_en ____|`````|___|`````|__________|`````|__________|`````|__________|`````|___________________________________________________
child1_en ______________|`````````````|__|`````````````|__|`````````````|__|`````````````|__|`````````````|__________________________
child2_en _______________________________|``|_____________|``|_____________|``|_____________|``|_____________|``|____________________
child3_en ________________________________________________|```````|________|```````|________|```````|________|```````|__|```````|____
child0_done __________|_________|________________|________________|________________|___________________________________________________
child1_done ____________________________|________________|________________|________________|________________|__________________________
child2_done __________________________________|________________|________________|________________|________________|____________________
child3_done ________________________________________________________|________________|________________|________________|__________|____ ^ ^ ^ ^ ^ |________________|________________|________________|__________| | 5 iters/parent execution
View Sample Tree:

Parent - 64 cycles/iter
(115 total cycles, 1 total iters)
[# iters/parent execution]

Counter:

Child0 - 5 cycles/iter
(25 total cycles, 5 total iters)
[5 iters/parent execution]

Counter:
Child1 - 13 cycles/iter
(65 total cycles, 5 total iters)
[5 iters/parent execution]

Counter:
Child2 - 2 cycles/iter
(10 total cycles, 5 total iters)
[5 iters/parent execution]

Counter:
Child3 - 7 cycles/iter
(35 total cycles, 5 total iters)
[5 iters/parent execution]

Counter:
Sequenced (OuterControl)

View Sample Waveform:


    
Sample Waveform:

  parent_en ____|```````````````````````````````````````````````````````````````````````````````````````````|_

parent_done ________________________________________________________________________________________________|_ 5 cycles / iter <-----> child0_en ____|`````|________________________|`````|________________________|`````|_________________________
child1_en _____________|`````````````|_______________|`````````````|________________|`````````````|_________
child2_en ______________________________|``|__________________________|``|__________________________|``|____
child0_done __________|______________________________|______________________________|_________________________
child1_done ____________________________|____________________________|______________________________|_________
child2_done _________________________________|_____________________________|_____________________________|____ ^ ^ ^ |_____________________________|_____________________________| | 3 iters/parent execution
View Sample Tree:

Parent - 91 cycles/iter
(91 total cycles, 1 total iters)
[# iters/parent execution]

Counter:

Child0 - 5 cycles/iter
(15 total cycles, 3 total iters)
[3 iters/parent execution]

Counter:
Child1 - 13 cycles/iter
(39 total cycles, 3 total iters)
[3 iters/parent execution]

Counter:
Child2 - 2 cycles/iter
(6 total cycles, 3 total iters)
[3 iters/parent execution]

Counter:
ForkJoin (OuterControl)

View Sample Waveform:


    
Sample Waveform:

  parent_en ____|````````````````|____|````````````````|____|````````````````|_____

parent_done _____________________|_____________________|_____________________|_____ 5 cycles / iter <-----> child0_en ____|`````|________________|`````|_______________|`````|__________
child1_en ____|`````````````|________|`````````````|_______|`````````````|__
child2_en ____|``|___________________|``|__________________|``|_____________
child0_done __________|______________________|_____________________|__________
child1_done __________________|______________________|_____________________|__
child2_done _______|______________________|_____________________|_____________ ^ | | 1 iters/parent execution
View Sample Tree:

Parent - 16 cycles/iter
(48 total cycles, 3 total iters)
[# iters/parent execution]

Counter:

Child0 - 5 cycles/iter
(15 total cycles, 3 total iters)
[1 iters/parent execution]

Counter:
Child1 - 13 cycles/iter
(39 total cycles, 3 total iters)
[1 iters/parent execution]

Counter:
Child2 - 2 cycles/iter
(6 total cycles, 3 total iters)
[1 iters/parent execution]

Counter:
Fork (OuterControl)

View Sample Waveform:


    
Sample Waveform:

  parent_en ____|````````````````|____|````````````````|____|````````|_____________

parent_done _____________________|_____________________|_____________|_____________ 5 cycles / iter <-----> child0_en _________________________________________________|`````|__________
child1_en ____|`````````````|________|`````````````|________________________
child2_en __________________________________________________________________
child0_done _______________________________________________________|__________
child1_done __________________|______________________|________________________
child2_done __________________________________________________________________ ^ | | 1 iters/parent execution
View Sample Tree:

Parent - 13 cycles/iter
(40 total cycles, 3 total iters)
[# iters/parent execution]

Counter:

Child0 - 5 cycles/iter
(5 total cycles, 1 total iters)
[1 iters/parent execution]

Counter:
Child1 - 13 cycles/iter
(26 total cycles, 2 total iters)
[1 iters/parent execution]

Counter:
Child2 - 0 cycles/iter
(0 total cycles, 0 total iters)
[0 iters/parent execution]

Counter:
Streaming (OuterControl)

TBD (Complicated to show)
Think of this as a ForkJoin but the enables are only valid when all of a given stage's input/output streams are valid/ready, respectively. The Counters for the Streaming controller are duplicated so that each child has its own copy that runs independently of other siblings.
*** (InnerControl)

View Sample Waveform:


    
Sample Waveform:

  ctrl_en            ____|```````````````````````````|____|```````````````````````````|___________

                              27 cycles / iter
                
                          <------------------------->                                                                


ctrl_datapath ____|`|____|`|____|`|________________|`|____|`|____|`|_______________________ II = 5 <---->
datapath (retimed) _____\______\______\__________________\______\______\________________________
________\______\______\__________________\______\______\_____________________
___________\______\______\__________________\______\______\__________________
______________\______\______\__________________\______\______\_______________
_________________\______\______\__________________\______\______\____________
Latency = 12 <------------>
ctrl_done _________________________________|________________________________|__________
View Sample Tree:

Ctrl - 27 cycles/iter
(54 total cycles, 2 total iters)
[# iters/parent execution]

Counter:

Latency=12, II=5


cycles/iter ~= Counter * II + latency