Strength and the limitation of the methods

For our testing on the CPU, we can easily see the impact of scale and power-law on our instruction mix, memory footprint, and speed up result. However, we still can not explain some of the variances in the instruction mix result for different graphs. Since the graph algorithm’s performance is impacted heavily by the structure of the graph, only considering the scale and power-law may not be enough for us to fully investigate the property of each kernel. Therefore, we suggest that more graphs need to be tested in future research with more factors like the average degree and the number of edges.

For our test on the HLS simulator, due to the extremely long simulation time, we are not able to effectively collect data for larger-scale graphs. As a result, we can not see the impact of scale and power-law on our HLS performance. In future research, we suggest that we run all of these simulations on a powerful server or we run our test on a real FPGA board. By doing so, it can save us much more time during the data collection process.

Significance of our programming model and results

During our pragmas test, we can clearly see the impact of the memory bottleneck on the performance of our algorithm in FPGA. Due to the reason that the graph algorithm has irregular access in the memory, we can not simply partition the array to meet the throughput in our code. Therefore, we believe that unrolling the loops might not be the best strategy to improve their performance on an FPGA, and we should consider using loop pipelines with less memory access for our HLS kernel pragmas. The problem with using pipeline in the GAS programming model is that we can not pipeline the loop gather because we can not determine the number of iterations for loop apply in each node. Since the loop gather is the most computation-heavy step for each algorithm, our pipeline strategy is still not the optimal. Therefore, in future research, we may consider changing our programming model to select the best model in order to improve the performance.

Future Goals

  • Implement more graphs with different factors like average degree.
  • Run on a real FPGA board for efficient simulation.
  • Implement a more HLS-friendly programming model for pipelining.