FPGA Results

For our final design, we were able to create an FPGA implementation of a growth transform network. The FPGA uses a fixed-point (integer) representation of data, so we had to convert from floating-point (MATLAB) to compare accuracy. There was an average of 0.3% error in the FPGA due to precision errors with bit representations. This is very reasonable and the error does not affect the spiking behavior. Therefore, we successfully met our objective of creating an accurate growth transform network in hardware.

We found that the maximum sized network that could be represented by our design in the FPGA is 256 neurons (or 512, including mirrored neurons). This is due to limitations in BRAM (block random access memory), where the FPGA only has 2088 Kb available. This was a severe limitation and is very small compared to MATLAB’s ability to simulate neural networks with tens of thousands of neurons. However, in the future we could use DRAM (dynamic RAM) on the FPGA to increase storage and potentially avoid this issue.

The FPGA consumes an estimated 129 mW of power, which is roughly 1000 times less power than a standard computer (in the range of hundreds of watts). This is a dramatic decrease and is a major benefit of our hardware implementation.

However, our hardware implementation is orders of magnitude slower than MATLAB, as seen by the chart below. We were very confused by this initially, but found two potential sources of error. The first is the nature of our algorithm using a matrix multiplication, which is cubic time. As the network size increases, the computation time for the matrix multiplication skyrockets, contributing to poor timing results.

FPGA Timing vs. MATLAB Timing for various sized neural networks (Hardware Implementation)

However, the larger problem is the difference between clock frequencies of the FPGA (50 MHz) and the computer (3.6 GHz) we tested our machine on. The computer was roughly 100 times faster than the FPGA in terms of clock frequency, meaning it could theoretically execute roughly 100 instructions for every one of the FPGA. We attempted to account for this and adjusted the timing results appropriately., as seen below. With these relative timing results we find that the FPGA is actually faster than MATLAB.

MATLAB vs FPGA computation time, adjusting for clock differences. MATLAB runs on the lab computer at a different clock frequency than the FPGA, so this graph shows an estimation of the computation times of both operations if they were using the same clock (in this case, the FPGA clock of 50MHz). Note that this graph is just an estimation, and in reality, cannot be scaled in this way. But it does demonstrate that the FPGA could be faster than MATLAB if the clock were increased, and that the clock differences between the two contributes heavily towards timing results.

We successfully created a hardware implementation which has low power consumption and promising timing results. However, there is certainly room for improvement in our design and future research. For example, creation of an application-specific integrated circuit (ASIC) would provide better speed and lower power usage.

Raspberry Pi Results

We additionally created a C representation of the network which provides lower power, as it is run on a Raspberry Pi 3. It is slower than MATLAB, but proves that the growth transform network is portable to multiple platforms.

Computational time for determining the SVM hyperplane over 100 training iterations as a function of the number of neurons in the network. Three implementations are represented: C implementation on the RPi, C implementation on a MacBook Pro laptop, and AIMLab’s MATLAB implementation on a MacBook Pro laptop. The C implementation is slower than the MATLAB implementation, even when executed on the same machine.

The accuracy of the neural network increases as the training period increases, but levels off after approximately 64 training loops. There is a tradeoff between time spent training and confidence in the model’s ability to classify data correctly.

Accuracy of correctly predicting presence of a heart condition as the SVM trains for various lengths. Accuracy does not improve appreciably beyond 64 iterations.