For our final design, we were able to create an FPGA implementation of a growth transform network. The FPGA uses a fixed-point (integer) representation of data, so we had to convert from floating-point (MATLAB) to compare accuracy. There was an average of 0.3% error in the FPGA due to precision errors with bit representations. This is very reasonable and the error does not affect the spiking behavior. Therefore, we successfully met our objective of creating an accurate growth transform network in hardware.
We found that the maximum sized network that could be represented by our design in the FPGA is 256 neurons (or 512, including mirrored neurons). This is due to limitations in BRAM (block random access memory), where the FPGA only has 2088 Kb available. This was a severe limitation and is very small compared to MATLAB’s ability to simulate neural networks with tens of thousands of neurons. However, in the future we could use DRAM (dynamic RAM) on the FPGA to increase storage and potentially avoid this issue.
The FPGA consumes an estimated 129 mW of power, which is roughly 1000 times less power than a standard computer (in the range of hundreds of watts). This is a dramatic decrease and is a major benefit of our hardware implementation.
However, our hardware implementation is orders of magnitude slower than MATLAB, as seen by the chart below. We were very confused by this initially, but found two potential sources of error. The first is the nature of our algorithm using a matrix multiplication, which is cubic time. As the network size increases, the computation time for the matrix multiplication skyrockets, contributing to poor timing results.
However, the larger problem is the difference between clock frequencies of the FPGA (50 MHz) and the computer (3.6 GHz) we tested our machine on. The computer was roughly 100 times faster than the FPGA in terms of clock frequency, meaning it could theoretically execute roughly 100 instructions for every one of the FPGA. We attempted to account for this and adjusted the timing results appropriately., as seen below. With these relative timing results we find that the FPGA is actually faster than MATLAB.
We successfully created a hardware implementation which has low power consumption and promising timing results. However, there is certainly room for improvement in our design and future research. For example, creation of an application-specific integrated circuit (ASIC) would provide better speed and lower power usage.
Raspberry Pi Results
We additionally created a C representation of the network which provides lower power, as it is run on a Raspberry Pi 3. It is slower than MATLAB, but proves that the growth transform network is portable to multiple platforms.
The accuracy of the neural network increases as the training period increases, but levels off after approximately 64 training loops. There is a tradeoff between time spent training and confidence in the model’s ability to classify data correctly.