We’ve chosen to implement the Lucas-Kanade optical flow estimation algorithm because our review of previous work in this area suggested that this algorithm strikes a desirable balance between accuracy and speed of computation. The L-K algorithm takes a gradient-based approach to optical flow estimation. The algorithm relies on the following assumptions:

- Brightness Constancy
- Small Motion
- Small regions move together

Based on the first two assumptions, the following equation must be satisfied, at a given pixel within the input image:

Vx and Vy are the x and y components of the optical flow vector at that pixel, respectively. This is what we are trying to estimate, so naturally they are unknowns at this point. Ixand Iy, the partial derivatives with respect to x and y at the given pixel, together make up the spatial gradient. It, the partial derivative with respect to time, makes up the temporal gradient. Here, we have two unknowns, but just one equation. To solve this equation, we use the third assumption to claim that the U and V we are trying to solve for are approximately the same for all of the pixels within a small window around the pixel we are trying to solve for. Now, we can create the following system of equations:

Here, n is the number of pixels within the small window. In our design we use an 11×11 window, which contains 121 pixels. After taking the temporal and spatial gradient at each pixel in the window, we have 121 equations, but still just two unknowns. Because the system of equations is now overdetermined (more equations than unknowns), we can use the least squares approach to solve for Vx and Vy within our window. Rewriting our equations in matrix form, we have:

Now we solve for V:

After solving this for this system within a window, we have successfully computed the optical flow vector for the pixel at the center of the window. To compute the optical flow vector at the next pixel, we slide the window to the right by 1 pixel and repeat the procedure, doing this over and over again until we have covered the entire image.

This algorithm is very parallelizable, which makes it a prime candidate for hardware acceleration. For example, all of the partial derivatives within a window can be calculated at the same (there are no dependencies which prevent this). Additionally, the calculation can be divided into several separate stages: partial derivative calculation, matrix transpose and multiplication, and matrix inversion. This makes it a good fit for a pipelined architecture, which would allow us to carry out computations for multiple different windows at the same time. Further details of our implementation are given below.