Match Factor

Our first algorithm was based upon an idea that we learned about in a paper titled  “Collisions, Combinations and Contagions: A Novel Approach to Privacy Preservation in Contact Tracing.” This research paper proposed the idea of a match factor: a numerical value that represents the extent to which an interaction could be deemed a valid exposure, should one individual produce a positive test result. In practice, based on the specific qualities of the virus in discussion, there would be a determined threshold for the match factor based upon distance and time that would signal a true exposure (i.e. ~15 minutes at 6 ft. apart for COVID-19).

Mathematically, the formula for the match factor through the lens of Wi-Fi footprint data is fairly simple; the equation revolves around the idea that the closer that two individuals are to one another, the more access points should be overlapping in their respective Wi-Fi scans. Hence, the formula works so that as two people achieve closer proximity to one another, their match factor should increase.

Equation for the match factor produced in our first algorithm.

Modified Match Factor (Individual)

In addition to this general match factor, we also implemented an algorithm to produce a modified match factor, which is meant to account for a necessary weighting, referring to the situation in which there are very few access points in range of the scan (i.e. an open space, outdoors, etc.). This weighting is essential, since the number of access points surrounding two individuals should not alter the inherent value of the match factor representing their interaction. This modified match factor was also mentioned in the research paper as discussed above.

Equation for the modified match factor, which incorporates the value of the simple match factor.

Modified Match Factor (Universal)

The results for our first two implementations of the match factor were not very consistent, leading us to pursue our own modifications to the suggested formulas. We decided to create a version of the match factor that does not require the definition of a primary and secondary user. When these roles are assigned, according the formulas for the previous match factors, the results will be different depending on who is titled as primary. Rather than taking this approach, we believed that having a formula that would produce a universal match factor with respect to the interaction as a whole would be more logical, since an exposure relies on the positioning of both individuals equally.

This equation for the universal match factor represents the intersection of access points between users A and B divided by the total number of access points seen across both users.

Cosine Similarity

Another method that we used to produce a numerical value that represents the level of concern in regards to an interaction is a cosine similarity algorithm. Cosine similarity is defined as “a measure of similarity between two non-zero vectors of an inner product space.” Specifically, this measures the cosine of the angle between these two vectors and indicates where two vectors fall along the scale of overall similarity. This concept can be applied to numerous concepts, and our approach was to treat the values of RSSI gathered by two users as two individual vectors, allowing us to find how similar the strengths were amongst access points seen by these individuals. Accordingly, the higher the level of similarity indicated by this result, the more likely it is that these two individuals have reached a threshold level of exposure. Using the formula for the dot product and isolating the cosine function, we replicated this logic for all respective access points scanned by the individuals in question.

Dot product formula, leading to cosine similarity algorithm.