Optimization Algorithms | Accessibility Audit of St. Louis Metrolink via Systems Engineering Methods

Problem Formulation

The county of St. Louis City is comprised of 104 census tracts, which were used as the geographical unit for this analysis. Initially, we represented each tract by the latitude and longitude of its geometric center, but this dataset introduced limitations: for instance, the center of a non-convex tract is not guaranteed to reside within the tract. Additionally, these points aren’t tied to population density and wouldn’t necessarily indicate the highest points of need within each tract. Thus, population was introduced as a new measure of center. A 1km population density grid was used to calculate a center for each tract weighted on population.

Census tracts with their geometric and population centers as reference points

Socioeconomic Weights

We constructed three additional variables to use throughout our project as a way to examine equity. We call these variables socioeconomic “weights” because they all sum to one over all 104 census tracts and are used to weight various averages throughout the project. The weights were constructed using the following three equations:

The distribution of the three weights across St. Louis City is shown below.

Socioeconomic weights mapped across St Louis City by census tract

We acknowledge that these weights are not the only way to measure equity across the county. There are many other ways to conceptualize equity, as well as many other ways to create transit, race, and income weights specifically. The weights were one of our biggest considerations throughout the project, and we edited their formulation many times. A better conceptualization would have many more weights constructed in many different ways, but for the sake of this project, we stuck with these three. The frameworks described in the following sections are compatible with any new weights that we may construct in the future.

K-Means

Our first method was a modification of the K-Means clustering algorithm. Our modified K-Means approach uses weighted averages based on our socioeconomic weights. We found that using any of the three weights led to very similar results, as they were heavily correlated, so for our final method, we used only our race weight. After clustering the census tracts into 14 clusters, our resulting plan was constructed by placing one metro stop at the geometric center of each cluster.

Linear Programming

As a second method for algorithmic metro stop placement, we formulated a linear programming problem. Linear programming was of particular interest due to its flexibility via penalty terms, which allowed us to induce desired characteristics in the resulting maps. The linear programming problem was formulated as shown below.

Modularity Maximization

For our third optimization method, we wanted to introduce a new data structure. To accomplish this, we created a graph representing the county of St. Louis City. Each node in the graph represented a census tract, and nodes were connected by an edge if they were physically adjacent. Edge weights were computed for the resulting dual graph as walking distance in meters between each node pair. This was done using the Distance Matrix Python API by Google Maps. Then, we inverted these values so that edge weights would represent the inverse of the walking distance between the geometric centers of two census tracts.

Modularity Maximization is a community detection algorithm that divides nodes in a graph into clusters. The output of this algorithm is the initial graph partitioned into a given number of communities. Like our K-Means method, our final plan is constructed by placing one metro stop at the center of each community.