Ensemble Analysis

One major point of interest within this project was to not only explore frameworks for design- ing equitable metro stop placement, but also to better understand the effect this placement has on equity outcomes. Although we suspected otherwise, it could be the case that equity outcomes are actually insensitive to the placement of metro stops. To contextualize the im- pact of metro stop placement as a system design choice, we turned to Ensemble Analysis. Introduced to us by Dr. Wormleighton in ESE 3090: Design and Analysis of Social Choice Systems, Ensemble Analysis is a method for assessing the distribution of large possibility spaces. In our case, placement of metro stops constitutes a nearly infinite possibility space. To sample from this space, we calculated nearly a million trials in which 14 stops were placed according to a uniform random distribution within a bounding box around St. Louis. For each of these trials, the DNS metric was computed and then the resulting distribution of these metrics across the sample of the entire possibility space was visualized and assessed.

Since the described process characterized the most general possibility space, that of all of St. Louis, we also performed two more Ensemble Analyses which accounted for other, more specific situations. First, we fixed the longitude of the randomly placed stops to be at the middle of the city and only allowed for random placement along the resulting North-South line. For the final Ensemble Analysis, we sought a way of characterizing the possibility space of maps generated from network-based methods, like modularity maximization, since these are qualitatively different than our other spatial methods. To generate a network ensemble, we used the package gerrychain, which has been developed to study Congressional Gerrymandering. Using the gerrychain package, we began with the grouped graph object created by modularity maximization and performed ’recombination,’ in which a Markov chain is used to iteratively and randomly recombine the groupings of nodes within the network. Each regrouping of nodes was treated as a new network plan and was evaluated with the DNS metric to produce a network ensemble.

Ensemble Analysis Results

The three distributions for the three previous described Ensemble Analyses are shown in the histograms below. In each case, the x-axis represents the DNS metric (in miles) and the y-axis represents the frequency of occurrence.

Figure 1
Figure 2
Figure 3

The results of the ensemble analysis show that the placement of metro stops has the capacity for profound equity implications. The range of the distribution in Fig. 1 spans values of approximately one to two miles. The difference between having to walk one mile to access the metro or two is incredibly meaningful and could be the difference between whether or not the metro is a viable option for someone to reach their place of work. Further, we can compare the results of the evaluation metric to the histograms to assess how the Bi-State and algorithmic plans fall into the broader possibility space. The Bi-State proposed plan had DNS value of about 2.6, which places it at the extreme upper end of the distribution in Fig. 1. On the other hand, the Linear Programming plan had a DNS of roughly 1.2, which was improved upon by an increment of about 0.02 by K-Means and again by Modularity Maximization, which performed the best. This places all three algorithmic methods on the extreme low end of the distribution in Fig. 1. Similarly, when the longitude was fixed in the middle of the city, as shown in Fig. 2, the DNS metric was more concentrated and ranged from about 1.4 to 2.0. The network ensemble was even more further concentrated, shown in Fig. 3, spanning values from roughly 0.75 to 1.0. Each of the distributions had an approximate bell shape, with varying strengths of upper tails.

The most significant finding from the evaluation metric and ensemble analysis is that there are relatively small design choices which can dramatically improve equity outcomes. From the performance of the Bi-State plan at 2.6 to Modularity Maximization at 0.79, there is a great deal of in-bewteen in which equity improvements can be made while still respecting practical constraints. This is evidenced by the Linear Programming results, which reasonably approximate a realistic map but still yield significant improvements over the Bi-State plan with regard to the evaluation metric. The distribution in Fig. 2 shows that enforcing a North-South line already shifts outcomes upward in terms of the equity metric, unavoidably eliminating the most equitable outcomes, but still leaves room for improvement. Conversely, the distribution of Fig. 3 shows that any network-based map will have highly equitable outcomes within a fairly narrow range, which reflects the way network-based maps implicitly span a highly dispersed, and therefore impractical, swathe of the city.

Vector Overlay Analysis

The city of St Louis has published four datasets identifying regions of transit need through their open data portal. These districts, Neighborhood Revitalization Strategy Areas, Special Business Districts, Community Improvement Districts, and Transportation Development Districts, mapped below. Each generated plan was overlaid on top of this map, with a half-mile buffer plotted around each stop. Percent coverage for each stop over development areas was computed.

St Louis Development Districts
Results of Overlay Analysis

Vector overlay is mapped for the plans generated using geometric centers below, with its numerical results displayed in a subsequent table.

Visual overlay results for geometric centers.
Numerical overlay results for geometric centers.

Similarly, vector overlay is mapped for the plans generated using population centers below, with its numerical results displayed in a subsequent table.

Visual overlay results for population centers.
Numerical overlay results for population centers.

Buffer overlay analysis shows that, under both center settings, weighted K-means and linear programming perform the best. This is unsurprising, this metric is purely equity-based and weighted K-means is considered to be an ideal plan from an equity perspective. It was gratifying that linear programming did well; linear programming is the most logistically realistic of the plans because it was constrained with the spatial layout in mind, and this analysis showed that sufficient coverage is possible under a realistic line layout. It was unsurprising that modularity maximization did not do well by this metric, as it is based on coverage of fixed areas distributed across the city and this distance-based algorithm produced a grid-like plan. This metric served a way to evaluate a plan based purely on whether or not it served areas that have been explicitly labeled by St Louis city as having developmental need. These datasets are not a comprehensive summary of areas that have a need for transit in St Louis. That cannot be achieved without a sophisticated knowledge of the city or the use socioeconomic datasets with a much higher spatial resolution than census tracts. Additionally, as it is a simple overlay analysis, it rewards plans that are spatially compressed. The proposed metro plan was able to perform well by the numeric coverage metric, but the maps show that this is because all of its stops were compressed in one area that had two large development districts, while the other plans spanned more of the city, covering much more area and overlapping with many different types of development districts. Overall, this metric can be useful in internally visualizing and validating how well St Louis city is meeting the needs of districts that its own government has identified.