### Step 1: Data Collection and Filtering

Based on pick_transactions and loadout table, we carefully reviewed each field and selected a list of fields containing necessary information and generated a target table shown below.

Since the database is so massive, we limited our test data with 4 hub warehouse  – Mira Loma, CA, Allentown, PA, Geneva, IL and Sunnyvale, TX – and used shipment information from a specific week.

Throughout the procedure, we used California and Pennsylvania pairing dataset with clusters shipping from 1/2/17 to 1/6/17. Following is the SQL code to generate the target table.

### Step 2: Distance Computation

After properly filtering fields, we used Google Maps Distance Matrix API to compute distance between warehouse and destinations as well as between destinations. Google Maps Distance Matrix API takes in a list of origin addresses and another list of destination addresses and compute distance and duration of a shortest path. A snippet of API response is shown below.

### Step 3: Optimization and Load Consolidation

Loads that are meant to go to the same general area can be consolidated on to one truck to save driving distance. Because any classification algorithm could be used to verify whether two destinations are in “the same general area,” we notated the classification of any given point x as Class(x); two points x and y would be notated the same if Class(x) = Class(y). We tried using two different classification algorithms in this project: we first classifying points by state, and we also used a k-means clustering algorithm with k=4. A bin-packing formulation can be used to represent this problem:

### Step 4: Python program with Backhauls and Optimize Truck Routing

Once these loads have been consolidated, each new packed truckload is ran through a routing algorithm which determines the optimal route in which each of the orders should be delivered. This program was created using the Vehicle Routing Problem from Google OR-Tools package. A distance matrix is created, calculating the distance between each of the delivery destinations and the the home warehouse. This matrix is then ran through the solver and prints out the route for each vehicle and the total distance it takes.

Finally after the orders leaving the origin and the orders leaving the destination have both been consolidated, the two lists are run through the backhaul problem in order to pair the origin trucks with consolidated loads leaving from the destination cluster.

Following is a snippet of our python code: