Step 1: MySQL Database

For this project, our advisor Alex Zhang provided us MySQL Database containing order, trucking, and shipment information from Lineage Logistics’ warehouse management system (WMS). This WMS is not controlled by Lineage Logistics.

Database Tables and Description

  • pick_transactions: Contains order and shipment data for each transaction
  • loadout: Contains out bounding trucking data for each transaction

The database contains over 2 million rows from more than 100 facilities across the US. Instead of computing distances and routes for all of the warehouses and destinations in 50 states, we simplified our problem by choosing hub warehouses and assign a cluster to each state then pair them.

Step 2: Hub Warehouses

Based on the data, we chose 4 hub warehouse which has the most transactions:

  • Mira Loma, CA
  • Allentown, PA
  • Geneva, IL
  • Sunnyvale, TX

Why are we using Hub Warehouses?

Because of how Lineage’s network is set up, there are a few very well-trafficked warehouses. By using these warehouses as hubs that trucks can use to get backhauls, we can extremely quickly find a route for any given truck to use as a backhaul when delivering goods.

Step 3: State Clusters

We ran a K-means unsupervised clustering algorithm on the 4000 destination data points. We investigated using four “hub warehouses,” and to test out whether the four “hubs” were feasible, we used k=4 when clustering. This allowed us to see whether or not the hubs accurately represented the clusters already present in the data.

The Python methods we used to run the k-means algorithm used Elkan’s algorithm to determine the clusters. Because only addresses were stored, we used the Google Maps API to get locations.

Following figure shows the clusters and hub warehouse (marked with red x) associated with each cluster.

Based on clustering, following table shows which states associate with each clusters.

Cluster #

Home Warehouse

Associated States

0

Mira Loma, CA

AZ, BC, CA, ID, MT, MX, NV, OR, TI, UT, WA

1

Geneva, IL

AL, GA, IA, IL, IN, KY, MI, MN, MO, MS, ND, OH, SD, TN, WI

2

Allentown, PA

CT, DE, FL, MA, MD, ME, NC, NH, NJ, NY, OH, PA, SC, VA, VT, WV

3

Sunnyvale, TX

AR, CO, KS, LA, MO, NE, NM, OK, TE, TX, WY

Step 4: Backhauls and State-cluster pairs

After properly filtering fields, we used Google Maps Distance Matrix API to compute distance between warehouse and destinations as well as between destinations.

In terms of backhauls, now we are pairing up two warehouse-destination. Following figures demonstrate the concept of backhauls using California and Pennsylvania pair, which not only decreases total driving distances but also the number of trucks used.

Base Case

  • Used 3 trucks
  • Total distance = 2*Dist 1+2*Dist 2+2*Dist 3

Backhauls

  • Used 2 trucks
  • Total distance = Dist 1+Dist 2+Dist 3+Dist 4+2*Dist 5