Discussion and Conclusions

The objective of our project was to build a proof-of-concept system that could predict some future weather patterns at level of accuracy on par with existing physical models. We elected to use machine learning for its aptitude at creating regressions without knowledge of the underlying rules of the data. Our models were built using free data and personal computers.


Post-tuning, our models achieved better performance than current physical models interpreted by meteorologists. The improvement was at such a level that we feel confident claiming that machine learning can reach at least parity with existing methods at a fraction of the cost.


As they stand, our results do not have a lot of value. However, they imply significant avenues for future research, proving that massive potential exists in pursuing machine learning as a weather prediction method. These results are unique because existing research focuses on classifying holistic weather systems like hurricanes. Our research brings weather forecasting data science into the realm of the everyday, showing potential for producing the temperature and humidity predictions people look to their phones for every day.


A particular strength of our methodology is modularity. Though we chose to use 1 and 7 day intervals and maximum and minimum dry bulb temperature for St. Louis only as our values, these targets are entirely configurable. Our software can easily be modified to predict any continuous trait of the weather, and can predict multiple locations as long as the appropriate training data are uploaded to the database. Such strengths are not inherent in existing physical models.


Weaknesses of our technique are primarily technical and would be solved by further research. The areas of further growth for this project would be:


  1. Getting more detailed data for baseline forecast accuracy in order to be able to compare performance on spread as well as absolute error
  2. Developing the software to scale better with increased data – as it stands there are several sections of the algorithm that risk consuming too much memory
  3. Further testing of model efficacy with changes in location and time period


We conclude that this proof-of-concept project was successful and indicates potential for future research and value.