Discussion and Deliverables | Data-Driven Weekly Forecasting of Dow Jones Stocks Using Machine Learning

Discussion

Buy and Hold outperformed the algorithm over the period, but was much more volatile. The Buy and Hold strategy had no protection from market movement (as it’s 100% market influenced). As such, it was susceptible to very large negative or positive changes. Our algorithm generally trended upward but never generated significantly positive returns. Taking in the significant difference in volatility, our algorithm performed better when looking at return per unit of risk (measured by sharpe ratio).
Our algorithm, as aforementioned, generally trends upward, and even when it does trend downward it is not for a prolonged period. As such, our algorithm is much more reliable for generating consistent positive return day per day (though not large returns). If such a product is more desirable for an investor’s financial needs, our algorithm performs better than buy and hold.
The linear regression strategy performed abysmally, however this was expected. As mentioned under “Results”, the overall performance of the linear regression method was right about 50%, regarddless of whether is was the 1 or 5 day method. With total return of -1.09%, the linear regression is greatly outdone by the CatBoost method.

Overall, our algorithm proved accurate in predicting upward movements of stocks. As such, our trading strategy of trading on upward predictions provided positive return over our testing period. While our algorithm was outperformed by a simple buy and hold strategy of those stocks, we had significantly less risk because we only used a fraction of our total capital each day (as opposed to the whole amount used in a buy and hold strategy), and in general was less susceptible to large market movements. In fact, per unit of risk, we returned better than buy and hold, and thus appeal to more risk-averse investors who prefer a low-risk positive returning investment strategy.

Deliverables

Data cleaning Excel platform:
- VBA program that lets us quickly create labels, training data, testing data csv files
Data analysis Excel platform:
- VBA program that automatically generates accuracy results and creates graph (aggregate results graph)
Algorithm:
- Modified Catboost code that generates directional movement predictions for stocks
Proposed trading strategy:
- Trade on model up predictions
Written final report
Website

Next Steps & Improvements

Next Steps:

- Implementation – The only foolproof method to determining the validity of our model is to implement the strategy on the real market or in a paper trading simulator. Given more time, we would put our model to the test.
- The Model on Individual Stocks Outside the DJIA – We believe that the model may work better for some stocks that others, due to the factors that impact individual stocks and the algorithm’s learning methods. It would be interesting (and perhaps profitable) to test the CatBoost method on individual stocks outside the DJIA in order to determine the model’s strengths and weaknesses from other inputs. This would require adjustments to our current list of factors.

Improvements:

- Data Collection – We pulled 677 points of data for each stock, amounting to just under 2 years. In another iteration of this project, we would have pulled more data in order to have a larger testing sample. The issue here is that due to intrinsic changes in the market, data from 7-10 years ago may not necessarily follow the same patterns as that of the last 2-4 years. This leads to questions of where to draw the line between data that benefits the model and data that incorrectly skews our results.
- Testing Data Variability – In our results, we mention that our model is most successful when predicting up movement. However, our only set of testing data covered a set of days in which the Dow Jones’s prices increased. We believe that this may have impacted our results negatively – perhaps with a testing set that better represented all market conditions, we would have better results.
- Analysis of Results – Perhaps with a larger testing set, we would have been better able to analyze our results. The selling point of our trading strategy and model is that it requires a low amount of capital and experiences low volatility. Since we only have enough data to test on one 3 month period, it is difficult to depict the real, long term benefits our strategy has in regards to volatility and low-risk trading. More testing data would have solved this issue.