This section delivers an overview of the project’s system for stock movement predictions within both the Linear Regression method and CatBoost method. Topics covered below:
- System Overview
- Feature Selection
- Data Acquisition
- Trading Strategy
System Overview
This diagram outlines our process for calculating predictions. The upper section (3, 4a, and 4b) correspond to the CatBoost method, and the lower section (5, 6, and 7) correspond to the Linear Regression.
Feature Selection
Data Acquisition
The project’s data, including dates and all the aforementioned features (see Feature Selection), was pulled from Bloomberg Terminals – computers connected to a large network of historical financial data. Due to download limits on the machines, this process took many days. By the end of the process, we had all available data from January 3, 2017 until September 6, 2019. Due to the availability of some data, we were only able to acquire data from 24 of the 30 DJIA stocks. The stocks that were excluded from this project were Boeing, Dow Chemical, Goldman Sachs, Home Depot, JPM, and McDonald’s.
After all data was pulled from the Terminals, we used a VBA macro to automatically generate 3 CSV files per stock analyzed. These 3 CSV files were training data, testing data, and labels. The training data consisted of the values for all the features on each date for the stock. The labels are 1s and 0s, where 1 indicates that 5 days from that date, the stock price went up while a 0 indicates that 5 days from that date the stock price went down. The testing data was in the same format as the training data, but it did not include labels because the goal of the algorithm is to predict labels for the testing data. We trained the model on the training data and labels and then output the predictive labels for the testing data, these labels were then fed back into a master excel file, and another VBA macro was run to get analytics (such as the accuracy breakdown). If implemented day-to-day (for investing purposes), the output would simply be a 1 or a 0 for each stock. This result is the determination of whether to invest or not.
Trading Strategy
The results of the CatBoost method can be found on the CatBoost page. In short, however, we determined that the Machine Learning algorithm was correct more often when predicting upwards movement than it was when predicting downward movement. Our trading strategy is designed to capitalize on this fact. When the method predicts an upward movement, we invest a fixed amount into all 24 stocks. When the predicted day arrives, we sell. An analysis of the profitability of this method can be found on the Results page.