This section delivers an overview of the project’s system for stock movement predictions within both the Linear Regression method and CatBoost method. Topics covered below:

  • System Overview
  • Feature Selection
  • Data Acquisition
  • Trading Strategy

System Overview

This diagram outlines our process for calculating predictions. The upper section (3, 4a, and 4b) correspond to the CatBoost method, and the lower section (5, 6, and 7) correspond to the Linear Regression.

Feature Selection

Close Price

Close price is the return of a stock at the end of a trading day. This feature is necessary to determine the 5 day label, and directly relates to a stock’s value.

Open Price

Open price is a stock’s return at the beginning of a trading day. It is also used to find the define the 5 day label, and will directly impact a stock’s value.

Adjusted Beta

Adjusted Beta measures the systematic risk of a stock (excluding market risk). Its value is based off the S&P 500, which has been included as another feature.

Price-to-Earnings Ratio

The P/E Ratio is determined by dividing a stock’s current share price by the earnings per share. This feature is used to compare a company against its own historical record and tells us about a company’s intrinsic value.

Price-to-Sales Ratio

The P/S Ratio utilizes market capitalization and revenue to determine its value. A lower ratio indicates lower value the market places on each dollar of revenue generated by a company.

Price-to-Book Ratio

The P/B Ratio compares a company’s market value to its book value. A low P/B Ratio indicates an undervalued stock – an important factor when determining performance outlook.

Price-to-Cash Flow Ratio

The P/CF Ratio is used to value stocks that have positive cashflow but are not profitable due to large non-cash charges.

S&P Total Return

The S&P 500’s total return helps correlate other ratios to individual stocks. This is helpful because many ratios are really just looking at how the market values a company. This feature is also valuable because it indicates overall market trends – nothing happens inside a vacuum.

Data Acquisition

The project’s data, including dates and all the aforementioned features (see Feature Selection), was pulled from Bloomberg Terminals – computers connected to a large network of historical financial data. Due to download limits on the machines, this process took many days. By the end of the process, we had all available data from January 3, 2017 until September 6, 2019. Due to the availability of some data, we were only able to acquire data from 24 of the 30 DJIA stocks. The stocks that were excluded from this project were Boeing, Dow Chemical, Goldman Sachs, Home Depot, JPM, and McDonald’s.


After all data was pulled from the Terminals, we used a VBA macro to automatically generate 3 CSV files per stock analyzed. These 3 CSV files were training data, testing data, and labels. The training data consisted of the values for all the features on each date for the stock. The labels are 1s and 0s, where 1 indicates that 5 days from that date, the stock price went up while a 0 indicates that 5 days from that date the stock price went down. The testing data was in the same format as the training data, but it did not include labels because the goal of the algorithm is to predict labels for the testing data. We trained the model on the training data and labels and then output the predictive labels for the testing data, these labels were then fed back into a master excel file, and another VBA macro was run to get analytics (such as the accuracy breakdown). If implemented day-to-day (for investing purposes), the output would simply be a 1 or a 0 for each stock. This result is the determination of whether to invest or not. 

Trading Strategy

The results of the CatBoost method can be found on the CatBoost page. In short, however, we determined that the Machine Learning algorithm was correct more often when predicting upwards movement than it was when predicting downward movement. Our trading strategy is designed to capitalize on this fact. When the method predicts an upward movement, we invest a fixed amount into all 24 stocks. When the predicted day arrives, we sell. An analysis of the profitability of this method can be found on the Results page.