CatBoost | Data-Driven Weekly Forecasting of Dow Jones Stocks Using Machine Learning

General overview: CatBoost is a an open source machine learning algorithm that uses gradient boosting on decision trees. Gradient boosting is one of the most powerful techniques for building predictive models. The idea behind gradient boosting is taking a weak learner (like a decision tree algorithm) which is a poor hypothesis, and slowly converting it into a very good hypothesis. The hypothesis boosting works by filtering observations into easy and difficult ones, and “boosting” (learning other weak learners on) the difficult hypotheses until they’re easier to hypothesize. At a high level, gradient boosting works using three components: a loss function to be optimized, a weak learner to make predictions, and an additive model to add weak learners to minimize the loss function. We focus less on the optimization of these functions, and choose to use CatBoost as a blackbox algorithm. We wish to focus on clever feature selection and large data accumulation to improve our predictive accuracy. We take the view that the market is constantly changing over time, so rather than making generalizations and trying to find really specific market indicators, we’d rather use a more general algorithm with a larger feature set and sufficiently large training set.

Below is an example of our data along with a snapshot of CatBoost predictions. Each of the columns below, “Close Price” through “P/CF” are features. The highlighted “Result” column contains the CatBoost method’s up/down predictions. These predictions are the key to our trading strategy and results.

CatBoost Math

Overview

CatBoost is a gradient boosting machine-learning algorithm. This means that its prediction model is constructed by combining multiple iterations of weak predictors. The error between these weak learners’ predictions and the actual results is mapped as a loss function. The algorithm’s goal is to find a predicting function, F, that minimizes expected loss, L. These predicting functions, F^t(t = 0,1…), are the iteratively added weak predictors. They are calculated recursively by the equation

F^t= F^t-1+ α h^t

(where α is a predetermined step size and h^t is a base predictor – case 0). h^t is defined as

Minimizing loss is done through gradient descent. A method such as least-squared approximation is used to find (where g represents the derivative of the loss function):

CatBoost utilizes binary decision trees as base predictors. Decisions trees are models that recursively break up feature sets into several regions according to the values of their features. The deciding factor between two regions is a binary variable that determines if a feature’s value exceeds a threshold, t. These regions are called tree nodes. The model’s final nodes’ assigned values are estimates of the response from larger regions. Mathematically, h can be represented by the following (where R are regions – the leaves of the tree)

Further explanation of tree creation and ordered boosting can be found in our final report, which can be found in a link on the Home page.

Alternatively, more in-depth documentation of CatBoost can be found on the official website: here