CNN for Trading
CNN for Trading

CNN for Trading

Tags
Deep Learning
QR
Pytorch
Description
Project 1 of THU course DL & Finance
TOC
CNN-for-Trading
RichardS0268 • Updated Oct 5, 2023

Abstract

In this experiment, we have re-implemented the main idea of Jiang, Jingwen and Kelly, Bryan T. and Xiu, Dacheng in their paper (Re-)Imag(in)ing Price Trends using A-share market data. For the dataset, we use daily OHLC data to calculate labels (ret5, ret20) and generate images. The images are actually two dimensions array. When displayed in grey-figures, they represent the candlelist firgures of previous 5 days or 20 days together with corresponding technical indicators or volume if required. For the models, we rigorously follow the architecture and hyperparameters of the original paper. We have implied the first two CNNs proposed by the authors, i.e. 5-day CNN model and 20-day CNN model. Together with 2 kinds of labels (ret5 and ret20), there are total 4 kinds of models in our whole project, i.e. I5R5, I5R20, I20R5, I20R20. For the training scheme, we mainly follow the setting of original paper. However, due to limitation of computation resource, we introduce sample rate to decrease the size of dataset. Besides, we design a rolling training scheme seeking to obtain a set of stable and reliable models, and leave extensible opportunities for further implementation. In the end, we apply our models to generate factors (future returns predictions) and backtest them on A-share markets.
The project code is available on https://github.com/RichardS0268/CNN-for-Trading.
Although the final performances of models and backtest simulations may not be satisfying becuase we don’t delicately tune parameters from the settings of original paper, our code are extensible for further research and implementations.

1. Dataset

Following the idea of original paper, we generate the OHLC chart using raw csv file. The raw csv file is provided by TA and can be downloaded from tabularDf.zip. Each image in the dataset is represented by a matrix of pixle values (0 or 255 for black or white pixels). Each day’s price data occupies a width of 3 pixel. Open price occupies the first pixel, folling by Low price and High price, and Close price occupies the last pixel. To fix input images’ size, we rescale price within the same lookback window to [0, 1] by
where max Price and min Price are the maximal value and minimal value of all price values (open, high, low, close) of the lookback window. We multiply the rescaled price and the image size to obtain white pixels’ coordinates, and then map these coordinates to the matrix by setting corresponding pixel values to 255 and leaving other parts 0. Examples of images are shown as below (1) and (3).
Imageset entries: (1) I5, without indicators, without volume; (2) I5, with indicators, with volume; (3) I20, without indicators, without volume; (4) I20, with indicators, with volume.
Imageset entries: (1) I5, without indicators, without volume; (2) I5, with indicators, with volume; (3) I20, without indicators, without volume; (4) I20, with indicators, with volume.
As for indicators, each day’s indictor occupies 3 width pixels (an align line) in the image. Indicators’ values are also rescaled to fit in the fixed figure size. As for volume, when displayed, the image is split into two parts with a gap of 1 pixel. Upper part of the image displays OHLC chart while lower part displays volume bars. Each day’s volume only occupies 1 width pixel (the middle one). In our experiments, we only use 1 indicator, i.e. MA. Other indicators can also be displayed on the image by simply adding calculation functions in dataset.py and defining parameters in config files. Image size allocations are shown as below
ㅤ
I5 without volume
I5 with volume
I20 without volume
I20 with volume
Chart Width
15
15
60
60
Chart Height
32
25
64
51
Volume Height
—
6
—
12
Due to the limitation of computation resource, we define the sample rate when preparing for datasets. There are actually two layers’ loops during the images generation. The outer one iterates stocks and the inner one iterates trading dates. The sample rate is applied in the inner loop: each iteration has a possibility (sample rate) to be skipped. In this way, we decrease the size of datasets. Randomly sampling iterations of the inner loop is reasonable because for the same stock, there is a lot of duplicate information in the image of adjacent days.
Fixed image size and rescaleing price result in uniform input format, and thus can be viewed as a kind of normalization. We expect the model can learn some general patterns through these normalized images, and it also makes the random sampling process reasonable.

2. Model

The model architectures are exactly the same as in the orginal paper
notion image
Detailed parameters of the CNN are shown as below
CNN 5d
CNN 5d
CNN 20d
CNN 20d

3. Training the CNN

3.1 Rolling Training

As for the division of training set, validation set and test set, we design a rolling training scheme. Our data is available from 2010~2019, so we group each three years into a subset. Use the first two years’ data for training as validation and use the last year’s data for test. The scheme is shown as below
Rolling Training Scheme
Rolling Training Scheme
Different subsets are assigned via config files. For T&V set, the training set and validation set are split by a VALID_RATIO (0.3 validation set default). T stands for the test set. Training sets and validation sets are shuffled while test sets are not.

3.2 Workflow

We follow the procedure in the original paper. The paper treats the prediction analysis as a binary classification problem. In particular, the label for an image is defined as if the subsequent return is positive and otherwise. The loss function used is a cross-entropy loss function, which is defined as
where is the softmax output from the final step in the CNN. We use nn.BCELoss() in our code.
We also implement other techniques and settings mentioned in the paper (3.3 Training the CNN), including:
  • Xavier initializer for weights in each layer
  • Adam optimizer, weight decay = 0.01, learning rate = 1e-5
  • Batch normalizations between the convolution and non-linear activation function
  • 50% dropout of the FC layer
  • Early stopping, halt the training once the validation sample loss fails to decrease for 16 consecutive epochs
Note that all these settings can be modified in config files.
Take I20R5 (2010~2012 train & valid, 2012~2013 test) for example, the trianing error, validation error and training accuracy, validation accuracy during the traning prcocess is shown as below
notion image
This I20R5’s performance on the testset is shown as below
  • Test Loss: 0.727250
  • Test Accuracy of down: 60% (32705/54358) Test Accuracy of up: 42% (24304/56772)
  • Test Accuracy (Overall): 51% (57009/111130)
To run other experiments:
python main.py "**.yml" python test.py "**.yml"

Loading Comments...