Autoencoder Asset Pricing Models

TOC

Abstract 1. Introduction 2. Related Work 3. Approach 3.1 Base Model 3.2 FF model 3.3 PCA model 3.4 IPCA model 3.5 Conditional Autoencoder 4. Experiment 4.1 Data 4.2 Training, Validation, and Testing 4.3 Statistical Performance Evaluation 4.4 Risk Premia v.s. Mispricing 5. Conclusion References

Abstract

In this project, we reproduced most of the content of the paper from Gu, Kelly, and Xiu (GKX, 2019). Our work is strictly in accordance with the processing method of the original paper, and confirmed some details have not been mentioned. Although the final results of experiments are slightly different from GKX’s, mainly due to some hyperparameters which have not been given, our work provides a extensible framework that will be helpful for further research in the future. Especially, we use class template inheritance to implement all models, which not only accords with the theory of factor model, but also makes the implementation more elegant and efficient.

1. Introduction

Starting from the most basic CAPM model, researchers try to explain the return rate of assets through the factor model, which can be uniformly expressed by the following formula

In order to improve the explanatory power of the model, researchers have made many extensions and improvements to the model. From CAPM models to Fama-French three-factor and five-factor models, researchers have tried to improve the explanatory power of models by increasing the number of observable factors. However, this kind of model still cannot explain the return rate of assets well. Kelly, Pruitt, and Su (KPS, 2019) proposed through empirical experiments that factors (latent variables) used to explain asset returns are actually proxy variables of unobservable variables baesd on assets’ characteristics. Moreover, an asset's exposure to these latent variables should vary over time. In other words, characteristics appear to predict returns because they help pinpoint compensated aggregate risk exposures. KPS propose a method to use asset attributes as instrumental variables to calculate factors and asset exposure. The new method is called instrumental PCA (IPCA) and has following formation

In this form, KPS splits into a time variant part and a time invariant part .

Although IPCA uses characteristics of assets as instrumental variables, , which does not change over time, is essentially a linear function and does not combine these features and their covariate well. Therefore, GKX proposed the autoencoder model in the paper. Their main motivation is to use neural networks to fit beta and factor in factor pricing model respectively. They introduce nonlinear relations for feature and its covariates through activation functions (relu) in neural networks. In addition, GKX gives the corresponding derivation in the paper, proving that mathematically direct PCA decomposition of the asset return matrix and IPCA using as an instrumental variables are both special forms of their autoencoder. We do not provide a detailed proof of equivalence in our report, but the optimization and calculation of various models are derived in the Approach & Experiment section.

In terms of data, we used 60 years of historical data of US equity to conduct experiments. Due to computational resource constraints, we did not analyze the performance of the pricing model on a single stock, but instead constructed 94 portfolios based on 94 characteristics. By artificially constructing 94 portfolios, on the one hand, we reduce the amount of computation and make the input of the model more regular. On the other hand, some data noise is indirectly shielded by the combination of long and short, which makes the results more valuable for research. More details of the data process are shown in the Approach & Experiment section.

2. Related Work

We mainly refer to articles from GKX and KPS. The evaluation techniques of various machine learning methods comes from another article in GKX [citation needed]. In that article, GKX focused on the effectiveness of various machine learning methods for yield prediction. The paper we reproduce, on the other hand, focuses on the effect of using machine learning methods to explain realized returns. In addition, another predecessor related to this paper is Kozak et al. (2018), who propose an approach to factor analysis for asset pricing using principal components from “anaomaly”-sorted portfolios. Their approach is similar to how we build 94 portfolios with 94 characteristics.

The rest of our report is organized as follows. In section 3, we explain the models and methods mentioned in GKX’s paper in detail. In section 4, we explain the design and results of the replicated experiment. Finally, in section 5, we make a summary of this project and propose some directions for further research in the future.

3. Approach

3.1 Base Model

As equation, all models fall into the form of it. So when we implement it, we use class inheritance, the modelBase class represents the model in , and all the remaining models are its subclasses. The main approach of modelBase is shown in the following figure

Different subclasses override methods for calculating beta, factor, and so on in modelBase. This way of implementation is very consistent with the factor pricing model, and the code implementation is more concise and efficient.

3.2 FF model

The first model we reproduce is one with an observable factors. The advantage of this model is that the factors are interpretable, and all the factors are defined by clear meaning and calculation formulas. The disadvantage is that the explanatory power of the model is not high, the characteristic of the asset itself is not used, and the calculation of factor exposure is also realized by simple linear regression. The model can be expressed as follows

Where K takes 1, 2, 3, 4, 5, 6, and the observable factors added to the model in turn are market, SMB, HML, CMA, RMW, and UMD respectively. It is worth noting that the model is actually CAPM when only the market factor is used. When market, SMB, HML are included, the model is fama-french three-factor model. When market, SMB, HML, CMA, RMW are included, the model is fama-french five-factor model. The last UMD is the momentum factor proposed by fama-french. The data source for the implementation model is the french website, which has available data for all six factors since July 1963.

The estimation of FF model adopts OLS multiple regression, and the intercept term is not added in the regression, which is for the purpose of analyzing the explanatory power of the model later.

3.3 PCA model

Unlike FF model, which has observable factors, PCA model and the following models assume that factors are implicit variables. The PCA model is as follows:

It is called the PCA model because it assumes that beta does not change over time and tries to solve the following optimization problems,

This problem can be solved using PCA method. Using the least square method, for the latent factor model, the estimator of the feature is

residual error is

the square of the residual is

thus minimizing the square of the residual is maximizing . Note that

Assume , which is time invariant, then the maximization problem is equivalent to

Using Rayleigh theorem, the optimal are the eigenvectors corresponding to the first K eigenvalues of .

The PCA model implemented in our code is step-by-step following above procedure.

3.4 IPCA model

For IPCA model, the form is the same as PCA model but we do not assume is time invariant. Define , then

when is , the estimation of is the eigenvectors corresponding to the first K eigenvalues of .

When is not , which is often the case, we can following the procedure proposed by KPS in their paper. In their paper, they implement the IPCA by an iterated function to update . Assume we have , then

where . Then we can derive based on and . Consider

calculate

The first order condition of is

Apply the vec operator to both sides of the equation,

and then the estimation of is

The IPCA model implemented in our code is step-by-step following above procedure.

3.5 Conditional Autoencoder

From PCA model to IPCA model, we introduce 94 characteristics as instrument. However, we only use the linear combination of these instrument variables and their covariates. In the Conditional Autoencoder (CA) model, we try to fit a nonlinear combination between them by using activation function in the networks. The overall architecture of the CA model is as following. There are two networks in the CA model, representing beta and factor in the factor pricing model respectively. Inputs of the beta network are characterics (instrument variables) and the inputs of factor network are either stock-level returns or portfolio returns. In our implementation, we use 94 portfolios rather than individual stocks. That is, the beta network inputs have dimension of 94*94 () and the factor work inputs have dimension of 94*1 ().

We construct 4 types of CA models following original paper, i.e. CA0, CA1, CA2 and CA3. They are different in the structure of beta network. For CA0, the beta network of it is just a linear layer, making it similar (but not identical) to IPCA. For CA1, there is a hidden layer with 32 neurons in the beta network. CA2 adds a second hidder layer with 16 neurons and CA3 adds a third hidden layer with 8 neurons.

Note that CA0 through CA3 all maintain a one-layer linear specification on the factor network. In these cases, the only variation in factor network is in the number of neurons, which is range from 1 to 6, corresponding to the number of factors in the model. The intuition of the one-layer factor network is to maintain the interpretability of the factor to some extent, that is, the linear combination of the long-short portfolio returns of the stock-level characteristics.

4. Experiment

4.1 Data

The data we have is downloaded from Xiu’s website. The dataset contains monthly individual stock returns from the Center for Research in Securities Prices (CRSP) for all firms listed in the three major exchanges: NYSE, AMEX, and NASDAQ. We use the Treasury bill rate to proxy for the risk-free rate from which we calculate individual excess returns. The sample begins in March 1957 and ends in December 2016, totaling 60 years. For the convenience of realization and data transmission, we modify some format of the original data provided and we save some well-preprocessed data so that it can be used directly when the code is reused. All data process related operation is in the data_prepare.py.

As GKX pointed in the paper, the stock-level characteristics (94 in total) are released to the public with a delay. To avoid a forward-looking bias, they match realized returns at month with the most recent monthly characteristics at the end of month , the most recent quarterly data as of , and most recent annual data at of . This process has already been done in the data downloaded from Xiu’s website.

Distribution of some characteristics are highly skewed and leptokurtic. We follow the method of original paper and rank-normalize all characteristics into the interval (-1, 1) for each month. We then form 94 managed portfolios based on 94 characteristics. For example, to construct the ‘mvel1’ portfolio, for each month, we rank stocks according to ‘mvel1’. Then we write down the permnos of the top 10% stocks and the bottom 10% stocks. Calculate averages of the 94 characteristics of the top 10% stocks and the bottom 10% respectively, denoted by (94*1 vectors). Then the characteristics of the ‘mvel1’ portfolio is .

4.2 Training, Validation, and Testing

We implement a uniform way of training and testing. We divide data into three parts, i.e. train set, valid set, and test set. The initial separation is train set: 18 years (1957-1974), valid set: 12 yeas (1975-1986), and the remaining 30 years (1987-2016) are out-of-sample test set. We use a rolling training shceme to refit models once a year. Each time we refit, we rolling the train set and valid set by one year, maintaining their size. Note that here we are slightly different from the original paper, in which train set is increased by one year each time refit. We compare these two methods and find that there is no significant difference in results but our method has less computation cost.

We apply techniques mentioned in the original papar, including Adam optimizer, batch normaliztion etc. As for the penalization, we do not implement the LASSO penalization used by GKX because it’s not supported by pytorch. We try to use L2 penalization instead. Besides, we apply dropout layer to avoid overfitting and use early stop mechanism. We use the valid set for hyperparameters tuning (learning rate, regularization coefficients, drop out rate, etc.). A training process example is shown as below

Since GKX have not provided the hyperparameters they used in the paper, we cannot exactly coincide our results with theirs.

4.3 Statistical Performance Evaluation

We evaluate out-of-sample model performance using the total and predictive s. The total quantifies the explanatory power of contemporaneous factor realizations, and thus assesses the model’s description of individual stock riskiness:

The predictive assesses the accuracy of model-based predictions of future individual excess stock returns, which quantifies a model’s ability to explain panel variation in risk compensation:

where is the prevailing sample average of up to month . Note that all factor pricing models are estimated by minizing the realized residuals, that is, trying to maximizing the . There is a time lag when we use our explanatory models to make predictions, and there is no contraints about during the estimation. So we cannot expect to be high, and even negative values are possible.

Furthermore, we follow the method to evaluate characteristics’ importance mentioned in original paper. By assigning certain characteristic to zero during the model inference (do not change training process), we calculate the reduction in , which imply characteristics’ importance. The heatmap of reduced total R square is shown as below

4.4 Risk Premia v.s. Mispricing

We also follow GKX to carry out the mispricing analysis. Notice that all models we implemented above have no intercepts. Therefore, we can directly test whether the zero-intercept no-arbitrage restriction is satisfied in the data. If it is, the time series average of model residuals for each portfolios, that is, the pricing errors—should be statistically indistinguishable from zero (two side test). The uncoditional pricing errors are defined as

Take K=5 for example, the x axis represents portoflios’ return () while the y axis represents portfolios’ alpha (pricing errors). Blue dots represent insignificant alphas while red dots represent significant alphas. The flatter the distribution of points, the better the interpretation of the model. It can be observed that, compared with the FF_5 model, PCA and CAs can explain the portfolios’ returns better.

5. Conclusion

In this project, we reproduced most of the content of the GKX paper. Due to the limitation of computing resources, we only carried out the part of managed portfolios. For the implementation details not mentioned in the original paper, we determined them through online search and experimental attempts. Since GKX did not give some training hyperparameters in the paper, and we did not search for optimal parameters due to limited computational resources, thus our final results are slightly different from those in the original paper. For code implementation, we used classes and inheritance. This not only fits the form of factor pricing model well, but also makes the implementation process elegant and efficient, and provides a reference implementation framework for further research on stock dimension data.

References

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3335536

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3032013

https://blog.csdn.net/weixin_44207974/article/details/107970705

https://zihanzhu.blog.csdn.net/article/details/108016903