An R package for Bayesian estimation of the probability of informed trading
bayespin
bayespin implements the statistical methods for estimating the
probability of informed trading (PIN) with a Bayesian approach as proposed by
Grammig et al. (2015). This should simplify the usage of this rather complicated
estimation procedure and offers researchers an API that is easy to integrate,
stable, and fast in performance.
The estimation method of Grammig et al. (2015) offers some advantages in comparison to the original model of Easley et al. (1996) and other Bayesian approaches found in literature:
-
It uses only the number of trades per day instead of the number of seller- and buyer-initiated trades used by other approaches. This enables the researcher to collect data more easily - also for historical periods and in turn leads to less bias in case trade initiation had to be estimated by using the Lee and Ready (1991) algorithm or similar procedures.
-
The Bayesian estimation of the PIN measure is found to be more stable, especially when it comes to very large trading volumes as they occur regularly on modern markets today.
-
Especially in settings where the rates of informed trading,
$\mu$ and/or the probability of information events,$\alpha$ are very small Bayesian estimation of the underlying finite mixture distribution leads to more robust parameter estimates.
The package makes use of high-performance C++ algorithms for MCMC sampling of
finite mixture distributions offered by the
finmix
package. Model estimation
with a simple K-means
relabeling takes around 4-6 seconds.
In addition to the Bayesian estimation approach from Grammig et al. (2015) the
bayespin
package also implements several other methods to estimate the
probability of informed trading:
- The original maximum likelihood procedure of the model by Easley et al. (1996).
- The maximum likelihood procedure of the model by Easley et al. (1996) using a variation of the likelihood function proposed in Easley et al. (2002).
- The maximum likelihood procedure of the model by Jackson (2007) that also uses solely the number of trades per trading day (this is similar to Grammig et al. (2015)).
These models were implemented to ease their use for researchers and to enable comparisons between different models and estimation approaches.
The package can be installed directly from GitHub by using the function
install_github()
in the devtools
package. The package passed all checks from
R CMD check
on all major platforms and hence, should be installable on MacOS
X, Windows, and Linux. Be sure that you installed appropriate developer tools
for your platform as a C++ compiler for the source code is needed.
Note that installation of the dependencies can take some time as bayespin
depends on the finmix
package and needs to compile the C++ code therein.
For MacOS the XCode Command Line Tools are needed. You should have installed
these when installing R
. See the
MacOSX-FAQ
for more information on how to install source packages on MacOS.
For Windows the rtools
package is needed. Follow the link and install this package, if you have not
installed it, yet.
To start, simulate data and then estimate the model by Grammig et al. (2015) and compare results to outputs of maximum likelihood estimation of the original model of Easley et al. (1996):
# Set the random seed so results can be replicated.
set.seed(42)
# Simulate trades data from the model by Easley et al. (1996).
trades_data <- simulate_ekop(size = 1000, alpha = .3, epsilon = .3,
delta = .5, mu = .1, T = 60*6.5)
# Show first lines of data.
head(trades_data)
MisBuy MisSell Buy Sell Trades
1 0 0 175 167 342
2 0 0 163 161 324
3 0 0 163 147 310
4 0 0 141 172 313
5 0 0 176 156 332
6 0 0 154 163 317
# Estimate the model of Grammig et al. (2015).
bayesian_pin <- estimate_pin(trades_data$Trades)
# Show results.
bayesian_pin
alpha epsilon mu pin
MAP 0.3392890 0.2998388 0.09379760 0.05039492
BML 0.3356769 0.2999046 0.09406135 0.05000800
IEAVG 0.3402887 0.2998041 0.09369827 0.05049063
# Estimate the original model by Easley et al. (1996).
ml_pin <- estimate_mlekop(trades_data, methodLik="approx",
fnLik = "compute_ekop_orig_lik", opt_out=FALSE)
# Show results.
ml_pin
alpha epsilon delta mu pin
ML 0.3211642 0.3000834 0.486291 0.09695398 0.04932346
We can see that the original model by Easley et al. (1996) performs better
parameter estimates. This is not surprising, as if we have more data available
it helps to use it. Things become interesting, if the buyer- and
seller-initiated trades suffer from mis-specification (see herefor the
simulation function simulate_ekop_mis()
).
- Grammig, J., Theissen, E., Zehnder, L.S., 2015. Bayesian Estimation of the Probability of Informed Trading. Conference on Financial Econometrics & Empirical Asset Pricing 2016, Lancaster University.
- Easley, D., Kiefer, N., O’Hara, M., Paperman, J., 1996. Liquidity, information, and infrequently traded stocks. Journal of Finance 51, 1405–1436.
- Jackson, D., 2007. Infering trader behavior from transaction data: A trade count model. Journal of Computational and Graphical Statistics 12, 55-79.
- Lee, C., Ready, M. J., 1991. Inferring trade direction from intraday data. The Journal of Finance 46, 733-746.
This is a package worked on for years and still not fully implemented. As it is still maintained by a single author, please by patient with issues.