A Method to identify anomalies in stock market trading based on Probabilistic Machine Learning

Journal: Journal of Autonomous Intelligence DOI: 10.32629/jai.v2i2.44

Paulo Andre Lima de Castro, Anderson R.B. Teodoro

Autonomous Computational Systems Lab Technological Institute of Aeronautics (ITA) Sao Jose dos Campos-SP, Brazil pauloac@ita.br, marcelsoaresribeiro@gmail.com


Financial operations involve a significant amount of resources and can directly or indirectly affect the lives of virtually all people. For the efficiency and transparency in this context, it is essential to identify financial crimes and to punish the responsible. However, the large number of operations makes it unfeasible for analyzes made exclusively by humans. Thus, the application of automated data analysis techniques is essential. Within this scenario, this work presents a method that identifies anomalies that may be associated with operations in the stock exchange market prohibited by law. Specifically, we seek to find patterns related to insider trading. These types of operations can generate big losses for investors. In this work, publicly available information by the SEC and CVM, based on real cases on BOVESPA, NYSE and NASDAQ stock exchanges, is used as a training base. The method includes the creation of several candidate variables and the identification of relevant variables. With this definition, classifiers based on decision trees and Bayesian networks are constructed, and, after, evaluated and selected. The computational cost of performing such tasks can be quite significant, and it grows quickly with the amount of analyzed data. For this reason, the method considers the use of machine learning algorithms distributed in a computational cluster. In order to perform such tasks, we use the Weka framework with modules that allows the distribution of the processing load in a Hadoop cluster. The use of a computational cluster to execute learning algorithms in a large amount of data has been an active area of research, and this work contributes to the analysis of data in the specific context of financial operations. The obtained results show the feasibility of the approach, although the quality of the results is limited by the exclusive use of publicly available data.


machine learning, financial artificial intelligence


CNPq (Brazil)


COTSAFTIS, Michel. The autonomous intelligence challenge. Journal of Autonomous Intelligence, v. 1, n. 1, p. 1, 2018.

Saldanha, M., & Omar, N. Real-time manipulation in the Financial Market of Stocks and Derivatives by Spoofing and Layering, 2019.

Abualigah, L. M. Q. (2019). Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Studies in Computational Intelligence.

Abualigah, L. M., &Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 73(11), 4773-4795.

Abualigah, L. M., Khader, A. T., &Hanandeh, E. S. (2018). Hybrid clustering analysis using improved krill herd algorithm. Applied Intelligence.

Abualigah, L. M., Khader, A. T., &Hanandeh, E. S. (2018). A Combination of Objective Functions and Hybrid Krill Herd Algorithm for Text Document Clustering Analysis. Engineering Applications of Artificial Intelligence.

Abualigah, L. M., Khader, A. T., &Hanandeh, E. S. (2017). A new feature selection method to improve the document clustering using particle swarm optimization algorithm. Journal of Computational Science.

Abualigah, L. M., Khader, A. T., Hanandeh, E. S., &Gandomi, A. H. (2017). A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Applied Soft Computing, 60, 423-435.

Remco R Bouckaert. Bayesian network classifiers in weka. Department of Computer Sci- ence, University of Waikato Hamilton, 2004.
Remco Ronaldus Bouckaert. Bayesian belief networks: from construction to inference. 2001.

Brasil. Law no. 6,385, of december 7, 1976, 1976. URL http://www.planalto.gov.br/ ccivil_03/LEIS/L6385original.htm.

Peter Cabena, Pablo Hadjinian, Rolf Stadler, Jaap Verhees, and Alessandro Zanasi. Dis- covering data mining: from concept to implementation. Prentice-Hall, Inc., 1998.

Dennis W Carlton and Daniel R Fischel. The regulation of insider trading. Stanford Law Review, pages 857–895, 1983.

LINTON, Oliver; MAHMOODZADEH, Soheil. Implications of high-frequency trading for security markets. Annual Review of Economics, v. 10, p. 237-259, 2018.

Gregory F Cooper and Edward Herskovits. A bayesian method for the induction of proba- bilistic networks from data. Machine learning, 9(4):309–347, 1992.

Lawrence Davis. Handbook of genetic algorithms. 1991.

Sam Drazin and Matt Montag. Decision tree analysis using weka. Machine Learning-Project II, University of Miami, pages 1–3, 2012.
Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.

Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classifiers. Machine learning, 29(2-3):131–163, 1997.

Koosha Golmohammadi and Osmar R Zaiane. Data mining applications for fraud detection in securities market. In Intelligence and Security Informatics Conference (EISIC), 2012 European, pages 107–114. IEEE, 2012.

Thomas Gorman. Sec actions, 2016. URL http://www.secactions.com/.

Mark Hall. Mark hall on data mining weka. 2013. URL https://markahall.blogspot. com.br/2013/10/weka-and-hadoop-part-1.html.

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009.

Alain Hertz and Dominique de Werra. Using tabu search techniques for graph coloring.
Computing, 39(4):345–351, 1987.

George H John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 338–345. Morgan Kaufmann Publishers Inc., 1995.

J Richard Landis and Gary G Koch. The measurement of observer agreement for categorical data. biometrics, pages 159–174, 1977.

Andrew McCallum, Kamal Nigam, and Lyle H Ungar. Efficient clustering of high- dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 169–178. ACM, 2000.

Tom M Mitchell. Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45:37, 1997.

Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference.
Morgan Kaufmann, 2014.

Viviane Muller Prado and Renato Vilela. Insider trading x-ray in the brazilian securities commission (cvm) 2002-2014, 2015.

Stuart Jonathan Russell and Peter Norvig. Artificial intelligence: a modern approach.
Pearson, 3 edition, 2010.

SEC. Sec enforcement actions: Insider trading cases, 2016. URL https://www.sec.gov.

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pages 1–10. IEEE, 2010.

Jarrod West and Maumita Bhattacharya. Intelligent financial fraud detection: a compre- hensive review. Computers & Security, 57:47–66, 2016.

Ian H Witten and Eibe Frank. Data Mining: Practical machine learning tools and tech- niques. Morgan Kaufmann, 2005.

Copyright © 2019 Paulo Andre Lima de Castro, Anderson R.B. Teodoro

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License