Compositional Grounded Language for Agent Communication in Reinforcement Learning Environment

Journal: Journal of Autonomous Intelligence DOI: 10.32629/jai.v2i3.56

K. Lannelongue, M. de Milly, R. Marcucci, S. Selevarangame, A. Supizet, A. Grincourt

Undergraduate Students, ECE Paris School of Engineering, France


In a context of constant evolution of technologies for scientific, economic and social purposes, Artificial Intelligence (AI) and Internet of Things (IoT) have seen significant progress over the past few years. As much as Human-Machine interactions are needed and tasks automation is undeniable, it is important that electronic devices (computers, cars, sensors…) could also communicate with humans just as well as they communicate together. The emergence of automated training and neural networks marked the beginning of a new conversational capability for the machines, illustrated with chat-bots. Nonetheless, using this technology is not sufficient, as they often give inappropriate or unrelated answers, usually when the subject changes. To improve this technology, the problem of defining a communication language constructed from scratch is addressed, in the intention to give machines the possibility to create a new and adapted exchange channel between them. Equipping each machine with a sound emitting system which accompany each individual or collective goal accomplishment, the convergence toward a common ‘’language’’ is analyzed, exactly as it is supposed to have happened for humans in the past. By constraining the language to satisfy the two main human language properties of being ground-based and of compositionality, rapidly converging evolution of syntactic communication is obtained, opening the way of a meaningful language between machines.


Machine Learning, Reinforcement Learning, Natural Language Processing


[1] S.J. Russell, P. Norvig. Artificial intelligence: A modern approach, 2nd edn. Prentice Hall, NJ, 2003.
[2] J. Bratman, M. Shvartsman, R.L. Lewis, et al. A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints. Proc. 10th Intern. Conf. on Cognitive Modeling, pp.7–12, 2010.
[3] M.P. Deisenroth, G. Neumann, J. Peters. A survey on policy search for robotics. Foundations and Trends in Robotics 2013; (1-2): 1-142.
[4] M. Cotsaftis. Toward global complex systems control – the autonomous intelligence challenge. J. of. Autonomous Intelligence 2019; 2(1): 11-27.
[5] D.J.C. MacKay. Information theory, inference, and learning algorithms. Cambridge University Press 2003.
[6] R.S. Sutton, A.G. Barto. Reinforcement learning. The MIT Press 1998.
[7] R. Hafner, M. Riedmiller. Reinforcement learning in feedback control. Machine Learning 2011; 84(1-2): 137-169.
[8] J. Kober, J.A. Bagnell, J. Peters. Reinforcement learning in robotics : A survey. Intern. J. Robotic Research 2013; 32(11): 1238–1274.
[9] R. Coulom. Reinforcement learning using neural networks, with applications to motor control. PhD thesis, Institut National Polytechnique de Grenoble, 2002.
[10] Shi-Xiang Gu, E. Holly, T. Lillicrap, et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. arXiv : 1610.00633v2 [cs.RO], 2016.
[11] C.M. Bishop. Pattern recognition and machine learning. Information Science and Statistics. Springer-Verlag, 2006.
[12] K. Doya. Reinforcement learning in continuous time and space. Neural Computation 2000; 12(1): 219-245.
[13] R.J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 1992; 8: 229-256.
[14] E. Theodorou, J. Buchli, S. Schaal. A generalized path integral control approach to reinforcement learning. J. Machine Learning Research 2010; 11: 3137-3181.
[15] S. Amari. Natural gradient works efficiently in learning. Neural Computation 1998; 10: 251-276.
[16] A. Lazaridou, A. Peysakhovich, M. Baroni. Multi-agent cooperation and the emergence of (natural) language. arXiv:1612.07182, 2016.
[17] A. Lazaridou, N.T. Pham, M. Baroni. Towards multi-agent communication-based language learning. arXiv: 1605.07133, 2016.
[18] B.M. Lake, T.D. Ullman, J.B. Tenenbaum, et al. Building machines that learn and think like people. arXiv:1604.00289 [cs.AI], 2016.
[19] P.M. Nadkarni, L. Ohno-Machado, W.W. Chapman. Natural language processing: An introduction. J Am Med Inform Assoc. 2011; 18(5): 544–551.
[20] D. Bahdanau, K.H. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. ArXiv:1412:3555, 2014.
[21] G. Durrett, T. Berg-Kirkpatrick, D. Klein. Learning-based single-document summarization with compression and anaphoricity constraints. arXiv:1603.08887, 2016.
[22] B. Dhingra, L. Li, X. Li, et al. End-to-end reinforcement learning of dialogue agents for information access. arXiv:1609.00777 [Cs], 2016.
[23] A. Graves. Generating sequences with recurrent neural networks. ArXiv:1308.0850, 2014.
[24] N. Kalchbrenner, E. Grefenstette, P. Blunsom. A convolutional neural network for modelling sentences. Proc. 52th Annual Meeting of the Association for Computational Linguistics 2014; I: 655-665.
[25] L. Busoniu, R. Babuska, B. De Schutter, et al . Reinforcement learning and dynamic programming using function approximators. Taylor & Francis, CRC Press, 2010.
[26] L.P. Kaelbling, M.L., Littman, et al. Reinforcement learning: A survey. J. Artificial Intelligence Research 1996; 4: 237–285.
[27] L. Bottou. From machine learning to machine reasoning. Machine Learning 2014; 94(2): 133–149.
[28] J. Weston, S. Chopra, A. Bordes. Memory networks. Proc. ICLR 2015, arXiv:1410.3916, 2015.
[29] R. Jackendoff. Foundations of language. Oxford Univ. Press, 2003.
[30] R.C. Berwick, N. Chomsky. Why only us: Language and evolution. Cambridge, MA, MIT Press, 2016.
[31] S.I. Reynolds. Reinforcement learning with exploration. PhD Thesis, School of Computer Science, The University of Birmingham, UK, 2002.
[32] L. Steels. What triggers the emergence of grammar? Proc. 2nd Intern. Symp. on the Emergence and Evolution of Linguistic Communication (EELC’05), pp.143–150, 1995.
[33] R. Socher, A. Perelygin, J.Y. Wu, et al. Recursive deep models for semantic compositionality over a sentiment treebank. Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Vol.1631, 2013.
[34] S.J. Gershman, E.J. Horvitz, J.B. Tenenbaum. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 2015 ; 349: 273–278.
[35] T. Mikolov. Statistical language models based on neural networks. PhD Thesis, Brno University of Technology, 2012.
[36] J.N. Foerster, Y.M. Assael, N. de Freitas, et al. Learning to communicate with deep multi-agent reinforcement learning. Proc. Annual Conference on Neural Information Processing Systems, pp.2137-2145, 2016.
[37] S. Kirby, T. Griffiths, K. Smith. Iterated learning and the evolution of language. Current Opinion in Neurobiology 2014; 28: 108–114.
[38] I. Sutskever, O. Vinyals, Q.V. Le. Sequence to sequence learning with neural networks, Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.D. Weinberger, eds., Vol.27, Curran Associates, Inc. , pp.3104–3112, 2014.
[39] K. Beuls, L. Steels. Agent-based models of strategies for the emergence and evolution of grammatical agreement. PloS one 2013; 8(3): e58960.
[40] T. Mikolov, I. Sutskever, K. Chen, et al. Distributed representations of words and phrases and their compositionality. arXiv:1310.4546 [cs.CL], 2013.
[41] I. Mordatch, P. Abbeel. Emergence of grounded compositional language in multi-agent populations. arXiv: 1703.04908, 2018.
[42] M.A. Nowak, J.B. Plotkin, V.A.A. Jansen. The evolution of syntactic communication. Nature 2000; 304(6777): 405-498.
[43] M.L. Littman. Markov games as a framework for multi-agent reinforcement learning. Proc. XIth Intern. Conf. on Machine Learning 1994; 157: 157–163.
[44] C.D. Manning, J. Bauer, M. Surdeanu, et al. Stanford core nlp natural language processing toolkit, 2014.
[45] The physical state of agent i is Xi(t) = col[p,dp/dt,v,c]i(t) with c the (fixed) color of the agent, and its action space ai is ai = col[up,uv,v]. It obeys differential equations dXi(t)/dt = col[dp/dt,dp/dt + up + f({Xi(t)}),uv,0)] where f(.) are physical interactions between agents and dp/dt a damping term to ease numerical computation with adjustable [0,1].
[46] E. Jang, S. Gu, B. Poole. Categorical reparameterization with gumbel-softmax. arXiv : 1611.01144, [stat], 2016.
[47] C.J. Maddison, D. Tarlow, T. Minka. A* sampling, Advances in Neural Information Processing Systems, pp.3086–3094, 2014.
[48] Y.W. Teh. Dirichlet process, Encyclopedia of Machine Learning. Springer, pp.280–287, 2011.

Copyright © 2019 K. Lannelongue, M. de Milly, R. Marcucci, S. Selevarangame, A. Supizet, A. Grincourt

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License