# Record number of 5 papers are accepted to ICRA 2018

[2018.01.12]

Following five papers are accepted to the **IEEE International Conference on Robotics and Automation (ICRA 2018)**:

**Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning**by Kyungjae Lee, Sungjoon Choi, and Songhwai Oh**Abstract**: In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed. The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided. We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization.We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance.**Video**

**Text2Action: Generative Adversarial Synthesis from Language to Action**by Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh**Abstract**: In this paper, we propose a generative model which learns the relationship between language and human action in order to generate a human action sequence given a sentence describing human behavior. The proposed generative model is a generative adversarial network (GAN), which is based on the sequence to sequence (SEQ2SEQ) model. Using the proposed generative network, we can synthesize various actions for a robot or a virtual agent using a text encoder recurrent neural network (RNN) and an action decoder RNN. The proposed generative network is trained from 29,770 pairs of actions and sentence annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video dataset. We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence. Results show that the proposed generative network correctly models the relationship between language and action and can generate a diverse set of actions from the same sentence.**Video**

**Learning-Based Model Predictive Control under Signal Temporal Logic Specifications**by Kyunghoon Cho and Songhwai Oh**Abstract**: This paper presents a control strategy synthesis method for dynamical systems with differential constraints while satisfying a set of given rules in consideration of their importances. A special attention is given to situations where all rules cannot be met in order to fulfill a given task. Such dilemmas compel us to make a decision on the degree of satisfaction of each rule including which rule should be maintained or not. In this work, we propose a learning-based model predictive control method in order to solve this problem, where a key insight is to combine a learning method and traditional control scheme so that the designed controller behaves close to human experts. A rule is represented as a signal temporal logic (STL) formula. A robustness slackness, a margin to the satisfaction of the rule, is learned from expert’s demonstrations using Gaussian process regression. The learned margin is used in a model predictive control procedure, which helps to decide how much to obey each rule, even ignoring specific rules. In track driving simulation, we show that the proposed method generates human-like behavior and efficiently handles dilemmas as human teachers do.**Video**

**Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling**by Sungjoon Choi, Kyungjae Lee, Sungbin Lim, and Songhwai Oh**Abstract**: In this paper, we propose an uncertainty-aware learning from demonstration method by presenting a novel uncertainty estimation method utilizing a mixture density network appropriate for modeling complex and noisy human behaviors. The proposed uncertainty acquisition can be done with a single forward path without Monte Carlo sampling and is suitable for real-time robotics applications. Then, we show that it can be decomposed into explained variance and unexplained variance where the connections between aleatoric and epistemic uncertainties are addressed. The properties of the proposed uncertainty measure are analyzed through three different synthetic examples, absence of data, heavy measurement noise, and composition of functions scenarios. We show that each case can be distinguished using the proposed uncertainty measure and presented an uncertainty-aware learning from demonstration method for autonomous driving using this property. The proposed uncertainty-aware learning from demonstration method outperforms other compared methods in terms of safety using a complex real-world driving dataset.**Video**

**A Nonparametric Motion Flow Model for Human Robot Cooperation**by Sungjoon Choi, Kyungjae Lee, H. Andy Park, and Songhwai Oh**Abstract**: In this paper, we present a novel nonparametric motion flow model that effectively describes a motion trajectory of a human and its application to human robot cooperation. To this end, motion flow similarity measure which considers both spatial and temporal properties of a trajectory is proposed by utilizing the mean and variance functions of a Gaussian process. We also present a human robot cooperation method using the proposed motion flow model. Given a set of interacting trajectories of two workers, the underlying reward function of cooperating behaviors is optimized by using the learned motion description as an input to the reward function where a stochastic trajectory optimization method is used to control a robot. The presented human robot cooperation method is compared with the state-of-the-art algorithm, which utilizes a mixture of interaction primitives (MIP), in terms of the RMS error between generated and target trajectories. While the proposed method shows comparable performance with the MIP when the full observation of human demonstrations is given, it shows superior performance with respect to given partial trajectory information.**Video**