Three papers from RLLAB are accepted to IROS 2020

[2020.07.01]

Following papers are accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020):

  • No-Regret Shannon Entropy Regularized Neural Contextual Bandit Online Learning for Robotic Grasping by Kyungjae Lee, Jaegu Choy, Yunho Choi, Hogun Kee, and Songhwai Oh
    • Abstract: In this paper, we propose a novel contextual bandit algorithm that employs a neural network as a reward estimator and utilizes Shannon entropy regularization to encourage exploration, which is called Shannon entropy regularized neural contextual bandits (SERN). In many learning-based algorithms for robotic grasping, the lack of the real-world data hampers the generalization performance of a model and makes it difficult to apply a trained model to the real-world problems. To handle this issue, the proposed method utilizes the benefit of online learning. The proposed method trains a neural network to predict the success probability of a given grasp pose based on a depth image, which is called a grasp quality. The policy is defined as a softmax distribution of a grasp quality which is induced by the Shannon entropy regularization. The proposed method explores diverse grasp poses due to the softmax distribution, but promising grasp poses based on the estimated grasp quality are explored more frequently. We also theoretically show that the SERN has a no regret property. We empirically demonstrate that the SERN outperforms epsilon-greedy in terms of sample efficiency.
    • Video:
  • MixGAIL: Autonomous Driving Using Demonstrations with Mixed Qualities by Gunmin Lee, Dohyung Kim, Wooseok Oh, Kyungjae Lee, and Songhwai Oh
    • Abstract: In this paper, we consider autonomous driving of a vehicle using imitation learning. Generative adversarial imitation learning (GAIL) is a widely used algorithm for imitation learning. This algorithm leverages positive demonstrations to imitate the behavior of an expert. In this paper, we propose a novel method, called mixed generative adversarial imitation learning (MixGAIL), which incorporates both of expert demonstrations and negative demonstrations, such as vehicle collisions. To this end, the proposed method utilizes an occupancy measure and a constraint function. The occupancy measure is used to follow expert demonstrations and provides a positive feedback. On the other hand, the constraint function is used for negative demonstrations to assert a negative feedback. Experimental results show that the proposed algorithm converges faster than the other baseline methods. Also, hardware experiments using a real-world RC car shows an outstanding performance and faster convergence compared with existing methods.
    • Video:
  • Pedestrian Intention Prediction for Autonomous Driving Using a Multiple Stakeholder Perspective Model by Kyungdo Kim, Yoon Kyung Lee, Hyemin Ahn, Sowon Hahn, and Songhwai Oh
    • Abstract: This paper proposes a multiple stakeholder perspective model (MSPM) which predicts the future pedestrian trajectory observed from vehicle’s point of view. The motivation of the MSPM is that a human driver exploits the experience of being a pedestrian when he or she encounters a pedestrian crossing over the street. For the vehicle-pedestrian interaction, the estimation of the pedestrian’s intention is a key factor. However, even if this interaction is commonly initiated by both the human (pedestrian) and the agent (driver), current research focuses on developing a neural network trained by the data from driver’s perspective only. In this paper, we suggest a multiple stakeholder perspective model (MSPM) and apply this model for pedestrian intention prediction. The model combines the driver (stakeholder 1) and pedestrian (stakeholder 2) by separating the information based on the perspective. The dataset from pedestrian’s perspective have been collected from the virtual reality experiment, and a network that can reflect perspectives of both pedestrian and driver is proposed. Our model achieves the best performance in the existing pedestrian intention dataset, while reducing the trajectory prediction error by average of 4.48% in the short-term (0.5s) and middle-term (1.0s) prediction, and 11.14% in the long-term prediction (1.5s) compared to the previous state-of-the-art.
    • Video: