The aim of this work is to develop an approach that enables Unmanned Aerial System (UAS) to efficiently learn to navigate in large-scale urban environments and transfer their acquired expertise to novel environments. To achieve this, we propose a meta curriculum training scheme. First, the meta training allows the agent to learn a master policy to generalize across tasks. The resulting model is then fine-tuned on the downstream tasks. We organize the training curriculum in a hierarchical manner, such that the agent is guided from coarse to fine towards the target task. In addition, we introduce Incremental Self-Adaptive Reinforcement learning (ISAR), an algorithm that combines the ideas of incremental learning and meta reinforcement learning (MRL). In contrast to traditional reinforcement learning (RL), which focuses on acquiring a policy for a specific task, MRL aims to learn a policy with fast transfer ability to novel tasks. However, the training process of MRL is time-consuming, whereas our proposed ISAR algorithm achieves faster convergence than conventional MRL algorithm. We evaluate the proposed methodologies in simulated environments and demonstrate that using such training philosophy in conjunction with the ISAR algorithm significantly improves the convergence speed for navigation in large-scale cities and the adaptation proficiency in novel environments.
Two-stage learning framework: The navigation task consists of two phases: meta-training and curriculum fine-tuning. Meta-training allows the agent to learn a master navigation policy. The hierarchical-structured curriculum adapts the meta-policy to the target task. This meta-policy can further be transferred to novel environments.
We first set the agent’s altitude at 300 meters to implement meta-training. Then, we fine-tune the meta-policy with a hierarchical-structured training curriculum. We use ResNet-18 to extract the features of current state and target state. The combined feature is then fed into the policy network to generate the navigation policy and value. Our training strategy outperforms the standard agent trained from scratch at 15 meters.
The illustration of the ISAR framework with adaptive-update step size = 2. During exploration, we learn two types of losses: the interaction loss, and the adaptive loss based on trajectory segment to update the adaptation policy. The results show that our ISAR algorithm demonstrates a significant improvement in convergence speed compared to SAVN (traditional MRL).
The transfer learning process to unseen environments. We first conduct meta-training in scene A, then fine-tune the meta-policy to navigate in scene B. The meta-trained agent transfers rapidly to unseen environments, but has limitations in transferring from urban to wild environments.
The training result of navigating in AirSim urban environments is shown in 6 example navigation episodes. The UAS starts from random locations and reaches the targets, which are marked by a white car.
We first conduct meta-training in scene 1 with 25 meta-tasks. Then, we transfer the meta-policy for navigation in scene 2 through fine-tuning. This video shows the fine-tuning results navigating in scene 2.
The real application of our end-to-end visual navigation algorithm in indoor environment. (Supplementary work, not included in this paper.)
BibTex Code Here