Understanding scenes representing real-world environments is a challenging problem at the intersection of Computer Vision research and Deep Learning and a necessary pre-requisite for Embodied AI. Embodied AI is an emerging field within Machine Learning that focuses on the challenges that need to be addressed for the successful deployment of edge devices such as drones and robots. In this setting estimating the semantics of an environment plays an essential role in addition to how it can be efficiently navigated to solve a variety of tasks that can involve other agents as well. The Embodiment Hypothesis states that intelligence emerges from the interaction of an agent and its perception of the environment it is embodied within. This project establishes an empirical study to validate this hypothesis for Deep Reinforcement Learning (DRL) agents trained on environments derived from simulations as well as real-world data. We use DRL as a methodology to model a Partially Observable Markov Decision Process (POMDP) that describes optimal processes for navigation-perception problems. The embodiment hypothesis implies that an end-to-end understanding of this problem should outperform conventional methods for training computer vision models. To address the limitations of existing DRL algorithms on this task, we propose a novel family of networks that learn to solve embodied tasks from sparse representations of the perceived data. Our aim is to enable a new paradigm for efficient production systems for drones to navigate complex environments and visually monitor infrastructures such as for autonomous fire risk assessment and asset monitoring.

Industry Partner(s):EthicalAi Inc

Academic Institution:The University of Guelph

Academic Researcher: Lei, Lei

Focus Areas: 5G/NextGen Networks, Cities, Energy, Environment & Climate, Transportation

Platforms: Cloud, GPU