My work focuses on safe and capable autonomy, particularly for navigation task.

Basant Sharma, Prajyot Jadhav, Pranjal Paul, K. Madhava Krishna, Arun Kumar Singh
Navigating unknown spaces with a single RGB camera is notoriously difficult because depth estimates from vision foundation models are often too noisy for reliable zero-shot collision checking. In MonoMPC, we bypass this issue. Instead of using noisy depth for direct collision-checking, we use it as contextual input for a learned collision model that predicts obstacle clearance distributions. Paired with a risk-aware MPC planner, we achieve faster and safer navigation in highly cluttered spaces.
Pranjal Paul, Vineeth Bhat, Tejas Salian, Mohd. Omama, Krishna Murthy J., Naveen Arulselvan, K. Madhava Krishna
What if an autonomous agent could localize across a 5km region using just a thousand points instead of millions? To solve this, we developed a novel city-scale localization method that replaces dense maps and GPS with a sparse topometric map of open-vocabulary landmarks. By leveraging foundation models to identify semantic landmarks and integrating them into a Recursive Bayesian framework, our system achieves robust global localization across diverse environments. The result is a highly efficient pipeline that reduces map density by 500x compared to standard SLAM methods like LOAM and Fast-LIO-SAM, without sacrificing localization reliability.
Pranjal Paul, Anant Garg, Tushar Choudhary, Arun Kumar Singh, K. Madhava Krishna
Most neural planners treat safety as a post-hoc operation, forcing downstream optimizers to do the heavy lifting to make trajectories feasible for controllers, while perception models act merely as priors. To make perception inherently planner-aware, we integrate planning directly into perception by formulating the planner as a differentiable optimization layer and tasking the perception module to predict planner-oriented entities, such as goal location, from language commands. This allows gradients to flow end-to-end, ensuring that language-driven goal predictions are not just semantic, but kinematically feasible by design. This tight integration of perception and planning successfully anticipated the architectures that are now standard in Vision-Language-Action (VLA) models.
Interested in collaborating? Get in touch.