Publications

My work focuses on safe and capable autonomy, particularly for navigation task.

MonoMPC: Monocular Vision Based Navigation with Learned Collision Model and Risk-Aware Model Predictive Control

Basant Sharma, Prajyot Jadhav, Pranjal Paul, K. Madhava Krishna, Arun Kumar Singh

RA-L2026Risk-Aware MPCDepth Foundation ModelMonocular Vision-based Navigation

Navigating unknown spaces with a single RGB camera is notoriously difficult because depth estimates from vision foundation models are often too noisy for reliable zero-shot collision checking. In MonoMPC, we bypass this issue. Instead of using noisy depth for direct collision-checking, we use it as contextual input for a learned collision model that predicts obstacle clearance distributions. Paired with a risk-aware MPC planner, we achieve faster and safer navigation in highly cluttered spaces.

SparseLoc: Sparse Open-Set Landmark-based Global Localization for Autonomous Navigation

Pranjal Paul, Vineeth Bhat, Tejas Salian, Mohd. Omama, Krishna Murthy J., Naveen Arulselvan, K. Madhava Krishna

IROS2025Semantic MappingCLIPParticle Filter LocalizationKITTI

Paper ↗Project Page ↗Video ↗

What if an autonomous agent could localize across a 5km region using just a thousand points instead of millions? To solve this, we developed a novel city-scale localization method that replaces dense maps and GPS with a sparse topometric map of open-vocabulary landmarks. By leveraging foundation models to identify semantic landmarks and integrating them into a Recursive Bayesian framework, our system achieves robust global localization across diverse environments. The result is a highly efficient pipeline that reduces map density by 500x compared to standard SLAM methods like LOAM and Fast-LIO-SAM, without sacrificing localization reliability.

LeGo-Drive: Language-enhanced Goal-Oriented Closed-Loop End-to-End Autonomous Driving

Pranjal Paul, Anant Garg, Tushar Choudhary, Arun Kumar Singh, K. Madhava Krishna

IROS · RSS Workshop2024Autonomous DrivingVision-LanguageTrajectory OptimizationPlanning

Paper ↗Project Page ↗Video ↗

Most neural planners treat safety as a post-hoc operation, forcing downstream optimizers to do the heavy lifting to make trajectories feasible for controllers, while perception models act merely as priors. To make perception inherently planner-aware, we integrate planning directly into perception by formulating the planner as a differentiable optimization layer and tasking the perception module to predict planner-oriented entities, such as goal location, from language commands. This allows gradients to flow end-to-end, ensuring that language-driven goal predictions are not just semantic, but kinematically feasible by design. This tight integration of perception and planning successfully anticipated the architectures that are now standard in Vision-Language-Action (VLA) models.

Interested in collaborating? Get in touch.