Obstacles Encountered by Machine Learning Systems in Self-Driving Cars

The future of transportation is rapidly evolving, with autonomous vehicles (AVs) set to play a significant role in our cities and beyond. As the AV industry expands, so too does the need for sophisticated machine learning (ML) platforms that can handle the complex challenges posed by this technology.

One of the key requirements for AVs is continuous learning and better integration with simulation, as the number of cars and geolocations increases. This is because the ML system for AVs is multimodal and multitask, with various models trained for tasks like pedestrian detection, vehicle detection, and spurious objects detection. The future of mobility includes the rapid scaling of fully autonomous taxis in major cities and the extension of AVs to other domains like trucking and local deliveries.

However, designing an ML platform for AVs presents several unique challenges.

Data Mining Challenges

Autonomous vehicle ML needs large and diverse datasets, often exceeding 100 terabytes, covering all possible driving scenarios, weather conditions, traffic patterns, and rare events. Managing data quality, imbalance (rare events vs. frequent ones), and automated validation pipelines is critical to reliably train models without bias or blind spots.

Heterogeneous Models

Multiple types of ML models—such as convolutional neural networks (CNNs) for spatial perception and Transformers for capturing long-range dependencies and temporal context—must be integrated. Combining these heterogeneous models helps improve object detection, scene understanding, and decision-making in complex driving situations but increases system complexity and computational demands.

Training Efficiency

Training these massive models on huge datasets requires scalable storage and high-performance computing infrastructure. Efficient distributed training pipelines with automated data validation become essential to manage computational costs and rapid iteration cycles. Additionally, frequent model retraining is needed to incorporate new data and adapt to changing environments.

Onboard Inference Optimization

Autonomous vehicles operate under strict latency and power constraints. Optimizing inference requires compressing models, reducing computational complexity, and leveraging hardware acceleration for real-time perception and control tasks. Models must be resilient to adverse weather, sensors noise, and unusual situations while maintaining safety-critical performance.

Ad-Hoc Training Requirements

In-field conditions may necessitate on-demand or incremental training to quickly adapt to novel scenarios, map updates, or sensor changes. This implies the platform supports flexible, possibly edge-based, training and model updates without a full offline retraining cycle. Managing data privacy and security during such updates is also important.

In summary, the ML platform must address huge data scale and quality, integrate diverse advanced model architectures, maintain efficient and scalable training workflows, optimize lightweight and reliable inference on embedded vehicle hardware, and support dynamic model updates to safely enable autonomous driving.

To meet these challenges, the ML platform should provide solutions for efficient model development and compute resource utilization, efficient input data processing pipelines to prevent accelerators from data starvation, and optimization solutions to ensure successful deployment of models on the car without losing accuracy. Despite the desire to build and train an end-to-end model for AVs, it may not be feasible in the near future due to limitations in onboard inference latency and model explainability requirements for each stage. The role of a ML platform is to simplify the interface from underlying ML infrastructures, allowing ML practitioners to focus on algorithm design and enhancement.

Autonomous vehicles are already running on the streets of metropolitan cities, providing robotaxi services. ML is used in various stages of autonomous vehicle design, including perception, behavior prediction, motion planning, control, mapping, and routing. The accuracy of the ML algorithms in AVs must meet high safety requirements, and an improvement of 1% could significantly improve driving safety.

As the AV industry continues to grow, the demands on ML platforms will only increase. The platform must be able to adapt and evolve to meet these challenges, ensuring the safe and efficient deployment of autonomous vehicles on our roads.