At DriveX, we’ve developed an AI-powered web app (the kind you don’t need to download, but can open with a link on your browser) that helps glass repair companies instantly determine if a windshield needs repair or replacement. Customers simply take a photo via a link in their browser, and our AI analyses the damage to provide a recommendation.
To delve deeper into the development of this technology, I sat down with our CTO, Kentti, on a crisp November morning to ask him some challenging questions.
Okay, but what do you do as the CTO of DriveX?
My job is to create and implement the technical vision for DriveX’s deep-tech product. This involves staying up-to-date with the latest technologies and tools in the fields of computer vision and AI. The goal is to understand how to combine these technologies with traditional programming to develop a service that fills a gap in the market, offering a solution that delivers significant value to customers through cost savings or increased revenue.
If I were to use a metaphor, my role is like conducting an orchestra. The musicians represent various applications, servers, and networks, and my job is to ensure they all work together in harmony.
What’s the favourite part of your job?
My favourite part of the job is when a newly developed technology is used for the first time in a pilot project. That’s when we measure the product’s success against business KPIs. Seeing how users benefit is incredibly motivating. As an ongoing process, it’s also quite rewarding to act as the “doctor” for the system by eliminating pieces of code which limit scaling any systems to support DriveX’s rapid growth in usage.
How would you describe the AI development process at DriveX?
The process is centred around enabling a small team to create the maximum amount of valuable data as quickly as possible, avoiding months of manual work. In 2024, we’re fortunate to have tools that allow us to generate training data with machines. This means we can run thousands of processes simultaneously, each completing in seconds, simulating the work of a thousand people. This efficiency drives our training processes forward.
Phases:
- Planning, understanding the problem
- Selection of AI model type and architecture
- Collection of training data (+ labelling, if necessary)
- Model training
- Measuring model prediction performance against test data
How do you gather training data?
Training data can be gathered in two ways. First is quite time-intensive, where data operations specialists gather, label, and categorise the data manually. The other option is to source the data automatically, either from our databases or by using tools to render 3D models (of cars, for example) in simulated environments. For computer vision, the latter approach is feasible due to recent innovations by the developers of graphical rendering enginesc – synthetic data in 2024 looks very photo realistic.
What happens after the last phase?
If we identify weaknesses during performance measurement, and describe how to solve each individually. Where additional training data is needed, we incorporate new labels into the training dataset (e.g., if the model could be better at detecting windshield outlines on minibuses, we source additional minibus windshield labels). This additional training cycle (steps 3, 4, and 5 from above) continues for each model even after it is deployed live. AI development is like managing a living organism: as the system runs, we keep analysing its shortcomings and add targeted training data to produce specific enhancements.
How was the AI for windshield repair/replacement decisions developed?
First, we analysed how windshield repair companies operate today. We needed to understand the criteria for deciding between repair and replacement. This revealed that factors like the type, size, and location of damage are crucial.
Windshield repair algorithm development required three models working together to reach a decision. Each intermediate decision, such as determining the distance between damages, identifying cracks versus chips, or assessing proximity to the windshield edge, required separate models.
While our machine learning engineers worked on these models, our full stack engineers were integrating them to the DriveX SmartScan car inspection flow. The latter involves coding the triggers for executing AI inference in real-time and building the UX around acquiring full-windshield images and close-ups of the damage from users.
Our windshield repair algorithm is a blend of traditional software engineering and computer vision engineering.
What was the biggest challenge in AI for windshield repairs?
The biggest challenge was gathering enough high-quality training data. For example, to train a model differentiating between chips and cracks – types of damages – we sourced 50,000 detailed annotations outlining variants of both shapes of damages. This task required either a massive manual effort or a highly effective synthetic data generator. We pursued both.
One technique we used to speed up manual labelling is pseudo-labeling. By combining manually labelled bounding boxes with Meta’s SegmentAnything we set up a semi automatic segmentation labelling pipeline which doubled the rate of creating training data.
We also generated our own synthetic labels. We rendered 3D models of cars in various environments like forests, parking lots, or streets — where our customers inspect their cars. Then, we rendered graphical sprites, such as chips and cracks, onto the windshields.
If you weren’t building DriveX, what other AI projects would interest you?
I’d love to develop a chatbot for mental health first aid. It would engage with young people, asking insightful questions to help them see the positive side of life when they’re struggling. With a shortage of psychologists, such a chatbot could reduce waiting times and gather information about issues from people waiting on call centre queues.
There are new AI solutions and companies launching every day. How to stay competitive?
There’s quite little guidance online about how to use all of the tools that make machine learning engineers exponentially more productive. This is where we focus our innovation: finding entirely new ways to train models efficiently on large datasets by a small team. Speed to market with their products is, I believe, a key competitive advantage that separates leading AI companies from the rest.