Spotlight: TCS Increases Automotive Software Testing Speeds by 2x Using NVIDIA Generative AI

Generative AI is transforming every aspect of the automotive industry, including software development, testing, user experience, personalization, and safety. With the automotive industry shifting from a mechanically driven approach to a software-driven one, generative AI is unlocking a world of possibilities.

Tata Consultancy Services (TCS) focuses on two major segments for leveraging generative AI in automotive:

Building features to enhance customer experience

Accelerating the software engineering lifecycle

Building features to enhance customer experience

Generative AI is the key to realizing fully autonomous vehicles (AVs) by enhancing AI-based algorithms for better decision-making. It generates and synthesizes datasets across all possibilities, from limited real-time data to training and testing data. This technology is instrumental in delivering vehicle personalization and user experiences. This can encompass a range of capabilities, including advanced search functionalities, language translations, in-car personal assistants, and intuitive recommendations for video and audio entertainment.

Accelerating the software engineering lifecycle

The goal of a software-defined vehicle (SDV) is to provide more flexibility and enrich the user experience, enabling customers to upgrade and update vehicle features based on their convenience. This has increased vehicle complexity, resulting in millions of lines of code. There is high demand for enabling feature-as-a-service models, where automotive features need to be developed and deployed within a few weeks.

Current processes and tools make this timeline nearly impossible. Here, generative AI has the potential to act as a companion to engineers, accelerating the software engineering lifecycle, including requirement analysis, design, development, and validation.

With these focus areas, TCS has built the Automotive Gen-AI Suite, leveraging TCS-patented algorithms developed with NVIDIA technologies. Figure 1 shows the architecture for text-based use cases in an off-board environment. Large language models (LLMs) are trained with the NVIDIA NeMo framework and fine-tuned using automotive domain-specific datasets using NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform.

This post explores one use case, unit level test case generation, including the approach and the key performance indicators (KPIs) used to measure success.

Figure 1. The architecture of the TCS Automotive Gen-AI solution leverages a variety of NVIDIA technologies

Test case generation from unstructured requirements

Creating test cases from unstructured system requirements is one of the most time-consuming steps within the software engineering lifecycle.

Figure 2. Source text and target text for test case generation

Currently, creating test case repositories for various automotive domains are mostly done manually, which is time-consuming and costly. Training and development of these test cases can take weeks.

To solve this industry-wide problem, TCS is using NVIDIA technologies to automatically generate test cases from unstructured text-based requirements.

LLMs can speed up the process and reduce costs with minimal intervention during validation. It can generate scenarios and corresponding test cases, which can then be validated by experts for accuracy and coverage.To generate test cases tailored to specific requirements, TCS carefully curates the dataset using an iterative process where TCS analyzes the output of a pretrained model and selects data, such as cases with lower accuracy or coverage, for further refinement. Using the NVIDIA NeMo framework, TCS fine-tunes the model on automotive-specific data using Parameter Efficient Fine-Tuning (PEFT) techniques like Low-Rank Adaptation (LoRA). Prompt selection is also a critical step, with prompt tuning incorporated to optimize KPIs.

For deployment, TCS uses NVIDIA NIM microservices on NVIDIA DGX H100 systems. The preprocessed input prompt is fed into a fine-tuned NeMo-based model, which has been trained with automotive knowledge. The base model used for fine-tuning is the Llama 3 8B Instruct model. After postprocessing, the output consists of test cases that help customers enhance their capabilities, serving as a companion tool.

Figure 3 illustrates the overall approach, from input requirements to output test cases. Input is preprocessed with techniques such as few-shot learning and prompt chaining. Context awareness for each use case is achieved using reference documents from the customer during the preprocessing steps, employing retrieval-augmented generation (RAG) techniques.

Figure 3. Approach for the test case generation from requirements

The training pipeline includes domain and task-specific curated data blocks, which, after cleaning, are fed into the fine-tuning block. This block utilizes PEFT with LoRA, rank 32, and the NVIDIA NeMo framework (NeMo: 24.05 container) with a fused Adam optimizer. The model is trained with steps ranging from 1 to 100 to prevent overfitting. TCS deploys the fine-tuned model using the NVIDIA NIM (Meta/Llama3-8b-instruct: 1.0.0) container. The generated output is then postprocessed and integrated with LangChain to produce the required output test cases.

Optimizing with the NVIDIA AI Enterprise software platform

TCS has leveraged NVIDIA NeMo to build these state-of-the-art models. The base LLMs were fine-tuned with our automotive-specific curated datasets using the LoRA technique. The fine-tuning was performed in the NeMo framework training container to improve GPU utilization.

Using NVIDIA NIM-based optimization, TCS achieved low latency (close to real time) and high throughput on NVIDIA DGX systems. Post-training quantization using NIM microservices including NVIDIA TensorRT-LLM helps reduce GPU utilization and latency. NIM also provides inferencing APIs, which can be directly invoked from application services.

Benchmarks to identify the best model

A comparison study of GPU utilization, training parameters, and output accuracies for TCS test case generator pipeline on different LLMs was conducted to select the most appropriate ones.

The TCS test case generation pipeline using NeMo starts with an input, which can be a specifications document from the customer, or a prompt based on these specs.

These inputs are fed to the LLM NIM microservices fine-tuned with auto-specific data.

The generated output is verified for incorrect and duplicate test cases. If needed, a new prompt is used to correct or generate more test cases, and the process is repeated.

Accuracy and coverage are used for comparison.

In terms of latency, on average NIM based inference is around 2.5x to 3x faster than other open-source-based direct inference scenarios with similar accuracies.

Figure 4 shows a comparison of pretrained models to help users identify the best model for a given use case and select the most suitable base model for the requirement. This comparison includes not just accuracy and number of test cases but also decision coverage, condition coverage, and Modified Condition Decision Coverage (MCDC).

Figure 4. Comparison of pretrained models based on accuracy and average number of test cases

*Considering requirements of test case scenario only

Decision coverage

Decision coverage evaluates decision points within a model, such as switch blocks or flow states, by calculating the percentage of simulation paths traversed during testing. Full coverage is achieved when all possible paths through these decision points are executed at least once.

Condition coverage

Condition coverage examines the logical combinations of inputs and state flow transitions. Full coverage is obtained when each input and transition condition in the model is tested to be both true and false at least once during the simulation.

MCDC

MCDC assesses the independence of logical inputs and transition conditions in a model. Full coverage is achieved when a change in one input or condition, independent of others, directly causes a change in the model’s output or triggers a transition.

Table 1 provides insight into why choosing the Llama 3 8B Instruct model fine-tuned with NVIDIA NIM in this case outperforms on accuracy, decision, and MCDC criteria. Note that online model inferencing and fine-tuning were not in the scope of this work, as the customer data is sensitive and all the training and inference TCS conducted was on offline mode only using TCS on-premises NVIDIA DGX H100 systems.

Comparison considering base as Llama 3 8B Instruct (pretrained versus fine-tuned)*ModelAccuracyDecisionConditionMCDCLlama 3 8B Instruct NVIDIA NIM pretrained87%84.5%88.56%71.22%Llama 3 8B Instruct NVIDIA NIM fine-tuned91%85.1%87.89%73.11%Table 1. Comparison of pretrained and fine-tuned Llama 3 8B Instruct model

*Considering requirements of test case scenario only

Using LLMs can reduce the cost and time for the training and development of automotive software. With minimal manual intervention, LLMs help understand the key requirements and write automated test cases. They can also generate scenarios and test cases, which experts can then validate for accuracy and coverage.

Conclusion

With the expertise in the generative AI and automotive domains, TCS has developed a highly efficient automotive test case generation pipeline using NVIDIA DGX H100 systems and software including NVIDIA NIM and NVIDIA NeMo. This model, fine-tuned with the NVIDIA NeMo framework, along with faster inference possible with NIM, resulted in accuracies and coverage higher than that of existing models available with low latency. TCS has observed ~2x acceleration in its overall test case generation pipeline.

TCS is also using NeMo and NIM to advance conversational LLMs, visual LLMs for context understanding, and image-based generative adversarial network models. TCS also will be using NVIDIA Blueprints to explore multimodal capabilities and will further refine the software engineering lifecycle.