AI and functional safety – the role of metrics

The integration of Artificial Intelligence (AI) into functional safety-critical systems need a rigorous adherence to established safety standards and thoughtful consideration of emerging AI-specific guidelines. Key established standards in the functional safety domain include IEC61508 (general), ISO26262 (automotive) and ISO13849 (machinery).  On the other hand, standardization in the area of AI is ongoing. For instance, several committees under the group  ISO/IEC JTC 1/SC 42  ( deal with various aspects of AI such as AI system life cycle process, guidance for AI applications, framework for AI systems using Machine Learning (ML), trustworthiness in AI and AI use cases. Among these, in particular ISO/IEC TR 5469:2024 (  is a technical report that describes the properties, related risk factors, available methods and processes relating to some aspects such as the use of AI within safety-related functions, Safety for AI-controlled equipment using non-AI safety functions and AI in the design and development of safety-related functions. Some examples use cases where we find AI-based functional safety systems include, but not limited to:

  • Automotive: autonomous driving systems handling tasks such as object detection and traffic sign recognition.
  • Healthcare: diagnostic systems used to analyse images and clinical data.
  • Industrial automation: predictive maintenance using AI to predict when machines might fail preventing accidents and ensuring continuous safe operations in industrial settings based on historical sensor data.
  • Energy: smart grid management systems to manage energy distribution and predicting load to prevent outages.

Quality assurance metrics

To ensure these AI-enabled systems operate reliably and safely within their intended environments, comprehensive quality assurance metrics are essential. These metrics help manage the inherent complexity of AI-enabled functional safety systems, facilitating predictable and understandable behaviour in dynamic and potentially unpredictable environments. In the following, some of the most important quality metrics for both the data model and the neural networks themselves are outlined. Further, some commonly known functional safety metrics are also discussed.

Data model metrics:

These metrics focuses on ensuring data quality, diversity and integrity, critical for training and validating AI models. Some of these metrics (but not all) include:

  • Completeness: Coverage of all relevant scenarios the AI may encounter.
  • Consistency: Uniform data formatting and labelling.
  • Accuracy: Reflecting true and validated outcomes.
  • Timeliness: Relevance of data in dynamic environments.

Neural Network (NN) model metrics:

These metrics are used to evaluate the performance and robustness of the AI models as defined in ISO/IEC TR29119-11 and ISO/IEC 24029-1. Some of the foundational metrics for assessing how well an AI model performs against known values, particularly crucial in safety-critical applications where decisions may directly impact human safety are accuracy, precision and recall. On the other hand, F1-score is especially useful in uneven class distribution scenarios, typical in safety applications. Further, adversarial robustness and generalization metrics evaluate the model’s ability to handle unexpected, novel inputs or deliberate attempts to mislead the model, which are critical for maintaining functionality under all conditions (e.g. fail operational system).

  • Accuracy: Measures the proportion of true results (both true positives and true negatives) among the total number of cases examined. For safety systems, high accuracy is essential to ensure the system behaves as expected in all situations.
  • Precision: Indicates the proportion of positive identifications that were correct. Precision is vital in scenarios where the cost of a false positive is high, such as in automated driving systems where unnecessary braking can cause accidents.
  • Recall: Measures the proportion of actual positives that were identified correctly. In safety-critical contexts, a high recall rate is crucial to ensure all potential threats or failures are recognized, minimizing the risk of missing a dangerous condition.
  • F1-score: Offers a balance between precision and recall by considering both false positives and false negatives. It is especially useful in uneven class distribution scenarios, typical in safety applications where ‘failures’ or ‘events’ are rare compared to normal operation.
  • Adversarial Robustness: Assesses the AI’s capability to withstand attacks designed to fool the model. This is crucial for safety systems where malicious attempts could lead to harmful decisions. Enhancing adversarial robustness involves techniques like adversarial training, where the model is trained on a mixture of normal and maliciously perturbed data.
  • Generalization: Measures how well the AI model performs on a new, unseen dataset or under conditions that differ from those during the training phase. Good generalization is imperative for functional safety as it ensures the AI system can operate safely across all expected operational environments without overfitting to the training data.

By identifying the most relevant metrics for each application, developers can concentrate on optimizing critical aspects of AI systems to enhance safety and reliability, particularly in worst-case scenarios.

Functional safety metrics:

These metrics are critical for assessing the system’s ability to function correctly in response to failures or malfunctions, ensuring overall system safety. Commonly known metrics include:

  • Safety Integrity Level (SIL): Establishing required risk reduction levels.
  • Failure Metrics (PFH, PFD): Estimating the probability and frequency of failures.
  • Reliability Metrics (MTTF, MTBF): Calculating average operating time between failures.
  • Diagnostic Coverage (DC): Assessing the effectiveness of diagnostic systems.

In summary, the metrics discussed above (note: not a comprehensive list) are integral to the quality measures needed for AI-based functionals safety systems, aiming to minimize risks and uphold safety across operational levels. Integrating these metrics with broader functional safety measures allows for a unified approach and a comprehensive framework to monitor and manage AI-enabled safety systems effectively throughout their lifecycle.

An interesting example of application of metrics and related research with respect to quality and safety aspects can be found in the case study below.

Case Study: Quality Assurance Metrics and Dataset Accuracy in AI-Based Predictive Maintenance Systems

The accuracy and reliability of datasets are essential in AI-based systems, especially those deployed in safety-critical environments like machinery predictive maintenance used also for safety diagnostic. My study here “Exploring the Impact of Dataset Accuracy on Machinery Functional Safety,” highlights how inaccuracies in data—arising from sensor noise, labelling errors, missing data, and outliers—can significantly degrade the performance of predictive maintenance systems. These challenges underscore the necessity for stringent quality assurance metrics in dataset preparation and maintenance. Thus, the study above provides experimental evidence of how data quality affects system reliability and offers recommendations for enhancing data handling practices to improve the safety and efficiency of machinery in manufacturing settings. 

By incorporating these metrics and recommendations, organizations can significantly minimize the risk of unplanned downtime and ensure that their machinery operates safely and efficiently, even in high-risk environments. This proactive approach to data quality is not just about maintaining system performance—it’s about safeguarding against potential failures that could have dire consequences for both safety and productivity.

Padma Iyenghar (innotec) presents this work at ENASE 2024, which is one among her two accepted papers at ENASE 2024. Padma also chairs a session on system and software quality at ENASE 2024.

Similar Posts