Why software testing needs explainable AI – TechBeacon

Get up to speed fast on the techniques behind successful enterprise application development, QA testing and software delivery from leading practitioners.
Why software testing needs explainable AI
Shift your UI testing to sprint zero by leveraging mockups and AI
Fast fixes for slow tests: How to unclog your CI pipeline
8+ programming languages that will keep you in demand
Here’s the right way to test links in your web apps
Trends and best practices for provisioning, deploying, monitoring and managing enterprise IT systems. Understand challenges and best practices for ITOM, hybrid IT, ITSM and more.
INSPIRE 20 Podcast series: Building an inclusive and diverse organization
Why most machine learning projects stumble
Digital transformation fails: Are you automating the wrong processes?
Why ITSM is key to your digital transformation
3 container orchestration challenges to tackle head on
All things security for software engineering, DevOps, and IT Ops teams. Stay out front on application security, information security and data security.
Put more ‘Sec’ in your DevSecOps
Rethink your DevSecOps: 4 key lessons for achieving security as code
Economics meets cybersecurity: A light at the end of the tunnel?
BSIMM12 is here: 9 key takeaways for software security teams
The shift from DevOps and security to DevSecOps: 5 key roadblocks
TechBeacon Guides are collections of stories on topics relevant to technology practitioners.
TechBeacon Guide: DevSecOps and Security as Code
TechBeacon Guide: SecOps Tooling
TechBeacon Guide: World Quality Report 2021-22
TechBeacon Guide: The State of SecOps 2021
TechBeacon Guide: Application Security Testing
Discover and register for the best 2021 tech conferences and webinars for app dev & testing, DevOps, enterprise IT and security.
Webinar: Maximizing Your IT Assets
Webinar: Get a Fast Pass to Full-Stack AIOps
Webinar: Access Mainframes Securely from the Cloud
Webinar: Best Practices to Protect Data in the Cloud
Webinar: Threat Hunting—Stories from the Trenches
Applications that use artificial intelligence and machine learning techniques present unique challenges to testers. These systems are largely black boxes that use multiple algorithms—sometimes hundreds of them—to process data in a series of layers and return a result.
While testing can be a complex endeavor for any application, at a fundamental level it involves making sure that the results returned are those expected for a given input. And with AI/ML systems, that’s a problem. The software returns an answer, but testers have no way to independently determine whether it’s the correct answer. It’s not always apparent, because testers don’t necessarily know what the right answer should be for a given set of inputs.
In fact, some application results may be laughable. Individual e-commerce recommendation engines often get it factually wrong, but as long as they collectively induce shoppers to add items to their carts, they are considered a business success. And how do you determine whether your ML application achieves the needed level of success before deployment?
So the definition of a correct answer depends not only on the application, but also on how accurate it’s required to be. If the answer has to be exact, that is straightforward, but how close is close enough? And will it always be close enough?
That’s ultimately the black hole for testers. If you don’t have a working statistical definition of accuracy that’s based on the needs of the problem domain, you can’t tell objectively whether or not a result is correct.
It gets worse from there. Testers may have no idea whether an answer is right or wrong, even for a binary answer. Under certain circumstances, it might be possible to go back to the training data and find a similar case, but there is still no obvious way to validate results under many circumstances.
Does it matter? Yes, and probably more so than in traditional business applications. The vast majority of results in a traditional business application can be easily classified as correct or incorrect. Testers don’t need to know about how the underlying algorithms operate, although it would be useful if they did. 
ML applications aren’t that apparent. A result may seem correct, but bias or misrepresented training data could make it wrong. But wrong answers can also result from using an incorrect ML model that occasionally or systematically produces less-than-optimal answers. That’s where explainable AI (XAI) can help.
XAI is a way of allowing an AI or ML application to explain why it came up with a particular result. By providing a defined path from input to output, XAI can allow a tester to understand the logic between inputs and outputs that may be otherwise impenetrable.
XAI is a young field, and most commercial AI/ML applications are not yet to the point of adopting it. Techniques behind the term are vaguely defined. While application users can gain confidence if they have some rationale that points to a result, any explanation also helps development and testing teams validate the algorithms and training data and make sure that the results accurately reflect the problem domain.
A fascinating example of an early XAI effort comes from Pepper, the SoftBank robot that responds to tactile stimulation. Pepper has been programmed to talk through its instructions as it is executing them. Talking through the instructions is a form of XAI, in that it enables users to understand why the robot is performing specific sequences of activities. Pepper will also identify contradictions or ambiguities through this process and knows when to ask for additional clarification.
Imagine how such a program feature can assist testers. Using test data, the tester can obtain a result, then ask the application how it obtained that result, working through the process of manipulating the input data so that the tester can document why the result is valid.
But that’s just scratching the surface; XAI has to serve multiple constituents. For developers, it can help validate the technical approach and algorithms used. For testers, it helps confirm correctness and quality. For end users, it is a way of establishing trust in the application.
So how does XAI work? There is a long way to go here, but there are a couple of techniques that show some promise. XAI operates off of the principles of transparency, interpretability, and explainability. 
Several techniques can help AI/ML applications with explainability. These tend to make quantitative assumptions as to how to qualitatively explain a particular result. 
Two common techniques are Shapley values and integrated gradients. Both offer quantitative measures that assess what each set of data or features contributes to a particular result.
Similarly, the contrastive explanations method is an after-the-fact computation that tries to isolate individual results in terms of why one result occurred over a competing one. In other words, why did it return this result and not that one? 
Once again, this is a quantitative measure that rates the likelihood of one result over another. The numbers give you the relative positioning of the strength of the input on the result.
Ultimately, because AI/ML applications rely on data, and the manipulation of that data has to use quantitative methods for explainability, we don’t have any way beyond data science to provide explanations. The problem is that numerical weights might find a role in interpretability, but are still a ways from true explainability.
AI/ML development teams need to understand and apply techniques such as these, for their benefit, and for the benefit of testers and users. In particular, without an explanation of the result at some level, it can be impossible for testers to determine whether or not the result returned is correct. 
To assure the quality and integrity of AI/ML applications, testers have to have a means of determining where results are derived from. XAI is a start, but it’s going to take some time to fully realize this technique.
Take a deep dive into the state of quality with TechBeacon’s Guide. Plus: Download the free World Quality Report 2021-22.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon’s Buyer’s Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon’s Guide
Get the best of TechBeacon, from App Dev & Testing to Security, delivered weekly.

Brought to you by

I’d like to receive emails from TechBeacon and Micro Focus to stay up-to-date on products, services, education, research, news, events, and promotions.
Check your email for the latest from TechBeacon.

Connect with Chris Hood, a digital strategist that can help you with AI.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2021 AI Caosuo - Proudly powered by theme Octo