View from India: Chaos engineering - simulating failure to achieve success
A confluence of problem solving, learning methodologies and precise deliverable outcomes has forged a new business opportunity which can be described as chaos engineering.
Chaos engineering, which simulates failure in production and not in testing, is an emerging domain whose concept can be traced to the various stages of testing. Broadly speaking, testing is aided by data mining, which leads to validation, law enforcement and augmentation. Testing through log files comprising data helps predict failures, but that’s not enough: we need a prevention strategy. Chaos engineering helps prevent failures. Such prevention is crucial because when systems fail, it becomes a time-consuming and expensive proposition to re-do it.
“When complex large-scale distributed systems execute step-by-step rapid iterations, many things can go wrong,” said Madhusudan Shekar, principal evangelist, Amazon, speaking at the STeP-IN Summit 2018, the 15th international conference on software testing. “And the dimensionality of the problem will only increase with repeat iterations. This is where chaos engineering fits in. It is a disciplined approach that tackles the uncertainties in the system. It allows you to break stuff, only to make it better.”
With its origins in Netflix, chaos engineering proactively permeates every aspect of the production with its tools like Simian Army and Gremlin. It validates testing in production by introducing random and unpredictable behaviour which helps detect weaknesses or failures in the system. Thereby properties are built bottom up to handle any unexpected disturbances.
The fact that chaos engineering can prevent revenue loss, improve performance and maintenance is a benefit that’s passed on to the customer. However, it can work only when IT professionals, designers and developers across all operations coordinate.
Chaos engineering helps large tech companies understand their distributed systems and microservice architectures. Even traditional domains like banking and finance are beginning to leverage it. Although there’s awareness about chaos engineering, it needs to be introduced as a course study in the engineering discipline for it to come to mainstream.
One thing is certain: in the present scenario, ‘the system’ requires chaos engineering. Looking back, the system relied on IT to address various operations including sales and employee performance. As long as IT took care of key variables in the organisation, it was fine.
With time, key variables have become incidental as the spotlight is on the consumer. ‘Customer first’ is now ‘customer imperative’. In order to achieve user-centric, customer-driven outcomes, the software-to-production downtime should be decreased and this happens through testing. Consequently, testing is changing the rules of the game for incumbent IT players that largely include call centres, engineering services, business process outsourcing firms and infrastructure management and software companies.
“The dimensionality of testing takes into account parameters like functionality, performance, user context and experience. Testing should also address parameters like scalability, security and adaptability to work in different time zones across geographies. That’s how software is built into the system so that the development and operations (devops) cycle is not just deployable but offers multiple deployments through test measures,” felt Shekar. Automation, machine learning and algorithms are some of the tools used for testing, but in the case of large systems, predictive testing is insufficient. As preventative measures are required, software testing includes chaos engineering.
Software testing is paramount to achieve digital transformation. “Digital transformation is software-led transformation. The value exchange between the company and customer is a business that happens by identifying the needs of the customer and the software is built to address the requirement. In order to accelerate a value exchange between the company and customer, software testing is deployed,” reasoned Ingo Philipp, Tricentis.
Testing, however, has been around for a while. For its part, the STeP-IN forum brought together a community of testers 15 years ago. “At that time, independent user groups were formed in companies. With time, the outlook towards product management has changed as there’s exposure in processes. The tester industry, too, has matured,” said Vivek Mathur, general manager, Mindtree and co-president, STeP-IN, Bangalore Chapter. “The only way to keep pace is by working on projects that have cutting-edge technology or be part of organisations that will allow professionals to learn from one another,” added Mathur.