We have a lot of great webinars and virtual events here at Applitools. I’m hoping posts like this give you a high-level summary of the key points with plenty of room for you to form your own impressions.

Dave Piacente

Curious if the software robots are here to take our jobs? Or maybe you’re not a fan of the AI hype train? During a recent session, The Future of AI-Based Test Automation, CTO Adam Carmi discussed—in practical terms—the current and future state of AI-based test automation, why it matters, and what you can do today to level up your automation practice.

  • He describes how AI can be used to overcome common everyday challenges in end-to-end test automation, how the need for skilled testers will only increase, and how AI-based tooling can help supercharge any automated testing practice.
  • He also puts his money where his mouth is by demonstrating how the neverending maintenance overhead of tests can be mitigated using AI-driven tooling which already exists today using concrete examples (e.g., visual validation and self-healing locators).
  • He also discusses the role that AI will play in the future, including the development of autonomous testing platforms. These platforms will be able to automatically explore applications, add validations, and fill gaps in test coverage. (Spoiler alert: Applitools is building one, and Adam shows a bit of a teaser for it using a real-time in-browser REPL to automate the browser which uses natural language similar to ChatGPT.)

You can watch the full recording and find the session materials here, and I’ve included a quick breakdown with timestamps for ease of reference.

  • Challenges with automating end-to-end tests using traditional approaches (02:34-10:22)
  • How AI can be used to overcome these challenges (10:23-44:56)
  • The role of AI in the future of test automation (e.g., autonomous testing) (44:57-58:56)
  • The role of testers in the future (58:57-1:01:47)
  • Q&A session with the speaker (1:01:48-1:12:30)

Want to see more? Don't miss Future of Testing: AI in Automation.

In our previous article, we learned what autonomous testing is all about: autonomous testing is when a tool can learn an app's behaviors and automatically execute tests against them. It...

In our previous article, we learned what autonomous testing is all about: autonomous testing is when a tool can learn an app’s behaviors and automatically execute tests against them. It then provides results to humans who can determine what’s good and what’s bad. Fully autonomous testing solutions are not yet available today, but we can get part of the way there with readily available tools. Let’s learn how to build our own semi-autonomous solution using Playwright and Applitools Eyes!

The solution sketch

Let’s say you have a website with multiple pages. It would be really nice if a test suite could visit each page and make sure it looks okay: no missing buttons, no overlapping text, and no other kinds of visual bugs. The tests wouldn’t be very sophisticated, but they’d quickly catch a lot of problems. They’d be like smoke tests.

There’s a straightforward way to do this. Most websites have a sitemap file that lists the links for all the pages. We could use a browser automation tool like Selenium, Cypress, or Playwright to visit each of those pages, and we could use a tool like Applitools Eyes to capture visual snapshots of each page. Every time we run the suite, it would automatically discover new pages and avoid removed pages. Existing pages would be checked for visual differences. We wouldn’t need to explicitly code what to check – the snapshots would implicitly check everything on the pages! With the Applitools Ultrafast Grid, we could even test these pages against different browsers, devices, and viewport sizes.

This kind of test suite is technically “autonomous” because we, as human testers, don’t need to explicitly write the tests. The sitemap provides the pages, and visual testing covers the assertions.

The website to test

Last year at Applitools, we replaced our old tutorial website with a new website that uses Docusaurus, a very popular documentation framework based on React. We also rewrote several of the guides for our most popular SDKs. Presently, the site has about 60 pages of varying length. Since many of the pages host similar content, we use components to avoid duplication in text and in code. However, that means any change could inadvertently break multiple pages.
Our tutorial site is currently hosted at

The tutorial site also has a sitemap file at

<urlset xmlns="" xmlns:news="" xmlns:xhtml="" xmlns:image="" xmlns:video="">


It would be very helpful to test this tutorial site with the semi-autonomous testing tool we just sketched out.

Writing the autonomous testing code

To follow along with this article, you can find the GitHub repository for the project at In the package.JSON file, you’ll find all the packages needed for this project:

  • Playwright
  • Applitools Eyes SDK for Playwright
  • ts-node

Since I’ll be developing my code in TypeScript instead of raw JavaScript, the ts-node package will allow me to run TypeScript files directly.

  "name": "auto-website-testing",
  "version": "1.0.0",
  "description": "A semi-autonomous testing project that visually tests all the pages in a website's sitemap",
  "main": "index.js",
  "scripts": {},
  "repository": {
     "type": "git",
     "url": "git+"
   "keywords": [],
   "author": "",
   "license": "ISC",
   "bugs": {
      "url": ""
   "homepage": "",
   "devDependencies": {
     "@applitools/eyes-playwright": "^1.13.0",
     "@playwright/test": "^1.29.1",
     "ts-node": "^10.9.1"

The main file in our project is autonomous.ts. This isn’t a typical Playwright test that has described blocks and test functions, but rather autonomous.ts is just a plain old script. The first thing we need to do is read in some environment variables. For test inputs, we’ll need the base URL of the target website. We’ll need the site name of the website for logging purposes and reporting purposes, and we’ll also need a level of test concurrency for the Applitools Ultrafast Grid, which will handle our visual snapshots. If you’re on a free Applitools account, you’ll be limited to one, but I’ll be using a bit more in this example.

import { chromium } from '@playwright/test';
import { BatchInfo, Configuration, VisualGridRunner, BrowserType, Eyes, Target } from '@applitools/eyes-playwright';

(async () => {

     // Read environment variables
     const BASE_URL = process.env.BASE_URL;
     const SITE_NAME = process.env.SITE_NAME;
     const TEST_CONCURRENCY = Number(process.env.TEST_CONCURRENCY) || 1;

Once we read those in, we want to validate those environment variables to make sure their values are given and they’re good. Then what we’ll do is we will figure out what the sitemap URL is. Basically, the sitemap URL will be the base URL plus this standard XML file name.

 // Validate environment variables
 if (!BASE_URL) {
    throw new Error('ERROR: BASE_URL environment variable is not defined');
 if (!SITE_NAME) {
    throw new Error('ERROR: SITE_NAME environment variable is not defined');

 // Parse the base and sitemap URLs
 const baseUrl = BASE_URL.replace(/\/+$/, '');
 const sitemapUrl = baseUrl + '/sitemap.xml';

Next, we will set up Applitools to be able to do visual testing. These are all fairly standard Applitools SDK objects. If you take one of our Applitools SDK tutorials, you’ll see things just like this. Basically, we’ll need a visual grid runner to connect to the Ultrafast Grid for rendering our snapshots. We’ll create a batch which will have the name of our site so that we can see reporting, and we’ll have a configuration object to specify things like the batch, the browsers, and all these other things we want.

 // Create Applitools objects

 let runner = new VisualGridRunner({ testConcurrency: TEST_CONCURRENCY });
 let batch = new BatchInfo({name: SITE_NAME});
 let config = new Configuration();
 let widthAndHeight = {width: 1600, height: 1200};
 let snapshotPromises: Promise<any>[] = [];

With the configuration, we’re going to set the batch and we’re going to add one browser to test. I want to test Chrome with this particular viewport size. If we wanted to, we could also test other browsers in the Ultrafast Grid such as Firefox, Safari, and Edge Chromium. Even if you don’t have those browsers installed on your local machine, it’s all going to be done in the Applitools cloud. For this example, we’ll use one browser.

 // Set Applitools configuration
 config.addBrowser(1600, 1200, BrowserType.CHROME);
 // config.addBrowser(1600, 1200, BrowserType.FIREFOX);
 // config.addBrowser(1600, 1200, BrowserType.SAFARI);
 // config.addBrowser(1600, 1200, BrowserType.EDGE_CHROMIUM);

Now comes the fun part of getting that sitemap file. What we’ll need to do is launch a browser through Playwright, we’ll just use Chromium. Then we’ll need to create a new browser context from that browser and we’ll give it a standard width and height viewport. Then we’ll get a page object from that context, because with Playwright, all interactions happen through a page object.

 // Set up a browser
 const browser = await chromium.launch();

 // Set up a sitemap context and page
 const sitemapContext = await browser.newContext({viewport: widthAndHeight});
 const sitemapPage = await sitemapContext.newPage();

Once we’ve got that page, now we can visit the sitemap page and we can try to find all of the page links inside that sitemap file. Even though it’s XML, Playwright can still parse it just like it’s a regular webpage. Once we’ve got that list of page links, then we’re going to close that session and so this Playwright session will be done.

 // Get the sitemap
 await sitemapPage.goto(sitemapUrl);
 const pageLinks = await sitemapPage.locator("loc").allTextContents();

Now that we have the list of all pages from the sitemap, we can visit each one and capture a snapshot. To do that, we’re going to iterate over that list with a for loop for each page we visit. We’re going to make a promise so that we can capture those snapshots asynchronously. In the background, for each page, we are going to create a new browser context and then from that context, create a new page object, and then start an Applitools Eyes session. This is what enables us to capture those visual snapshots.

 // Capture a snapshot for each page
 for (const link of pageLinks) {
     snapshotPromises.push((async () => {

       // Open a new page
       const linkContext = await browser.newContext({viewport: widthAndHeight});
       const linkPage = await linkContext.newPage();

       // Open Eyes
       const eyes = new Eyes(runner, config);
           link.replace(baseUrl, ''),

Taking the snapshot is pretty basic. We just visit the page and we say eyes.check, and we take a picture of the whole window. Once we do that, we can close our session so that Applitools knows that’s the only snapshot we’re taking. Firing these off in different promises means that we can run them asynchronously in the background, letting Applitools Eyes crunch through all of the visual validations, and that way we’re not waiting for them one at a time. They’ll just all go. Once we fired off all of the visual snapshots, we can wait for those promises to join at the end.

       // Take the snapshot
       await linkPage.goto(link);
       await eyes.check(link, Target.window().fully());
       console.log(`Checked ${link}`);

       // Close Eyes
       await eyes.close(false);
       console.log(`Closed ${link}`);


And finally, once that’s complete, we can close the browser and be done with testing.

 // Close all Eyes
 console.log('Waiting for all snapshots to complete...');
 await Promise.all(snapshotPromises);

 // Close the browser
 await browser.close();


That’s all there is to our autonomous testing script. It’s pretty concise – only about 80 lines long. It doesn’t take a whole lot of logic to write autonomous tests.

Running the tests

Let’s run the script to test it out. The website I’m going to use in this example is the Applitools tutorial site, which has about 60 pages. You can use any site you want as long as you set your environment variables. If you want to run this, you will need an Applitools account, which you can register for free with your email or GitHub account. Once you register an account, you’ll take your Applitools API key and set that as an environment variable.

We need to run our tests in the terminal. If this is your first time, you’ll need to run npm install to install the npm packages as well as npx playwright install to install the Playwright browsers. Once you’ve run those installers and set your environment variables, you just need to run npx ts-node autonomous.ts to run the script.

The script fetches the sitemap file, parsing out all of those links, and it’s firing off promises to start capturing visual snapshots for each one that’s happening asynchronously. The messages saying “Checked” link means those have now been initiated. What we’ll start to see is as they complete one by one in the Applitools Ultrafast Grid, we’ll see how the session becomes closed. Closed images code, JavaScript, closed mobile browser, that means one by one, the visual testing has been completed.

While the test is running, we can see results in the Applitools Eyes dashboard as a batch.

If I look at all the tests they’re popping in, we can see some of them are still running while others have passed.

If I want to see what the visual comparisons look like, I can compare them side by side. If there are no visual differences, the test for this page will pass.

You can see on the page it captures everything, the title bar, the sidebar table of contents, main body and footer. It will even scroll all the way to the bottom to make sure it gets the full page worth of contents.

No matter how long the page is, everything is being compared visually. It’s pretty cool to see just how many tests our autonomous testing solution can uncover. As I said before, the tutorial site right now has about 60 pages, which means that’s 60 tests that I didn’t have to explicitly write. Playwright plus Applitools Eyes took care of it all for me.

Improving the tool

This example of a semi-autonomous testing tool is rather basic. There are plenty of ways we could improve it:

  • Decoupling our tests: We could write separate scripts for fetching sitemap links and taking the visual snapshots. That would let us provide links to visit by ways other than a sitemap file. We could also add an intermittent step to filter links from a sitemap file.
  • Expanding test coverage: We could add settings to test different browsers, devices, and viewports. This would provide autonomous cross-browser and cross-device testing coverage for multiple screen sizes.
  • Target specific regions: We could exclude parts of pages like navigation bars and sidebars. This could allow us to target specific portions of a page in our tests while ignoring content that is dynamic or that we may want to test separately.

Even with the basics before any of these suggestions are implemented, this tool still provides a lot of value for a little bit of work. You can clone it from the GitHub repository and try it yourself. Let us know @Applitools how you use it!
Read more about Applitools Eyes.

Applitools is working on a fully autonomous testing solution that will be available soon. Reach out to our team to see a demo and learn more!

Autonomous testing – where automated tools figure out what to test for us – is going to be the next great wave in software quality. In this article, we'll dive...

Autonomous testing vs traditional testing

Autonomous testing – where automated tools figure out what to test for us – is going to be the next great wave in software quality. In this article, we’ll dive deeply into what autonomous testing is, how it will work, and what our workflow as a tester will look like with autonomous tools. Although we don’t have truly autonomous testing tools available yet, like self-driving cars, they’re coming soon. Let’s get ready for them!

What is autonomous testing?

So, what exactly is autonomous testing? It’s the next evolutionary step in efficiency.

A brief step back in time

Let’s take a step back in time. Historically, all testing was done manually. Humans needed to poke and prod software products themselves to check if they worked properly. Teams of testers would fill repositories with test case procedures and then grind through them en masse during release cycles. Testing moved at the speed of humans: it got done when it got done.

Then, as an industry, we began to automate our tests. Many of the tests we ran were rote and repetitive, so we decided to make machines run them for us. We scripted our tests in languages like Java, JavaScript, and Python. We developed a plethora of tools to interact with products, like Selenium for web browsers and Postman for APIs. Eventually, we executed tests as part of Continuous Integration systems so that we could get fast feedback perpetually.

With automation, things were great… mostly. We still needed to take time to develop the tests, but we could run them whenever we wanted. We could also run them in parallel to speed up turnaround time. Suites that took days to complete manually could be completed in hours, if not minutes, with automation.

Unfortunately, test automation is hard. Test automation is full-blown software development. Testers needed to become developers to do it right. Flakiness became a huge pain point. Sometimes, tests missed obvious problems that humans would easily catch, like missing buttons or poor styling. Many teams tried to automate their tests and simply failed because the bar was too high.

What we want is the speed and helpfulness of automation without the challenges in developing it. It would be great if a tool could look at an app, figure out its behaviors, and just go test it. That’s essentially what autonomous testing will do.

Ironically, traditional test automation isn’t fully automated. It still needs human testers to write the steps. Autonomous testing tools will truly be automated because the tool will figure out the test steps for us.

The car analogy

Cars are a great analogy for understanding the differences between manual, automated, and autonomous testing.
Manual testing is like a car with a manual transmission. As the driver, you need to mash the clutch and shift through gears to make the car move. You essentially become one with the vehicle.

Many classic cars, like this vintage Volkswagen Beetle, relied on manual transmissions. Beetle gear shifters were four-on-the-floor with a push-down lockout for reverse.

Automated testing is like a car with an automatic transmission. The car still needs to shift gears, but the transmission does it automatically for you based on how fast the car is going. As the driver, you still need to pay attention to the road, but driving is easier because you have one less concern. You could even put the car on cruise control!
Autonomous testing is like a self-driving car. Now, all you need to do is plug in the destination and let the car take you there. It’s a big step forward, and it enables you, now as a passenger, to focus on other things. Like with self-driving cars, we haven’t quite achieved full autonomous testing yet. It’s still a technology we hope to build in the very near future.

How will it work?

This probably goes without saying, but autonomous testing tools will need to leverage artificial intelligence and machine learning in order to learn an app’s context well enough to test it. For example, if we want to test a mobile app, then at the bare minimum, a tool needs to learn how phones work. It needs to know how to recognize different kinds of elements on a screen. It needs to know that buttons require tapping while input fields require text. Those kinds of things are universal for all mobile apps. At a higher level, it needs to figure out workflows for the app, like how humans would use it. Certain things like search bars and shopping carts may be the same in different apps, but domain specific workflows will be unique. Each business has its own special sauce.

This means that autonomous testing tools won’t be ready to use “out of the box.” Their learning models will come with general training on how apps typically work, but then they’ll need to do more training and learning on the apps they are targeted to test. For example, if I want my tool to test the Uber app, then the tool should already know how to check fields and navigate maps, but it will need to spend time learning how ridesharing works. Autonomous tools will, in a sense, need to learn how to learn. And there are three ways this kind of learning could happen.

Random trial and error

The first way is random trial and error. This is machine learning’s brute force approach. The tool could literally just hammer the app, trying to find all possible paths – whether or not they make sense. This approach would take the most time and yield the poorest quality of results, but it could get the job done. It’s like a Roomba, bonking around your bedroom until it vacuums the whole carpet.

Coaching from a user

The second way for the tool to learn an app is coaching from a user. Instead of attempting to be completely self-reliant, a tool could watch a user do a few basic workflows. Then, it could use what it learned to extend those workflows and find new behaviors. Another way this could work would be for the tool to provide a recommendation system. The tool could try to find behaviors worth testing, suggest those behaviors to the human tester, and the human tester could accept or reject those suggestions. The tool could then learn from that feedback: accepted tests signal good directions, while rejected tests could be avoided in the future.
Essentially, the tool would become a centaur: humans and AI working together and learning from each other. The human lends expertise to the AI to guide its learning, while the AI provides suggestions that go deeper than the human can see at a glance. Both become stronger through symbiosis.

Learning from observability data

A third way for an autonomous testing tool to learn app context acts like a centaur on steroids: learning from observability data. Observability refers to all the data gathered about an app’s real-time operations. It includes aspects like logging, system performance, and events. Essentially, if done right, observability can capture all the behaviors that users exercise in the app. An autonomous testing tool could learn all those behaviors very quickly by plunging the data – probably much faster than watching users one at a time.

What will a workflow look like?

So, let’s say we have an autonomous testing tool that has successfully learned our app. How do we, as developers and testers, use this tool as part of our jobs? What would our day-to-day workflows look like? How would things be different? Here’s what I predict.

Setting baseline behaviors

When a team adopts an autonomous testing tool for their app, the tool will go through that learning phase for the app. It will spend some time playing with the app and learning from users to figure out how it works. Then, it can report these behaviors to the team as suggestions for testing, and the team can pick which of those tests to keep and which to skip. That set then becomes a rough “baseline” for test coverage. The tool will then set itself to run those tests as appropriate, such as part of CI. If it knows what steps to take to exercise the target behaviors, then it can put together scripts for those interactions. Under the hood, it could use tools like Selenium or Appium.

Detecting changes in behaviors

Meanwhile, developers keep on developing. Whenever developers make a code change, the tool can do a few things. First, it can run the automated tests it has. Second, it can go exploring for new behaviors. If it finds any differences, it can report them to the humans, who can decide if they are good or bad. For example, if a developer intentionally added a new page, then the change is probably good, and the team would want the testing tool to figure out its behaviors and add them to the suite. However, if one of the new behaviors yields an ugly error, then that change is probably bad, and the team could flag that as a failure. The tool could then automatically add a test checking against that error, and the developers could start working on a fix.

Defining new roles

In this sense, the autonomous testing tool automates the test automation. It fundamentally changes how we think of test automation. With traditional automation, humans own the responsibility of figuring out interactions, while machines own the responsibility of making verifications. Humans need to develop the tests and code them. The machines then grind out PASS or FAIL. With autonomous testing, those roles switch. The machines figure out interactions by learning app behaviors and exercising them, while humans review those results to determine if they were desirable (PASS) or undesirable (FAIL). Automation is no longer a collection of rote procedures but a sophisticated change detection tool.

What can we do today?

Although full-blown autonomous testing is not yet possible, we can achieve semi-autonomous testing today with readily-available testing tools and a little bit of ingenuity. In the next article, we’ll learn how to build a test project using Playwright and Applitools Eyes that autonomously performs visual and some functional assertions on every page in a website’s sitemap file using Visual AI.

Applitools is working on a fully autonomous testing solution that will be available soon. Reach out to our team to see a demo and learn more!

Get all the latest test automation videos you need right here. All feature test automation experts sharing their knowledge and their stories.

Get all the latest test automation videos you need in one place.

It’s summertime (at least where I am in the US), and this year has been a hot one. Summer is a great season to take a step back, to reflect, and hopefully to relax. The testing world moves so quickly sometimes, and while we’re all doing our jobs it can be hard to find the time to just pause, take a deep breath, and look around you at everything that’s new and growing.

Here at Applitools, we want to help you out with that. While you’ve hopefully been enjoying the nice weather, you may not have had a chance to see every video or event that you might have wanted to, or you may have missed some new developments you’d be interested in. So we’ve rounded up a few of our best test automation videos of the summer so far in one place.

All speakers are brilliant testing experts and we’re excited to share their talks with you – you’ll definitely want to check them all out below.

ICYMI: A few months back we also rounded up our top videos from the first half of 2022.

The State of UI/UX Testing: 2022 Results

Earlier this year, Applitools set out to conduct an industrywide survey on the state of testing in the UI/UX space. We surveyed over 800 testers, developers, designers, and digital thought leaders on the state of testing user interfaces and experiences in modern frontend development. Recently, our own Dan Giordano held a webinar to go over the results in detail. Take a look below – and don’t forget to download your free copy of the report.

Front-End Test Fest 2022

Front-End Test Fest 2022 was an amazing event, featuring leading speakers and testing experts sharing their knowledge on a wide range of topics. If you missed it, a great way to get started is with the thought-provoking opening keynote for the event given by Andrew Knight, AKA the Automation Panda. In this talk, titled The State of the Union for Front End Testing, Andrew explores seven major trends in front end testing to help unlock the best approaches, tools and frameworks you can use.

For more on Front-End Test Fest 2022 and to see all the talks, you can read this dedicated recap post or just head straight to our video library for the event.

Cypress Versus Playwright: Let the Code Speak

There are a lot of opinions out there on the best framework for test automation – why not let the code decide? In the latest installment in our popular versus series, Andrew Knight backs Playwright and goes head to head with Cypress expert Filip Hric. Round for round, Filip and Andy implement small coding challenges in JavaScript, and the live audience voted on the best solution. Who won the battle? You’ll have to watch to find out.

Just kidding, actually – at Applitools we want to make gaining testing knowledge easy, so why would we limit you to just one way of finding the answer? Filip Hric summarizes the code battle (including the final score) in a great recap blog post right here.

Can’t get enough of Cypress vs Playwright? Us either. That’s why we’re hosting a rematch to give these two heavyweights another chance to go head to head. Register today for to be a part of the Cypress vs Playwright Rematch Event on September 8th!

Coded vs. Codeless Testing Tools—And the Space In Between

There are a lot of testing debates out there, and coded vs codeless testing tools is one of the big ones. How can you know which is better, and when to use one or the other? Watch this panel discussion to see leading automation experts discuss the current landscape of coded and codeless tools. Learn what’s trending, common pitfalls with each approach, how a hybrid approach could work, and more.

Your panel for this event includes our own Anand Bagmar and Andrew Knight, along with Mush Honda, Chief Quality Architect and Coty Resenblath, CTO, both from Katalon.

Autonomous Testing, Test Cloud Infrastructure, and Emerging Trends in Software Testing

Looking to get a handle on the where testing is heading in the future? Hear from our Co-Founder and CEO, Gil Sever, as he sits down for a Q&A with QA Financial to discuss the future of testing. Learn about the ways autonomous testing is transforming the market, advancements in the cloud and AI, and the ups and downs of where testing could go in the next few years. Gil also shares insights he’s learned from our latest State of UI/UX Testing survey.

Test Automation Stories from Our Customers

We know that every day you and countless others are innovating in the test automation space, encountering challenges and discovering – or inventing – impressive solutions. Our hope is that hearing how others have solved a similar problem will help you understand that you’re not alone in facing these obstacles, and that their stories will give you a better understanding of your own challenges and spark new ways of thinking.

Automating Manufacturing Quality Control with Visual AI

We all know about web and mobile regression testing, but did you know that Visual AI is solving problems in the manufacturing space as well? Jerome Rieul, Test Automation Architect, explains how a major Swiss luxury brand uses uses Visual AI to detect changes in CAD drawings and surface issues before they hit production lines. A great example of an out-of-the-box application of technology leading to fantastic results.

Simplifying Test Automation with Codeless Tools and Visual AI

Test automation can be hard, and many shops struggle to do it effectively. One way to lower the learning curve is to take advantage of a codeless test automation tool – and that doesn’t mean you have to forego advanced and time-saving capabilities like Visual AI. In this webinar Applitools’ Nikhil Nigam shares how Visual AI can integrate seamlessly with codeless tools like Selenium IDE, Katalon Studio, and Tosca to supercharge verifications and meet industrial-grade needs. (And for more on codeless testing tools, don’t forget to watch our lively panel discussion!)

How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months

Starting up test automation from scratch can be a daunting challenge – but it’s one that countless testing teams across the world have faced before you. In this informative talk, Greg Sypolt, VP of Quality Engineering, and Sneha Viswalingam, Director of Quality Engineering, both from EVERFI, share their journey. Learn about the tools they used, how they approached the project, and the time and productivity savings they achieved.

More to Come!

This is just a selection of our favorite test automation videos that we’ve shared with the community this summer. We’re continuously sharing more too – keep an eye on our upcoming events page to see what we have in store next.

What were your favorite videos? Check out our full video library here, and you can let us know your own favorites @Applitools.

"Full" test automation is approaching. We are riding the crest of the next great wave: autonomous testing. It will fundamentally change testing.

The word “automation” has become a buzzword in pop culture. It conjures things like self-driving cars, robotic assistants, and factory assembly lines. They don’t think about automation for software testing. In fact, many non-software folks are surprised to hear that what I do is “automation.”

The word “automation” also carries a connotation of “full” automation with zero human intervention. Unfortunately, most of our automated technologies just aren’t there yet. For example, a few luxury cars out there can parallel-park themselves, and Teslas have some cool autopilot capabilities, but fully-autonomous vehicles do not yet exist. Self-driving cars need several more years to perfect and even more time to become commonplace on our roads.

Software testing is no different. Even when test execution is automated, test development is still very manual. Ironic, isn’t it? Well, I think the day of “full” test automation is quickly approaching. We are riding the crest of the next great wave: autonomous testing. It’ll arrive long before cars can drive themselves. Like previous waves, it will fundamentally change how we, as testers, approach our craft.

Let’s look at the past two waves to understand this more deeply. You can watch the keynote address I delivered at Future of Testing: Frameworks 2022, or you can keep reading below.

Test Automations Next Great Wave

Before Automation

In their most basic form, tests are manual. A human manually exercises the behavior of the software product’s features and determines if outcomes are expected or erroneous. There’s nothing wrong with manual testing. Many teams still do this effectively today. Heck, I always try a test manually before automating it. Manual tests may be scripted in that they follow a precise, predefined procedure, or they may be exploratory in that the tester relies instead on their sensibilities to exercise the target behaviors.

Testers typically write scripted tests as a list of steps with interactions and verifications. They store these tests in test case management repositories. Most of these tests are inherently “end-to-end:” they require the full product to be up and running, and they expect testers to attempt a complete workflow. In fact, testers are implicitly incentivized to include multiple related behaviors per test in order to gain as much coverage with as little manual effort as possible. As a result, test cases can become very looooooooooooong, and different tests frequently share common steps.

Large software products exhibit countless behaviors. A single product could have thousands of test cases owned and operated by multiple testers. Unfortunately, at this scale, testing is slooooooooow. Whenever developers add new features, testers need to not only add new tests but also rerun old tests to make sure nothing broke. Software is shockingly fragile. A team could take days, weeks, or even months to adequately test a new release. I know – I once worked at a company with a 6-month-long regression testing phase.

Slow test cycles forced teams to practice Waterfall software development. Rather than waste time manually rerunning all tests for every little change, it was more efficient to bundle many changes together into a big release to test all at once. Teams would often pipeline development phases: While developers are writing code for the features going into release X+1, testers would be testing the features for release X. If testing cycles were long, testers might repeat tests a few times throughout the cycle. If testing cycles were short, then testers would reduce the number of tests to run to a subset most aligned with the new features. Test planning was just as much work as test execution and reporting due to the difficulty in judging risk-based tradeoffs.

A Waterfall release schedule showing overlapping cycles of Design, Development, Testing and Release.
Typical Waterfall release overlapping

Slow manual testing was the bane of software development. It lengthened time to market and allowed bugs to fester. Anything that could shorten testing time would make teams more productive.

The First Wave: Manual Test Conversion

That’s when the first wave of test automation hit: manual test conversion. What if we could implement our manual test procedures as software scripts so they could run automatically? Instead of a human running the tests slowly, a computer could run them much faster. Testers could also organize scripts into suites to run a bunch of tests at one time. That’s it – that was the revolution. Let software test software!

During this wave, the main focus of automation was execution. Teams wanted to directly convert their existing manual tests into automated scripts to speed them up and run them more frequently. Both coded and codeless automation tools hit the market. However, they typically stuck with the same waterfall-minded processes. Automation didn’t fundamentally change how teams developed software, it just made testing better. For example, during this wave, running automated tests after a nightly build was in vogue. When teams would plan their testing efforts, they would pick a few high-value tests to automate and run more frequently than the rest of the manual tests.

A table showing "interaction" on one column and "verification" in another, with sample test steps.
An example of a typical manual test that would have likely been converted to an automated test during this wave.

Unfortunately, while this type of automation offered big improvements over pure manual testing, it had problems. First, testers still needed to manually trigger the tests and report results. On a typical day, a tester would launch a bunch of scripts while manually running other tests on the side. Second, test scripts were typically very fragile. Both tooling and understanding for good automation had not yet matured. Large end-to-end tests and long development cycles also increased the risk of breakage. Many teams gave up attempting test automation due to the maintenance nightmare.

The first wave of test automation was analogous to cars switching from manual to automatic transmissions. Automation made the task of driving a test easier, but it still required the driver (or the tester) to start and stop the test.

The Second Wave: CI/CD

The second test automation wave was far more impactful than the first. After automating the execution of tests, focus shifted to automating the triggering of tests. If tests are automated, then they can run without any human intervention. Therefore, they could be launched at any time without human intervention, too. What if tests could run automatically after every new build? What if every code change could trigger a new build that could then be covered with tests immediately? Teams could catch bugs as soon as they happen. This was the dawn of Continuous Integration, or “CI” for short.

Continuous Integration revolutionized software development. Long Waterfall phases for coding and testing weren’t just passé – they were unnecessary. Bite-sized changes could be independently tested, verified, and potentially deployed. Agile and DevOps practices quickly replaced the Waterfall model because they enabled faster releases, and Continuous Integration enabled Agile and DevOps. As some would say, “Just make the DevOps happen!”

The types of tests teams automated changed, too. Long end-to-end tests that covered “grand tours” with multiple behaviors were great for manual testing but not suitable for automation. Teams started automating short, atomic tests focused on individual behaviors. Small tests were faster and more reliable. One failure pinpointed one problematic behavior.

Developers also became more engaged in testing. They started automating both unit tests and feature tests to be run in CI pipelines. The lines separating developers and testers blurred.

Teams adopted the Testing Pyramid as an ideal model for test count proportions. Smaller tests were seen as “good” because they were easy to write, fast to execute, less susceptible to flakiness, and caught problems quickly. Larger tests, while still important for verifying workflows, needed more investment to build, run, and maintain. So, teams targeted more small tests and fewer large tests. You may personally agree or disagree with the Testing Pyramid, but that was the rationale behind it.

The Testing Pyramid, showing a large amount of unit tests at the base, integration tests in the middle and end-to-end tests at the top.
The Classic Testing Pyramid

While the first automation wave worked within established software lifecycle models, the second wave fundamentally changed them. The CI revolution enabled tests to run continuously, shrinking the feedback loop and maximizing the value that automated tests could deliver. It gave rise to the SDET, or Software Development Engineer in Test, who had to manage tests, automation, and CI systems. SDETs carried more responsibilities than the automation engineers of the first wave.

If we return to our car analogy, the second wave was like adding cruise control. Once the driver gets on the highway, the car can just cruise on its own without much intervention.

Unfortunately, while the second wave enabled teams to multiply the value they can get out of testing and automation, it came with a cost. Test automation became full-blown software development in its own right. It entailed tools, frameworks, and design patterns. The continuous integration servers became production environments for automated tests. While some teams rose to the challenge, many others struggled to keep up. The industry did not move forward together in lock-step. Test automation success became a gradient of maturity levels. For some teams, success seemed impossible to reach.

Attempts at Improvement

Now, these two test automation waves I described do not denote precise playbooks every team followed. Rather, they describe the general industry trends regarding test automation advancement. Different teams may have caught these waves at different times, too.

Currently, as an industry, I think we are riding the tail end of the second wave, rising up to meet the crest of a third. Continuous Integration, Agile, and DevOps are all established practices. The innovation to come isn’t there.

Over the past years, a number of nifty test automation features have hit the scene, such as screen recorders and smart locators. I’m going to be blunt: those are not the next wave, they’re just attempts to fix aspects of the previous waves.

  1. Screen recorders and visual step builders have been around forever, it seems. Although they can help folks who are new to automation or don’t know how to code, they produce very fragile scripts. Whenever the app under test changes its behavior, testers need to re-record tests.
  2. Self-healing locators don’t deliver much value on their own. When a locator breaks, it’s most likely due to a developer changing the behavior on a given page. Behavior changes require test step changes. There’s a good chance the target element would be changed or removed. Besides, even if the target element keeps its original purpose, updating its locator is a super small effort.
  3. Visual locators – ones that find elements based on image matching instead of textual queries – also don’t deliver much value on their own. They’re different but not necessarily “better.” The one advantage they do offer is finding elements that are hard to locate with traditional locators, like a canvas or gaming objects.  Again, the challenge is handling behavior change, not element change.

You may agree or disagree with my opinions on the usefulness of these tools, but the fact is that they all share a common weakness: they are vulnerable to behavioral changes. Human testers must still intervene as development churns.

These tools are akin to a car that can park itself but can’t fully drive itself. They’re helpful to some folks but fall short of the ultimate dream of full automation.

The Third Wave: Autonomous Testing

The first two waves covered automation for execution and scheduling. Now, the bottleneck is test design and development. Humans still need to manually create tests. What if we automated that?

Consider what testing is: Testing equals interaction plus verification. That’s it! You do something, and you make sure it works correctly. It’s true for all types of tests: unit tests, integration tests, end-to-end tests, functional, performance, load; whatever! Testing is interaction plus verification.

At its core, testing is interaction plus verification

During the first two waves, humans had to dictate those interactions and verifications precisely. What we want – and what I predict the third wave will be – is autonomous testing, in which that dictation will be automated. This is where artificial intelligence can help us. In fact, it’s already helping us.

Applitools has already mastered automated validation for visual interfaces. Traditionally, a tester would need to write several lines of code to functionally validate behaviors on a web page. They would need to check for elements’ existence, scrape their texts, and make assertions on their properties. There might be multiple assertions to make – and other facets of the page left unchecked. Visuals like color and position would be very difficult to check. Applitools Eyes can replace almost all of those traditional assertions with single-line snapshots. Whenever it detects a meaningful change, it notifies the tester. Insignificant changes are ignored to reduce noise.

Automated visual testing like this fundamentally simplifies functional verification. It should not be seen as an optional extension or something nice to have. It automates the dictation of verification. It is a new type of functional testing.

The remaining problem to solve is dictation of interaction. Essentially, we need to train AI to figure out proper app behaviors on its own. Point it at an app, let it play around, and see what behaviors it identifies. Pair those interactions with visual snapshot validation, and BOOM – you have autonomous testing. It’s testing without coding. It’s like a fully-self-driving car!

Some companies already offer tools that attempt to discover behaviors and formulate test cases. Applitools is also working on this. However, it’s a tough problem to crack.

Even with significant training and refinement, AI agents still have what I call “banana peel moments:” times when they make surprisingly awful mistakes that a human would never make. Picture this: you’re walking down the street when you accidentally slip on a banana peel. Your foot slides out from beneath you, and you hit your butt on the ground so hard it hurts. Everyone around you laughs at both your misfortune and your clumsiness. You never saw it coming!

Banana peel moments are common AI hazards. Back in 2011, IBM created a supercomputer named Watson to compete on Jeopardy, and it handily defeated two of the greatest human Jeopardy champions at that time. However, I remember watching some of the promo videos at the time explaining how hard it was to train Watson how to give the right answers. In one clip, it showed Watson answering “banana” to some arbitrary question. Oops! Banana? Really?

IBM Watson is shown defeating other contestants with the correct answer of Bram Stoker in Final Jeopardy.
Watson (center) competing against Ken Jennings (left) and Brad Rutter (right) on Jeopardy in 2011. (Image source:

While Watson’s blunder was comical, other mistakes can be deadly. Remember those self-driving cars? Tesla autopilot mistakes have killed at least a dozen people since 2016. Autonomous testing isn’t a life-or-death situation like driving, but testing mistakes could be a big risk for companies looking to de-risk their software releases. What if autonomous tests miss critical application behaviors that turn out to crash once deployed to production? Companies could lose lots of money, not to mention their reputations.

So, how can we give AI for testing the right training to avoid these banana peel moments? I think the answer is simple: set up AI for testing to work together with human testers. Instead of making AI responsible for churning out perfect test cases, design the AI to be a “coach” or an “advisor.” AI can explore an app and suggest behaviors to cover, and the human tester can pair that information with their own expertise to decide what to test. Then, the AI can take that feedback from the human tester to learn better for next time. This type of feedback loop can help AI agents not only learn better testing practices generally but also learn how to test the target app specifically. It teaches application context.

AI and humans working together is not just a theory. It’s already happened! Back in the 90s, IBM built a supercomputer named Deep Blue to play chess. In 1996, it lost 4-2 to grandmaster and World Chess Champion Garry Kasparov. One year later, after upgrades and improvements, it defeated Kasparov 3.5-2.5. It was the first time a computer beat a world champion at chess. After his defeat, Kasparov had an idea: What if human players could use a computer to help them play chess? Then, one year later, he set up the first “advanced chess” tournament. To this day, “centaurs,” or humans using computers, can play at nearly the same level as grandmasters.

Gary Kasperov staring at a chessboard across the table from an operator playing for the Deep Blue AI.
Garry Kasparov playing chess against Deep Blue. (Image source:

I believe the next great wave for test automation belongs to testers who become centaurs – and to those who enable that transformation. AI can learn app behaviors to suggest test cases that testers accept or reject as part of their testing plan. Then, AI can autonomously run approved tests. Whenever changes or failures are detected, the autonomous tests yield helpful results to testers like visual comparisons to figure out what is wrong. Testers will never be completely removed from testing, but the grindwork they’ll need to do will be minimized. Self-driving cars still have passengers who set their destinations.

This wave will also be easier to catch than the first two waves. Testing and automation was historically a do-it-yourself effort. You had to design, automate, and execute tests all on your own. Many teams struggled to make it successful. However, with the autonomous testing and coaching capabilities, AI testing technologies will eliminate the hardest parts of automation. Teams can focus on what they want to test more than how to implement testing. They won’t stumble over flaky tests. They won’t need to spend hours debugging why a particular XPath won’t work. They won’t need to wonder what elements they should and shouldn’t verify on a page. Any time behaviors change, they rerun the AI agents to relearn how the app works. Autonomous testing will revolutionize functional software testing by lowering the cost of entry for automation.

Catching the Next Wave

If you are plugged into software testing communities, you’ll hear from multiple testing leaders about their thoughts on the direction of our discipline. You’ll learn about trends, tools, and frameworks. You’ll see new design patterns challenge old ones. Something I want you to think about in the back of your mind is this: How can these things be adapted to autonomous testing? Will these tools and practices complement autonomous testing, or will they be replaced? The wave is coming, and it’s coming soon. Be ready to catch it when it crests.

The post Autonomous Testing: Test Automation’s Next Great Wave appeared first on Automated Visual Testing | Applitools.
