Wizard of Oz experiment works

Avyay
Aug 9, 2021
4 min read

It's the year 1770 at the Schönbrunn Palace. A middle aged man, Kempelen, has returned to the palace just six months after having promised the monarch of Austria an illusion that would “topple all illusions”. He opens a box with several wires and cogs and allows everyone to inspect it. On the surface of it is a life-sized model of a human head and torso, with a black beard and grey eyes, and dressed in Ottoman robes and a turban, "the traditional costume of an oriental sorcerer". In front of the model is a chess board complete with an ivory set. Kempelen then tells everyone what he's built; an automaton which could play chess against any human and even do the knight’s tour (a game in which a knight is moved until all places on the board have been occupied).

This marked the first instance of display from where the machine which came to be known as the “Turk” would go on to beat the likes of Napoleon Bonaparte, Francois Andre Danican and even Benjamin Franklin. There was however, one big issue with the machine; the reason we didn’t study it in our high school history or computer science. It was a fake.

It had a small compartment inside which an operator would sit, controlling the hands of the Turk and playing the games.

Most philosophers, illusionists and thinkers were able to assert this to the media well within Kempelen’s life and his tour of Europe with the Turk. Ultimately, Kempelen died without having sold the machine which then made its way to America but I digress. The concept to be understood here is that of a fake Artificial Intelligence.

The Turk acts as a good overture to the main subject of this article. Having read “The Wonderful Wizard of Oz” or watched one of the countless musical reproductions or movies, you’d know the premise is essentially of a young girl Dorothy being swept away by a tornado into the magical land of Oz. While the overall theme is one of self sufficiency i.e. all the characters seek external “magic” to give them the qualities they already possess but fail to recognize, the part we will be looking at here is the wizard.

Dorothy learns of a wizard of Oz who can send her back to her world away from the magical yet dangerous land of Oz. The wizard however turns out to be a fake, using technology and illusion to appear as a real wizard. This acts as the premise for the Wizard of Oz experiments; a user interacting with a system they believe is autonomous but is actually controlled by a human. This has had several applications in the past; companies used it for testing out vending machines by simply having an actual person push out what was required rather than automate the machine. It has been used to test bots on slack and discord wherein users believe they are interacting with a bot but are actually interacting with a human typing out from a given set of instructions.

But why do companies use it? You can utilise the Wizard of Oz methodology to test how people react to a system before you even start thinking about development. This may be a novel concept you're not sure will work for your users or a project that will take a lot of time and money to develop, but you want to learn more before you commit the time and money, and it can't be tested with standard prototype tools.

Wizard of Oz is a flexible technique that allows concepts to be tested and adjusted without having to worry about time-consuming code changes, interruptions in daily testing, or complete development expenditures.

Integrating the technique is pretty simple. You develop a prototype which performs the expected end result only without any automation and specific use cases. Allow a user to enter their information and manually perform the processing needed to deliver an output. The user then submits feedback while assuming the application is completely automated.

In practice, the experiment is performed at varying levels of fidelity; Low, Medium and High. In essence, with an increase in fidelity you get a product which is expensive with insights on usability instead of utility. This provides a three point cluster for most applications. Low fidelity prototypes when tested give more results on whether the idea for the prototype is flawed or not. As an example, consider an application that would let you know how many restaurants are near you. Results on this would often be along the lines of the futile nature of it given that google maps does the same and more for a user. High fidelity prototypes yield results on specific use cases of the application. Consider an application which sends you a notification to remind you to exercise or meditate. Users may prefer having the notifications on their smartwatches or laptops instead of their phones.

The scope for Wizard of Oz experiments however has taken a dive over the past years. With the advent of platforms such as GPT3 using which you can make chatbots within a couple of minutes, it's just easier to dry test using bots themselves. However, it can still be used as a method to test the concept of an application, a new software or even hardware. Integrating this into the workflow of any project allows developers to better visualize the needs of users and cut down on revisualizing application workflow post release.