Sierra AI: A Wilco Experiment at Educative AI

Ever since the introduction of ChatGPT late last year, the potential of generative AI has boggled the minds of tech and business leaders around the world. While the tech world had been aware of LLMs for years, OpenAI’s creation brought it to the masses and started to turn on “idea lightbulbs” above creative heads.

‍

For us at Wilco, the introduction of API access to GPT 3.5 presented an opportunity to see the potential generative AI can have on online learning. We wanted to test how accurate GPT would be at generating step-by-step tutorials and developer challenges and how creative it would be at generating narratives to serve as content wrapping for those educative pieces.

‍

Before diving into the experiment, let’s talk a bit about Wilco. We started our journey in 2021, aiming to transform the way developers acquire skills. Instead of hands-off tutorials and tests in a sterile environment, we simulate real workplaces with real production environments and lifelike challenges. This way, developers can learn by doing, as they would on their real job, but faster and more efficiently because no company will introduce bugs into production just to teach devs.

‍

On our journey, we have also discovered another significant use case: demonstrating products and educating existing users about new features through self-guided, hands-on exploration.

‍

In LLMs, we saw the potential to scale our catalog much faster than before. Moreover, we decided to test whether we can flip the current paradigm of learning experiences on its head. Prior to the appearance of GPT and its competitors, people seeking to learn something new had to rely on the right content having been created beforehand. What would happen if educative content were to be generated on demand?

‍

The Experiment: Sierra AI

‍

The core of our experiment is a mini-product we have launched: Sierra AI. In a nutshell, it’s an AI agent that we created and trained to create Wilco quests. While GPT can be used to generate creative output quite easily, our aim was to ensure that its output is consistent with existing quests in terms of structure and narrative, and that it responds accurately enough for a wide variety of requests.

‍

The constraints we wanted the LLM to operate within ended up requiring several rounds of training, testing, and fine-tuning before we arrived at a result that could be used for a public-facing MVP.

‍

We launched Sierra on [date], and had [number] of quests generated within the first [number of days].

‍

Generated quests were not automatically published to the catalog, so they would not interfere with the experience of existing Wilco users. Instead, the creator could go over their creation, edit it, playtest it, and share it with others.

‍

All the while, we were looking at what GPT 4 generated and validating its creations ourselves to see how accurate and correct it is.

‍

‍

Narrative Results: A Resounding Success

‍

It would probably come as no surprise that GPT was excellent at generating realistic narratives from a very general world-building prompt. These even included moderately successful attempts at humor and an adequate variety of invented scenarios that conformed to the style of existing Wilco quests.

‍

Narrative scenarios for a quest designed to teach [thing] using a prompt [prompt] included, among others:

[Scenario 1]
[Scenario 2]
[Scenario 3]

‍

In no quests did we see any problematic narrative or something that we would not be willing to show our users.

‍

Factual Results: It’s Complicated

‍

By now, we all know about LLMs’ potential for hallucination. Sierra AI was no different: while its instructions were mostly correct, the percentage of quests with no factual mistakes at all was relatively low, at [percentage]. As it stands, we would not be comfortable letting it generate quests on its own for our users.

‍

But during the test, we discovered something else: Sierra was incredible at helping our business clients with quest templates for their products and needs. These templates, while not 100% correct, were creative enough and required only minimal modifications to become production-grade.

‍

Educative AI: The Dream VS. The Reality

‍

Did we think for a minute that educative AI would enable Wilco to rapidly close the content gap with the largest learning course providers in the world? Yes, we did. But the technology isn’t there yet, and even with rapid improvements, without an agreed-upon source of truth LLMs might never become trustworthy enough to produce professional content without human oversight.

‍

That said, for us Sierra is a resounding success. Its ability to spark prospects' imagination and get them through the often-tedious ideation stage in minutes is invaluable for our business processes. And that is more than enough for a first step.