How to build a conversational agent in less time than Cupid’s arrow takes to strike

What happens when you set out to build a fully functioning AI love guru with very little turnaround time? You get Eros – a delightfully unhelpful chatbot, with a knack for disastrous matchmaking and over-the-top romantic advice.

In this post, I’ll walk you through how we brought Eros to life faster than a whirlwind romance – and, more importantly, how you can do it too.

Breaking it down into three components

At its core, a conversational AI follows a three-step process:

1) Speech to text (ASR): Capturing user input and converting it into text.

2) Language processing (LLM): Determining the AI’s response based on the input and predefined system behavior.

3) Text to speech (TTS): Converting the response back into speech for the user.

For our project, we designed a Valentine’s-themed chatbot named Eros, an AI persona trained to deliver over-the-top romantic advice and hilariously mismatched love pairings.

Step 1: Automatic speech recognition (ASR)

The user interacts with the agent – this could be through their phone, a website, or another medium. They might say something like, "Hello, my name is Farah."

This spoken input is processed by our ASR system, which transcribes it into text. The quality of this transcription depends on the ASR model’s accuracy, which can be influenced by factors like background noise, accents, and speaking speed.

At this stage, we’ve successfully converted speech into a format the LLM can understand.

Step 2: Large language model (LLM) processing

Once we have the transcribed text, it moves into the LLM. The LLM requires two two key input:

The user query ( prompt): The text transcribed by ASR (e.g., "Hello, my name is Farah.")
System context: A set of predefined instructions that guide the LLM’s behavior.

The system context is typically structured as a YAML or embedded directly into the API call. For Eros, we defined instructions like:

"Your name is Eros. You are super cheesy, you give horrible relationship advice, and you make very incompatible matches."

These elements are passed to the LLM through an API call, this call can be made to the LLM of your choice. The LLM processes the input and generates a response based on its trained parameters and provided context.

So for Eros, the response might be:

"Hello, I'm the god of love! Let me tell you... you and your worst enemy? A perfect match!"

For this project, we used the simplest LLM setup-no function calls or retrieval-augmented generation (RAG).

Function calling allows an LLM to execute predefined functions in the codebase or interact with external APIs to retrieve live data, such as real-time weather updates. This is useful for AI systems that need to perform actions like booking reservations or fetching internal data. For example, a customer service bot could use function calling to check a user’s account balance.
RAG, on the other hand, enables the LLM to retrieve information from external sources like databases or documents before generating a response. A customer service bot, for instance, might use RAG to access a user’s recent order history.

Since Eros was purely rule-based, it relied solely on static context without these additional enhancements.

Step 3: Text-to-speech (TTS)

Once the LLM generates a response, the final step is converting it back into speech using TTS.

There are multiple TTS providers available, each offering different voice styles, tones, and accents. The choice of TTS can significantly influence the personality of the agent. For Eros, we selected a voice that exaggerated its ridiculously bad matchmaking skills – something dramatic and over-the-top.

The generated response is then played back to the user, completing the conversational loop.

The takeaway: conversational AI doesn’t have to be complex

What’s exciting about this project is how quickly you can go from concept to execution using just three core technologies.

Whether you’re designing an enterprise-wide customer service bot or a playful AI matchmaker that absolutely shouldn’t be trusted, the fundamentals remain the same.

Ready to hear and see Eros in action? Trust me when I say, take his advice with a grain of salt!

Mar 6, 2025 | Read time 4 min