This documents dives into the high-level architecture of AI Town and its different layers. We'll first start with a brief overview and then go in-depth on each component. The overview should be sufficient for forking AI Town and changing game or agent behavior. Read on to the deep dives if you're interested or running up against the engine's limitations.
This doc assumes the reader has a working knowledge of Convex. If you're new to Convex, check out the Convex tutorial to get started.
AI Town is split into a few layers:
- The server-side game logic in
convex/aiTown
: This layer defines what state AI Town maintains, how it evolves over time, and how it reacts to user input. Both humans and agents submit inputs that the game engine processes. - The client-side game UI in
src/
: AI Town usespixi-react
to render the game state to the browser for human consumption. - The game engine in
convex/engine
: To make it easy to hack on the game rules, we've separated out the game engine from the AI Town-specific game rules. The game engine is responsible for saving and loading game state from the database, coordinating feeding inputs into the engine, and actually running the game engine in Convex functions. - The agent in
convex/agent
: Agents run as part of the game loop, and can kick off asynchronous Convex functions to do longer processing, such as talking to LLMs. Those functions can save state in separate tables, or submit inputs to the game engine to modify game state. Internally, our agents use a combination of simple rule-based systems and talking to an LLM.
So, if you'd like to tweak agent behavior but keep the same game mechanics, check out convex/agent
for the async work, and convex/aiTown/agent.ts
for the game loop logic.
If you would like to add new gameplay elements (that both humans and agents can interact with), add
the feature to convex/aiTown
, render it in the UI in src/
, and respond to it in convex/aiTown/agent.ts
.
If you have parts of your game that are more latency sensitive, you can move them out of engine into regular Convex tables, queries, and mutations, only logging key bits into game state. See "Message data model" below for an example.
AI Town's data model has a few concepts:
- Worlds (
convex/aiTown/world.ts
) represent a map with many players interacting together. - Players (
convex/aiTown/player.ts
) are the core characters in the game. Players have human readable names and descriptions, and they may be associated with a human user. At any point in time, a player may be pathfinding towards some destination and has a current location. - Conversations (
convex/aiTown/conversations.ts
) are created by a player and end at some point in time. - Conversation memberships (
convex/aiTown/conversationMembership.ts
) indicate that a player is a member of a conversation. Players may only be in one conversation at any point in time, and conversations currently have exactly two members. Memberships may be in one of three states:invited
: The player has been invited to the conversation but hasn't accepted yet.walkingOver
: The player has accepted the invite to the conversation but is too far away to talk. The player will automatically join the conversation when they get close enough.participating
: The player is actively participating in the conversation.
There are three main categories of tables:
- Engine tables (
convex/engine/schema.ts
) for maintaining engine-internal state. - Game tables (
convex/aiTown/schema.ts
) for game state. To keep game state small and efficient to read and write, we store AI Town's data model across a few tables. Seeconvex/aiTown/schema.ts
for an overview. - Agent tables (
convex/agent/schema.ts
) for agent state. Agents can freely read and write to these tables within their actions.
AI Town modifies its data model by processing inputs. Inputs are submitted by players and agents and
processed by the game engine. We specify inputs in the inputs
object in convex/aiTown/inputs.ts
.
Use the inputHandler
function to construct an input handler, specifying a Convex validator for
arguments for end-to-end type-safety.
- Joining (
join
) and leaving (leave
) the game. - Moving a player to a particular location (
moveTo
): Movement in AI Town is similar to RTS games, where the players specify where they want to go, and the engine figures out how to get there. - Starting a conversation (
startConversation
), accepting an invite (acceptInvite
), rejecting an invite (rejectInvite
), and leaving a conversation (leaveConversation
). To track typing indicators, you usestartTyping
andfinishSendingMessage
. These are imported fromgame/conversations.ts
. - Agent inputs are imported from
aiTown/agentInputs.ts
for things like remembering conversations, deciding what to do, etc.
Each of these inputs' implementation method checks invariants and updates game state as desired.
For example, the moveTo
input checks that the player isn't participating in a conversation,
throwing an error telling them to leave the conversation first if so, and then updates their
pathfinding state with the desired destination.
Other than when processing player inputs, the game state can change over time in the background as the simulation runs time forward. For example, if a player has decided to move along a path, their position will gradually update as time moves forward. Similarly, if two players collide into each other, they'll notice and replan their paths, trying to avoid obstacles.
We manage the tables for tracking chat messages in separate tables not affiliated with the game engine. This is for a few reasons:
- The core simulation doesn't need to know about messages, so keeping them out keeps game state small.
- Messages are updated very frequently (when streamed out from OpenAI) and benefit from lower input latency, so they're not a great fit for the engine. See "Design goals and limitations" below.
Messages (convex/schema.ts
) are in a conversation and indicate an author and message text.
Each conversation has a typing state in the conversations table that indicates that a player
is currently typing. Players can still send messages while another player is typing, but
having the indicator helps agents (and humans) not talk over each other.
The separate tables are queried and modified with regular Convex queries and mutations that don't directly go through the simulation.
Given the description of AI Town's game behavior in the previous section,
the AbstractGame
class in convex/engine/abstractGame.ts
implements actually running the simulation.
The game engine has a few responsibilities:
- Coordinating incoming player inputs, feeding them into the simulation, and sending their return values (or errors) to the client.
- Running the simulation forward in time.
- Saving and loading game state from the database.
- Managing executing the game behavior, efficiently using Convex resources and minimizing input latency.
AI Town's game behavior is implemented in the Game
subclass.
Users submit inputs through the insertInput
function, which inserts them into an inputs
table, assigning a
monotonically increasing unique input number and stamping the input with the time the server received it. The
engine then processes inputs, writing their results back to the inputs
row. Interested clients can subscribe
on an input's status with the inputStatus
query.
Game
provides an abstract method handleInput
that AiTown
implements with its specific behavior.
The Game
class specifies how it simulates time forward with the tick
method:
tick(now)
runs the simulation forward until the given timestamp- Ticks are run at a high frequency, configurable with
tickDuration
(milliseconds). Since AI town has smooth motion for player movement, it runs at 60 ticks per second. - It's generally a good idea to break up game logic into separate systems that can be ticked forward independently.
For example, AI Town's
tick
method advances pathfinding withPlayer.tickPathfinding
, player positions withPlayer.tickPosition
, conversations withConversation.tick
, andAgent.tick
for agent logic.
To avoid running a Convex mutation 60 times per second (which would be expensive and slow), the engine batches up many ticks into a step. AI town runs steps at only 1 time per second. Here's how a step works:
- Load the game state into memory.
- Decide how long to run.
- Execute many ticks for our time interval, alternating between feeding in inputs with
handleInput
and advancing the simulation withtick
. - Write the updated game state back to the database.
One core invariant is that the game engine is fully "single-threaded" per world, so there are never two runs of an engine's step overlapping in time. Not having to think about race conditions or concurrency makes writing game engine code a lot easier.
However, preserving this invariant is a little tricky. If the engine is idle for a minute and an input comes in, we want to run the engine immediately but then cancel its run after the minute's up. If we're not careful, a race condition may cause us to run multiple copies of the engine if an input comes in just as an idle timeout is expiring!
Our approach is to store a generation number with the engine that monotonically increases over time. All scheduled runs of the engine contain their expected generation number as an argument. Then, if we'd like to cancel a future run of the engine, we can bump the generation number by one, and then we're guaranteed that the subsequent run will fail immediately as it'll notice that the engine's generation number does not match its expected one.
The World
, Player
, Conversation
, and Agent
classes coordinate loading data into memory from the database,
modifying it according to the game rules, and serializing it to write back out to the database. Here's the flow:
- The Convex scheduler calls the
convex/aiTown/main.ts:runStep
action. - The
runStep
action callsconvex/aiTown/game.ts:loadWorld
to load the current game state. This query callsGame.load
, which loads all of a world's game state from the appropriate tables, and returns aGameState
object, which contains serialized versions of all of the players, agents, etc. - The
runStep
action passes theGameState
to theGame
constructor, which parses the serialized versions of all our game objects using their constructors. For example,new Player(serializedPlayer)
parses the database representation into the in-memoryPlayer
class. - The engine runs the simulation, modifying the in-memory game objects.
- At the end of a step, the framework calls
Game.saveStep
, which computes a diff of the game state since the beginning of the step and passes the diff to theconvex/aiTown/game.ts:saveWorld
mutation. - The
saveWorld
mutation applies the diff to the database, notices if any deleted objects need to be archived, updates theparticipatedTogether
graph, and kicks off any scheduled jobs to run. - Since the engine is the only mutator of game state, it continues to run steps for some amount of time without repeating steps 1 to 3 again.
Just as we assume that the game engine is "single threaded", we also assume that the game engine exclusively owns the tables that store game engine state. Only the game engine should programmatically modify these tables, so components outside the engine can only mutate them by sending inputs.
If we're only writing updates out to the database at the end of the step, and steps are only running at once per second, continuous quantities like position will only update every second. This, then, defeats the whole purpose of having high-frequency ticks: Player positions will jump around and look choppy.
To solve this, we track the historical values of quantities like position within a step, storing the value at the end of each tick. Then, the client receives both the current value and the past step's worth of history, and it can "replay" the history to make the motion smooth.
The game tracks these quantities at the end of each tick by feeding them to a HistoricalObject
. This object
efficiently tracks its changes over time and serializes them into a buffer that clients can use for replaying
its history. There are a few limitations on HistoricalObject
:
- Historical objects can only have numeric (floating point) values and can't have nested objects or optional fields.
- Historical objects must declare which fields they'd like to track.
We store each player's "location" (i.e. its position, orientation, and speed) in a HistoricalObject
and
write it to the worlds
document at the end of a step when computing a diff.
One guiding principle for AI Town's architecture is to keep the usage as close to "regular Convex" usage as possible. So,
game state is stored in regular tables, and the UI just uses regular useQuery
hooks to load that state and render
it in the UI.
The one exception is for historical tables, which feed in the latest state into a useHistoricalValue
hook that parses
the history buffer and replays time forward for smooth motion. To keep replayed time synchronized across multiple
historical buffers, we provide a useHistoricalTime
hook for the top of your app that keeps track of the current
time and returns it for you to pass down into components.
We also provide a useSendInput
hook that wraps useMutation
and automatically sends inputs to the server and
waits for the engine to process them and return their outcome.
Agents will execute any game state changes, and schedule operations to do anything that requires a long-lived request or accessing non-game tables. The flow generally is:
- Logic in
Agent.tick
can read and modify game state as time progresses, such as waiting until the agent is near another player to start talking. - When there is something that needs to talk to an LLM or read/write external data,
it calls
startOperation
with a reference to a Convex function: generally aninternalAction
. - This function can read state from game tables and other tables via
internalQuery
functions. - It executes long-running tasks, and can write data via
internalMutation
s. Game state should not be written, but rather submitted viainputs
(described in a previous section). - Inputs are submitted from actions with
ctx.runMutation(api.game.main.sendInput, {...})
from actions or viainsertInput
from mutations. They are referenced by their name as a string, likemoveTo
. - Inputs are defined with
inputHandler
and are given an instance of the AiTown game to modify, similar to the game loop. In fact, these are called as part of the game loop beforetickAgent
. - When an operation is done, it deletes the
inProgressOperation
. This is to ensure an agent only is trying to do one thing at a time. Agent.tick
then can observe the new game state and continue to make decisions.
The agent code calls into the conversation layer which implements the prompt engineering for
injecting personality and memories into the GPT responses. It has functions for starting a
conversation (startConversation
), continuing after the first message (continueConversation
), and
politely leaving a conversation (leaveConversation
). Each function loads structured data from the
database, queries the memory layer for the agent's opinion about the player they're talking with,
and then calls into the OpenAI client (convex/util/openai.ts
).
After each conversation, GPT summarizes its message history, and we compute an embedding of the summary text and write it into Convex's vector database. Then, when starting a new conversation with, Danny, we embed "What you think about Danny?", find the three most similar memories, and fetch their summary texts to inject into the conversation prompt.
To avoid computing the same embedding over and over again, we cache embeddings by a hash of their text in a Convex table.
AI Town's game engine has a few design goals:
- Try to be as close to a regular Convex app as possible. Use regular client hooks (like
useQuery
) when possible, and store game state in regular tables. - Be as similar to existing engines as possible, so it's easy to change the behavior. We chose a
tick()
based model for simulation since it's commonly used elsewhere and intuitive. - Decouple agent behavior from the game engine. It's nice to allow human players and AI agents to do all the same things in the game.
These design goals imply some inherent limitations:
- All data is loaded into memory each step. The active game state loaded by the game should be small enough to fit into memory and load and save frequently. Try to keep game state to less than a few dozen kilobytes: Games that require tens of thousands of objects interacting together may not be a good fit.
- All inputs are fed through the database in the
inputs
table, so applications that require very large or frequent inputs may not be a good fit. - Input latency will be around one RTT (time for the input to make it to the server and the response to come back) plus half the step size (for expected server input delay when the input's waiting for the next step). Historical values add another half step size of input latency since their values are viewed slightly in the past. As configured, this will roughly be around 1.5s of input latency, which won't be a good fit for competitive games. You can configure the step size to be smaller (e.g. 250ms) which will decrease input latency at the cost of adding more Convex function calls and database bandwidth.
- The game engine is designed to be single threaded. JavaScript operating over plain objects in-memory can be surprisingly fast, but if your simulation is very computationally expensive, it may not be a good fit on AI Town's engine today.