LieGraph is a social deduction game based on the LangGraph framework, powered by AI agents. In this "Who is the Undercover" style game, intelligent AI players engage in reasoning, strategizing, and interacting in natural language to uncover the hidden undercover agent. LieGraph features autonomous AI agents, each with unique personalities and strategic thinking capabilities. It includes a dynamic identity inference system that continuously analyzes conversation history and voting patterns to deduce both their own and others' identities. A sophisticated probabilistic belief system is employed to track self‑belief confidence and construct suspicion matrices among players. Advanced strategic reasoning enables deception detection, alliance formation, and long‑term planning, with all critical speech and voting decisions implemented through LLM‑driven structured tools. LieGraph integrates metrics and evaluation mechanisms, automatically recording quality indicators such as win‑loss balance, identity recognition accuracy, and speech diversity, generating JSON reports to support prompt engineering evaluation and historical analysis. The entire game flow is meticulously orchestrated via LangGraph's StateGraph, providing complete logic from role assignment to final victory determination.
Each AI agent possesses a distinct personality and strategic mindset, participating in the game without human intervention. They adapt their speech and voting logic based on the evolving situation—some analyze cautiously, while others proactively guide—simulating the interactive dynamics of real players.
AI continuously analyzes conversation history and voting patterns, deducing others' identities while also verifying its own role. For example, civilians examine speech consistency to identify undercover agents, while undercover agents observe others' descriptions for weaknesses and adjust their disguise strategies.
All communication is conducted through natural language. AI responses align with their character settings, avoiding mechanical or unnatural expressions. Undercover agents deliberately use vague descriptions, whereas civilians convey information precisely—making the entire dialogue feel authentic to a real‑world game setting.
AI builds a "suspicion matrix," quantifying trust in other players with probabilities, while also recording confidence in its own role. These data points are updated after each round of speech and voting, serving as the basis for subsequent decisions.
Key data are automatically logged during gameplay, including win‑rate balance between civilians and undercover agents, identity recognition accuracy, speech diversity, etc. A JSON report is generated after each game, facilitating future prompt optimization or rule adjustments.
uv for Python package management.env file in the project root and configure LLM provider details:LLM_PROVIDER=openai
OPENAI_API_KEY="your-openai-api-key"
OPENAI_MODEL="gpt-4o-mini"
LLM_PROVIDER=deepseek
DEEPSEEK_API_KEY="your-deepseek-api-key"
DEEPSEEK_MODEL="deepseek-chat"
git clone https://github.com/leslieo2/LieGraph.git
cd LieGraph
# If uv is not installed
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
# Install frontend dependencies
cd ui-web/frontend
npm install
# Start the backend (from project root)
langgraph dev --config langgraph.json --port 8124 --allow-blocking
# Start the frontend (from ui-web/frontend directory)
npm start
http://localhost:3000 to begin playing.The entire game lifecycle is managed by LangGraph's StateGraph:
Edit config.yaml to adjust game parameters:
game:
player_count: 6 # Number of players
vocabulary:
- ["Shakespeare", "Dumas"] # English word pairs
- ["太阳", "月亮"] # Chinese word pairs
player_names:
- "Alice"
- "Bob" # Player names – more can be added
The game includes a built‑in metrics collector (src/game/metrics.py) that logs core data:
Data is automatically saved:
logs/metrics/{game_id}.jsonlogs/metrics/overall.jsonYou can also retrieve metrics programmatically:
from src.game.dependencies import build_dependencies
deps = build_dependencies()
collector = deps.metrics
audit = collector.get_overall_metrics()
score = collector.compute_quality_score() # Deterministic score
# For LLM‑assisted scoring: collector.compute_quality_score(method="llm", llm=client)