Architecture
dimos/agents/mcp/mcp_client.py) is a Module with:
human_input: In[str]: receives text fromhumancli,WebInput, oragent-sendagent: Out[BaseMessage]: publishes agent responses (text, tool calls, images)agent_idle: Out[bool]: signals when the agent is waiting for input
gpt-4o and you need to provide an OPENAI_API_KEY environment variable. On startup, it discovers all @skill-annotated methods across deployed modules via RPC and exposes them as LangChain tools.
Skills
Skills are methods decorated with@skill on any Module. The agent discovers them automatically at startup.
- Parameters must be JSON-serializable primitives (
str,int,float,bool,list,dict). - Docstrings become the tool description the LLM sees. Write them clearly so the agent has sufficent context.
- The function must return a string or image which with be used by the agent to decide what to do next.
Built-in Skills
| Skill | Module | Description |
|---|---|---|
relative_move(forward, left, degrees) | UnitreeSkillContainer | Move robot relative to current position |
execute_sport_command(command_name) | UnitreeSkillContainer | Unitree sport commands (sit, stand, flip, etc.) |
wait(seconds) | UnitreeSkillContainer | Pause execution |
observe() | GO2Connection | Capture and return current camera frame |
navigate_with_text(query) | NavigationSkillContainer | Navigate to a location by description |
tag_location(name) | NavigationSkillContainer | Tag current position for later recall |
stop_navigation() | NavigationSkillContainer | Cancel current navigation goal |
follow_person(query) | PersonFollowSkill | Visual servoing to follow a described person |
stop_following() | PersonFollowSkill | Stop person following |
speak(text) | SpeakSkill | Text-to-speech through robot speakers |
where_am_i() | GoogleMapsSkillContainer | Current street/area from GPS |
get_gps_position_for_queries(queries) | GoogleMapsSkillContainer | Look up GPS coordinates |
set_gps_travel_points(points) | GPSNavSkill | Navigate via GPS waypoints |
map_query(query) | OsmSkill | Search OpenStreetMap with VLM |
MCP
All agentic blueprints use two modules:McpServer and McpClient.
McpServerexposes the methods annotated with@skillas MCP tools. Any external client can connect to the server to use the MCP tools.McpClienthas a LangGraph LLM which calls MCP tools fromMcpServer.
Input Methods
| Method | How it works |
|---|---|
humancli | Standalone terminal — type messages, see responses |
dimos agent-send "text" | One-shot CLI command via LCM |
WebInput | Web interface at localhost:7779 with optional Whisper STT |
Models
| Config | Model | Notes |
|---|---|---|
| Default | gpt-4o | Best quality, requires OPENAI_API_KEY |
ollama:llama3.1 | Local Ollama | Requires ollama serve running |
| Custom | Any LangChain-compatible | Set via McpClient.blueprint(model="...") |
