Autonomous AI Agents
Sophia provides two autonomous agents which work to complete the request via a control loop which iteratively (re-)plans and calls the functions available to the agent.
At a high level they share the same internal state, agent memory, human-in-the loop, functional calling history etc.
The key difference is:
- The XML agent returns the desired function call(s) in a custom XML format
- The dynamic agent returns Python code. This is executed in a WebAssembly sandbox which has the LLM functions bound to the global scope.
The dynamic agent has the advantage of being able to perform multiple function calls and perform validation logic, which can significantly reduce the time and cost of running the agent depending on the tasks.
This custom prompt and parsing allows function calling on any sufficiently capable LLM. However, given the reasoning capabilities required for optimal plan generation and function selection, the best results will be from using the most capable frontier models. Currently, we prefer and recommend using Claude 3.5 Sonnet for the 'hard' LLM which is used by the autonomous agent control loop.
AgentContext
Built-in functions
The autonomous agents always have four function available:
Agent_requestFeedback
Agent_completed
Agent_saveMemory
Agent_deleteMemory
The requestFeedback
function in the system prompt lets the LLM know it can request a decision or additional details.
If you would like to have input at a particular step, then in your prompt ask the model to request feedback at that point.
Human-in-the-loop
Having a human in the loop is essential for any agent to handle a number of cases:
- Budget control - Multiple iterations of calls to frontier LLMs can quickly add up.
- Guidance - Keeping the AI on track after a few control loop iterations.
- Agent initiated feedback - Provide details/decisions asked for by the AI
- Error handling - Transient errors, configuration errors, balance exceeded errors and more can be fixed and then resume the agent. With some errors you may be able to give the agent guidance for a change in plan, or to research a solution.
- Verification - Manually verify function calls that could result in data loss, unwanted modifications etc.
Currently, if the Slack function is available, a message will be sent when a human-in-the-loop event occurs.
When the budget control or control loop iteration thresholds have been reached, then the console will require an Enter
keypress.
More configuration will be provided soon.
Considering that LLM sometimes generates inaccurate responses and non-executable codes, we develop a Self-Check mechanism to enable LLM’s autonomous evaluation and response improvement, which is inspired by recent approaches (Shinn et al., 2023; Dhuliawala et al., 2023). The mechanism comprises two checking operations: LLM rethinking and code verification. The former prompts LLM with a set of queries for self-assessment, ensuring compliance with specified criteria. The latter verifies the executability of discriminator functions with actual state inputs, providing feedback until all functions are executable.
Reasoning and agent research
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
AGENT-ORIENTED PLANNING IN MULTI-AGENT SYSTEMS https://www.youtube.com/watch?v=WigY-WigKf0
PROOF OF THOUGHT : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent
Finetuning research
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models