Skip to content

Agents API Reference

cs_copilot.agents

Cs_copilot Agents Package

This package provides a comprehensive system for creating and managing AI agents specialized in cheminformatics tasks.

Public API:

Agent Creation (Recommended): create_agent(agent_type, model, **kwargs) - Create agents by type list_available_agent_types() - List all available agent types

Team Coordination

get_cs_copilot_agent_team(model, **kwargs) - Multi-agent team with intelligent coordination

Utilities

get_last_agent_reply(agent) - Extract last message from agent

Available Agent Types (5-Agent Architecture):

Core Agents: - "chembl_downloader" - Download and process bioactivity data from ChEMBL database - "gtm_agent" - Unified GTM operations (build, load, density, activity, project) with smart caching - "chemoinformatician" - Comprehensive chemoinformatics (chemotype, clustering, SAR, similarity, QSAR) - "report_generator" - Universal presentation layer for all analysis types - "autoencoder" - Molecular generation via LSTM autoencoders (standalone + GTM-guided)

Testing/Evaluation: - "robustness_evaluation" - Analyze robustness test results and metrics

Agent Capabilities Breakdown:

Chemoinformatician (Most Versatile): - Chemotype/Scaffold Analysis: Extract and analyze molecular frameworks - Clustering: Group molecules by structural similarity (k-means, hierarchical, DBSCAN) - SAR Analysis: Structure-Activity Relationships, activity cliffs, matched molecular pairs - Similarity/Diversity: Molecular similarity, diversity metrics, nearest neighbors - QSAR Modeling: Extensible framework for predictive modeling (tools to be added)

AgentConfig dataclass

Configuration for creating an agent.

Source code in src/cs_copilot/agents/factories.py
@dataclass
class AgentConfig:
    """Configuration for creating an agent."""

    name: str
    description: str
    tools: List[Any] = field(default_factory=list)
    instructions: List[str] = field(default_factory=list)
    session_state: Dict[str, Any] = field(default_factory=dict)

    def validate(self) -> None:
        """Validate the agent configuration."""
        if not self.name:
            raise ValueError("Agent name cannot be empty")
        if not self.description:
            raise ValueError("Agent description cannot be empty")
        if not isinstance(self.tools, list):
            raise TypeError("Tools must be a list")
        if not isinstance(self.instructions, list):
            raise TypeError("Instructions must be a list")

validate()

Validate the agent configuration.

Source code in src/cs_copilot/agents/factories.py
def validate(self) -> None:
    """Validate the agent configuration."""
    if not self.name:
        raise ValueError("Agent name cannot be empty")
    if not self.description:
        raise ValueError("Agent description cannot be empty")
    if not isinstance(self.tools, list):
        raise TypeError("Tools must be a list")
    if not isinstance(self.instructions, list):
        raise TypeError("Instructions must be a list")

AgentCreationError

Bases: Exception

Exception raised when agent creation fails.

Source code in src/cs_copilot/agents/factories.py
class AgentCreationError(Exception):
    """Exception raised when agent creation fails."""

    pass

BaseAgentFactory

Bases: ABC

Base class for creating agents with common configuration and error handling.

Source code in src/cs_copilot/agents/factories.py
class BaseAgentFactory(ABC):
    """Base class for creating agents with common configuration and error handling."""

    def __init__(self, logger: Optional[logging.Logger] = None):
        self.logger = logger or logging.getLogger(__name__)

    @abstractmethod
    def get_agent_config(self) -> AgentConfig:
        """Return the configuration for this agent type."""
        pass

    def create_agent(
        self,
        model: Model,
        markdown: bool = True,
        debug_mode: bool = False,
        enable_mlflow_tracking: bool = True,
        **kwargs,
    ) -> Agent:
        """Create an agent with error handling and validation.

        Args:
            model: Model to use for the agent
            markdown: Whether to enable markdown formatting
            debug_mode: Whether to enable debug mode
            enable_mlflow_tracking: Whether to enable MLflow tracking for this agent
            **kwargs: Additional keyword arguments for agent creation

        Returns:
            Created agent instance
        """
        try:
            config = self.get_agent_config()
            config.validate()

            # Log agent creation
            self.logger.info(f"Creating agent: {config.name}")

            # Create agent with common parameters
            agent_kwargs = {
                "model": model,
                "name": config.name,
                "description": config.description,
                "tools": config.tools,
                "markdown": markdown,
                "debug_mode": debug_mode,
                "enable_agentic_state": True,
                "add_session_state_to_context": True,
            }

            # Add optional parameters if they exist
            if config.instructions:
                agent_kwargs["instructions"] = config.instructions
            if config.session_state:
                agent_kwargs["session_state"] = config.session_state

            # Add any additional kwargs passed in
            agent_kwargs.update(kwargs)

            agent = Agent(**agent_kwargs)

            # Wrap agent methods with MLflow tracking if enabled
            if enable_mlflow_tracking:
                agent = self._wrap_agent_with_tracking(agent, config)

            self.logger.info(f"Successfully created agent: {config.name}")
            return agent

        except Exception as e:
            self.logger.error(
                f"Failed to create agent {config.name if 'config' in locals() else 'unknown'}: {str(e)}"
            )
            raise AgentCreationError(f"Failed to create agent: {str(e)}") from e

    def _wrap_agent_with_tracking(self, agent: Agent, config: AgentConfig) -> Agent:
        """Wrap agent execution methods with MLflow tracking.

        Args:
            agent: Agent instance to wrap
            config: Agent configuration

        Returns:
            Agent with wrapped methods
        """
        try:
            from cs_copilot.tracking import get_tracker
            from cs_copilot.tracking.utils import build_prompt_signature

            tracker = get_tracker()

            if not tracker.is_enabled():
                return agent

            # Get the agent type from the factory
            agent_type = getattr(self.__class__, "agent_type", None)

            def build_prompt_template() -> Optional[str]:
                sections = []
                if config.description:
                    sections.append(str(config.description).strip())
                if config.instructions:
                    normalized = [
                        str(item).strip() for item in config.instructions if item is not None
                    ]
                    instructions_text = "\n".join(normalized).strip()
                    if instructions_text:
                        sections.append(instructions_text)
                template = "\n\n".join([section for section in sections if section])
                return template.strip() if template else None

            def build_prompt_name() -> str:
                base_name = agent_type or config.name
                safe_name = str(base_name).replace(" ", "_").lower()
                return f"cs_copilot.{safe_name}"

            prompt_template = build_prompt_template()
            prompt_signature = build_prompt_signature(prompt_template)
            prompt_registry_name = build_prompt_name()

            def register_prompt_in_registry():
                if not prompt_template:
                    return
                commit_message = None
                if prompt_signature:
                    commit_message = f"cs_copilot auto update ({prompt_signature.version})"
                prompt_obj = tracker.register_prompt_version(
                    name=prompt_registry_name,
                    template=prompt_template,
                    commit_message=commit_message,
                    tags={
                        "agent_name": agent.name,
                        "agent_type": agent_type or "unknown",
                        "component": "cs_copilot",
                    },
                )
                if prompt_obj:
                    version = getattr(prompt_obj, "version", None)
                    tracker.log_params(
                        {
                            "prompt_registry_name": prompt_registry_name,
                            "prompt_registry_version": str(version) if version is not None else "",
                            "prompt_registry_uri": (
                                f"prompts:/{prompt_registry_name}/{version}"
                                if version is not None
                                else ""
                            ),
                        }
                    )

            # Wrap run() method
            original_run = agent.run

            def tracked_run(*args, **kwargs):
                # Extract prompt from args
                prompt = args[0] if args else kwargs.get("message", "")

                with tracker.track_agent_run(
                    agent_name=agent.name, prompt=str(prompt), agent_type=agent_type
                ):
                    # Log agent configuration
                    tracker.log_params(
                        {
                            "agent_name": agent.name,
                            "agent_type": agent_type or "unknown",
                            "num_tools": len(config.tools),
                            "tools": ",".join([t.__class__.__name__ for t in config.tools]),
                        }
                    )
                    register_prompt_in_registry()

                    result = original_run(*args, **kwargs)

                    # Log result metrics if available
                    if hasattr(result, "content") and result.content:
                        from cs_copilot.tracking.utils import count_tokens

                        tracker.log_metrics(
                            {"output_tokens_estimate": float(count_tokens(result.content))}
                        )

                    return result

            agent.run = tracked_run

            # Wrap arun() method (async version)
            original_arun = agent.arun

            async def tracked_arun(*args, **kwargs):
                # Extract prompt from args
                prompt = args[0] if args else kwargs.get("message", "")

                with tracker.track_agent_run(
                    agent_name=agent.name, prompt=str(prompt), agent_type=agent_type
                ):
                    # Log agent configuration
                    tracker.log_params(
                        {
                            "agent_name": agent.name,
                            "agent_type": agent_type or "unknown",
                            "num_tools": len(config.tools),
                            "tools": ",".join([t.__class__.__name__ for t in config.tools]),
                        }
                    )
                    register_prompt_in_registry()

                    result = await original_arun(*args, **kwargs)

                    # Log result metrics if available
                    if hasattr(result, "content") and result.content:
                        from cs_copilot.tracking.utils import count_tokens

                        tracker.log_metrics(
                            {"output_tokens_estimate": float(count_tokens(result.content))}
                        )

                    return result

            agent.arun = tracked_arun

            self.logger.debug(f"MLflow tracking enabled for agent: {agent.name}")

        except ImportError:
            self.logger.warning(
                "MLflow tracking module not available. Agent will run without tracking."
            )
        except Exception as e:
            self.logger.warning(f"Failed to enable MLflow tracking for agent: {e}")

        return agent

get_agent_config() abstractmethod

Return the configuration for this agent type.

Source code in src/cs_copilot/agents/factories.py
@abstractmethod
def get_agent_config(self) -> AgentConfig:
    """Return the configuration for this agent type."""
    pass

create_agent(model, markdown=True, debug_mode=False, enable_mlflow_tracking=True, **kwargs)

Create an agent with error handling and validation.

Parameters:

Name Type Description Default
model Model

Model to use for the agent

required
markdown bool

Whether to enable markdown formatting

True
debug_mode bool

Whether to enable debug mode

False
enable_mlflow_tracking bool

Whether to enable MLflow tracking for this agent

True
**kwargs

Additional keyword arguments for agent creation

{}

Returns:

Type Description
Agent

Created agent instance

Source code in src/cs_copilot/agents/factories.py
def create_agent(
    self,
    model: Model,
    markdown: bool = True,
    debug_mode: bool = False,
    enable_mlflow_tracking: bool = True,
    **kwargs,
) -> Agent:
    """Create an agent with error handling and validation.

    Args:
        model: Model to use for the agent
        markdown: Whether to enable markdown formatting
        debug_mode: Whether to enable debug mode
        enable_mlflow_tracking: Whether to enable MLflow tracking for this agent
        **kwargs: Additional keyword arguments for agent creation

    Returns:
        Created agent instance
    """
    try:
        config = self.get_agent_config()
        config.validate()

        # Log agent creation
        self.logger.info(f"Creating agent: {config.name}")

        # Create agent with common parameters
        agent_kwargs = {
            "model": model,
            "name": config.name,
            "description": config.description,
            "tools": config.tools,
            "markdown": markdown,
            "debug_mode": debug_mode,
            "enable_agentic_state": True,
            "add_session_state_to_context": True,
        }

        # Add optional parameters if they exist
        if config.instructions:
            agent_kwargs["instructions"] = config.instructions
        if config.session_state:
            agent_kwargs["session_state"] = config.session_state

        # Add any additional kwargs passed in
        agent_kwargs.update(kwargs)

        agent = Agent(**agent_kwargs)

        # Wrap agent methods with MLflow tracking if enabled
        if enable_mlflow_tracking:
            agent = self._wrap_agent_with_tracking(agent, config)

        self.logger.info(f"Successfully created agent: {config.name}")
        return agent

    except Exception as e:
        self.logger.error(
            f"Failed to create agent {config.name if 'config' in locals() else 'unknown'}: {str(e)}"
        )
        raise AgentCreationError(f"Failed to create agent: {str(e)}") from e

create_agent(agent_type, model, **kwargs)

Create an agent by type using the global registry.

Parameters:

Name Type Description Default
agent_type str

The type of agent to create

required
model Model

The language model to use

required
**kwargs

Additional arguments passed to the agent factory

{}

Returns:

Name Type Description
Agent Agent

The created agent instance

Raises:

Type Description
ValueError

If agent_type is not registered

AgentCreationError

If agent creation fails

Source code in src/cs_copilot/agents/registry.py
def create_agent(agent_type: str, model: Model, **kwargs) -> Agent:
    """
    Create an agent by type using the global registry.

    Args:
        agent_type: The type of agent to create
        model: The language model to use
        **kwargs: Additional arguments passed to the agent factory

    Returns:
        Agent: The created agent instance

    Raises:
        ValueError: If agent_type is not registered
        AgentCreationError: If agent creation fails
    """
    return _agent_registry.create_agent(agent_type, model, **kwargs)

get_registry()

Get the global agent registry instance.

Source code in src/cs_copilot/agents/registry.py
def get_registry() -> AgentRegistry:
    """Get the global agent registry instance."""
    return _agent_registry

list_available_agent_types()

List all available agent types.

Source code in src/cs_copilot/agents/registry.py
def list_available_agent_types() -> List[str]:
    """List all available agent types."""
    return _agent_registry.list_agent_types()

get_cs_copilot_agent_team(model, *, markdown=True, debug_mode=False, show_members_responses=True, enable_memory=True, db_file=None, enable_mlflow_tracking=True)

Create a coordinated team of cs_copilot agents using Agno.

Parameters:

Name Type Description Default
model Model

Agno Model instance used for team coordination and member agents

required
markdown bool

Format output in markdown

True
debug_mode bool

Enable debug logs

False
show_members_responses bool

Print member responses during coordination

True
enable_memory bool

Enable persistent memory (default: True). Set to False for isolated testing to prevent state leakage between runs.

True
db_file str

Custom database file path. If not provided, uses CS_COPILOT_MEMORY_DB. Use unique paths for session isolation in testing.

None
enable_mlflow_tracking bool

Enable MLflow tracking for agents (default: True). Set to False to disable tracking.

True

Returns:

Name Type Description
Team Team

Configured Cs_copilot team

Raises:

Type Description
AgentCreationError

If one or more agents fail to initialize

Source code in src/cs_copilot/agents/teams.py
def get_cs_copilot_agent_team(
    model: Model,  # Agno Model instance, e.g. OpenAIChat(...) or Claude(...)
    *,
    markdown: bool = True,
    debug_mode: bool = False,
    show_members_responses: bool = True,
    enable_memory: bool = True,
    db_file: str = None,
    enable_mlflow_tracking: bool = True,
) -> Team:
    """
    Create a coordinated team of cs_copilot agents using Agno.

    Args:
        model: Agno Model instance used for team coordination and member agents
        markdown: Format output in markdown
        debug_mode: Enable debug logs
        show_members_responses: Print member responses during coordination
        enable_memory: Enable persistent memory (default: True). Set to False for
                      isolated testing to prevent state leakage between runs.
        db_file: Custom database file path. If not provided, uses CS_COPILOT_MEMORY_DB.
                Use unique paths for session isolation in testing.
        enable_mlflow_tracking: Enable MLflow tracking for agents (default: True).
                               Set to False to disable tracking.

    Returns:
        Team: Configured Cs_copilot team

    Raises:
        AgentCreationError: If one or more agents fail to initialize
    """
    logger = logging.getLogger(__name__)
    logger.info("Creating Cs_copilot Agent Team")

    # ✅ Single DB handles session storage + user memories in v2.1.x
    # For testing, either disable memory or use unique DB files per session
    db = None
    if enable_memory:
        db = SqliteDb(
            db_file=db_file
            or CS_COPILOT_MEMORY_DB
            # NOTE: CS_COPILOT_MEMORY_TABLE is not required by SqliteDb.
            # Agno manages its own tables for sessions/memories. Kept import for compat.
        )

    # Common agent parameters supplied by the factory
    agent_params = {
        "markdown": markdown,
        "debug_mode": debug_mode,
        "enable_mlflow_tracking": enable_mlflow_tracking,
    }

    # ============================================================================
    # 5-AGENT ARCHITECTURE
    # ============================================================================
    # Consolidation history:
    #   MERGED: GTM Optimization + Loading + Density + Activity → GTM Agent
    #   GENERALIZED: GTM Chemotype Analysis → Chemoinformatician (method-agnostic)
    #   MERGED: Autoencoder + Autoencoder GTM Sampling → Autoencoder (mode-based)
    #   ADDED: Report Generator (presentation layer)
    #   REMOVED: Robustness Evaluator (not included in main team, invoked separately)
    # ============================================================================

    # (type_key, human_name)
    agents_config: List[Tuple[str, str]] = [
        ("chembl_downloader", "ChEMBL Downloader"),
        (
            "gtm_agent",
            "GTM Agent",
        ),  # Unified GTM operations (build, load, density, activity, project)
        (
            "chemoinformatician",
            "Chemoinformatician",
        ),  # Comprehensive chemoinformatics (chemotype, clustering, SAR, similarity, QSAR)
        ("report_generator", "Report Generator"),  # Universal presentation layer
        ("autoencoder", "Autoencoder"),  # SMILES molecule generation (LSTM autoencoder)
        ("peptide_wae", "Peptide WAE"),  # Peptide sequence generation (Wasserstein autoencoder)
        ("synplanner", "SynPlanner"),
        # Note: Robustness Evaluator excluded from main team (invoked separately for testing)
    ]

    agents = []
    failures = []

    for agent_type, agent_name in agents_config:
        try:
            logger.info("Creating %s agent", agent_name)
            agent = create_agent(agent_type, model=model, **agent_params)
            agents.append(agent)
            logger.info("Successfully created %s agent", agent_name)
        except Exception as e:
            logger.exception("Failed to create %s agent", agent_name)
            failures.append(f"{agent_name}: {e!s}")

    if failures:
        msg = "Agent initialization failures:\n  - " + "\n  - ".join(failures)
        raise AgentCreationError(msg)

    team = Team(
        name="Cs_copilot Team",
        members=agents,
        model=model,
        # ✅ Attach DB directly to the team (persists sessions/history/memories)
        # If enable_memory=False, db=None prevents any persistence
        db=db,
        # Team-level capabilities (disabled when enable_memory=False)
        enable_agentic_memory=enable_memory,  # let the team manage memories
        enable_user_memories=False,  # Disable cross-session user memories for session isolation
        add_history_to_context=enable_memory,  # include recent history in prompts
        num_history_runs=5 if enable_memory else 0,  # 🔧 LIMIT context to last 5 runs
        share_member_interactions=True,  # share member messages across team
        store_history_messages=enable_memory,  # persist message history to DB
        store_tool_messages=enable_memory,  # persist tool results
        store_media=enable_memory,  # persist any media if used
        # Session state (always enabled for within-session data passing)
        add_session_state_to_context=True,
        enable_agentic_state=True,
        # Prompting
        description=(
            "You are an intelligent coordinator orchestrating a team of specialized cheminformatics agents. "
            "Your role is to understand user requests, select the appropriate agent(s) or workflows, "
            "and chain multiple agents when needed to complete complex analyses.\n\n"
            "• ChEMBL Downloader: Download bioactivity data from ChEMBL database\n"
            "• GTM Agent: All GTM operations (build/load/density/activity/project) with smart caching\n"
            "• Chemoinformatician: Downstream analysis (scaffold, SAR, similarity, clustering) - works with GTM output\n"
            "• Report Generator: Universal presentation layer for all analysis types\n"
            "• Autoencoder: Small molecule generation via LSTM autoencoders (SMILES, standalone + GTM-guided)\n"
            "• Peptide WAE: Peptide sequence generation + GTM on latent space + DBAASP antimicrobial activity landscapes\n"
            "• SynPlanner: Retrosynthetic planning for target molecules\n\n"
            "**Molecule vs Peptide Routing**:\n"
            "  - 'peptide', 'amino acid', 'AMP', 'antimicrobial peptide' → Peptide WAE agent\n"
            "  - 'SMILES', 'molecule', 'compound', 'small molecule' → Autoencoder agent\n"
            "  - DBAASP/antimicrobial landscapes → Peptide WAE agent (has GTM tools)\n"
            "  - Unqualified 'generate' → Autoencoder (small molecules)\n\n"
            "When coordinating: (1) Assess if a predefined workflow covers the request, (2) Select and chain "
            "specialized agents for multi-step tasks (GTM → Chemoinformatician → Report Generator is common), "
            "(3) For analysis requests, automatically add Report Generator unless user explicitly requests raw data only, "
            "(4) For ambiguous opening requests, apply the INITIAL CLARIFICATION FLOW (peptides vs molecules, then exploratory vs generative), (5) Synthesize insights from agent outputs into coherent analyses."
        ),
        instructions=AGENT_TEAM_INSTRUCTIONS,
        # UX & observability
        markdown=markdown,
        debug_mode=debug_mode,
        stream_member_events=True,  # stream events from members (Team API)
        show_members_responses=show_members_responses,
    )

    logger.info("Successfully created Cs_copilot Agent Team")
    return team

get_last_agent_reply(agent)

Extract the content of the last message from an agent's session.

Source code in src/cs_copilot/agents/utils.py
def get_last_agent_reply(agent: Agent) -> str:
    """Extract the content of the last message from an agent's session."""
    return copy.deepcopy(agent.get_messages_for_session()[-1].to_dict()["content"])

config

Configuration module for cs_copilot agents. Contains path constants and database configuration settings. Agent instructions and prompts are now in prompts.py.

factories

Agent factory classes for creating specialized cs_copilot agents. Contains the base factory class and all specialized factory implementations.

AgentConfig dataclass

Configuration for creating an agent.

Source code in src/cs_copilot/agents/factories.py
@dataclass
class AgentConfig:
    """Configuration for creating an agent."""

    name: str
    description: str
    tools: List[Any] = field(default_factory=list)
    instructions: List[str] = field(default_factory=list)
    session_state: Dict[str, Any] = field(default_factory=dict)

    def validate(self) -> None:
        """Validate the agent configuration."""
        if not self.name:
            raise ValueError("Agent name cannot be empty")
        if not self.description:
            raise ValueError("Agent description cannot be empty")
        if not isinstance(self.tools, list):
            raise TypeError("Tools must be a list")
        if not isinstance(self.instructions, list):
            raise TypeError("Instructions must be a list")
validate()

Validate the agent configuration.

Source code in src/cs_copilot/agents/factories.py
def validate(self) -> None:
    """Validate the agent configuration."""
    if not self.name:
        raise ValueError("Agent name cannot be empty")
    if not self.description:
        raise ValueError("Agent description cannot be empty")
    if not isinstance(self.tools, list):
        raise TypeError("Tools must be a list")
    if not isinstance(self.instructions, list):
        raise TypeError("Instructions must be a list")

AgentCreationError

Bases: Exception

Exception raised when agent creation fails.

Source code in src/cs_copilot/agents/factories.py
class AgentCreationError(Exception):
    """Exception raised when agent creation fails."""

    pass

BaseAgentFactory

Bases: ABC

Base class for creating agents with common configuration and error handling.

Source code in src/cs_copilot/agents/factories.py
class BaseAgentFactory(ABC):
    """Base class for creating agents with common configuration and error handling."""

    def __init__(self, logger: Optional[logging.Logger] = None):
        self.logger = logger or logging.getLogger(__name__)

    @abstractmethod
    def get_agent_config(self) -> AgentConfig:
        """Return the configuration for this agent type."""
        pass

    def create_agent(
        self,
        model: Model,
        markdown: bool = True,
        debug_mode: bool = False,
        enable_mlflow_tracking: bool = True,
        **kwargs,
    ) -> Agent:
        """Create an agent with error handling and validation.

        Args:
            model: Model to use for the agent
            markdown: Whether to enable markdown formatting
            debug_mode: Whether to enable debug mode
            enable_mlflow_tracking: Whether to enable MLflow tracking for this agent
            **kwargs: Additional keyword arguments for agent creation

        Returns:
            Created agent instance
        """
        try:
            config = self.get_agent_config()
            config.validate()

            # Log agent creation
            self.logger.info(f"Creating agent: {config.name}")

            # Create agent with common parameters
            agent_kwargs = {
                "model": model,
                "name": config.name,
                "description": config.description,
                "tools": config.tools,
                "markdown": markdown,
                "debug_mode": debug_mode,
                "enable_agentic_state": True,
                "add_session_state_to_context": True,
            }

            # Add optional parameters if they exist
            if config.instructions:
                agent_kwargs["instructions"] = config.instructions
            if config.session_state:
                agent_kwargs["session_state"] = config.session_state

            # Add any additional kwargs passed in
            agent_kwargs.update(kwargs)

            agent = Agent(**agent_kwargs)

            # Wrap agent methods with MLflow tracking if enabled
            if enable_mlflow_tracking:
                agent = self._wrap_agent_with_tracking(agent, config)

            self.logger.info(f"Successfully created agent: {config.name}")
            return agent

        except Exception as e:
            self.logger.error(
                f"Failed to create agent {config.name if 'config' in locals() else 'unknown'}: {str(e)}"
            )
            raise AgentCreationError(f"Failed to create agent: {str(e)}") from e

    def _wrap_agent_with_tracking(self, agent: Agent, config: AgentConfig) -> Agent:
        """Wrap agent execution methods with MLflow tracking.

        Args:
            agent: Agent instance to wrap
            config: Agent configuration

        Returns:
            Agent with wrapped methods
        """
        try:
            from cs_copilot.tracking import get_tracker
            from cs_copilot.tracking.utils import build_prompt_signature

            tracker = get_tracker()

            if not tracker.is_enabled():
                return agent

            # Get the agent type from the factory
            agent_type = getattr(self.__class__, "agent_type", None)

            def build_prompt_template() -> Optional[str]:
                sections = []
                if config.description:
                    sections.append(str(config.description).strip())
                if config.instructions:
                    normalized = [
                        str(item).strip() for item in config.instructions if item is not None
                    ]
                    instructions_text = "\n".join(normalized).strip()
                    if instructions_text:
                        sections.append(instructions_text)
                template = "\n\n".join([section for section in sections if section])
                return template.strip() if template else None

            def build_prompt_name() -> str:
                base_name = agent_type or config.name
                safe_name = str(base_name).replace(" ", "_").lower()
                return f"cs_copilot.{safe_name}"

            prompt_template = build_prompt_template()
            prompt_signature = build_prompt_signature(prompt_template)
            prompt_registry_name = build_prompt_name()

            def register_prompt_in_registry():
                if not prompt_template:
                    return
                commit_message = None
                if prompt_signature:
                    commit_message = f"cs_copilot auto update ({prompt_signature.version})"
                prompt_obj = tracker.register_prompt_version(
                    name=prompt_registry_name,
                    template=prompt_template,
                    commit_message=commit_message,
                    tags={
                        "agent_name": agent.name,
                        "agent_type": agent_type or "unknown",
                        "component": "cs_copilot",
                    },
                )
                if prompt_obj:
                    version = getattr(prompt_obj, "version", None)
                    tracker.log_params(
                        {
                            "prompt_registry_name": prompt_registry_name,
                            "prompt_registry_version": str(version) if version is not None else "",
                            "prompt_registry_uri": (
                                f"prompts:/{prompt_registry_name}/{version}"
                                if version is not None
                                else ""
                            ),
                        }
                    )

            # Wrap run() method
            original_run = agent.run

            def tracked_run(*args, **kwargs):
                # Extract prompt from args
                prompt = args[0] if args else kwargs.get("message", "")

                with tracker.track_agent_run(
                    agent_name=agent.name, prompt=str(prompt), agent_type=agent_type
                ):
                    # Log agent configuration
                    tracker.log_params(
                        {
                            "agent_name": agent.name,
                            "agent_type": agent_type or "unknown",
                            "num_tools": len(config.tools),
                            "tools": ",".join([t.__class__.__name__ for t in config.tools]),
                        }
                    )
                    register_prompt_in_registry()

                    result = original_run(*args, **kwargs)

                    # Log result metrics if available
                    if hasattr(result, "content") and result.content:
                        from cs_copilot.tracking.utils import count_tokens

                        tracker.log_metrics(
                            {"output_tokens_estimate": float(count_tokens(result.content))}
                        )

                    return result

            agent.run = tracked_run

            # Wrap arun() method (async version)
            original_arun = agent.arun

            async def tracked_arun(*args, **kwargs):
                # Extract prompt from args
                prompt = args[0] if args else kwargs.get("message", "")

                with tracker.track_agent_run(
                    agent_name=agent.name, prompt=str(prompt), agent_type=agent_type
                ):
                    # Log agent configuration
                    tracker.log_params(
                        {
                            "agent_name": agent.name,
                            "agent_type": agent_type or "unknown",
                            "num_tools": len(config.tools),
                            "tools": ",".join([t.__class__.__name__ for t in config.tools]),
                        }
                    )
                    register_prompt_in_registry()

                    result = await original_arun(*args, **kwargs)

                    # Log result metrics if available
                    if hasattr(result, "content") and result.content:
                        from cs_copilot.tracking.utils import count_tokens

                        tracker.log_metrics(
                            {"output_tokens_estimate": float(count_tokens(result.content))}
                        )

                    return result

            agent.arun = tracked_arun

            self.logger.debug(f"MLflow tracking enabled for agent: {agent.name}")

        except ImportError:
            self.logger.warning(
                "MLflow tracking module not available. Agent will run without tracking."
            )
        except Exception as e:
            self.logger.warning(f"Failed to enable MLflow tracking for agent: {e}")

        return agent
get_agent_config() abstractmethod

Return the configuration for this agent type.

Source code in src/cs_copilot/agents/factories.py
@abstractmethod
def get_agent_config(self) -> AgentConfig:
    """Return the configuration for this agent type."""
    pass
create_agent(model, markdown=True, debug_mode=False, enable_mlflow_tracking=True, **kwargs)

Create an agent with error handling and validation.

Parameters:

Name Type Description Default
model Model

Model to use for the agent

required
markdown bool

Whether to enable markdown formatting

True
debug_mode bool

Whether to enable debug mode

False
enable_mlflow_tracking bool

Whether to enable MLflow tracking for this agent

True
**kwargs

Additional keyword arguments for agent creation

{}

Returns:

Type Description
Agent

Created agent instance

Source code in src/cs_copilot/agents/factories.py
def create_agent(
    self,
    model: Model,
    markdown: bool = True,
    debug_mode: bool = False,
    enable_mlflow_tracking: bool = True,
    **kwargs,
) -> Agent:
    """Create an agent with error handling and validation.

    Args:
        model: Model to use for the agent
        markdown: Whether to enable markdown formatting
        debug_mode: Whether to enable debug mode
        enable_mlflow_tracking: Whether to enable MLflow tracking for this agent
        **kwargs: Additional keyword arguments for agent creation

    Returns:
        Created agent instance
    """
    try:
        config = self.get_agent_config()
        config.validate()

        # Log agent creation
        self.logger.info(f"Creating agent: {config.name}")

        # Create agent with common parameters
        agent_kwargs = {
            "model": model,
            "name": config.name,
            "description": config.description,
            "tools": config.tools,
            "markdown": markdown,
            "debug_mode": debug_mode,
            "enable_agentic_state": True,
            "add_session_state_to_context": True,
        }

        # Add optional parameters if they exist
        if config.instructions:
            agent_kwargs["instructions"] = config.instructions
        if config.session_state:
            agent_kwargs["session_state"] = config.session_state

        # Add any additional kwargs passed in
        agent_kwargs.update(kwargs)

        agent = Agent(**agent_kwargs)

        # Wrap agent methods with MLflow tracking if enabled
        if enable_mlflow_tracking:
            agent = self._wrap_agent_with_tracking(agent, config)

        self.logger.info(f"Successfully created agent: {config.name}")
        return agent

    except Exception as e:
        self.logger.error(
            f"Failed to create agent {config.name if 'config' in locals() else 'unknown'}: {str(e)}"
        )
        raise AgentCreationError(f"Failed to create agent: {str(e)}") from e

ChEMBLDownloaderFactory

Bases: BaseAgentFactory

Factory for creating ChemBL downloader agents.

Source code in src/cs_copilot/agents/factories.py
class ChEMBLDownloaderFactory(BaseAgentFactory):
    """Factory for creating ChemBL downloader agents."""

    agent_type = "chembl_downloader"

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="chembl_agent",
            description="""
            You are a specialized agent for downloading and processing bioactivity data from the ChEMBL database.
            You support multiple backends: local SQL databases (SQLite, PostgreSQL, or MySQL — used when configured) and the ChEMBL REST API.
            The backend is selected automatically — you do not need to worry about which one is active.
            Your role is to query ChEMBL based on user requests (e.g., protein targets, compound types),
            retrieve relevant bioactivity data, validate data quality, and prepare structured datasets
            for downstream cheminformatics analysis.
            """,
            tools=[
                ChemblToolkit(),
                PointerPandasTools(),
                # SessionToolkit(),
            ],
            instructions=CHEMBL_INSTRUCTIONS,
            session_state={
                "data_file_paths": {
                    "dataset_path": None,
                }
            },
        )

ChemoinformaticianFactory

Bases: BaseAgentFactory

Factory for creating comprehensive chemoinformatics analysis agents.

This agent is a versatile chemoinformatician capable of: - Chemotype Analysis: Scaffold extraction, chemotype profiling, structural diversity - Clustering: Molecular clustering using various methods (k-means, hierarchical, DBSCAN) - SAR Analysis: Structure-Activity Relationship analysis, activity cliffs, matched molecular pairs - Similarity Analysis: Molecular similarity, diversity metrics, nearest neighbor searches

GTM-Integrated Design: - Primary use case: Downstream analysis after GTM agents (nodes as clusters) - Also works with ANY data source: t-SNE clusters, user CSVs, ChEMBL families - Standardized input: DataFrame with 'smiles' + optional 'cluster_id' + optional 'activity' - Produces structured data output (DataFrames, dicts) - NO report generation - Report generation handled by separate ReportGeneratorAgent

Tools: - ChemicalSimilarityToolkit: Fingerprints, similarity metrics, scaffold extraction - PointerPandasTools: DataFrame operations with S3 support - GTMToolkit: Access to GTM data (source_mols, node projections)

Source code in src/cs_copilot/agents/factories.py
class ChemoinformaticianFactory(BaseAgentFactory):
    """Factory for creating comprehensive chemoinformatics analysis agents.

    This agent is a versatile chemoinformatician capable of:
    - **Chemotype Analysis**: Scaffold extraction, chemotype profiling, structural diversity
    - **Clustering**: Molecular clustering using various methods (k-means, hierarchical, DBSCAN)
    - **SAR Analysis**: Structure-Activity Relationship analysis, activity cliffs, matched molecular pairs
    - **Similarity Analysis**: Molecular similarity, diversity metrics, nearest neighbor searches

    GTM-Integrated Design:
    - Primary use case: Downstream analysis after GTM agents (nodes as clusters)
    - Also works with ANY data source: t-SNE clusters, user CSVs, ChEMBL families
    - Standardized input: DataFrame with 'smiles' + optional 'cluster_id' + optional 'activity'
    - Produces structured data output (DataFrames, dicts) - NO report generation
    - Report generation handled by separate ReportGeneratorAgent

    Tools:
    - ChemicalSimilarityToolkit: Fingerprints, similarity metrics, scaffold extraction
    - PointerPandasTools: DataFrame operations with S3 support
    - GTMToolkit: Access to GTM data (source_mols, node projections)
    """

    agent_type = "chemoinformatician"

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="chemoinformatician_agent",
            description="""
            You are an expert chemoinformatician specialized in computational chemistry and molecular analysis.
            Primary use case: Downstream analysis after GTM operations (analyzing molecules in GTM nodes/clusters).

            **Core Competencies**:

            1. **Chemotype & Scaffold Analysis**:
               - Murcko scaffold decomposition and profiling
               - Scaffold frequency per cluster/node
               - Structural diversity metrics

            2. **Clustering & Chemical Space Analysis**:
               - Works with GTM nodes (primary), or any clustering method
               - Cluster characterization and comparison
               - Chemical space coverage analysis

            3. **SAR Analysis (Structure-Activity Relationships)**:
               - Activity cliff detection
               - Matched molecular pair (MMP) analysis
               - Potency distribution across clusters/scaffolds

            4. **Similarity & Diversity**:
               - Tanimoto/Dice similarity calculations
               - Diversity analysis (Shannon entropy, coverage)
               - Nearest neighbor searches

            **Input Format**:
            - Standardized DataFrame with 'smiles' column
            - Optional 'cluster_id' (from GTM node_index or other clustering)
            - Optional 'activity' (for SAR analysis)
            - Use `normalize_for_analysis` tool to standardize input from any source

            **Output**:
            - Structured data (DataFrames, dicts) saved to session_state
            - NO visualizations (handled by Report Generator)
            """,
            tools=[
                ChemicalSimilarityToolkit(),
                PointerPandasTools(),
                GTMToolkit(),  # Enable GTM data access for downstream analysis
                # Future: QSARToolkit, ClusteringToolkit, DescriptorToolkit
            ],
            instructions=CHEMOINFORMATICIAN_INSTRUCTIONS,
            session_state={
                # Normalized input data for analysis
                "analysis_input": None,  # DataFrame with standardized columns (smiles, cluster_id?, activity?)
                # Chemotype/Scaffold Analysis
                "chemotype_analysis": {
                    "scaffolds_per_cluster": None,
                    "similarity_matrix": None,
                    "summary_stats": None,
                    "metadata": {},
                    "output_paths": {
                        "scaffolds_csv": None,
                        "similarity_csv": None,
                    },
                },
                # Clustering Analysis
                "clustering_results": {
                    "cluster_assignments": None,  # DataFrame with cluster_id column
                    "cluster_metrics": None,  # Silhouette, Davies-Bouldin, etc.
                    "cluster_centroids": None,
                    "method": None,  # 'gtm', 'kmeans', 'dbscan', 'hierarchical', etc.
                },
                # SAR Analysis
                "sar_analysis": {
                    "activity_cliffs": None,  # Detected activity cliffs
                    "mmps": None,  # Matched molecular pairs
                    "series_analysis": None,  # Chemical series breakdown
                    "potency_trends": None,
                },
                # Similarity/Diversity
                "similarity_analysis": {
                    "similarity_matrix": None,
                    "diversity_metrics": None,
                    "nearest_neighbors": None,
                },
                # General data paths
                "analysis_outputs": {
                    "primary_data_csv": None,
                    "supplementary_data": [],
                },
            },
        )

AutoencoderFactory

Bases: BaseAgentFactory

Factory for creating autoencoder-based molecular generation agents.

Supports two modes: - Standalone: Encode/decode SMILES, sample from latent space, interpolate, explore neighborhoods - GTM-guided: Combine GTM maps with autoencoders for targeted molecular generation from specific map regions (by density, activity, or coordinates)

Enhanced with GTM cache awareness to avoid redundant GTM loading when working with GTM Agent in the same session.

Source code in src/cs_copilot/agents/factories.py
class AutoencoderFactory(BaseAgentFactory):
    """Factory for creating autoencoder-based molecular generation agents.

    Supports two modes:
    - **Standalone**: Encode/decode SMILES, sample from latent space, interpolate, explore neighborhoods
    - **GTM-guided**: Combine GTM maps with autoencoders for targeted molecular generation from
      specific map regions (by density, activity, or coordinates)

    Enhanced with GTM cache awareness to avoid redundant GTM loading when working with GTM Agent
    in the same session.
    """

    agent_type = "autoencoder"
    aliases = ["autoencoder_gtm_sampling"]

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="autoencoder_agent",
            description="""
            You are a scientific assistant specialized in molecular generation and analysis using LSTM
            autoencoders. You operate in two modes:

            **Standalone mode**: Encode molecules to latent representations, generate novel structures
            by sampling from latent space, interpolate between molecules, and explore chemical space
            neighborhoods to understand structure-property relationships.

            **GTM-guided mode**: Combine Generative Topographic Mapping (GTM) with autoencoders for
            targeted molecular generation. Sample molecules from specific regions of GTM maps
            (by density, activity, or coordinates), encode them to latent space, and generate novel
            molecules by exploring neighborhoods around regions of interest.

            **Cache-Aware**: Automatically reuses GTM models cached by GTM Agent in session_state,
            eliminating redundant loading for multi-step workflows (e.g., GTM density → sampling).
            """,
            tools=[
                AutoencoderToolkit(),
                GTMToolkit(),
                ChemicalSimilarityToolkit(),
                PointerPandasTools(),
            ],
            instructions=AUTOENCODER_INSTRUCTIONS,
            session_state={
                "data_file_paths": {
                    "dataset_path": None,
                },
            },
        )

GTMAgentFactory

Bases: BaseAgentFactory

Factory for creating unified GTM agents (consolidates optimization, loading, density, activity, projection).

This factory creates a single agent that handles all GTM-related operations via mode-based dispatch: - optimize: Build and optimize new GTM maps - load: Load existing GTM models from S3/local/HuggingFace - density: Analyze compound distributions and neighborhood preservation - activity: Create activity-density landscapes for SAR analysis - project: Project external datasets onto existing GTM maps

Features smart caching to avoid redundant GTM loading across operations.

Source code in src/cs_copilot/agents/factories.py
class GTMAgentFactory(BaseAgentFactory):
    """Factory for creating unified GTM agents (consolidates optimization, loading, density, activity, projection).

    This factory creates a single agent that handles all GTM-related operations via mode-based dispatch:
    - optimize: Build and optimize new GTM maps
    - load: Load existing GTM models from S3/local/HuggingFace
    - density: Analyze compound distributions and neighborhood preservation
    - activity: Create activity-density landscapes for SAR analysis
    - project: Project external datasets onto existing GTM maps

    Features smart caching to avoid redundant GTM loading across operations.
    """

    agent_type = "gtm_agent"

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="gtm_agent",
            description="""
            You are a unified scientific assistant for all GTM (Generative Topographic Mapping) operations.
            Your role is to handle building, loading, and analyzing GTM-based maps of chemical space.

            Capabilities:
            - **Optimize**: Build and optimize new GTM maps from chemical datasets
            - **Load**: Retrieve existing GTM models from storage (S3, local, HuggingFace)
            - **Density**: Analyze compound distributions and neighborhood preservation on GTM maps
            - **Activity**: Create activity-density landscapes for structure-activity relationship (SAR) exploration
            - **Project**: Map external datasets onto existing GTM maps for comparative analysis

            Key Features:
            - Smart caching: Automatically reuses loaded GTM models across operations within the same session
            - Mode-based dispatch: Detects operation type from user requests and executes appropriate workflow
            - Session state integration: Shares GTM data with other agents
            """,
            tools=[
                GTMToolkit(),
                PointerPandasTools(),
                save_gtm_plot,
            ],
            instructions=GTM_AGENT_INSTRUCTIONS,
            session_state={
                "gtm_cache": {
                    "model": None,
                    "dataset": None,
                    "metadata": {},
                },
                "gtm_file_paths": {
                    "gtm_path": None,
                    "dataset_path": None,
                    "gtm_plot_path": None,
                },
                "analysis_results": {
                    "density_csv": None,
                    "activity_csv": None,
                    "projection_csv": None,
                    "plots": [],
                },
                "landscape_files": {  # Backward compatibility
                    "landscape_data_csv": None,
                    "landscape_plot": None,
                },
            },
        )

ReportGeneratorFactory

Bases: BaseAgentFactory

Factory for creating report generation agents.

This agent handles ALL report generation and visualization across different analysis types: - Chemotype analysis reports - GTM density reports - GTM activity/SAR reports - Autoencoder generation reports - Combined/custom reports

Separation of Concerns: Analysis agents produce structured data, Report Generator handles presentation.

This architecture enables: - Consistent formatting across all report types - Reusable visualization patterns - Easy updates to report styles (change in one place) - Clean separation: data processing vs visualization/formatting

Source code in src/cs_copilot/agents/factories.py
class ReportGeneratorFactory(BaseAgentFactory):
    """Factory for creating report generation agents.

    This agent handles ALL report generation and visualization across different analysis types:
    - Chemotype analysis reports
    - GTM density reports
    - GTM activity/SAR reports
    - Autoencoder generation reports
    - Combined/custom reports

    **Separation of Concerns**: Analysis agents produce structured data, Report Generator handles presentation.

    This architecture enables:
    - Consistent formatting across all report types
    - Reusable visualization patterns
    - Easy updates to report styles (change in one place)
    - Clean separation: data processing vs visualization/formatting
    """

    agent_type = "report_generator"

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="report_generator_agent",
            description="""
            You are a specialized agent for generating reports and visualizations from analysis results.
            Your role is to create well-formatted, comprehensive reports that present scientific findings
            in a clear, actionable manner.

            Capabilities:
            - **Multi-format reports**: Generate markdown, HTML, or text reports
            - **Visualization creation**: Produce publication-quality plots and charts
            - **Template-based formatting**: Consistent structure across different report types
            - **Flexible input handling**: Works with results from any analysis agent

            Report Types Supported:
            - Chemotype analysis: Scaffold distributions, similarity heatmaps, cluster comparisons
            - GTM density: Density overlays, neighborhood preservation, coverage analysis
            - GTM activity/SAR: Activity landscapes, potency hotspots, structure-activity insights
            - Autoencoder generation: Generated molecules, diversity metrics, similarity analyses
            - Combined reports: Multi-analysis integration with comparative visualizations

            Key Features:
            - **Analysis-agnostic**: Reads structured data from session_state (any analysis type)
            - **Consistent formatting**: Uniform markdown structure, color schemes, plot styles
            - **Embedded visualizations**: Inline plots in reports for easy consumption
            - **Actionable insights**: Highlights key findings and provides recommendations

            This separation enables analysis agents to focus on data processing while Report Generator
            handles all presentation concerns.
            """,
            tools=[
                PointerPandasTools(),
                save_gtm_plot,  # For GTM-specific visualizations
                # Plotting libraries (matplotlib, seaborn) available via Python environment
            ],
            instructions=REPORT_GENERATOR_INSTRUCTIONS,
            session_state={
                "report_outputs": {
                    "report_path": None,
                    "plots": [],
                    "report_type": None,
                },
            },
        )

RobustnessEvaluationFactory

Bases: BaseAgentFactory

Factory for creating robustness test evaluation agents.

Source code in src/cs_copilot/agents/factories.py
class RobustnessEvaluationFactory(BaseAgentFactory):
    """Factory for creating robustness test evaluation agents."""

    agent_type = "robustness_evaluation"

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="robustness_evaluator_agent",
            description="""
            You are a specialized agent for analyzing robustness test results. Your role is to load
            test results from S3 or local storage, analyze metrics and score distributions, identify
            patterns and issues in failing prompts, and generate actionable recommendations for
            improving system robustness across prompt variations.
            """,
            tools=[
                PointerPandasTools(),
                RobustnessAnalysisToolkit(),
            ],
            instructions=ROBUSTNESS_EVALUATION_INSTRUCTIONS,
            session_state={
                "loaded_results": {},
                "analysis_outputs": {
                    "summary_report": None,
                    "comparison_report": None,
                    "recommendations": None,
                },
            },
        )

SynPlannerFactory

Bases: BaseAgentFactory

Factory for creating retrosynthetic planning agents powered by SynPlanner.

This agent wraps the official SynPlanner package to perform retrosynthetic analysis on target molecules. It accepts SMILES strings or molecule names, resolves them to canonical SMILES (via PubChem / RDKit), runs the MCTS-based retrosynthesis search, and returns structured route descriptions with optional SVG/PNG visualizations.

Source code in src/cs_copilot/agents/factories.py
class SynPlannerFactory(BaseAgentFactory):
    """Factory for creating retrosynthetic planning agents powered by SynPlanner.

    This agent wraps the official SynPlanner package to perform retrosynthetic
    analysis on target molecules.  It accepts SMILES strings or molecule names,
    resolves them to canonical SMILES (via PubChem / RDKit), runs the MCTS-based
    retrosynthesis search, and returns structured route descriptions with
    optional SVG/PNG visualizations.
    """

    agent_type = "synplanner"

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="synplanner_agent",
            description=(
                "You are a retrosynthetic planning assistant powered by SynPlanner. "
                "Given a target molecule (as a SMILES string or common name), you "
                "identify the canonical structure, run the SynPlanner retrosynthesis "
                "engine, and present the best synthetic routes with step-by-step "
                "descriptions and visualizations."
            ),
            tools=[SynPlannerToolkit()],
            instructions=SYNPLANNER_INSTRUCTIONS,
        )

PeptideWAEFactory

Bases: BaseAgentFactory

Factory for creating peptide WAE-based sequence generation agents.

This agent uses a Wasserstein Autoencoder (WAE) trained on peptide data to encode, decode, sample, and interpolate amino acid sequences. The WAE can generate any peptides; activity landscape data comes from DBAASP (antimicrobial peptides specifically).

Key capabilities: - Encoding: Convert peptide sequences to 100-dimensional latent vectors - Decoding: Generate peptide sequences from latent vectors - Sampling: Generate novel peptides from Gaussian prior - Interpolation: Smooth transitions between peptides in latent space - Neighborhood exploration: Generate peptide analogs - GTM integration: Train GTMs on latent space, create activity landscapes - Activity landscapes: Use DBAASP data (specific to antimicrobial peptides)

Input format: Space-separated single-letter amino acid codes Example: "M L L L L L A L A L L A L L L A L L L"

Source code in src/cs_copilot/agents/factories.py
class PeptideWAEFactory(BaseAgentFactory):
    """Factory for creating peptide WAE-based sequence generation agents.

    This agent uses a Wasserstein Autoencoder (WAE) trained on peptide data
    to encode, decode, sample, and interpolate amino acid sequences. The WAE
    can generate any peptides; activity landscape data comes from DBAASP
    (antimicrobial peptides specifically).

    Key capabilities:
    - **Encoding**: Convert peptide sequences to 100-dimensional latent vectors
    - **Decoding**: Generate peptide sequences from latent vectors
    - **Sampling**: Generate novel peptides from Gaussian prior
    - **Interpolation**: Smooth transitions between peptides in latent space
    - **Neighborhood exploration**: Generate peptide analogs
    - **GTM integration**: Train GTMs on latent space, create activity landscapes
    - **Activity landscapes**: Use DBAASP data (specific to antimicrobial peptides)

    Input format: Space-separated single-letter amino acid codes
    Example: "M L L L L L A L A L L A L L L A L L L"
    """

    agent_type = "peptide_wae"

    def get_agent_config(self) -> AgentConfig:
        return AgentConfig(
            name="peptide_wae_agent",
            description="""
            You are a scientific assistant specialized in peptide sequence generation and analysis
            using Wasserstein Autoencoders (WAE). You work with amino acid sequences represented
            as space-separated single-letter codes (e.g., "M L L L L L A L A L L A L L L").

            **Core Capabilities**:
            - **Encode peptides**: Convert peptide sequences to 100-dimensional latent representations
            - **Decode latent vectors**: Generate peptide sequences from latent space
            - **Sample new peptides**: Generate novel peptides from Gaussian prior
            - **Interpolate**: Create smooth transitions between peptides in latent space
            - **Explore neighborhoods**: Generate peptide analogs with controlled diversity
            - **GTM on latent space**: Train Generative Topographic Maps on WAE latent vectors
            - **Activity landscapes**: Create per-organism antimicrobial activity landscapes from DBAASP data

            **Key Parameters**:
            - Max sequence length: 25 amino acids
            - Latent dimension: 100
            - Supported amino acids: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, U, V, W, Y, Z

            **Use Cases**:
            - Generate novel peptide candidates (any peptides)
            - Generate novel antimicrobial peptide candidates
            - Explore peptide chemical space around active sequences
            - Interpolate between peptides to understand structure-activity relationships
            - Test sequence reconstruction for model quality assessment
            - Build GTM maps of peptide latent space for visualization
            - Analyze antimicrobial activity patterns using DBAASP data on GTM landscapes
            - Sample peptides from specific GTM regions and decode to sequences

            **Note**: Activity landscapes use DBAASP data and are specific to antimicrobial peptides.
            """,
            tools=[
                PeptideWAEToolkit(),
                GTMToolkit(),
                PointerPandasTools(),
                save_gtm_plot,
            ],
            instructions=PEPTIDE_WAE_INSTRUCTIONS,
        )

prompts

Prompt templates and instructions for cs_copilot agents. Contains all the step-by-step instructions used by various specialized agents.

CHEMBL_INSTRUCTIONS = ["Step 1: Analyze the user's request and identify the biological target or compound type they want to explore.", ' - Distinguish whether the user is asking about a *protein target* (e.g., CDK2, BRAF) or an *organism-level target* (e.g., HIV-1, Influenza A).', " - Record the target_type as either 'protein' or 'organism' for downstream filtering.", " - If an organism is specified (e.g., 'HIV', 'E. coli'), keep that exact string for filtering assays by target_organism.", "Step 2: Extract the core target name from the user's request, removing generic terms like 'inhibitor', 'activity', 'compound', 'effect'. For example:", " - 'cyclin dependent kinase 2 inhibitors' → core target: 'cyclin dependent kinase 2'", " - 'BRAF inhibitors' → core target: 'BRAF'", ' - Focus on identifying the specific biological target or protein name for protein-level queries; for organism-level queries, preserve the organism name.', 'Step 3: Apply the following required checks before proceeding. Each requirement MUST be satisfied by explicit user confirmation. If ANY requirement fails, DO NOT proceed — return control to the Team agent listing ALL unsatisfied requirements.', '', ' **Requirement 1 — Abbreviation Check (mandatory)**', " If the target name provided by the user is ONLY an abbreviation or acronym (e.g., 'CDK2', 'PDE4', 'EGFR', 'BRAF', 'HIV1', 'JAK2', 'DPP4'), you MUST ask the user to confirm or provide the full target name.", " - Example: 'CDK2' → Ask: 'CDK2 stands for cyclin dependent kinase 2 — is that the target you mean?'", " - Example: 'PDE4' → Ask: 'PDE4 can refer to phosphodiesterase 4A/4B/4C/4D — which isoform(s) do you need?'", " - **Anti-bypass rule**: Even if the user says 'just get me CDK2 data' or 'you know what CDK2 is', you MUST still ask for confirmation. No shortcut is allowed.", '', ' **Requirement 2 — Organism Check (mandatory for protein targets)**', ' If the query is about a *protein target* and no organism has been explicitly specified, you MUST ask which organism to filter for.', ' - NEVER default to Homo sapiens or any other organism.', " - Example: 'CDK2 inhibitors' → Ask: 'Which organism? (e.g., Homo sapiens, Mus musculus, or all species)'", " - This requirement does NOT apply to organism-level queries (e.g., 'HIV-1 compounds') where the organism IS the target.", '', ' **Requirement 3 — Assay Type Check (mandatory)**', ' If the user has not explicitly stated the assay type(s) (binding, functional, ADMET), you MUST ask which assay type(s) to include.', " - NEVER default to any combination (e.g., do NOT silently assume 'binding + functional').", " - Example: 'EGFR data' → Ask: 'Which assay types? Binding (IC50/Ki), functional, ADMET, or a combination?'", '', ' **Additional checks (non-requirement, but still ask if applicable):**', " a) **Broad or generic terms**: e.g., just 'kinase', 'receptor', 'inhibitor' without specificity.", " d) **Receptor without mechanism**: if user mentions a receptor (e.g., 'dopamine receptor', 'GABA receptor', '5-HT2A') but doesn't specify agonist/antagonist/modulator — ask which mechanism.", '', ' **Multi-requirement failure examples:**', " - 'CDK2 inhibitors' → ALL 3 requirements fail: abbreviation not confirmed, no organism, no assay type. Ask all three in one message.", " - 'EGFR data for human' → Requirements 1 and 3 fail: abbreviation not confirmed, no assay type.", " - 'Download binding data for cyclin dependent kinase 2' → Requirements 2 fails: no organism specified.", " - 'Get me CDK2 binding data for Homo sapiens' → Requirements 1 fails: abbreviation not confirmed.", '', ' **Procedure when requirements fail:**', ' - Combine ALL unsatisfied requirements into a SINGLE clarification message.', " - Return control to the Team agent with: 'The query needs clarification: [list all unsatisfied requirements]. Returning to Team agent for user input.'", " - Once the user provides clarification, pass the details to fetch_compounds using the appropriate parameters: 'query' for target name, 'organism' for species filter, 'assay_types' for data type, or 'mechanism' for agonist/antagonist/modulator.", ' - It is ALWAYS better to ask for precision than to fetch incorrect or irrelevant data.', 'Step 4: Use the `convert_to_chembl_query` tool with the identified core target to generate multiple keyword variations for ChEMBL search.', ' - The tool will generate abbreviations, shortened forms, and full names (typically 3-5 keywords)', ' - The tool handles greek character replacement and ensures keywords are suitable for ChEMBL assay description searches', " - Example: For 'cyclin dependent kinase 2', the tool will generate: 'cdk2, kinase 2, cyclin dependent kinase 2'", ' - When the query is organism-level, include the organism name as one of the keywords to ensure assays for that organism are retrieved.', " - Determine assay type preferences: map 'binding' → B, 'functional' → F, 'ADMET' → A. The user MUST have explicitly specified assay type(s) before reaching this step (enforced by GATE 3 above). NEVER apply a default.", "Step 5: Use the `fetch_compounds` tool with multiple keywords (comma-separated, e.g., 'cdk2, kinase 2, cyclin dependent kinase 2') to download bioactivity data from ChEMBL. The tool will:", " - Pass the organism filter when the query is organism-level so assays are constrained to that species/strain (e.g., organism='HIV-1').", " - Pass the assay_types filter (e.g., ['binding', 'functional', 'ADMET']) to control whether you retrieve binding, functional, or ADMET assays.", " - Pass the mechanism filter if the user specified a mechanism of action (e.g., mechanism='agonist' for agonist assays, mechanism='antagonist' for antagonist assays). This filters assays by their description to keep only those matching the specified mechanism.", ' - Search for assays related to each keyword separately', ' - Retrieve activity data for all found assays', ' - Merge all results and automatically remove duplicates', 'Step 6: After successful data fetch, verify the dataset quality:', ' - Check that SMILES structures were successfully mapped', ' - Verify the dataset contains expected columns (activity_id, molecule_chembl_id, canonical_smiles, standard_value, etc.)', ' - Confirm the data covers the intended biological target', ' - Confirm the assay_type column contains the requested assay categories (B=Binding, F=Functional, A=ADMET)', ' - Note the number of duplicates that were removed during merging', 'Step 7: Use the `describe_dataset` tool to generate comprehensive statistics for the downloaded dataset.', 'Step 8: Report key metrics to the user:', ' - Total number of compounds and activities', ' - Range of activity values (IC50, Ki, etc.)', ' - Data quality indicators (missing values, duplicates)', ' - Target coverage and assay diversity', 'Step 9: If data fetch fails, troubleshoot systematically:', ' - Check if the query terms are too specific (try broader terms)', ' - Verify ChEMBL connectivity using ping functionality (works for all SQL and REST backends)', ' - Consider alternative search strategies (different resource types: activity, molecule, assay)', ' - Handle rate limiting by implementing appropriate delays', 'Step 10: When working with dataframes, use inplace operations to modify dataframes (e.g., `df.drop(..., inplace=True)`) to avoid printing entire dataframes to the console, which can cause context window issues. Avoid operations like `df.assign()` that return new dataframes and may be printed.', 'Step 11: Prepare a comma-separated .csv file with all fetched data including molecules, their activities, and parent dataset information in the respective columns.', "Step 12: Save the dataframe to a .csv file. The `fetch_compounds` tool automatically stores the dataset path in session_state['data_file_paths']['dataset_path'].", 'Step 13: Confirm the dataset is properly saved to S3 storage with a descriptive filename.', 'Step 14: Provide the user with the exact filename and path for future reference.'] + HANDLING_NEW_FILES_INSTRUCTIONS module-attribute

Expert chemoinformatician capable of: - Chemotype/scaffold analysis - Clustering and chemical space mapping - SAR analysis - Similarity and diversity analysis - QSAR modeling (extensible)

Method-agnostic, modular, and extensible design.

GTM_AGENT_INSTRUCTIONS = ['Step 1: Determine the operation mode based on user request and context:', " - **optimize mode**: User asks to 'build', 'create', 'optimize', or 'train' a GTM map", " - **load mode**: User asks to 'load', 'retrieve', or 'use existing' GTM model", " - **density mode**: User asks about 'density', 'distribution', 'neighborhood preservation', or 'analyze GTM map'", " - **activity mode**: User asks about 'activity landscape', 'SAR', 'potency zones', or 'active regions'", " - **project mode**: User asks to 'project', 'map new data', or 'apply GTM to external dataset'", " - If unclear, default to load mode and check for cached GTM in session_state['gtm_cache']", 'Step 2: Check for cached GTM before loading from files:', " - If session_state['gtm_cache'] exists and is not None:", " - Verify cache validity: check metadata['dataset_shape'] matches current dataset if applicable", ' - If valid, reuse cached GTM model and dataset (skip loading)', ' - If invalid (dataset changed), proceed to load/optimize as needed', ' - If no cache exists, proceed with mode-specific loading', 'Step 3: Execute mode-specific workflow:', '', '**OPTIMIZE MODE**:', " 1. Load chemical data from session_state['data_file_paths']['dataset_path'] or user-provided path", ' 2. Verify SMILES column exists using available tools', ' 3. Run gtm_optimization with appropriate k_hit values (try multiple if not specified)', ' 4. For each k_hit: fit GTM, save with save_gtm_and_data, evaluate smoothness', ' 5. Select best GTM map (smoothest or user-specified criteria)', ' 6. **Cache the result**:', " - session_state['gtm_cache'] = {", " 'model': gtm_model_object,", " 'dataset': preprocessed_dataframe,", " 'metadata': {", " 'path': gtm_file_path,", " 'created_at': timestamp,", " 'dataset_shape': df.shape,", " 'source': 'optimize',", " 'optimization_metrics': {...}", ' }', ' }', " 7. Update session_state['gtm_file_paths'] = {'gtm_path': ..., 'dataset_path': ..., 'gtm_plot_path': ...}", ' 8. Generate and save GTM plot using save_gtm_plot', '', '**LOAD MODE**:', ' 1. Resolve GTM model path (priority order):', ' - User-provided explicit path', " - session_state['gtm_file_paths']['gtm_path']", ' - S3 assets bucket (via path resolver)', ' - Default model repository', ' - HuggingFace mirror (last resort)', ' 2. Load GTM using load_gtm_model_only(gtm_file)', ' 3. Determine associated dataset:', ' - If user provides dataset path → use it', ' - If dataset file next to GTM → use it', " - If session_state['data_file_paths']['dataset_path'] exists → use it", ' - Otherwise, ask user which dataset to use', ' 4. When dataset available, call load_and_prep_data(dataset, gtm_model) to build projections', " 5. **Cache the result** (same structure as optimize mode, source='load')", " 6. Update session_state['gtm_file_paths']", '', '**DENSITY MODE**:', " 1. **Check cache first**: If session_state['gtm_cache'] exists, reuse it (skip loading)", ' 2. If no cache, load GTM and dataset via load mode workflow above', ' 3. Call load_gtm_get_density_matrix(dataset_file, gtm_file) to get density and neighborhood tables', " 4. Analyze density table ['x', 'y', 'nodes', 'filtered_density']:", ' - Calculate max/min/mean/median density', ' - Identify top 5 densest nodes and top 5 sparsest nodes', ' - Describe spatial patterns (compass/quadrant terms)', " 5. Analyze neighborhood preservation table ['x', 'y', 'nodes', 'density', 'neighborhood score']:", ' - Report preservation quality metrics', ' - Identify well-preserved vs poorly-preserved regions', ' 6. Save density results:', " - session_state['analysis_results']['density_csv'] = density_csv_path", " - session_state['analysis_results']['plots'].append(density_plot_path)", ' 7. Generate visualization with density overlay using save_gtm_plot', ' 8. Provide 3-bullet executive summary', '', '**ACTIVITY MODE**:', " 1. **Check cache first**: If session_state['gtm_cache'] exists, reuse it", ' 2. If no cache, load GTM and dataset via load mode workflow', f' 3. Call create_activity_landscapes(dataset, gtm_model, node_threshold={DEFAULT_NODE_THRESHOLD}, chart_width={DEFAULT_CHART_WIDTH}, chart_height={DEFAULT_CHART_HEIGHT})', ' 4. The tool returns file prefix and creates CSV + PNG/HTML files', ' 5. Save paths to session_state:', " - session_state['landscape_files']['landscape_data_csv'] = csv_path", " - session_state['landscape_files']['landscape_plot'] = plot_path", " - session_state['analysis_results']['activity_csv'] = csv_path # Also save here for consistency", " 6. Load landscape CSV and analyze ['x', 'y', 'nodes', 'filtered_reg_density']:", ' - Global stats: max, min, mean, median of reg_density', ' - Identify top 5 active nodes and top 5 inactive nodes', " - Describe spatial trends (compass directions, e.g., 'dense band across center')", ' 7. Cross-layer analysis:', ' - Do density hotspots coincide with potent areas?', ' - Flag anomalies (dense but low-quality, sparse but high-activity)', ' - Identify gaps/unreliable regions (zero density, NaNs)', ' 8. Provide 3-bullet SAR takeaway with actionable recommendations', ' 9. Show activity landscape plot in output (markdown format, blue gradient: dark=high activity, light=low)', '', '**PROJECT MODE**:', " 1. **Check cache first**: If session_state['gtm_cache'] exists, reuse GTM model", ' 2. If no cache, load GTM via load mode workflow', ' 3. Get external dataset path from user or session_state', ' 4. Call project_data_on_gtm(external_dataset, gtm_model):', ' - Tool validates SMILES, checks compatibility', ' - Returns preprocessed CSV with GTM projections', ' 5. Analyze projection results:', ' - Compare distribution of external data vs original training data', ' - Identify covered vs novel regions', ' - Calculate distribution statistics', ' 6. Generate comparative visualization using save_gtm_plot(preprocessed_csv, gtm_model)', ' 7. Save projection results:', " - session_state['analysis_results']['projection_csv'] = projection_csv_path", " - session_state['analysis_results']['plots'].append(projection_plot_path)", ' 8. Provide summary of projection quality and coverage', 'Step 4: Final output formatting:', ' - Return concise summary of operation performed', ' - Include key metrics and file paths', ' - For plots, show using markdown format: ![Caption](path)', ' - Highlight any warnings or anomalies discovered', ' - Confirm session_state updates for downstream agents', 'Step 5: Error handling:', ' - If GTM loading fails, check path resolver and suggest alternatives', ' - If dataset incompatible, explain mismatch (e.g., wrong SMILES column)', ' - If cache invalid, automatically reload from files', ' - For optimization failures, suggest trying different k_hit values', 'Step 6: Latent-space GTM operations (for peptide WAE latent vectors):', ' - The GTM can also operate on pre-computed latent vectors from WAE models (not just SMILES descriptors)', " - When user mentions 'peptide GTM', 'latent space GTM', or 'WAE GTM', delegate to the Peptide WAE agent", ' - The Peptide WAE agent has GTM tools and handles the full peptide+GTM workflow', ' - For SMILES-based GTM: use standard descriptor workflow (this agent)', ' - For peptide latent-space GTM: route to Peptide WAE agent'] + HANDLING_NEW_FILES_INSTRUCTIONS module-attribute

Universal presentation layer for all analysis types. Generates markdown reports and visualizations from structured analysis results.

registry

Agent registry system for managing and creating agents dynamically. Provides the main public API for agent creation.

AgentRegistry

Registry for managing agent factories and configurations.

Source code in src/cs_copilot/agents/registry.py
class AgentRegistry:
    """Registry for managing agent factories and configurations."""

    def __init__(self):
        self._factories: Dict[str, BaseAgentFactory] = {}
        self._aliases: Dict[str, str] = {}  # Alias -> canonical agent_type mapping
        self.logger = logging.getLogger(__name__)

    def register(
        self, agent_type: str, factory: BaseAgentFactory, aliases: List[str] = None
    ) -> None:
        """Register an agent factory with optional aliases.

        Args:
            agent_type: Canonical agent type name
            factory: Factory instance
            aliases: Optional list of alias names that redirect to this agent
        """
        if agent_type in self._factories:
            self.logger.warning(f"Overriding existing factory for agent type: {agent_type}")
        self._factories[agent_type] = factory
        self.logger.info(f"Registered factory for agent type: {agent_type}")

        # Register aliases
        if aliases:
            for alias in aliases:
                self._aliases[alias] = agent_type
                self.logger.info(f"Registered alias '{alias}' -> '{agent_type}'")

    def create_agent(self, agent_type: str, model: Model, **kwargs) -> Agent:
        """Create an agent by type (supports aliases).

        Args:
            agent_type: Agent type or alias
            model: LLM model instance
            **kwargs: Additional arguments for agent creation

        Returns:
            Agent instance

        Raises:
            ValueError: If agent_type/alias is not registered
        """
        # Resolve alias if provided
        resolved_type = self._aliases.get(agent_type, agent_type)

        if resolved_type not in self._factories:
            available_types = list(self._factories.keys())
            available_aliases = list(self._aliases.keys())
            raise ValueError(
                f"Unknown agent type: {agent_type}. "
                f"Available types: {available_types}. "
                f"Available aliases: {available_aliases}"
            )

        factory = self._factories[resolved_type]
        return factory.create_agent(model, **kwargs)

    def list_agent_types(self) -> List[str]:
        """List all registered agent types."""
        return list(self._factories.keys())

    def auto_register(self) -> None:
        """Automatically discover and register all available factories."""
        for _, cls in inspect.getmembers(factory_module, inspect.isclass):
            if (
                issubclass(cls, BaseAgentFactory)
                and cls is not BaseAgentFactory
                and getattr(cls, "agent_type", None)
            ):
                # Get optional aliases from factory class
                aliases = getattr(cls, "aliases", None)
                self.register(cls.agent_type, cls(), aliases=aliases)
register(agent_type, factory, aliases=None)

Register an agent factory with optional aliases.

Parameters:

Name Type Description Default
agent_type str

Canonical agent type name

required
factory BaseAgentFactory

Factory instance

required
aliases List[str]

Optional list of alias names that redirect to this agent

None
Source code in src/cs_copilot/agents/registry.py
def register(
    self, agent_type: str, factory: BaseAgentFactory, aliases: List[str] = None
) -> None:
    """Register an agent factory with optional aliases.

    Args:
        agent_type: Canonical agent type name
        factory: Factory instance
        aliases: Optional list of alias names that redirect to this agent
    """
    if agent_type in self._factories:
        self.logger.warning(f"Overriding existing factory for agent type: {agent_type}")
    self._factories[agent_type] = factory
    self.logger.info(f"Registered factory for agent type: {agent_type}")

    # Register aliases
    if aliases:
        for alias in aliases:
            self._aliases[alias] = agent_type
            self.logger.info(f"Registered alias '{alias}' -> '{agent_type}'")
create_agent(agent_type, model, **kwargs)

Create an agent by type (supports aliases).

Parameters:

Name Type Description Default
agent_type str

Agent type or alias

required
model Model

LLM model instance

required
**kwargs

Additional arguments for agent creation

{}

Returns:

Type Description
Agent

Agent instance

Raises:

Type Description
ValueError

If agent_type/alias is not registered

Source code in src/cs_copilot/agents/registry.py
def create_agent(self, agent_type: str, model: Model, **kwargs) -> Agent:
    """Create an agent by type (supports aliases).

    Args:
        agent_type: Agent type or alias
        model: LLM model instance
        **kwargs: Additional arguments for agent creation

    Returns:
        Agent instance

    Raises:
        ValueError: If agent_type/alias is not registered
    """
    # Resolve alias if provided
    resolved_type = self._aliases.get(agent_type, agent_type)

    if resolved_type not in self._factories:
        available_types = list(self._factories.keys())
        available_aliases = list(self._aliases.keys())
        raise ValueError(
            f"Unknown agent type: {agent_type}. "
            f"Available types: {available_types}. "
            f"Available aliases: {available_aliases}"
        )

    factory = self._factories[resolved_type]
    return factory.create_agent(model, **kwargs)
list_agent_types()

List all registered agent types.

Source code in src/cs_copilot/agents/registry.py
def list_agent_types(self) -> List[str]:
    """List all registered agent types."""
    return list(self._factories.keys())
auto_register()

Automatically discover and register all available factories.

Source code in src/cs_copilot/agents/registry.py
def auto_register(self) -> None:
    """Automatically discover and register all available factories."""
    for _, cls in inspect.getmembers(factory_module, inspect.isclass):
        if (
            issubclass(cls, BaseAgentFactory)
            and cls is not BaseAgentFactory
            and getattr(cls, "agent_type", None)
        ):
            # Get optional aliases from factory class
            aliases = getattr(cls, "aliases", None)
            self.register(cls.agent_type, cls(), aliases=aliases)

create_agent(agent_type, model, **kwargs)

Create an agent by type using the global registry.

Parameters:

Name Type Description Default
agent_type str

The type of agent to create

required
model Model

The language model to use

required
**kwargs

Additional arguments passed to the agent factory

{}

Returns:

Name Type Description
Agent Agent

The created agent instance

Raises:

Type Description
ValueError

If agent_type is not registered

AgentCreationError

If agent creation fails

Source code in src/cs_copilot/agents/registry.py
def create_agent(agent_type: str, model: Model, **kwargs) -> Agent:
    """
    Create an agent by type using the global registry.

    Args:
        agent_type: The type of agent to create
        model: The language model to use
        **kwargs: Additional arguments passed to the agent factory

    Returns:
        Agent: The created agent instance

    Raises:
        ValueError: If agent_type is not registered
        AgentCreationError: If agent creation fails
    """
    return _agent_registry.create_agent(agent_type, model, **kwargs)

list_available_agent_types()

List all available agent types.

Source code in src/cs_copilot/agents/registry.py
def list_available_agent_types() -> List[str]:
    """List all available agent types."""
    return _agent_registry.list_agent_types()

get_registry()

Get the global agent registry instance.

Source code in src/cs_copilot/agents/registry.py
def get_registry() -> AgentRegistry:
    """Get the global agent registry instance."""
    return _agent_registry

teams

Team coordination functionality for multi-agent workflows.

get_cs_copilot_agent_team(model, *, markdown=True, debug_mode=False, show_members_responses=True, enable_memory=True, db_file=None, enable_mlflow_tracking=True)

Create a coordinated team of cs_copilot agents using Agno.

Parameters:

Name Type Description Default
model Model

Agno Model instance used for team coordination and member agents

required
markdown bool

Format output in markdown

True
debug_mode bool

Enable debug logs

False
show_members_responses bool

Print member responses during coordination

True
enable_memory bool

Enable persistent memory (default: True). Set to False for isolated testing to prevent state leakage between runs.

True
db_file str

Custom database file path. If not provided, uses CS_COPILOT_MEMORY_DB. Use unique paths for session isolation in testing.

None
enable_mlflow_tracking bool

Enable MLflow tracking for agents (default: True). Set to False to disable tracking.

True

Returns:

Name Type Description
Team Team

Configured Cs_copilot team

Raises:

Type Description
AgentCreationError

If one or more agents fail to initialize

Source code in src/cs_copilot/agents/teams.py
def get_cs_copilot_agent_team(
    model: Model,  # Agno Model instance, e.g. OpenAIChat(...) or Claude(...)
    *,
    markdown: bool = True,
    debug_mode: bool = False,
    show_members_responses: bool = True,
    enable_memory: bool = True,
    db_file: str = None,
    enable_mlflow_tracking: bool = True,
) -> Team:
    """
    Create a coordinated team of cs_copilot agents using Agno.

    Args:
        model: Agno Model instance used for team coordination and member agents
        markdown: Format output in markdown
        debug_mode: Enable debug logs
        show_members_responses: Print member responses during coordination
        enable_memory: Enable persistent memory (default: True). Set to False for
                      isolated testing to prevent state leakage between runs.
        db_file: Custom database file path. If not provided, uses CS_COPILOT_MEMORY_DB.
                Use unique paths for session isolation in testing.
        enable_mlflow_tracking: Enable MLflow tracking for agents (default: True).
                               Set to False to disable tracking.

    Returns:
        Team: Configured Cs_copilot team

    Raises:
        AgentCreationError: If one or more agents fail to initialize
    """
    logger = logging.getLogger(__name__)
    logger.info("Creating Cs_copilot Agent Team")

    # ✅ Single DB handles session storage + user memories in v2.1.x
    # For testing, either disable memory or use unique DB files per session
    db = None
    if enable_memory:
        db = SqliteDb(
            db_file=db_file
            or CS_COPILOT_MEMORY_DB
            # NOTE: CS_COPILOT_MEMORY_TABLE is not required by SqliteDb.
            # Agno manages its own tables for sessions/memories. Kept import for compat.
        )

    # Common agent parameters supplied by the factory
    agent_params = {
        "markdown": markdown,
        "debug_mode": debug_mode,
        "enable_mlflow_tracking": enable_mlflow_tracking,
    }

    # ============================================================================
    # 5-AGENT ARCHITECTURE
    # ============================================================================
    # Consolidation history:
    #   MERGED: GTM Optimization + Loading + Density + Activity → GTM Agent
    #   GENERALIZED: GTM Chemotype Analysis → Chemoinformatician (method-agnostic)
    #   MERGED: Autoencoder + Autoencoder GTM Sampling → Autoencoder (mode-based)
    #   ADDED: Report Generator (presentation layer)
    #   REMOVED: Robustness Evaluator (not included in main team, invoked separately)
    # ============================================================================

    # (type_key, human_name)
    agents_config: List[Tuple[str, str]] = [
        ("chembl_downloader", "ChEMBL Downloader"),
        (
            "gtm_agent",
            "GTM Agent",
        ),  # Unified GTM operations (build, load, density, activity, project)
        (
            "chemoinformatician",
            "Chemoinformatician",
        ),  # Comprehensive chemoinformatics (chemotype, clustering, SAR, similarity, QSAR)
        ("report_generator", "Report Generator"),  # Universal presentation layer
        ("autoencoder", "Autoencoder"),  # SMILES molecule generation (LSTM autoencoder)
        ("peptide_wae", "Peptide WAE"),  # Peptide sequence generation (Wasserstein autoencoder)
        ("synplanner", "SynPlanner"),
        # Note: Robustness Evaluator excluded from main team (invoked separately for testing)
    ]

    agents = []
    failures = []

    for agent_type, agent_name in agents_config:
        try:
            logger.info("Creating %s agent", agent_name)
            agent = create_agent(agent_type, model=model, **agent_params)
            agents.append(agent)
            logger.info("Successfully created %s agent", agent_name)
        except Exception as e:
            logger.exception("Failed to create %s agent", agent_name)
            failures.append(f"{agent_name}: {e!s}")

    if failures:
        msg = "Agent initialization failures:\n  - " + "\n  - ".join(failures)
        raise AgentCreationError(msg)

    team = Team(
        name="Cs_copilot Team",
        members=agents,
        model=model,
        # ✅ Attach DB directly to the team (persists sessions/history/memories)
        # If enable_memory=False, db=None prevents any persistence
        db=db,
        # Team-level capabilities (disabled when enable_memory=False)
        enable_agentic_memory=enable_memory,  # let the team manage memories
        enable_user_memories=False,  # Disable cross-session user memories for session isolation
        add_history_to_context=enable_memory,  # include recent history in prompts
        num_history_runs=5 if enable_memory else 0,  # 🔧 LIMIT context to last 5 runs
        share_member_interactions=True,  # share member messages across team
        store_history_messages=enable_memory,  # persist message history to DB
        store_tool_messages=enable_memory,  # persist tool results
        store_media=enable_memory,  # persist any media if used
        # Session state (always enabled for within-session data passing)
        add_session_state_to_context=True,
        enable_agentic_state=True,
        # Prompting
        description=(
            "You are an intelligent coordinator orchestrating a team of specialized cheminformatics agents. "
            "Your role is to understand user requests, select the appropriate agent(s) or workflows, "
            "and chain multiple agents when needed to complete complex analyses.\n\n"
            "• ChEMBL Downloader: Download bioactivity data from ChEMBL database\n"
            "• GTM Agent: All GTM operations (build/load/density/activity/project) with smart caching\n"
            "• Chemoinformatician: Downstream analysis (scaffold, SAR, similarity, clustering) - works with GTM output\n"
            "• Report Generator: Universal presentation layer for all analysis types\n"
            "• Autoencoder: Small molecule generation via LSTM autoencoders (SMILES, standalone + GTM-guided)\n"
            "• Peptide WAE: Peptide sequence generation + GTM on latent space + DBAASP antimicrobial activity landscapes\n"
            "• SynPlanner: Retrosynthetic planning for target molecules\n\n"
            "**Molecule vs Peptide Routing**:\n"
            "  - 'peptide', 'amino acid', 'AMP', 'antimicrobial peptide' → Peptide WAE agent\n"
            "  - 'SMILES', 'molecule', 'compound', 'small molecule' → Autoencoder agent\n"
            "  - DBAASP/antimicrobial landscapes → Peptide WAE agent (has GTM tools)\n"
            "  - Unqualified 'generate' → Autoencoder (small molecules)\n\n"
            "When coordinating: (1) Assess if a predefined workflow covers the request, (2) Select and chain "
            "specialized agents for multi-step tasks (GTM → Chemoinformatician → Report Generator is common), "
            "(3) For analysis requests, automatically add Report Generator unless user explicitly requests raw data only, "
            "(4) For ambiguous opening requests, apply the INITIAL CLARIFICATION FLOW (peptides vs molecules, then exploratory vs generative), (5) Synthesize insights from agent outputs into coherent analyses."
        ),
        instructions=AGENT_TEAM_INSTRUCTIONS,
        # UX & observability
        markdown=markdown,
        debug_mode=debug_mode,
        stream_member_events=True,  # stream events from members (Team API)
        show_members_responses=show_members_responses,
    )

    logger.info("Successfully created Cs_copilot Agent Team")
    return team

utils

Utility functions for agent operations.

get_last_agent_reply(agent)

Extract the content of the last message from an agent's session.

Source code in src/cs_copilot/agents/utils.py
def get_last_agent_reply(agent: Agent) -> str:
    """Extract the content of the last message from an agent's session."""
    return copy.deepcopy(agent.get_messages_for_session()[-1].to_dict()["content"])