Agents API Reference¶
cs_copilot.agents
¶
Cs_copilot Agents Package
This package provides a comprehensive system for creating and managing AI agents specialized in cheminformatics tasks.
Public API:¶
Agent Creation (Recommended): create_agent(agent_type, model, **kwargs) - Create agents by type list_available_agent_types() - List all available agent types
Team Coordination
get_cs_copilot_agent_team(model, **kwargs) - Multi-agent team with intelligent coordination
Utilities
get_last_agent_reply(agent) - Extract last message from agent
Available Agent Types (5-Agent Architecture):¶
Core Agents: - "chembl_downloader" - Download and process bioactivity data from ChEMBL database - "gtm_agent" - Unified GTM operations (build, load, density, activity, project) with smart caching - "chemoinformatician" - Comprehensive chemoinformatics (chemotype, clustering, SAR, similarity, QSAR) - "report_generator" - Universal presentation layer for all analysis types - "autoencoder" - Molecular generation via LSTM autoencoders (standalone + GTM-guided)
Testing/Evaluation: - "robustness_evaluation" - Analyze robustness test results and metrics
Agent Capabilities Breakdown:¶
Chemoinformatician (Most Versatile): - Chemotype/Scaffold Analysis: Extract and analyze molecular frameworks - Clustering: Group molecules by structural similarity (k-means, hierarchical, DBSCAN) - SAR Analysis: Structure-Activity Relationships, activity cliffs, matched molecular pairs - Similarity/Diversity: Molecular similarity, diversity metrics, nearest neighbors - QSAR Modeling: Extensible framework for predictive modeling (tools to be added)
AgentConfig
dataclass
¶
Configuration for creating an agent.
Source code in src/cs_copilot/agents/factories.py
validate()
¶
Validate the agent configuration.
Source code in src/cs_copilot/agents/factories.py
AgentCreationError
¶
BaseAgentFactory
¶
Bases: ABC
Base class for creating agents with common configuration and error handling.
Source code in src/cs_copilot/agents/factories.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | |
get_agent_config()
abstractmethod
¶
create_agent(model, markdown=True, debug_mode=False, enable_mlflow_tracking=True, **kwargs)
¶
Create an agent with error handling and validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
Model to use for the agent |
required |
markdown
|
bool
|
Whether to enable markdown formatting |
True
|
debug_mode
|
bool
|
Whether to enable debug mode |
False
|
enable_mlflow_tracking
|
bool
|
Whether to enable MLflow tracking for this agent |
True
|
**kwargs
|
Additional keyword arguments for agent creation |
{}
|
Returns:
| Type | Description |
|---|---|
Agent
|
Created agent instance |
Source code in src/cs_copilot/agents/factories.py
create_agent(agent_type, model, **kwargs)
¶
Create an agent by type using the global registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_type
|
str
|
The type of agent to create |
required |
model
|
Model
|
The language model to use |
required |
**kwargs
|
Additional arguments passed to the agent factory |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Agent |
Agent
|
The created agent instance |
Raises:
| Type | Description |
|---|---|
ValueError
|
If agent_type is not registered |
AgentCreationError
|
If agent creation fails |
Source code in src/cs_copilot/agents/registry.py
get_registry()
¶
list_available_agent_types()
¶
get_cs_copilot_agent_team(model, *, markdown=True, debug_mode=False, show_members_responses=True, enable_memory=True, db_file=None, enable_mlflow_tracking=True)
¶
Create a coordinated team of cs_copilot agents using Agno.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
Agno Model instance used for team coordination and member agents |
required |
markdown
|
bool
|
Format output in markdown |
True
|
debug_mode
|
bool
|
Enable debug logs |
False
|
show_members_responses
|
bool
|
Print member responses during coordination |
True
|
enable_memory
|
bool
|
Enable persistent memory (default: True). Set to False for isolated testing to prevent state leakage between runs. |
True
|
db_file
|
str
|
Custom database file path. If not provided, uses CS_COPILOT_MEMORY_DB. Use unique paths for session isolation in testing. |
None
|
enable_mlflow_tracking
|
bool
|
Enable MLflow tracking for agents (default: True). Set to False to disable tracking. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Team |
Team
|
Configured Cs_copilot team |
Raises:
| Type | Description |
|---|---|
AgentCreationError
|
If one or more agents fail to initialize |
Source code in src/cs_copilot/agents/teams.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |
get_last_agent_reply(agent)
¶
Extract the content of the last message from an agent's session.
config
¶
Configuration module for cs_copilot agents. Contains path constants and database configuration settings. Agent instructions and prompts are now in prompts.py.
factories
¶
Agent factory classes for creating specialized cs_copilot agents. Contains the base factory class and all specialized factory implementations.
AgentConfig
dataclass
¶
Configuration for creating an agent.
Source code in src/cs_copilot/agents/factories.py
validate()
¶
Validate the agent configuration.
Source code in src/cs_copilot/agents/factories.py
AgentCreationError
¶
BaseAgentFactory
¶
Bases: ABC
Base class for creating agents with common configuration and error handling.
Source code in src/cs_copilot/agents/factories.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | |
get_agent_config()
abstractmethod
¶
create_agent(model, markdown=True, debug_mode=False, enable_mlflow_tracking=True, **kwargs)
¶
Create an agent with error handling and validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
Model to use for the agent |
required |
markdown
|
bool
|
Whether to enable markdown formatting |
True
|
debug_mode
|
bool
|
Whether to enable debug mode |
False
|
enable_mlflow_tracking
|
bool
|
Whether to enable MLflow tracking for this agent |
True
|
**kwargs
|
Additional keyword arguments for agent creation |
{}
|
Returns:
| Type | Description |
|---|---|
Agent
|
Created agent instance |
Source code in src/cs_copilot/agents/factories.py
ChEMBLDownloaderFactory
¶
Bases: BaseAgentFactory
Factory for creating ChemBL downloader agents.
Source code in src/cs_copilot/agents/factories.py
ChemoinformaticianFactory
¶
Bases: BaseAgentFactory
Factory for creating comprehensive chemoinformatics analysis agents.
This agent is a versatile chemoinformatician capable of: - Chemotype Analysis: Scaffold extraction, chemotype profiling, structural diversity - Clustering: Molecular clustering using various methods (k-means, hierarchical, DBSCAN) - SAR Analysis: Structure-Activity Relationship analysis, activity cliffs, matched molecular pairs - Similarity Analysis: Molecular similarity, diversity metrics, nearest neighbor searches
GTM-Integrated Design: - Primary use case: Downstream analysis after GTM agents (nodes as clusters) - Also works with ANY data source: t-SNE clusters, user CSVs, ChEMBL families - Standardized input: DataFrame with 'smiles' + optional 'cluster_id' + optional 'activity' - Produces structured data output (DataFrames, dicts) - NO report generation - Report generation handled by separate ReportGeneratorAgent
Tools: - ChemicalSimilarityToolkit: Fingerprints, similarity metrics, scaffold extraction - PointerPandasTools: DataFrame operations with S3 support - GTMToolkit: Access to GTM data (source_mols, node projections)
Source code in src/cs_copilot/agents/factories.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 | |
AutoencoderFactory
¶
Bases: BaseAgentFactory
Factory for creating autoencoder-based molecular generation agents.
Supports two modes: - Standalone: Encode/decode SMILES, sample from latent space, interpolate, explore neighborhoods - GTM-guided: Combine GTM maps with autoencoders for targeted molecular generation from specific map regions (by density, activity, or coordinates)
Enhanced with GTM cache awareness to avoid redundant GTM loading when working with GTM Agent in the same session.
Source code in src/cs_copilot/agents/factories.py
GTMAgentFactory
¶
Bases: BaseAgentFactory
Factory for creating unified GTM agents (consolidates optimization, loading, density, activity, projection).
This factory creates a single agent that handles all GTM-related operations via mode-based dispatch: - optimize: Build and optimize new GTM maps - load: Load existing GTM models from S3/local/HuggingFace - density: Analyze compound distributions and neighborhood preservation - activity: Create activity-density landscapes for SAR analysis - project: Project external datasets onto existing GTM maps
Features smart caching to avoid redundant GTM loading across operations.
Source code in src/cs_copilot/agents/factories.py
ReportGeneratorFactory
¶
Bases: BaseAgentFactory
Factory for creating report generation agents.
This agent handles ALL report generation and visualization across different analysis types: - Chemotype analysis reports - GTM density reports - GTM activity/SAR reports - Autoencoder generation reports - Combined/custom reports
Separation of Concerns: Analysis agents produce structured data, Report Generator handles presentation.
This architecture enables: - Consistent formatting across all report types - Reusable visualization patterns - Easy updates to report styles (change in one place) - Clean separation: data processing vs visualization/formatting
Source code in src/cs_copilot/agents/factories.py
RobustnessEvaluationFactory
¶
Bases: BaseAgentFactory
Factory for creating robustness test evaluation agents.
Source code in src/cs_copilot/agents/factories.py
SynPlannerFactory
¶
Bases: BaseAgentFactory
Factory for creating retrosynthetic planning agents powered by SynPlanner.
This agent wraps the official SynPlanner package to perform retrosynthetic analysis on target molecules. It accepts SMILES strings or molecule names, resolves them to canonical SMILES (via PubChem / RDKit), runs the MCTS-based retrosynthesis search, and returns structured route descriptions with optional SVG/PNG visualizations.
Source code in src/cs_copilot/agents/factories.py
PeptideWAEFactory
¶
Bases: BaseAgentFactory
Factory for creating peptide WAE-based sequence generation agents.
This agent uses a Wasserstein Autoencoder (WAE) trained on peptide data to encode, decode, sample, and interpolate amino acid sequences. The WAE can generate any peptides; activity landscape data comes from DBAASP (antimicrobial peptides specifically).
Key capabilities: - Encoding: Convert peptide sequences to 100-dimensional latent vectors - Decoding: Generate peptide sequences from latent vectors - Sampling: Generate novel peptides from Gaussian prior - Interpolation: Smooth transitions between peptides in latent space - Neighborhood exploration: Generate peptide analogs - GTM integration: Train GTMs on latent space, create activity landscapes - Activity landscapes: Use DBAASP data (specific to antimicrobial peptides)
Input format: Space-separated single-letter amino acid codes Example: "M L L L L L A L A L L A L L L A L L L"
Source code in src/cs_copilot/agents/factories.py
prompts
¶
Prompt templates and instructions for cs_copilot agents. Contains all the step-by-step instructions used by various specialized agents.
CHEMBL_INSTRUCTIONS = ["Step 1: Analyze the user's request and identify the biological target or compound type they want to explore.", ' - Distinguish whether the user is asking about a *protein target* (e.g., CDK2, BRAF) or an *organism-level target* (e.g., HIV-1, Influenza A).', " - Record the target_type as either 'protein' or 'organism' for downstream filtering.", " - If an organism is specified (e.g., 'HIV', 'E. coli'), keep that exact string for filtering assays by target_organism.", "Step 2: Extract the core target name from the user's request, removing generic terms like 'inhibitor', 'activity', 'compound', 'effect'. For example:", " - 'cyclin dependent kinase 2 inhibitors' → core target: 'cyclin dependent kinase 2'", " - 'BRAF inhibitors' → core target: 'BRAF'", ' - Focus on identifying the specific biological target or protein name for protein-level queries; for organism-level queries, preserve the organism name.', 'Step 3: Apply the following required checks before proceeding. Each requirement MUST be satisfied by explicit user confirmation. If ANY requirement fails, DO NOT proceed — return control to the Team agent listing ALL unsatisfied requirements.', '', ' **Requirement 1 — Abbreviation Check (mandatory)**', " If the target name provided by the user is ONLY an abbreviation or acronym (e.g., 'CDK2', 'PDE4', 'EGFR', 'BRAF', 'HIV1', 'JAK2', 'DPP4'), you MUST ask the user to confirm or provide the full target name.", " - Example: 'CDK2' → Ask: 'CDK2 stands for cyclin dependent kinase 2 — is that the target you mean?'", " - Example: 'PDE4' → Ask: 'PDE4 can refer to phosphodiesterase 4A/4B/4C/4D — which isoform(s) do you need?'", " - **Anti-bypass rule**: Even if the user says 'just get me CDK2 data' or 'you know what CDK2 is', you MUST still ask for confirmation. No shortcut is allowed.", '', ' **Requirement 2 — Organism Check (mandatory for protein targets)**', ' If the query is about a *protein target* and no organism has been explicitly specified, you MUST ask which organism to filter for.', ' - NEVER default to Homo sapiens or any other organism.', " - Example: 'CDK2 inhibitors' → Ask: 'Which organism? (e.g., Homo sapiens, Mus musculus, or all species)'", " - This requirement does NOT apply to organism-level queries (e.g., 'HIV-1 compounds') where the organism IS the target.", '', ' **Requirement 3 — Assay Type Check (mandatory)**', ' If the user has not explicitly stated the assay type(s) (binding, functional, ADMET), you MUST ask which assay type(s) to include.', " - NEVER default to any combination (e.g., do NOT silently assume 'binding + functional').", " - Example: 'EGFR data' → Ask: 'Which assay types? Binding (IC50/Ki), functional, ADMET, or a combination?'", '', ' **Additional checks (non-requirement, but still ask if applicable):**', " a) **Broad or generic terms**: e.g., just 'kinase', 'receptor', 'inhibitor' without specificity.", " d) **Receptor without mechanism**: if user mentions a receptor (e.g., 'dopamine receptor', 'GABA receptor', '5-HT2A') but doesn't specify agonist/antagonist/modulator — ask which mechanism.", '', ' **Multi-requirement failure examples:**', " - 'CDK2 inhibitors' → ALL 3 requirements fail: abbreviation not confirmed, no organism, no assay type. Ask all three in one message.", " - 'EGFR data for human' → Requirements 1 and 3 fail: abbreviation not confirmed, no assay type.", " - 'Download binding data for cyclin dependent kinase 2' → Requirements 2 fails: no organism specified.", " - 'Get me CDK2 binding data for Homo sapiens' → Requirements 1 fails: abbreviation not confirmed.", '', ' **Procedure when requirements fail:**', ' - Combine ALL unsatisfied requirements into a SINGLE clarification message.', " - Return control to the Team agent with: 'The query needs clarification: [list all unsatisfied requirements]. Returning to Team agent for user input.'", " - Once the user provides clarification, pass the details to fetch_compounds using the appropriate parameters: 'query' for target name, 'organism' for species filter, 'assay_types' for data type, or 'mechanism' for agonist/antagonist/modulator.", ' - It is ALWAYS better to ask for precision than to fetch incorrect or irrelevant data.', 'Step 4: Use the `convert_to_chembl_query` tool with the identified core target to generate multiple keyword variations for ChEMBL search.', ' - The tool will generate abbreviations, shortened forms, and full names (typically 3-5 keywords)', ' - The tool handles greek character replacement and ensures keywords are suitable for ChEMBL assay description searches', " - Example: For 'cyclin dependent kinase 2', the tool will generate: 'cdk2, kinase 2, cyclin dependent kinase 2'", ' - When the query is organism-level, include the organism name as one of the keywords to ensure assays for that organism are retrieved.', " - Determine assay type preferences: map 'binding' → B, 'functional' → F, 'ADMET' → A. The user MUST have explicitly specified assay type(s) before reaching this step (enforced by GATE 3 above). NEVER apply a default.", "Step 5: Use the `fetch_compounds` tool with multiple keywords (comma-separated, e.g., 'cdk2, kinase 2, cyclin dependent kinase 2') to download bioactivity data from ChEMBL. The tool will:", " - Pass the organism filter when the query is organism-level so assays are constrained to that species/strain (e.g., organism='HIV-1').", " - Pass the assay_types filter (e.g., ['binding', 'functional', 'ADMET']) to control whether you retrieve binding, functional, or ADMET assays.", " - Pass the mechanism filter if the user specified a mechanism of action (e.g., mechanism='agonist' for agonist assays, mechanism='antagonist' for antagonist assays). This filters assays by their description to keep only those matching the specified mechanism.", ' - Search for assays related to each keyword separately', ' - Retrieve activity data for all found assays', ' - Merge all results and automatically remove duplicates', 'Step 6: After successful data fetch, verify the dataset quality:', ' - Check that SMILES structures were successfully mapped', ' - Verify the dataset contains expected columns (activity_id, molecule_chembl_id, canonical_smiles, standard_value, etc.)', ' - Confirm the data covers the intended biological target', ' - Confirm the assay_type column contains the requested assay categories (B=Binding, F=Functional, A=ADMET)', ' - Note the number of duplicates that were removed during merging', 'Step 7: Use the `describe_dataset` tool to generate comprehensive statistics for the downloaded dataset.', 'Step 8: Report key metrics to the user:', ' - Total number of compounds and activities', ' - Range of activity values (IC50, Ki, etc.)', ' - Data quality indicators (missing values, duplicates)', ' - Target coverage and assay diversity', 'Step 9: If data fetch fails, troubleshoot systematically:', ' - Check if the query terms are too specific (try broader terms)', ' - Verify ChEMBL connectivity using ping functionality (works for all SQL and REST backends)', ' - Consider alternative search strategies (different resource types: activity, molecule, assay)', ' - Handle rate limiting by implementing appropriate delays', 'Step 10: When working with dataframes, use inplace operations to modify dataframes (e.g., `df.drop(..., inplace=True)`) to avoid printing entire dataframes to the console, which can cause context window issues. Avoid operations like `df.assign()` that return new dataframes and may be printed.', 'Step 11: Prepare a comma-separated .csv file with all fetched data including molecules, their activities, and parent dataset information in the respective columns.', "Step 12: Save the dataframe to a .csv file. The `fetch_compounds` tool automatically stores the dataset path in session_state['data_file_paths']['dataset_path'].", 'Step 13: Confirm the dataset is properly saved to S3 storage with a descriptive filename.', 'Step 14: Provide the user with the exact filename and path for future reference.'] + HANDLING_NEW_FILES_INSTRUCTIONS
module-attribute
¶
Expert chemoinformatician capable of: - Chemotype/scaffold analysis - Clustering and chemical space mapping - SAR analysis - Similarity and diversity analysis - QSAR modeling (extensible)
Method-agnostic, modular, and extensible design.
GTM_AGENT_INSTRUCTIONS = ['Step 1: Determine the operation mode based on user request and context:', " - **optimize mode**: User asks to 'build', 'create', 'optimize', or 'train' a GTM map", " - **load mode**: User asks to 'load', 'retrieve', or 'use existing' GTM model", " - **density mode**: User asks about 'density', 'distribution', 'neighborhood preservation', or 'analyze GTM map'", " - **activity mode**: User asks about 'activity landscape', 'SAR', 'potency zones', or 'active regions'", " - **project mode**: User asks to 'project', 'map new data', or 'apply GTM to external dataset'", " - If unclear, default to load mode and check for cached GTM in session_state['gtm_cache']", 'Step 2: Check for cached GTM before loading from files:', " - If session_state['gtm_cache'] exists and is not None:", " - Verify cache validity: check metadata['dataset_shape'] matches current dataset if applicable", ' - If valid, reuse cached GTM model and dataset (skip loading)', ' - If invalid (dataset changed), proceed to load/optimize as needed', ' - If no cache exists, proceed with mode-specific loading', 'Step 3: Execute mode-specific workflow:', '', '**OPTIMIZE MODE**:', " 1. Load chemical data from session_state['data_file_paths']['dataset_path'] or user-provided path", ' 2. Verify SMILES column exists using available tools', ' 3. Run gtm_optimization with appropriate k_hit values (try multiple if not specified)', ' 4. For each k_hit: fit GTM, save with save_gtm_and_data, evaluate smoothness', ' 5. Select best GTM map (smoothest or user-specified criteria)', ' 6. **Cache the result**:', " - session_state['gtm_cache'] = {", " 'model': gtm_model_object,", " 'dataset': preprocessed_dataframe,", " 'metadata': {", " 'path': gtm_file_path,", " 'created_at': timestamp,", " 'dataset_shape': df.shape,", " 'source': 'optimize',", " 'optimization_metrics': {...}", ' }', ' }', " 7. Update session_state['gtm_file_paths'] = {'gtm_path': ..., 'dataset_path': ..., 'gtm_plot_path': ...}", ' 8. Generate and save GTM plot using save_gtm_plot', '', '**LOAD MODE**:', ' 1. Resolve GTM model path (priority order):', ' - User-provided explicit path', " - session_state['gtm_file_paths']['gtm_path']", ' - S3 assets bucket (via path resolver)', ' - Default model repository', ' - HuggingFace mirror (last resort)', ' 2. Load GTM using load_gtm_model_only(gtm_file)', ' 3. Determine associated dataset:', ' - If user provides dataset path → use it', ' - If dataset file next to GTM → use it', " - If session_state['data_file_paths']['dataset_path'] exists → use it", ' - Otherwise, ask user which dataset to use', ' 4. When dataset available, call load_and_prep_data(dataset, gtm_model) to build projections', " 5. **Cache the result** (same structure as optimize mode, source='load')", " 6. Update session_state['gtm_file_paths']", '', '**DENSITY MODE**:', " 1. **Check cache first**: If session_state['gtm_cache'] exists, reuse it (skip loading)", ' 2. If no cache, load GTM and dataset via load mode workflow above', ' 3. Call load_gtm_get_density_matrix(dataset_file, gtm_file) to get density and neighborhood tables', " 4. Analyze density table ['x', 'y', 'nodes', 'filtered_density']:", ' - Calculate max/min/mean/median density', ' - Identify top 5 densest nodes and top 5 sparsest nodes', ' - Describe spatial patterns (compass/quadrant terms)', " 5. Analyze neighborhood preservation table ['x', 'y', 'nodes', 'density', 'neighborhood score']:", ' - Report preservation quality metrics', ' - Identify well-preserved vs poorly-preserved regions', ' 6. Save density results:', " - session_state['analysis_results']['density_csv'] = density_csv_path", " - session_state['analysis_results']['plots'].append(density_plot_path)", ' 7. Generate visualization with density overlay using save_gtm_plot', ' 8. Provide 3-bullet executive summary', '', '**ACTIVITY MODE**:', " 1. **Check cache first**: If session_state['gtm_cache'] exists, reuse it", ' 2. If no cache, load GTM and dataset via load mode workflow', f' 3. Call create_activity_landscapes(dataset, gtm_model, node_threshold={DEFAULT_NODE_THRESHOLD}, chart_width={DEFAULT_CHART_WIDTH}, chart_height={DEFAULT_CHART_HEIGHT})', ' 4. The tool returns file prefix and creates CSV + PNG/HTML files', ' 5. Save paths to session_state:', " - session_state['landscape_files']['landscape_data_csv'] = csv_path", " - session_state['landscape_files']['landscape_plot'] = plot_path", " - session_state['analysis_results']['activity_csv'] = csv_path # Also save here for consistency", " 6. Load landscape CSV and analyze ['x', 'y', 'nodes', 'filtered_reg_density']:", ' - Global stats: max, min, mean, median of reg_density', ' - Identify top 5 active nodes and top 5 inactive nodes', " - Describe spatial trends (compass directions, e.g., 'dense band across center')", ' 7. Cross-layer analysis:', ' - Do density hotspots coincide with potent areas?', ' - Flag anomalies (dense but low-quality, sparse but high-activity)', ' - Identify gaps/unreliable regions (zero density, NaNs)', ' 8. Provide 3-bullet SAR takeaway with actionable recommendations', ' 9. Show activity landscape plot in output (markdown format, blue gradient: dark=high activity, light=low)', '', '**PROJECT MODE**:', " 1. **Check cache first**: If session_state['gtm_cache'] exists, reuse GTM model", ' 2. If no cache, load GTM via load mode workflow', ' 3. Get external dataset path from user or session_state', ' 4. Call project_data_on_gtm(external_dataset, gtm_model):', ' - Tool validates SMILES, checks compatibility', ' - Returns preprocessed CSV with GTM projections', ' 5. Analyze projection results:', ' - Compare distribution of external data vs original training data', ' - Identify covered vs novel regions', ' - Calculate distribution statistics', ' 6. Generate comparative visualization using save_gtm_plot(preprocessed_csv, gtm_model)', ' 7. Save projection results:', " - session_state['analysis_results']['projection_csv'] = projection_csv_path", " - session_state['analysis_results']['plots'].append(projection_plot_path)", ' 8. Provide summary of projection quality and coverage', 'Step 4: Final output formatting:', ' - Return concise summary of operation performed', ' - Include key metrics and file paths', ' - For plots, show using markdown format: ', ' - Highlight any warnings or anomalies discovered', ' - Confirm session_state updates for downstream agents', 'Step 5: Error handling:', ' - If GTM loading fails, check path resolver and suggest alternatives', ' - If dataset incompatible, explain mismatch (e.g., wrong SMILES column)', ' - If cache invalid, automatically reload from files', ' - For optimization failures, suggest trying different k_hit values', 'Step 6: Latent-space GTM operations (for peptide WAE latent vectors):', ' - The GTM can also operate on pre-computed latent vectors from WAE models (not just SMILES descriptors)', " - When user mentions 'peptide GTM', 'latent space GTM', or 'WAE GTM', delegate to the Peptide WAE agent", ' - The Peptide WAE agent has GTM tools and handles the full peptide+GTM workflow', ' - For SMILES-based GTM: use standard descriptor workflow (this agent)', ' - For peptide latent-space GTM: route to Peptide WAE agent'] + HANDLING_NEW_FILES_INSTRUCTIONS
module-attribute
¶
Universal presentation layer for all analysis types. Generates markdown reports and visualizations from structured analysis results.
registry
¶
Agent registry system for managing and creating agents dynamically. Provides the main public API for agent creation.
AgentRegistry
¶
Registry for managing agent factories and configurations.
Source code in src/cs_copilot/agents/registry.py
register(agent_type, factory, aliases=None)
¶
Register an agent factory with optional aliases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_type
|
str
|
Canonical agent type name |
required |
factory
|
BaseAgentFactory
|
Factory instance |
required |
aliases
|
List[str]
|
Optional list of alias names that redirect to this agent |
None
|
Source code in src/cs_copilot/agents/registry.py
create_agent(agent_type, model, **kwargs)
¶
Create an agent by type (supports aliases).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_type
|
str
|
Agent type or alias |
required |
model
|
Model
|
LLM model instance |
required |
**kwargs
|
Additional arguments for agent creation |
{}
|
Returns:
| Type | Description |
|---|---|
Agent
|
Agent instance |
Raises:
| Type | Description |
|---|---|
ValueError
|
If agent_type/alias is not registered |
Source code in src/cs_copilot/agents/registry.py
list_agent_types()
¶
auto_register()
¶
Automatically discover and register all available factories.
Source code in src/cs_copilot/agents/registry.py
create_agent(agent_type, model, **kwargs)
¶
Create an agent by type using the global registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_type
|
str
|
The type of agent to create |
required |
model
|
Model
|
The language model to use |
required |
**kwargs
|
Additional arguments passed to the agent factory |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Agent |
Agent
|
The created agent instance |
Raises:
| Type | Description |
|---|---|
ValueError
|
If agent_type is not registered |
AgentCreationError
|
If agent creation fails |
Source code in src/cs_copilot/agents/registry.py
list_available_agent_types()
¶
teams
¶
Team coordination functionality for multi-agent workflows.
get_cs_copilot_agent_team(model, *, markdown=True, debug_mode=False, show_members_responses=True, enable_memory=True, db_file=None, enable_mlflow_tracking=True)
¶
Create a coordinated team of cs_copilot agents using Agno.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Model
|
Agno Model instance used for team coordination and member agents |
required |
markdown
|
bool
|
Format output in markdown |
True
|
debug_mode
|
bool
|
Enable debug logs |
False
|
show_members_responses
|
bool
|
Print member responses during coordination |
True
|
enable_memory
|
bool
|
Enable persistent memory (default: True). Set to False for isolated testing to prevent state leakage between runs. |
True
|
db_file
|
str
|
Custom database file path. If not provided, uses CS_COPILOT_MEMORY_DB. Use unique paths for session isolation in testing. |
None
|
enable_mlflow_tracking
|
bool
|
Enable MLflow tracking for agents (default: True). Set to False to disable tracking. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Team |
Team
|
Configured Cs_copilot team |
Raises:
| Type | Description |
|---|---|
AgentCreationError
|
If one or more agents fail to initialize |
Source code in src/cs_copilot/agents/teams.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |