Core Concepts
Understanding the fundamental concepts of AgenticAnts will help you make the most of the platform.
Platform Architecture
AgenticAnts is built around these core principles:
1. LLMOps Framework
AgenticAnts implements LLMOps (Large Language Model Operations) - the comprehensive discipline for managing LLM operations from development to production.
2. Three Pillars Approach
We provide comprehensive LLMOps coverage through three integrated domains:
- FinOps: Cost optimization and financial management
- SRE: Reliability engineering and performance
- Security Posture: Security and compliance
3. Agent-Centric Observability
Everything in AgenticAnts is centered around AI agents - autonomous systems that make decisions and take actions.
4. Credit-Based Economics
Flexible, usage-based pricing that scales with your needs
5. OpenTelemetry Standard
Built on industry standards for maximum compatibility
Key Concepts
LLMOps Framework
LLMOps encompasses the entire lifecycle of LLM operations:
- Model Lifecycle Management - Selection, versioning, deployment, and retirement
- Prompt Operations - Prompt engineering, versioning, and optimization
- Performance Optimization - Latency, throughput, and cost optimization
- Model Governance - Policies, compliance, and risk management
- Versioning & Deployment - CI/CD pipelines and rollback strategies
Agents
An agent is an autonomous AI system that:
- Receives inputs (user queries, events, data)
- Makes decisions using LLMs and logic
- Takes actions (API calls, tool usage, responses)
- Learns and adapts over time
Traces
A trace represents a complete execution path of an agent or application:
Trace: Customer Support Request
├─ Span: Query Classification
├─ Span: Retrieve Customer Data
│ └─ Span: Database Query
├─ Span: LLM Processing
│ ├─ Span: Token Generation
│ └─ Span: Response Formatting
└─ Span: Response DeliverySpans
A span represents a single unit of work within a trace:
- Function call
- API request
- LLM inference
- Database query
- Tool execution
Metrics
Metrics are numerical measurements collected over time:
- Latency (p50, p95, p99)
- Throughput (requests/second)
- Error rates
- Token usage
- Cost per operation
Events
Events are discrete occurrences in your system:
- Agent started
- Error occurred
- Threshold exceeded
- User feedback received
The Three Pillars
FinOps - AI Cost Optimization
Control and optimize your AI spending:
Key Features:
- Token usage tracking
- Cost attribution (per customer, per agent, per operation)
- Budget management and alerts
- Contract optimization recommendations
- ROI analytics
Use Cases:
- "How much does our customer support agent cost per query?"
- "Which customers are driving the most AI costs?"
- "What's the ROI of our AI investments?"
SRE - AI Reliability Engineering
Ensure your AI systems are reliable and performant:
Key Features:
- End-to-end tracing
- Performance monitoring
- Automated alerting
- Incident response
- SLA tracking
Use Cases:
- "Why is our agent slow for certain queries?"
- "What caused the spike in errors yesterday?"
- "Are we meeting our SLA targets?"
Security Posture - AI Security Control
Secure your AI operations and maintain compliance:
Key Features:
- PII detection and redaction
- Security guardrails
- Compliance reporting
- Audit trails
- RBAC and access control
Use Cases:
- "Are we exposing any PII in our agent responses?"
- "Can we prove GDPR compliance for our AI systems?"
- "Who accessed sensitive agent data?"
Credit System
AgenticAnts uses a credit-based pricing model for flexible, usage-based billing.
How Credits Work
Credits are consumed based on platform usage:
| Operation | Credit Cost |
|---|---|
| Trace ingestion (per 1000) | 1 credit |
| Span ingestion (per 1000) | 0.1 credit |
| Metric data point (per 1000) | 0.05 credit |
| Data storage (per GB/month) | 5 credits |
| API request (per 1000) | 0.5 credit |
Credit Allocation
Credits can be used flexibly across:
- Observability (traces, metrics, logs)
- Agents (monitoring, analytics)
- Policies (evaluation, enforcement)
- Projects (multi-project organizations)
Observability Model
AgenticAnts provides comprehensive observability for AI systems:
Collection Layer
Your Application
↓
AgenticAnts SDK / OpenTelemetry
↓
Ingestion Pipeline
↓
Storage & IndexingData Types
- Traces: Complete execution paths
- Metrics: Time-series measurements
- Logs: Discrete events and messages
- Metadata: Context and tags
Query Layer
Storage & Indexing
↓
Query Engine
↓
├─ Dashboard UI
├─ REST API
├─ GraphQL API
└─ WebhooksLearn more about observability →
Data Model
Hierarchy
Organization
└─ Projects
└─ Environments
└─ Agents
└─ Traces
└─ Spans
└─ EventsRelationships
- Organizations contain multiple Projects
- Projects have multiple Environments (prod, staging, dev)
- Environments host multiple Agents
- Agents generate Traces
- Traces contain Spans
- Spans can have Events
Best Practices
1. Structured Instrumentation
// Good: Structured and consistent
await ants.trace.create({
name: 'customer-support-agent',
input: query,
metadata: {
customerId: '123',
channel: 'web',
priority: 'high'
}
})
// Avoid: Unstructured or missing context
await ants.trace.create({
name: 'agent',
input: query
})2. Meaningful Names
// Good: Descriptive and hierarchical
'customer-support.classify-query'
'customer-support.retrieve-context'
'customer-support.generate-response'
// Avoid: Vague or inconsistent
'function1'
'process'
'handler'3. Rich Metadata
Include relevant context:
{
metadata: {
// Business context
customerId: '123',
orderId: 'ORD-456',
// Technical context
modelName: 'gpt-4',
temperature: 0.7,
// Operational context
region: 'us-east-1',
version: '1.2.3'
}
}4. Error Handling
Always capture errors:
try {
const result = await agent.run(input)
await trace.complete({ output: result })
} catch (error) {
await trace.error({
error: error.message,
stack: error.stack,
severity: 'error'
})
throw error
}Common Patterns
Pattern 1: Multi-Agent Systems
const mainTrace = await ants.trace.create({
name: 'multi-agent-workflow'
})
// Coordinator agent
const coordinatorSpan = mainTrace.span('coordinator-agent')
const plan = await coordinator.plan(query)
coordinatorSpan.end()
// Worker agents (parallel)
const results = await Promise.all(
plan.tasks.map(task =>
workerAgent.execute(task, mainTrace)
)
)
mainTrace.complete({ output: results })Pattern 2: RAG Systems
const trace = await ants.trace.create({
name: 'rag-query-system'
})
// Retrieval phase
const retrievalSpan = trace.span('document-retrieval')
const docs = await vectorDB.search(query)
retrievalSpan.end({ documents: docs.length })
// Generation phase
const generationSpan = trace.span('llm-generation')
const response = await llm.generate({ query, context: docs })
generationSpan.end({ tokens: response.usage.total })
trace.complete({ output: response.text })Pattern 3: Tool-Using Agents
const trace = await ants.trace.create({
name: 'tool-using-agent'
})
// Agent decides which tools to use
const planSpan = trace.span('plan-tools')
const toolPlan = await agent.plan(query)
planSpan.end()
// Execute tools
for (const tool of toolPlan.tools) {
const toolSpan = trace.span(`tool:${tool.name}`)
const result = await executeTool(tool)
toolSpan.end({ result })
}
trace.complete()Next Steps
Explore each concept in detail: