Docs
Core Concepts
Overview

Core Concepts

Understanding the fundamental concepts of AgenticAnts will help you make the most of the platform.

Platform Architecture

AgenticAnts is built around these core principles:

1. LLMOps Framework

AgenticAnts implements LLMOps (Large Language Model Operations) - the comprehensive discipline for managing LLM operations from development to production.

2. Three Pillars Approach

We provide comprehensive LLMOps coverage through three integrated domains:

  • FinOps: Cost optimization and financial management
  • SRE: Reliability engineering and performance
  • Security Posture: Security and compliance

3. Agent-Centric Observability

Everything in AgenticAnts is centered around AI agents - autonomous systems that make decisions and take actions.

4. Credit-Based Economics

Flexible, usage-based pricing that scales with your needs

5. OpenTelemetry Standard

Built on industry standards for maximum compatibility

Key Concepts

LLMOps Framework

LLMOps encompasses the entire lifecycle of LLM operations:

  • Model Lifecycle Management - Selection, versioning, deployment, and retirement
  • Prompt Operations - Prompt engineering, versioning, and optimization
  • Performance Optimization - Latency, throughput, and cost optimization
  • Model Governance - Policies, compliance, and risk management
  • Versioning & Deployment - CI/CD pipelines and rollback strategies

Learn more about LLMOPs →

Agents

An agent is an autonomous AI system that:

  • Receives inputs (user queries, events, data)
  • Makes decisions using LLMs and logic
  • Takes actions (API calls, tool usage, responses)
  • Learns and adapts over time

Learn more about AI agents →

Traces

A trace represents a complete execution path of an agent or application:

Trace: Customer Support Request
├─ Span: Query Classification
├─ Span: Retrieve Customer Data
│  └─ Span: Database Query
├─ Span: LLM Processing
│  ├─ Span: Token Generation
│  └─ Span: Response Formatting
└─ Span: Response Delivery

Learn more about tracing →

Spans

A span represents a single unit of work within a trace:

  • Function call
  • API request
  • LLM inference
  • Database query
  • Tool execution

Metrics

Metrics are numerical measurements collected over time:

  • Latency (p50, p95, p99)
  • Throughput (requests/second)
  • Error rates
  • Token usage
  • Cost per operation

Events

Events are discrete occurrences in your system:

  • Agent started
  • Error occurred
  • Threshold exceeded
  • User feedback received

The Three Pillars

FinOps - AI Cost Optimization

Control and optimize your AI spending:

Key Features:

  • Token usage tracking
  • Cost attribution (per customer, per agent, per operation)
  • Budget management and alerts
  • Contract optimization recommendations
  • ROI analytics

Use Cases:

  • "How much does our customer support agent cost per query?"
  • "Which customers are driving the most AI costs?"
  • "What's the ROI of our AI investments?"

Explore FinOps →

SRE - AI Reliability Engineering

Ensure your AI systems are reliable and performant:

Key Features:

  • End-to-end tracing
  • Performance monitoring
  • Automated alerting
  • Incident response
  • SLA tracking

Use Cases:

  • "Why is our agent slow for certain queries?"
  • "What caused the spike in errors yesterday?"
  • "Are we meeting our SLA targets?"

Explore SRE →

Security Posture - AI Security Control

Secure your AI operations and maintain compliance:

Key Features:

  • PII detection and redaction
  • Security guardrails
  • Compliance reporting
  • Audit trails
  • RBAC and access control

Use Cases:

  • "Are we exposing any PII in our agent responses?"
  • "Can we prove GDPR compliance for our AI systems?"
  • "Who accessed sensitive agent data?"

Explore Security Posture →

Credit System

AgenticAnts uses a credit-based pricing model for flexible, usage-based billing.

How Credits Work

Credits are consumed based on platform usage:

OperationCredit Cost
Trace ingestion (per 1000)1 credit
Span ingestion (per 1000)0.1 credit
Metric data point (per 1000)0.05 credit
Data storage (per GB/month)5 credits
API request (per 1000)0.5 credit

Credit Allocation

Credits can be used flexibly across:

  • Observability (traces, metrics, logs)
  • Agents (monitoring, analytics)
  • Policies (evaluation, enforcement)
  • Projects (multi-project organizations)

Learn more about credits →

Observability Model

AgenticAnts provides comprehensive observability for AI systems:

Collection Layer

Your Application

AgenticAnts SDK / OpenTelemetry

Ingestion Pipeline

Storage & Indexing

Data Types

  1. Traces: Complete execution paths
  2. Metrics: Time-series measurements
  3. Logs: Discrete events and messages
  4. Metadata: Context and tags

Query Layer

Storage & Indexing

Query Engine

├─ Dashboard UI
├─ REST API
├─ GraphQL API
└─ Webhooks

Learn more about observability →

Data Model

Hierarchy

Organization
└─ Projects
   └─ Environments
      └─ Agents
         └─ Traces
            └─ Spans
               └─ Events

Relationships

  • Organizations contain multiple Projects
  • Projects have multiple Environments (prod, staging, dev)
  • Environments host multiple Agents
  • Agents generate Traces
  • Traces contain Spans
  • Spans can have Events

Best Practices

1. Structured Instrumentation

// Good: Structured and consistent
await ants.trace.create({
  name: 'customer-support-agent',
  input: query,
  metadata: {
    customerId: '123',
    channel: 'web',
    priority: 'high'
  }
})
 
// Avoid: Unstructured or missing context
await ants.trace.create({
  name: 'agent',
  input: query
})

2. Meaningful Names

// Good: Descriptive and hierarchical
'customer-support.classify-query'
'customer-support.retrieve-context'
'customer-support.generate-response'
 
// Avoid: Vague or inconsistent
'function1'
'process'
'handler'

3. Rich Metadata

Include relevant context:

{
  metadata: {
    // Business context
    customerId: '123',
    orderId: 'ORD-456',
    
    // Technical context
    modelName: 'gpt-4',
    temperature: 0.7,
    
    // Operational context
    region: 'us-east-1',
    version: '1.2.3'
  }
}

4. Error Handling

Always capture errors:

try {
  const result = await agent.run(input)
  await trace.complete({ output: result })
} catch (error) {
  await trace.error({
    error: error.message,
    stack: error.stack,
    severity: 'error'
  })
  throw error
}

Common Patterns

Pattern 1: Multi-Agent Systems

const mainTrace = await ants.trace.create({
  name: 'multi-agent-workflow'
})
 
// Coordinator agent
const coordinatorSpan = mainTrace.span('coordinator-agent')
const plan = await coordinator.plan(query)
coordinatorSpan.end()
 
// Worker agents (parallel)
const results = await Promise.all(
  plan.tasks.map(task => 
    workerAgent.execute(task, mainTrace)
  )
)
 
mainTrace.complete({ output: results })

Pattern 2: RAG Systems

const trace = await ants.trace.create({
  name: 'rag-query-system'
})
 
// Retrieval phase
const retrievalSpan = trace.span('document-retrieval')
const docs = await vectorDB.search(query)
retrievalSpan.end({ documents: docs.length })
 
// Generation phase
const generationSpan = trace.span('llm-generation')
const response = await llm.generate({ query, context: docs })
generationSpan.end({ tokens: response.usage.total })
 
trace.complete({ output: response.text })

Pattern 3: Tool-Using Agents

const trace = await ants.trace.create({
  name: 'tool-using-agent'
})
 
// Agent decides which tools to use
const planSpan = trace.span('plan-tools')
const toolPlan = await agent.plan(query)
planSpan.end()
 
// Execute tools
for (const tool of toolPlan.tools) {
  const toolSpan = trace.span(`tool:${tool.name}`)
  const result = await executeTool(tool)
  toolSpan.end({ result })
}
 
trace.complete()

Next Steps

Explore each concept in detail: