Why This Stack?

a10y is not a monolithic product. It is a composition of best-of-breed open source components — each chosen for a specific reason. This page explains the trade-offs behind those choices.

OpenObserve alone is not enough

OpenObserve excels at what it does: storing and querying logs, metrics, and traces with extreme storage efficiency. But observability data and alert management are fundamentally different problems.

Capability OpenObserve Keep
Log / Metric / Trace storage Excellent Out of scope
SQL / PromQL query Native N/A
Threshold-based alerting Basic Advanced (multi-source)
Alert deduplication No Built-in
Cross-source correlation No AI-powered grouping
Recovery alert handling No Tracks resolved vs. active
Remediation workflows No Declarative YAML
Multi-tool integration Limited 110+ bidirectional

OpenObserve answers "what happened?" — Keep answers "what is the problem, and what remains?"

Without Keep, operators must mentally correlate alerts from raw telemetry, track which issues have been resolved, and manually trigger remediation. This is the gap that prevents achieving TMF L3+ autonomy.

Why not Datadog?

Datadog is a powerful, mature platform. But for autonomous network operations in telecom, it introduces fundamental constraints.

Datadog a10y (OpenObserve + Keep)
Deployment SaaS only Self-hosted / Air-gapped
Data sovereignty Data leaves your network All data stays on-premise
Cost model Per-host + per-GB ingestion Infrastructure cost only
Telecom-scale log volume Expensive at high volume 140x lower storage cost
Custom AI / LLM integration Locked to Datadog AI Bring your own model
Closed-loop automation Workflow Automation (limited) Keep workflows + correlation-engine
Network protocol support Agent-based, IT-centric Syslog, SNMP, gNMI, custom VRL
Source code access Proprietary Full OSS

Datadog is built for IT/cloud monitoring. a10y is built for telecom autonomous operations.

Telecom operators require data sovereignty, air-gap deployability, telecom-native protocol support, and cost predictability at petabyte-scale log volumes. These are not optional — they are regulatory and operational requirements.

a10y alone does not achieve Autonomous

a10y provides the cognitive core — the ability to observe, understand, and act on network events. But autonomous operations require more than intelligence. They require an operational platform.

Aether Platform
The operational foundation that makes autonomy possible
aether-ide
Unified operator interface — topology, dashboards, AI chat
aether-term
CLI-first interface for headless / SSH-only environments
active-inventory
Live topology graph — blast radius, impact analysis
Helm charts
Production-grade deployment, scaling, lifecycle management
integrates
a10y
Cognitive core — observe, understand, correlate, act
correlation-engine
LLM + statistical AI for causal reasoning
OpenObserve + Vector
Telemetry ingestion, storage, query
Keep
Alert lifecycle, correlation, remediation
Qdrant + NATS
Knowledge memory + event backbone

What Aether Platform adds

Operator Experience

a10y produces insights and actions. Aether Platform presents them in context — topology views, incident timelines, and natural-language interaction. Without this layer, operators must navigate multiple dashboards and mentally stitch information together.

Topological Awareness

active-inventory maintains a live graph of network devices, links, and services. This is critical for blast radius estimation ("if this router fails, which services are affected?") and impact-aware remediation ("reroute traffic before replacing the card"). a10y's correlation-engine uses this context, but it doesn't own it.

Production Readiness

Helm charts handle Kubernetes deployment, resource management, secret injection, and upgrade strategies. Moving from "it works on my laptop" to "it runs in a carrier network" requires infrastructure engineering that is separate from the intelligence layer.

Closed Loop Completion

True closed-loop automation requires a feedback path: Act → Verify → Adjust. Aether Platform closes this loop by connecting remediation actions back to observability data, confirming recovery, and escalating when automated fixes fail.

a10y is the brain. Aether Platform is the body.

Intelligence without operational context is just analysis. Autonomous operations require both — the ability to understand what is happening and the infrastructure to act on that understanding safely, at scale, in production.

TMF Autonomy Levels — Where Each Stack Lands

The TM Forum defines six levels of network autonomy (L0–L5). Different tooling choices land you at different levels. Here is where each approach realistically places you.

L1
Manual with basic scripts
No integrated tooling
Operators rely on CLI access, ad-hoc scripts, and tribal knowledge. Alert fatigue is the norm. Every incident is a fire drill. This is where many networks still are — not by choice, but by inertia.
L2
Monitoring-centric
OpenObserve / Datadog / Grafana (standalone)
Dashboards exist. Alerts fire. But correlation is manual, deduplication is nonexistent, and remediation means "someone gets paged and SSHs into the box." You can see the problem — you just can't do anything about it automatically.
L3
AI-assisted operations
Aether Platform + a10y
AI correlates alerts, suggests root causes, and recommends actions. Operators make final decisions with full context — topology, history, and causal analysis at their fingertips. Human-in-the-loop, but the loop is intelligent. This is where Aether Platform + a10y delivers value today.
L4
Autonomous operations
a10y vision — closed-loop automation
The system detects, understands, acts, and verifies — autonomously. Operators supervise rather than operate. Remediation workflows execute automatically with blast-radius-aware safeguards. Humans intervene only for novel situations. This is the destination a10y is engineered toward.

Most networks are stuck at L1–L2. The jump to L3 requires more than better dashboards — it requires alert intelligence (Keep), topological context (active-inventory), and AI reasoning (correlation-engine) working together.

The jump from L3 to L4 requires trust — trust built through transparent AI decisions, verifiable outcomes, and gradual expansion of autonomous scope. a10y is designed to earn that trust incrementally.

Summary

Question Answer
Why not OpenObserve alone? It stores telemetry but cannot deduplicate, correlate, or remediate alerts. Keep fills this gap. OpenObserve alone keeps you at L2.
Why not Datadog? SaaS-only, no data sovereignty, cost-prohibitive at telecom scale, locked AI, limited telecom protocol support. Also L2 — with a bigger bill.
Why not a10y alone? a10y is the cognitive core. Aether Platform provides the operator interface, topology awareness, and production infrastructure needed to reach L3 and progress toward L4.