Why This Stack?

a10y is not a monolithic product. It is a composition of best-of-breed open source components — each chosen for a specific reason. This page explains the trade-offs behind those choices.

OpenObserve alone is not enough

OpenObserve excels at what it does: storing and querying logs, metrics, and traces with extreme storage efficiency. But observability data and alert management are fundamentally different problems.

Capability	OpenObserve	Keep
Log / Metric / Trace storage	Excellent	Out of scope
SQL / PromQL query	Native	N/A
Threshold-based alerting	Basic	Advanced (multi-source)
Alert deduplication	No	Built-in
Cross-source correlation	No	AI-powered grouping
Recovery alert handling	No	Tracks resolved vs. active
Remediation workflows	No	Declarative YAML
Multi-tool integration	Limited	110+ bidirectional

OpenObserve answers "what happened?" — Keep answers "what is the problem, and what remains?"

Without Keep, operators must mentally correlate alerts from raw telemetry, track which issues have been resolved, and manually trigger remediation. This is the gap that prevents achieving TMF L3+ autonomy.

Why not Datadog?

Datadog is a powerful, mature platform. But for autonomous network operations in telecom, it introduces fundamental constraints.

	Datadog	a10y (OpenObserve + Keep)
Deployment	SaaS only	Self-hosted / Air-gapped
Data sovereignty	Data leaves your network	All data stays on-premise
Cost model	Per-host + per-GB ingestion	Infrastructure cost only
Telecom-scale log volume	Expensive at high volume	140x lower storage cost
Custom AI / LLM integration	Locked to Datadog AI	Bring your own model
Closed-loop automation	Workflow Automation (limited)	Keep workflows + correlation-engine
Network protocol support	Agent-based, IT-centric	Syslog, SNMP, gNMI, custom VRL
Source code access	Proprietary	Full OSS

Datadog is built for IT/cloud monitoring. a10y is built for telecom autonomous operations.

Telecom operators require data sovereignty, air-gap deployability, telecom-native protocol support, and cost predictability at petabyte-scale log volumes. These are not optional — they are regulatory and operational requirements.

a10y alone does not achieve Autonomous

a10y provides the cognitive core — the ability to observe, understand, and act on network events. But autonomous operations require more than intelligence. They require an operational platform.

Aether Platform

The operational foundation that makes autonomy possible

aether-ide

Unified operator interface — topology, dashboards, AI chat

aether-term

CLI-first interface for headless / SSH-only environments

active-inventory

Live topology graph — blast radius, impact analysis

Helm charts

Production-grade deployment, scaling, lifecycle management

integrates

a10y

Cognitive core — observe, understand, correlate, act

correlation-engine

LLM + statistical AI for causal reasoning

OpenObserve + Vector

Telemetry ingestion, storage, query

Keep

Alert lifecycle, correlation, remediation

Qdrant + NATS

Knowledge memory + event backbone

What Aether Platform adds

Operator Experience

a10y produces insights and actions. Aether Platform presents them in context — topology views, incident timelines, and natural-language interaction. Without this layer, operators must navigate multiple dashboards and mentally stitch information together.

Topological Awareness

active-inventory maintains a live graph of network devices, links, and services. This is critical for blast radius estimation ("if this router fails, which services are affected?") and impact-aware remediation ("reroute traffic before replacing the card"). a10y's correlation-engine uses this context, but it doesn't own it.

Production Readiness

Helm charts handle Kubernetes deployment, resource management, secret injection, and upgrade strategies. Moving from "it works on my laptop" to "it runs in a carrier network" requires infrastructure engineering that is separate from the intelligence layer.

Closed Loop Completion

True closed-loop automation requires a feedback path: Act → Verify → Adjust. Aether Platform closes this loop by connecting remediation actions back to observability data, confirming recovery, and escalating when automated fixes fail.

a10y is the brain. Aether Platform is the body.

Intelligence without operational context is just analysis. Autonomous operations require both — the ability to understand what is happening and the infrastructure to act on that understanding safely, at scale, in production.

TMF Autonomy Levels — Where Each Stack Lands

The TM Forum defines six levels of network autonomy (L0–L5). Different tooling choices land you at different levels. Here is where each approach realistically places you.

L1
Manual with basic scripts
No integrated tooling
Operators rely on CLI access, ad-hoc scripts, and tribal knowledge. Alert fatigue is the norm. Every incident is a fire drill. This is where many networks still are — not by choice, but by inertia.
L2
Monitoring-centric
OpenObserve / Datadog / Grafana (standalone)
Dashboards exist. Alerts fire. But correlation is manual, deduplication is nonexistent, and remediation means "someone gets paged and SSHs into the box." You can see the problem — you just can't do anything about it automatically.
L3
AI-assisted operations
Aether Platform + a10y
AI correlates alerts, suggests root causes, and recommends actions. Operators make final decisions with full context — topology, history, and causal analysis at their fingertips. Human-in-the-loop, but the loop is intelligent. This is where Aether Platform + a10y delivers value today.
L4
Autonomous operations
a10y vision — closed-loop automation
The system detects, understands, acts, and verifies — autonomously. Operators supervise rather than operate. Remediation workflows execute automatically with blast-radius-aware safeguards. Humans intervene only for novel situations. This is the destination a10y is engineered toward.

Most networks are stuck at L1–L2. The jump to L3 requires more than better dashboards — it requires alert intelligence (Keep), topological context (active-inventory), and AI reasoning (correlation-engine) working together.

The jump from L3 to L4 requires trust — trust built through transparent AI decisions, verifiable outcomes, and gradual expansion of autonomous scope. a10y is designed to earn that trust incrementally.

Summary

Question	Answer
Why not OpenObserve alone?	It stores telemetry but cannot deduplicate, correlate, or remediate alerts. Keep fills this gap. OpenObserve alone keeps you at L2.
Why not Datadog?	SaaS-only, no data sovereignty, cost-prohibitive at telecom scale, locked AI, limited telecom protocol support. Also L2 — with a bigger bill.
Why not a10y alone?	a10y is the cognitive core. Aether Platform provides the operator interface, topology awareness, and production infrastructure needed to reach L3 and progress toward L4.