Use Cases

How a10y works in practice — from fault detection to autonomous recovery.

Fiber Cut Impacts Mobile Core

A physical fiber cut triggers cascading alarms across transport, IP, and mobile core layers. a10y correlates them to a single root cause in seconds.

Observe

Vector → OpenObserve

Transport layer: optical power loss alarm on interface ge-0/0/1

IP layer: BGP neighbor down, OSPF adjacency lost on 3 links

Mobile core (free5GC): AMF registration failures spike, UPF path unreachable

47 alerts in 90 seconds across 12 devices

Orient

Keep + Qdrant

Keep deduplicates 47 alerts → 8 unique alarm types

Qdrant finds similar incident from 3 months ago (fiber cut on same span)

Correlation: all affected devices share a common fiber path

Decide

correlation-engine

Queries active-inventory: "what is the physical topology between these devices?"

Identifies fiber span X as single point of failure

RCA: fiber cut on span X → transport down → IP reroute failed (no alternate path) → mobile core unreachable

Action plan: reroute traffic via backup path, notify NOC for physical repair

Act

Keep workflows

Activates backup MPLS path via NETCONF

Creates incident ticket with RCA summary

Notifies NOC: "Fiber cut on span X, backup path activated, physical repair needed"

Verify

OpenObserve + engine

BGP sessions re-established on backup path

AMF registration success rate returns to 99.9%

Alert storm resolved — 47 alerts → 0 active

Time to RCA: 12 seconds. Time to recovery: 3 minutes.

Latency Degradation on perfSONAR Mesh

Gradual latency increase detected by perfSONAR between research sites. Statistical AI identifies the anomaly before any threshold alert fires.

Observe

Vector (syslog) → OpenObserve

perfSONAR OWAMP tests: one-way delay increasing 2ms/hour on path A→B

No threshold alarm yet (still below 50ms SLA)

Statistical anomaly detection flags the trend as abnormal

Orient

Keep + Qdrant + active-inventory

Topology query: path A→B traverses 4 hops, optical amplifiers on span 2

Qdrant: similar pattern seen last year — optical amplifier degradation

Correlation with SNMP: optical power on span 2 decreasing (still in spec)

Decide

correlation-engine

Prediction: at current rate, SLA breach in ~18 hours

Root cause: optical amplifier aging on span 2 (pre-failure state)

Recommendation: proactive maintenance window, switch to protection path

Act

Keep workflows

Creates maintenance ticket with predicted failure window

Schedules protection path switchover for next maintenance window

Alerts optical team: "Amplifier on span 2 showing degradation, replace within 18h"

Verify

OpenObserve

After switchover: latency on path A→B returns to baseline

perfSONAR tests confirm SLA compliance

Issue resolved before any user impact. Zero SLA violations.

Daily Operations via aether-ide

An operator's typical workflow — from morning overview to incident investigation, all from the aether-ide portal.

Open aether-ide

Topology view shows all network nodes. Green = healthy, yellow = warning, red = critical. Two nodes are yellow.

Click yellow node → OpenObserve

Opens OpenObserve filtered to that device's logs and metrics. Sees elevated error rate on one interface.

Check alerts → Keep

Opens Keep dashboard. 3 correlated alerts for this device — CRC errors, input drops, and optical power warning. Keep has already grouped them.

Search history → Qdrant

"Have we seen this pattern before?" Qdrant dashboard shows 2 similar past incidents — both resolved by cleaning the fiber connector.

Approve action

correlation-engine suggests dispatching a field tech with the RCA summary. Operator approves → Keep workflow creates the dispatch ticket.

What Operators See

aether-ide :8080

Network topology map with health status. Click any node to drill into its data in OpenObserve or Keep.

OpenObserve :5080

Device logs, interface metrics, trace data. Custom dashboards per network layer (transport, IP, mobile core).

Keep :3001

Active alerts grouped by correlation. Workflow execution history. Integration status with external tools.

Qdrant :6333

Past incident database. Similarity search results. Runbook and documentation embeddings.