NOC contact: noc@peon.tech
Escalation: CTO (ricky@peon.tech)
Monitoring: grafana.peon.tech (Prometheus + Grafana)
| Level |
Name |
Response Time |
Examples |
| P1 |
Critical |
15 min |
Total site outage, SEACOM down, PBX all-tenant outage |
| P2 |
High |
1 hour |
Single site down, >50% capacity loss, PBX single-tenant outage |
| P3 |
Medium |
4 hours |
Degraded performance, single device failure, minor alarm |
| P4 |
Low |
Next business day |
Planned maintenance, minor config drift, hardware pre-failure |
¶ Standard Operating Procedures
- Confirm via monitoring — Grafana alert + manual ping check
- Check upstream BGP sessions:
show bgp summary on rtr-01/rtr-02
- Check physical layer: console-01 OOB access (Opengear IM7200)
- Escalate to carrier if upstream circuit issue confirmed
- Notify affected tenants via portal within 15 minutes
- Open P1 ticket in Vikunja → label:
urgent, telecom
- Shift handover note required if incident spans shifts
- Check Homer 7 MOS dashboard — identify affected calls/tenants
- SBC health:
kamctl status on pbx-sbc-01/02
- Core health: FreePBX admin panel (HTTPS/443 on pbx-core-01)
- RTPEngine:
rtpengine-ctl list on pbx-media-01
- Kamailio blocklist check if spike in failed auth:
kamctl htable.dump ipban
- For tenant-specific issues: check VRF routing table in FG fw-01
- Check WLB-SEACOM-001 optical power levels (target: > -8 dBm)
- Contact SEACOM NOC: noc@seacom.com / NOC ticket portal
- Fail over internet transit to MTC/Telecom Namibia secondary paths
- Update BGP local-pref on rtr-01 to prefer alternate upstreams
- Log in Vikunja — carrier ticket reference number in comments
- Root cause: Liquid Telecom fibre cut on SEACOM feeder
- Resolution: Liquid Telecom escalation → fibre repair confirmed
- Duration: 4h 12m
- Root cause: Salt spray entering junction box JB-LDZ-047 at km 47.3
- OTDR finding: 3.2 dB degradation (spec: < 0.5 dB)
- Resolution: IP68 marine-grade enclosure fitted, 4 fibre pairs re-spliced
- Post-repair OTDR: 0.3 dB ✅
- Alarm: 03:14 SAST — rtr-02 at 71°C (threshold: 75°C)
- Status: Fan tray PN ASR1001X-FAN sourced from TechSol Namibia
- Maintenance window: March 12, 01:00 SAST. SEACOM NOC notified.
- Current: Traffic running on rtr-01 only — redundancy lost until repair.
| Tool |
Purpose |
Access |
| Prometheus |
Metrics collection (SNMP, node exporters) |
Internal (WDH DC2) |
| Grafana |
Dashboards and alerting |
grafana.peon.tech |
| Homer 7 |
SIP/VoIP capture, MOS tracking |
pbx-monitor-01:9080 |
| Vikunja |
Incident tickets and task tracking |
todo.peon.tech |
| Opengear IM7200 |
Out-of-band console access (SWK DC1) |
Management VLAN 100 |
| Metric |
Value |
| Primary DC |
SWK DC1 (Swakopmund) |
| DR site |
WDH DC2 (Windhoek) |
| Replication |
Veeam B&R continuous replication |
| Target RTO |
4 hours |
| Last DR test |
March 20, 2026 — achieved 3h 22m ✅ |
| Next DR test |
September 2026 |
| PBX DR |
pbx-dr-01 at WDH DC2 — runbook being drafted |
Lessons from last DR test:
- db-dr-01 replication was 47 minutes stale → Veeam schedule adjusted to 15-minute RPO
- HAProxy VIP failover required manual intervention → automated with keepalived
¶ Common Commands
# Check all services on a host
cd /opt/<service> && docker compose ps
# Full restart (picks up .env changes)
cd /opt/<service> && docker compose down && docker compose up -d
# Caddy status
cd /opt/caddy && docker compose ps && docker compose logs --tail=20
# Check BGP on router (via console-01)
ssh admin@10.10.0.4 # console-01 (Opengear)
# then connect to rtr-01 console
# VoIP: check active calls
docker exec pbx-core-01 asterisk -rx "core show channels"
# VoIP: Kamailio stats
docker exec pbx-sbc-01 kamctl stats