OpsCopilot is a governed Kubernetes copilot. The Philips initiative needs a governed BDD/Cucumber triage system — every architectural pattern transfers directly.
Deterministic where correctness matters. Agentic only where judgment adds value. LLMs invoked only for failed or ambiguous tests.
Azure DevOps (cloud) ──── nightly pipeline ──── POST /ingest ──────────────►
payload: { run_id · results[] · branch · commit_sha · duration_ms }
│ HTTPS · mTLS · auth token
▼
┌─────────────────────── Nutanix Cluster (on-prem) ─────────────────────────┐
│ │
│ Ingestion API · Go · port 8080 │
│ validate schema · dedup run_id · normalize results │
│ enqueue Δ-files → Embedding Worker (Qwen3-8B, CPU) │
│ │ │ │
│ ▼ ▼ │
│ PostgreSQL (Test Store) OpenSearch (RAG Index) │
│ ───────────────────────── ───────────────────────────────── │
│ test_runs · results Gherkin features · step defs │
│ owners · deferrals feature docs · prior tickets │
│ tickets · cost_events embeddings: Qwen3-8B vectors │
│ │ │ │
│ └─────────────────┬──────────────────┘ │
│ ▼ │
│ Bounded Agent Graph · LangGraph · Python │
│ ────────────────────────────────────────────────────────────────────── │
│ [1] classify rule engine → flaky / env / logic / unknown │
│ [2] query SQL · zero LLM calls → trend · owner · deferral state │
│ [3] retrieve BM25 + vector hybrid → similar Gherkins · past issues │
│ [4] analyze Qwen3-8B · CPU · on-prem → root cause · evidence bundle │
│ [5] gate policy check → confidence ≥ threshold │
│ [6] act write plane · Go → notify · draft ticket · create │
│ │ │
│ Langfuse (LLM traces) · Audit Store · Cost Ledger · all on-prem │
│ │ │
│ step [4] only: LLM inference call ─────────────────────────────────────┼──► AWS Bedrock (cloud)
│ no embeddings · no raw test data · inference tokens only │ strong model · ambiguous cases
└────────────────────────────────────────────────────────────────────────────┘
Clean passes never touch the LLM. Only failed/ambiguous tests trigger steps 3–6.
40 tests from one root cause → one incident, not 40 noisy tickets.
Validate accuracy and cost on historical builds before any write action is enabled.
Nutanix headroom is already available. Qwen3 runs entirely on-prem — full Gherkin corpus embedded at $0 cost. Azure DevOps, OpenSearch, and Langfuse deploy on existing infra. No procurement cycle, no data egress.
Read-only. No write authority yet.
After replay validates accuracy.
Compiler pipeline, not chatbot feature.
Built a staged automation platform for PICiX performance-test environment setup — reducing a painful manual workflow to a repeatable, observable pipeline with improved reliability and configurability.
Worked in the R&D Systems Engineering and Integration team deploying and configuring the largest-scale PICiX setup in the organisation — the same infra stack proposed for this initiative.
When DNS/DHCP access was too risky to grant, worked with the networking team and solved the bottleneck through static MAC assignment at VM creation — the same least-privilege mindset behind the safe-mutation design here.
Added license assignment and pre-population functionality in in-house config tool inside the actual PICiX codebase — raised the PR, handled review comments. Shipped inside the existing SDLC, not around it.