MatrixObserve.
1 Matrix Systems Innovations Ltd, Cardiff CF10
Abstract
MatrixObserve joins logs, traces, RUM and synthetic check results into a single graph on the WebMatrix data model and runs an inference layer over the graph at incident time. The output is a one-paragraph explanation in English — symptom, proximate cause, upstream cause — rather than a wall of correlated lines. We describe the join, the inference discipline, the public monthly eval, and the conditions under which the inference layer is deliberately silent.
One model. Four telemetry sources.
Modern observability has a join problem, not a collection problem. The agents emit. The collectors collect. The data lands. The dashboards render. What goes missing is the relationship between the line in the log, the span in the trace, the user beacon in RUM, and the synthetic check that went red two regions away — which is the relationship that, if a person could read it quickly, would tell them what happened.
MatrixObserve takes the position that the join is the product. Logs, traces, RUM and synthetic checks all land on the same data model, with the same envelope and the same time base. The join is structural, not heuristic: a span is linked to the log lines emitted under its context; a RUM beacon is linked to the trace whose route it was issued from; a synthetic check is linked to the route and region it was probing. The graph is, at the moment a page fires, already joined.
From dumped lines to one paragraph.
An incident at 03:14 used to look like four browser tabs: logs, traces, RUM, the synthetic dashboard. The on-call's job was to do, by eye, the join that the platforms had failed to do. MatrixObserve does the join. The inference layer reads the joined graph and produces a paragraph. The paragraph names the symptom (what the user saw), the proximate cause (the span that failed and how), and the upstream cause (the dependency, the region, the rollout that explains the proximate cause).
The paragraph is not a substitute for the graph. The graph is below the paragraph; the on-call can click into any phrase and land in the exact span, log line or beacon the inference layer cited. The paragraph is what gets posted in the channel; the graph is what gets posted in the post-mortem.
How MatrixObserve gets stood up.
For most customers MatrixObserve is a four-hour stand-up. Point your existing OTLP collectors at the WebMatrix endpoint; install the RUM beacon (one script tag, signed); point at most six synthetic checks at the routes you care most about. The graph is populated in about an hour. The inference layer becomes useful at about the four-hour mark — once the routes have produced enough span volume that the graph has structure to reason over.
We do not require ripping out the existing observability platform. MatrixObserve runs alongside; the same OTLP traffic can fan out to both. The customers who eventually retire the old platform do so because they stop opening it, not because we asked them to.
The eval is monthly and public.
The inference layer is evaluated against a public incident set every month, with the eval set, the rubric and the per-incident scoresheet published on the same day the eval runs. Two of the eval incidents in the last six months are ones MatrixObserve materially got wrong — it produced a paragraph whose upstream cause was misattributed. Both are in the eval log, with the reason, the model version at the time, and the change shipped in response. We do not curate the eval to make the platform look better than it is.
An engineering call against your trace volume.
The most useful conversation about MatrixObserve is held against your existing OTLP traffic, your incident history, and one current open question. Forty-five minutes; written note same day. The note tells you what we think MatrixObserve would have done at your last three P1s and where, honestly, it would not have helped.