← back to blog

cloud phone fleet monitoring dashboard guide

May 07, 2026

cloud phone fleet monitoring dashboard guide

cloud phone fleet monitoring in 2026 turns a black-box subscription into a glass-box infrastructure. instead of finding out a phone is degraded when an engineer files a ticket, you see it on a dashboard and act before users notice. cloudf.one ships a fleet dashboard with the metrics that matter, plus the export hooks to mirror them into Datadog, Grafana, or your in-house observability stack. this guide walks through what to monitor, what alerts to set up, and how to drill from a red tile down to the device that needs attention.

if you are building admin discipline more broadly, team seats, RBAC, and audit logs all assume the fleet itself is healthy. this article is how you keep it that way.

what to monitor

four metric families.

a useful default dashboard fits all four families on one screen.

[SCREENSHOT: fleet monitoring dashboard with 4 quadrants showing availability, utilization, performance, errors over the last 24 hours]

the availability tile

shows fleet-wide health.

green means >90% available + in use. yellow means 70-90%. red means <70%, dig in immediately.

[SCREENSHOT: availability tile with stacked bar chart, green/yellow/red device counts, click-through link to device list]

drill into any state to see the device list. for offline devices, the dashboard shows last-seen timestamp, region, model, and a quick action to reboot or escalate to support.

the utilization tile

shows how busy your fleet is.

[SCREENSHOT: utilization tile with line chart over 24h, per-team breakdown, peak hour indicator]

watch for two patterns.

most healthy fleets sit at 50-75% peak utilization with room to grow.

the performance tile

shows latency at three percentiles.

[SCREENSHOT: performance tile with three line charts at p50, p95, p99 over 24h, target lines drawn]

healthy targets in 2026.

if p95 or p99 is rising, the fleet is degrading. usually it means devices are queueing because availability is too low, or because a region is under network load.

the errors tile

shows failed events by type.

[SCREENSHOT: errors tile with stacked bar chart showing each error type per hour over 24h, with red threshold line]

healthy fleets have a baseline error rate (<1% of operations). an order-of-magnitude jump is a real problem. cloudf.one alerts on anomalies automatically; you can also configure your own thresholds via the webhook automation pattern.

drill-down workflows

three common drill-downs.

drill 1: offline devices

dashboard shows 8 offline devices. click the offline tile, see the list, sort by last-seen.

[SCREENSHOT: offline device list with model, region, last-seen, action buttons]

three states.

drill 2: slow region

p95 spiked from 12s to 28s in the last hour. click the perf tile, switch breakdown to per-region, see SG region is the culprit.

[SCREENSHOT: per-region performance breakdown, SG region in red, EU and US green]

two next steps.

drill 3: error storm

errors tile shows 200+ failed locks in 10 minutes from a single API token. drill into errors, group by api_token, identify the offender.

[SCREENSHOT: errors grouped by api_token, one token responsible for 200+ failures, action menu with rate-limit, revoke, contact-owner options]

three actions.

alerts to wire up

four alerts every team should have on, all routable to your Slack/Discord/Telegram channels.

alert threshold severity
availability < 80% 5 min sustained high
utilization > 90% 30 min sustained medium
p95 latency > 20s 15 min sustained medium
error rate > 5% of operations 10 min sustained high

route via the webhook automation, surface in your existing ops chat, page on-call for the high-severity ones.

exporting metrics to Datadog or Grafana

three export options.

option 1: native integrations

cloudf.one ships native Datadog and Grafana Cloud integrations as of 2026. one-click in the integrations page, paste API key, metrics flow within 5 minutes.

[SCREENSHOT: integrations page with Datadog, Grafana, Prometheus, New Relic tiles, click to configure]

option 2: Prometheus endpoint

GET /metrics returns Prometheus-formatted metrics. point your scraper at it. works with any Prometheus-compatible system (Mimir, Cortex, VictoriaMetrics).

- job_name: cloudfone
  scheme: https
  static_configs:
    - targets: ['api.cloudf.one']
  metrics_path: /metrics
  bearer_token: $CLOUDFONE_TOKEN
  scrape_interval: 60s

option 3: webhook + custom processor

for stacks that do not support Prometheus, use webhook events as the source. process and forward to your stack of choice.

most teams use option 1 if they are on Datadog or Grafana, option 2 otherwise.

SLO setup

once you have monitoring, set service-level objectives that match your contract. examples.

these become the burn-rate alerts that fire only when you are actually trending toward SLA breach, not on every transient blip.

the daily 5-minute monitoring routine

every morning, 5 minutes.

  1. open the dashboard, scan all four tiles
  2. check overnight errors and any auto-paged alerts
  3. peek at projected month-end usage (utilization vs capacity)
  4. close any acknowledged alerts in your ops channel
  5. file an action item for any drift you saw

teams that do this catch 80% of issues before users do.

frequently asked questions

how far back does the fleet dashboard data go?

12 months on paid plans, with 1-minute granularity for the last 24 hours and 5-minute granularity beyond. exports go to your stack for longer retention.

can I customize the dashboard layout?

partially. you can rearrange the four tiles, add up to 4 custom widgets, and save the layout per user. full custom dashboards live in your Datadog/Grafana export.

what is the latency on alerts firing?

real-time alerts trigger within 30 seconds of the threshold breach. webhook delivery is typically under 5 seconds after that.

does cloudf.one offer SLO-as-code?

yes via the API. you can define SLOs in YAML, push via POST /slos, and get burn-rate alerts back through the same webhook channel.

can multiple admins share dashboard alerts?

yes. configure alerts at the account level (everyone with audit:view permission gets them) or scope to specific roles via the alert configuration page.

ready to make your fleet observable? start a cloudf.one trial, open the fleet dashboard, and watch the metrics flow in real time as you lock your first device.