cloud phone proof-of-concept success criteria 2026
cloud phone proof-of-concept success criteria 2026
cloud phone POC success criteria in 2026 should be written before the POC starts and signed by every stakeholder who will weigh in on the buy decision. teams that skip this step end up arguing about whether the POC “passed” weeks after it ended, with each evaluator remembering different goals. teams that write criteria first run a 14-day evaluation that produces a clear yes or no on the last day.
this guide gives you the criteria template, the threshold values that work in 2026, and the decision matrix that translates measurements into a buy decision. if you have not yet planned the POC itself, the POC framework is the prerequisite.
why criteria must be written first
three failure modes when criteria are not pre-defined.
- goal drift. the vendor’s strengths quietly become the criteria. their weak points get redefined as “out of scope”.
- stakeholder disagreement. security thinks audit log is critical, engineering thinks throughput is critical, finance thinks TCO is critical. without alignment, the meeting at day 14 becomes a debate, not a decision.
- vendor influence. the vendor’s SE shapes how success is framed during weekly check-ins. by day 10, you are evaluating their best feature, not your hardest problem.
write criteria on day zero, get sign-off, then run the test.
the five categories
every POC should produce a number across five categories.
| category | weight | example threshold |
|---|---|---|
| reliability | 25% | <2% flake rate over 1000 test runs |
| integration ease | 20% | full CI integration in <8 hours engineer time |
| scale behavior | 20% | 8-way parallel runs, no contention errors |
| security and compliance | 20% | SOC 2 valid, audit log complete, wipe verified |
| TCO at projected scale | 15% | within 10% of vendor quote, no surprise line items |
adjust weights to fit your context. regulated teams weight security higher. high-throughput teams weight reliability and scale higher.
category 1: reliability
the easiest thing to measure, and the easiest to fudge if you do not run enough trials.
| metric | passing | excellent |
|---|---|---|
| lock success rate | >98% | >99.5% |
| ADB connect success rate | >97% | >99% |
| test flake rate | <5% | <2% |
| device-stuck-locked rate after job kill | <1% | 0% |
| screenshot API success rate | >99% | >99.9% |
measure across at least 1000 trials. fewer than that and you are testing the vendor’s lucky day. distribute the trials across the 14 days; do not run all 1000 on day 7.
category 2: integration ease
the speed at which a competent engineer can wire the platform into your stack predicts how painful day-to-day operations will be.
| metric | passing | excellent |
|---|---|---|
| CI integration (one job) | <8 engineer hours | <2 hours |
| webhook receiver setup | <4 hours | <1 hour |
| RBAC mirroring from existing IdP | <8 hours | <2 hours |
| custom dashboard with vendor API | <16 hours | <4 hours |
| docs quality (5-point rubric) | 3+ | 5 |
if integration takes 40 hours instead of 8, you are paying that cost every time the vendor changes the API or you onboard a new team. it compounds.
category 3: scale behavior
scale tells you whether the platform will survive your eventual growth, not just today’s load.
| metric | passing | excellent |
|---|---|---|
| parallel device locks | 4 simultaneously | 16+ |
| latency at 95th percentile (lock to ADB ready) | <30s | <10s |
| API rate limit headroom | 2x current load | 10x |
| webhook delivery latency | <5s | <1s |
| error rate during burst (50 locks in 60s) | <2% | 0% |
run an explicit burst test. the platform’s behavior under burst is more predictive than steady-state numbers. spec a “scale day” in your POC plan.
category 4: security and compliance
binary pass/fail on most items. a single fail blocks the buy.
| item | required |
|---|---|
| SOC 2 Type II report current | yes |
| ISO 27001 or equivalent | yes |
| SSO via SAML or OIDC | yes |
| RBAC with custom roles | yes |
| immutable audit log | yes |
| audit log export to SIEM | yes |
| device wipe verified between sessions | yes |
| data residency commitment for your region | yes |
| TLS 1.2+ in transit | yes |
| encryption at rest | yes |
a vendor that fails any of these on day 14 is not yet enterprise-ready. that does not necessarily kill the deal but it does mean a longer evaluation and stronger contractual remedies.
category 5: TCO at projected scale
actual usage from days 1-12 lets you build a realistic projection. the criterion is consistency between the projection and the vendor quote.
| metric | passing | excellent |
|---|---|---|
| invoice prediction accuracy | within 15% | within 5% |
| hidden line items found | <2 | 0 |
| projected 3-yr TCO vs vendor quote | within 20% | within 10% |
| price increase cap on renewal | <15% | <8% |
| support tier upgrade required for SLA needs | optional | not needed |
build the projection using the TCO worksheet and the actual usage data from your POC.
the decision matrix
each category produces a score 1-5. weighted sum yields the buy decision.
| weighted score | decision |
|---|---|
| > 4.5 | strong buy |
| 4.0-4.5 | buy with negotiated improvements |
| 3.5-4.0 | extended pilot recommended |
| 3.0-3.5 | go back to RFP, expand vendor list |
| < 3.0 | no |
if any category scores below 3.0, that is a hard constraint. for example: TCO scores 4.5 but security scores 2.0 means do not buy regardless of weighted total.
stakeholder sign-off
before the POC starts, three signatures.
- engineering lead: weights and thresholds for reliability, integration, scale
- security lead: weights and thresholds for security/compliance
- finance lead: weights and thresholds for TCO
after the POC ends, the same three signatures on the final scorecard. if any of the three refuse to sign, do not proceed to contract.
what to do with edge cases
three patterns come up often.
vendor passes 4 of 5 categories
if the failure is in TCO, negotiate. if the failure is in integration ease, build the integration anyway and factor the cost. if the failure is in security, do not buy until remediated. if the failure is in reliability or scale, no amount of negotiation fixes it. walk.
two vendors both pass
run the POC framework’s tiebreakers. audit log depth, support response times, exit terms. if still tied, pick the cheaper vendor and lock the better contract terms.
one vendor passes only because of an exceptional sales engineer
red flag. the SE will not be assigned to your account post-sale. ask “what happens if our SE rotates?” if the answer is hand-wavy, downgrade your scores 10%.
the right vendor does not exist on the shortlist
happens 10% of the time. expand the search, run a fast checklist scan on 5 more vendors, and shortlist the top 2. accept the timeline cost.
frequently asked questions
can I share the success criteria with the vendor before the POC starts?
share the categories and weights, not the specific thresholds. otherwise the vendor will optimize their POC environment to hit your numbers narrowly.
what if my stakeholders disagree on weights?
force the conversation before the POC. a 30-minute alignment meeting saves a 4-week debate at the end.
should success criteria differ between vendors evaluated in parallel?
no. same criteria, same weights, same thresholds. anything else makes the comparison invalid.
how do I handle a vendor that says “we are working on that” for a critical criterion?
ask for a date. if they commit in writing to delivery before contract start, accept conditionally with contractual exit if missed. if they wave the question, fail the criterion.
are these criteria valid for emulator-only vendors too?
mostly yes, with minor adjustments. drop “device wipe verified between sessions” since emulators reset by definition. add “concurrency limit per region” since emulators scale differently.
ready to write your criteria first and run the POC second? open a cloudf.one trial so you have a benchmark vendor running while you draft the scorecard.