AI assists, the evaluator decides

Turn a messy program description into a clear, reviewable evaluation plan.

EvalSmart drafts evaluation questions, indicators, qualitative inquiry protocols, and evidence gaps from plain language — with every item tagged as stated, inferred, or missing. AI drafts; the evaluator decides.

In plain terms: EvalSmart helps you figure out what to measure, how to measure it, what questions to ask stakeholders, and what information is missing — before you launch an evaluation.

Mixed-methods by design Provenance-tracked — no fabrication Two human gates before anything is final
What it is

Evaluation methodology, encoded — not a data-analysis black box.

EvalSmart designs the evaluation: the questions worth asking, the indicators and validated instruments to measure them, the qualitative inquiry that explains the numbers, and the gaps to resolve before measurement begins. The value is the encoded methodology — provenance discipline, real standards, and human review — not the language model underneath.

Each stage is checked for completeness and consistency before moving on. A full run takes a few minutes and yields two documents: a comprehensive plan and a one-page executive summary.

What EvalSmart is — and isn't

  • Does design the evaluation plan
  • Does tag every item by its source
  • Does ground standards in real frameworks
  • Doesn't ingest or analyze your datasets
  • Doesn't invent statistics or citations
  • Doesn't present anything as final without review
Who it's for

Built for people who need an evaluation plan — not another blank template.

If you're staring at program goals and a deadline, wondering how to turn them into something measurable, EvalSmart gives you a structured, rigorous first draft to react to.

Medical education & accreditation

Teams preparing evaluation plans for clerkships, residencies, and LCME/ACGME review.

Nonprofits scaling across sites

Programs that need to show consistent, equitable outcomes as they grow across affiliates.

Grant-funded programs

Projects that need credible outcomes, indicators, and an evidence plan for funders.

K-12 & higher-ed assessment

Teams designing assessment and continuous-improvement (CQI) systems.

Evaluation consultants

Practitioners who want a structured first draft before the stakeholder-review stage.

Program & impact leads

Anyone turning "we believe this works" into a plan that can actually prove it.

How it works

One description in. A complete, reviewable plan out.

A four-stage pipeline. Each stage is checked for completeness and consistency before moving on, and two explicit human review points keep you in control of what gets produced.

Stage 1

Intake

Program description becomes a structured Case File — components, population, outcomes, assumptions.

⛬ Human Gate 1
Stage 2a

Quant branch

Indicator matrix with validated instruments, an analysis plan, and measurement considerations.

parallel
Stage 2b

Qual branch

Inquiry plan: questions, interview guides, sampling, thematic approach, trustworthiness.

parallel
Stage 3

Report

A single package stamped DRAFT — REQUIRES HUMAN REVIEW, integrating both branches.

⛬ Human Gate 2
Render

Two views

A comprehensive audit-grade plan and a simplified executive view, organized by function.

The anti-fabrication keystone

Every item is tagged by where it came from.

This is the discipline that makes an AI-drafted plan trustworthy. Nothing is quietly invented — each element is one of three things, and the evaluator can see which at a glance.

Stated

Present in the program description. Drawn directly from what you provided — nothing added.

Inferred

A professional suggestion from the AI, flagged for you to verify before relying on it.

Gap

Needed for a credible evaluation but absent — surfaced, never invented to fill the hole.

Why you can trust it

Why you can trust an AI-drafted plan.

An evaluation plan is only useful if you can stand behind it. EvalSmart is built so you can — here's what that means in practice.

Nothing is quietly invented

Every item is labeled by where it came from — stated in your description, a suggestion to confirm, or a gap that's missing. You're never handed a confident-sounding guess dressed up as fact.

A human always signs off

You review and edit before anything is final. The plan is yours to own and defend — not something a machine hands you to rubber-stamp.

§

Grounded in real frameworks

Standards and references point to the field's actual sources, or are clearly flagged for you to confirm. Nothing is made up to sound authoritative.

Consistent and easy to defend

The plan is laid out so a colleague, funder, or reviewer can see exactly where every recommendation came from — and you get a dependable result each time.

Fields & standards

Tuned to the standards of your field.

EvalSmart speaks the language and frameworks of your field — undergraduate and graduate medical education today, with education and nonprofit settings expanding. Each field's standards are kept current and reviewed on a regular cadence, so your plan reflects what's actually expected now.

FieldStandardsStatus
Undergraduate medical education LCME — 7 elements, verified Available in prototype
Graduate medical education (residency) ACGME Core Competencies + Milestones Internal Medicine milestone mapping available
K-12 / education / NGO ESSA / state frameworks Vocabulary supported; standards require review

References come only from official, authoritative sources — never second-hand copies — and anything that can't be verified is flagged for you rather than guessed.

Sample output

What a finished keystone looks like.

EvalSmart organizes the plan into keystones — each pairing a quantitative measure with the qualitative inquiry that explains it. Below is one keystone from a real sample run on an Internal Medicine core clerkship.

STATUS: DRAFT — REQUIRES HUMAN REVIEW
Measurement Quality · JCSEE: Accuracy LCME 9.4 Assessment System · LCME 8.7 Comparability
Measure — quant

Rater-level and site-level variance components in clinical evaluation scores, decomposing student vs. rater vs. site variance.

Explain — qual

How do supervising attendings and residents interpret and apply the rating instrument — what mental models and reference points drive the scores they assign?

Variance decomposition shows that sites differ; rubric-interpretation interviews explain why.

Every run also surfaces decisions and prerequisites (e.g. obtain IRB determination before qualitative collection) and a ranked list of top gaps (e.g. no baseline data, demographic linkage blocked by FERPA/IRB) — so the team knows exactly what to resolve before measurement begins.

In practice

Built for multi-site reality.

In a case study evaluating a family-stability program scaling across 11 nonprofit affiliates in 16 cities, EvalSmart's keystone was a variance decomposition: of the differences in outcomes across affiliates, how much traces to the participant, the advocate, and the site? That turns "outcomes look different in different cities" into a measurable, answerable question — paired with an implementation-fidelity indicator and an equity disaggregation.

It grounded the design in the field's real frameworks — the Title IV-E Prevention Services Clearinghouse, Strengthening Families, and validated protective-factor instruments — and cited them without certifying or fabricating.

EvalSmart lets an impact team move faster from "we believe this works" to a rigorous, equity-centered, multi-site evidence plan — while keeping every claim traceable and every standard real.
Who built it
FW

Fangning Wang, Ph.D.

Research & Evaluation · User Assessment · Developer of EvalSmart

A research and evaluation professional with Ph.D. training in Evaluation and Measurement and applied experience designing assessment tools, surveys, interview protocols, logic models, indicator matrices, dashboards, and stakeholder-facing reports for higher education, education, workforce, and nonprofit programs. Skilled in mixed-methods research, user and participant needs assessment, data visualization, and translating complex findings into accessible recommendations for non-specialist audiences.

EvalSmart grew out of that practice — a service-oriented approach to building assessment capacity, maintaining reusable evidence resources, and using responsible AI-assisted workflows to support research, analysis, and documentation, all while preserving human judgment, source validation, and methodological rigor.

Simple ways to start

Two low-friction ways to begin.

Start small with a fixed-price preview, or bring EvalSmart into a working session with your team.

Try EvalSmart
$50 · one preview

Send a program description and receive a draft evaluation-plan preview: core evaluation questions, a sample indicator matrix, a qualitative inquiry angle, and your top evidence gaps.

Best for early-stage programs, grant drafts, course/clerkship reviews, or teams who need a structured starting point. Not a full evaluation report or data analysis.

Start with a $50 EvalSmart preview →
Evaluation consultation
$75 / hour

For teams that need help clarifying evaluation questions, preparing for accreditation, reviewing existing measures, or turning program goals into a usable evaluation plan.

Book a consultation →

Every preview is a human-reviewed draft stamped requires review — a fast, rigorous starting point, not a finished evaluation.

Not sure where to start?

If you're not sure whether EvalSmart fits your program, accreditation timeline, or evaluation function, send a short note. I'll help you decide whether a preview or consultation is the right starting point.

Email Fangning →