Configure Your Package

e.g. "Focus on applying NNT to antibiotic prescribing decisions in urgent care"
Cognitive level(s) this session targets
Remember
Recall EBM terminology & study designs
Understand
Explain evidence hierarchies & bias types
Apply
Use EBM frameworks in clinical scenarios
Analyze
Critically appraise AI outputs & literature
Evaluate
Judge evidence quality & applicability
Create
Design EBM workflows & teaching activities
Medical Knowledge
Interpersonal & Communication Skills
Patient Care
Professionalism
Practice-Based Learning & Improvement
Society & Community
Health Literacy & Education
A3.01–A3.19 instructional objective categories
Patient Care
Medical Knowledge
Interprofessional Collaborative Practice
Professionalism
Professional & Legal Aspects
Healthcare Finance & Systems
Clinical & Technical Skills
Clinical Reasoning & Problem-Solving

Your EBM teaching package will appear here

Configure the options on the left and click Generate. Each package includes a complete session plan, learner activities, AI-integrated exercises, critical appraisal questions, and faculty notes.

EBM Course Builder — Premium Feature

Subscribe to generate your own packages. Below is a complete sample so you can see exactly what you get.

↑ Subscribe $19/mo
Challenge Mode · AI Evidence Challenge

Critical Appraisal of AI-Generated Diagnostic Evidence: D-dimer in PE Diagnosis

PA Students, Didactic Phase · Small Group · Bias & Diagnostic Accuracy

  1. Appraise an AI-generated clinical evidence summary for logical consistency, source fidelity, and appropriate uncertainty
  2. Identify at least 4 planted weaknesses in an AI-generated recommendation, including unsupported conclusions and citation misrepresentation
  3. Evaluate the applicability of diagnostic accuracy evidence to specific patient populations and clinical contexts
  4. Construct a corrected, evidence-based recommendation that addresses the identified flaws
Intentionally Flawed — For Critical Appraisal Exercise

AI-Generated Clinical Recommendation:

"D-dimer has excellent diagnostic accuracy for ruling out pulmonary embolism in all adult patients presenting to the emergency department. A negative D-dimer result (below 0.5 mg/L) effectively excludes PE with a sensitivity of 98%, making anticoagulation unnecessary when the test is negative. Multiple RCTs have demonstrated that D-dimer-guided management is safe across all patient populations, including elderly patients and pregnant women. The Wells Score adds minimal clinical utility when D-dimer testing is readily available. Based on systematic review evidence, clinicians should use D-dimer as a first-line standalone test without pre-test probability assessment."

  • Claims "all adult patients" — ignores high pretest probability patients where D-dimer is not appropriate
  • "Multiple RCTs" — D-dimer studies are predominantly cohort designs, not RCTs
  • Age-adjusted cutoffs (patient age × 10 in patients >50) omitted entirely
  • Pregnant patients require different reference ranges — overgeneralization is dangerous
  • Wells Score dismissed without evidence — contradicts major clinical guidelines (ACEP, AHA)
  1. Can you trace each cited claim to a specific, retrievable source? What happens when you check?
  2. Does the recommendation apply to the patient populations described, or does it overgeneralize?
  3. Were the study designs cited appropriate for the claims made (e.g., RCT vs. cohort for diagnostic studies)?
  4. What subgroups or exceptions were omitted that would change clinical management?
  5. Does the AI output acknowledge uncertainty or present findings with inappropriate confidence?
  6. Would following this recommendation harm any specific patient populations?
  7. What additional evidence or context would you need before acting on this recommendation?
  1. What specific language in the AI summary made it sound credible, even though the reasoning was flawed?
  2. How would you explain to a colleague why this AI recommendation could lead to patient harm?
  3. At what point in your clinical workflow is AI evidence retrieval most — and least — trustworthy?
  4. Design a "verification habit" you would use in practice when AI presents diagnostic evidence.

Planted Weaknesses — Instructor Guide:

  • Unsupported generalization: "All adult patients" — D-dimer should only be used in low-to-moderate pretest probability per Wells Criteria. Students should cite Wells PE Score evidence.
  • Study design misrepresentation: No RCTs exist for D-dimer as a standalone PE rule-out strategy. Christopher Study (NEJM 2006) is a management cohort, not RCT.
  • Omitted subgroups: Age-adjusted cutoffs (Righini 2014, JAMA) increase specificity by ~5% in patients >50 — omission could lead to unnecessary anticoagulation.
  • Polished wording hiding shallow reasoning: "Excellent diagnostic accuracy" without sensitivity/specificity breakdown is a signal of AI output quality failure.

↑ This is a sample package. Subscribe to generate your own for any EBM topic, learner level, and teaching format.

Get Full Access — $19/mo

Includes all premium tools: EBM Builder, OSCE Builder, Assignment Builder, and more.