Data Pipeline Troubleshooter
The Data Pipeline Troubleshooter helps you diagnose and resolve data pipeline failures systematically. Instead of hunting through logs and guessing at causes, you describe the symptoms and pipeline architecture, and receive a structured root cause analysis with a prioritized list of checks, diagnostic queries, and fixes.
Data engineers debugging nightly ETL jobs, analytics engineers investigating dbt model failures, platform teams resolving Airflow or Prefect task errors, and analysts tracing why a dashboard shows stale or incorrect data use this prompt. It covers the full pipeline spectrum: ingestion, transformation, loading, scheduling, and monitoring.
This prompt outperforms a generic "help me debug my pipeline" request because it follows an incident response methodology: symptom documentation, hypothesis generation ranked by probability, targeted diagnostic steps for each hypothesis, and a remediation plan that addresses both the immediate fix and the underlying prevention. It also produces a post-mortem template so you can document the failure for your team, turning every incident into institutional knowledge.
This prompt is just the starting point
Score it with AI, optimize it with one click, track versions, and build your prompt library.
The Prompt
Help me diagnose and fix a data pipeline issue using a structured troubleshooting approach: **Pipeline Overview**: - Pipeline name/purpose: [DESCRIBE, e.g., "Nightly ETL that syncs Salesforce data to our Snowflake warehouse for reporting"] - Orchestrator: [Airflow / Prefect / dbt Cloud / cron / Dagster / custom / other] - Source(s): [e.g., "Salesforce API, PostgreSQL production database"] - Destination: [e.g., "Snowflake, schema analytics_prod"] - Transformation layer: [dbt / custom SQL / Python scripts / Spark / other] - Schedule: [e.g., "Runs daily at 2:00 AM UTC, typically completes in 45 minutes"] **Symptom Description**: - What went wrong: [DESCRIBE THE FAILURE, e.g., "The pipeline completed without errors, but the revenue dashboard shows $0 for yesterday. Previous days are correct."] - When it started: [e.g., "First noticed today, March 28. Last confirmed good run was March 26."] - Error messages (if any): ``` [PASTE ERROR LOGS, STACK TRACES, OR SCHEDULER OUTPUT HERE] ``` - What changed recently: [DEPLOYMENTS, SCHEMA CHANGES, NEW DATA SOURCES, INFRASTRUCTURE CHANGES, e.g., "We deployed a new version of the Salesforce connector yesterday" or "Nothing that I know of"] - Impact: [WHAT IS AFFECTED, e.g., "All revenue reports are incorrect. Finance team cannot close monthly books."] **Troubleshoot this issue following these steps:** ### Step 1: Hypothesis Generation Based on the symptoms, generate 3-5 ranked hypotheses for the root cause. For each hypothesis: - State the suspected cause in one sentence - Explain why this matches the symptoms - Assign a likelihood (high / medium / low) based on the information provided - List what evidence would confirm or rule out this hypothesis ### Step 2: Diagnostic Plan For the top 3 hypotheses, provide specific diagnostic steps: - Exact queries to run (SQL, CLI commands, API calls) to verify or eliminate each hypothesis - What to look for in log files (specific patterns, timestamps, error codes) - Comparison checks (e.g., "count rows in source vs. destination for the affected date range") - Order the diagnostics from fastest/cheapest to slowest, so I can eliminate hypotheses efficiently ### Step 3: Root Cause Confirmation Once I share the results of the diagnostic steps, confirm the root cause and explain: - The exact chain of events that caused the failure - Why existing monitoring did not catch it (or how it was caught) - Whether this is a one-time incident or a systemic risk ### Step 4: Remediation Plan Provide two levels of fix: - **Immediate fix**: Steps to resolve the current failure and restore correct data. Include rollback procedures if the fix carries risk. - **Permanent fix**: Changes to prevent this class of failure from recurring. This may include schema validation, data contracts, alerting rules, or pipeline design changes. ### Step 5: Post-Mortem Template Generate a brief post-mortem document with: - Incident timeline (using the information gathered) - Root cause summary (2-3 sentences) - Impact assessment (duration, data affected, downstream consumers) - Action items with owners (leave owner blank for me to fill in) - Monitoring gaps to address
Usage Tips
- Paste actual error messages, not paraphrases: "It failed with a timeout" is vague. The exact stack trace or error code lets the troubleshooter pinpoint the failing component immediately.
- Include what changed recently, even if you think it is unrelated: Schema migrations, dependency updates, infrastructure changes, and even upstream source system upgrades are frequent hidden causes. Mention them all and let the analysis determine relevance.
- Run diagnostic queries and share results iteratively: This prompt works best as a conversation. Share the results from Step 2, and the troubleshooter will narrow the diagnosis in Step 3. Trying to guess the root cause without running diagnostics wastes time.
- Save the post-mortem for your team wiki: The Step 5 output is formatted for direct use. Over time, your collection of post-mortems becomes a searchable knowledge base that accelerates future debugging.
Get more from this prompt
Save it, score it with AI, optimize it, and track every version. Free to start.