Back to templates
Data AnalysisBeginnerUser Prompt

Data Cleaning Checklist Generator

March 28, 2026

The Data Cleaning Checklist Generator produces a structured, dataset-specific cleaning plan so you catch quality issues before they corrupt your analysis. Rather than applying a generic checklist, it tailors its recommendations to your specific columns, data types, and domain.

Data analysts, data scientists, and anyone preparing data for reporting or machine learning use this template at the start of a new analysis project. It is especially useful when working with messy external data (CSV exports, API dumps, third-party datasets) where quality problems are common but unpredictable.

The prompt works by having you describe your dataset's structure, then generating checks organized by category (completeness, consistency, accuracy, validity). Each check includes the specific columns to inspect, what to look for, and a suggested fix, making it actionable rather than theoretical.

This prompt is just the starting point

Score it with AI, optimize it with one click, track versions, and build your prompt library.

AI quality score on 6 criteria
One-click optimization with 3 strategies
Version history to track improvements

The Prompt

Generate a data cleaning checklist tailored to the following dataset:

**Dataset Description**: [DESCRIBE YOUR DATASET, e.g., "E-commerce order data exported from Shopify, covering Jan 2024 to Dec 2024"]
**Number of Rows (approximate)**: [ROW COUNT, e.g., 50,000]
**Columns and Types**:
```
[LIST YOUR COLUMNS, e.g.,
order_id (string) - unique order identifier
customer_email (string) - customer email address
order_date (datetime) - when order was placed
total_amount (float) - order total in USD
status (string) - pending, completed, refunded, cancelled
product_name (string) - name of purchased product
quantity (integer) - number of items]
```
**Known Issues (if any)**: [ANY ISSUES YOU ALREADY KNOW ABOUT, e.g., "Some order dates are in 2019 which is before we launched"]

Generate a cleaning checklist organized into these categories:

### 1. Completeness Checks
For each column, identify: expected missing value rate, how to detect nulls/blanks, and recommended handling (drop, impute, flag).

### 2. Duplicate Detection
What constitutes a duplicate in this dataset? Suggest exact-match and fuzzy-match checks on the most likely duplicate columns.

### 3. Consistency Checks
Identify columns that should follow specific formats (dates, emails, phone numbers, currencies) and how to validate them. Flag columns that reference each other (e.g., quantity * price should equal total).

### 4. Outlier Detection
For numeric columns, suggest statistical methods to identify outliers (IQR, z-score, domain-specific ranges) and what thresholds to use.

### 5. Validity Checks
Identify columns with expected value ranges or sets (e.g., status should only be one of 4 values) and how to find invalid entries.

For each check, provide:
- What to look for (specific condition)
- Which column(s) to inspect
- Suggested fix or handling approach
- Priority (HIGH/MEDIUM/LOW)

End with a "Quick Stats" section: suggest 5 summary statistics to compute first (before cleaning) to understand the baseline data quality.

Usage Tips

  • Include all your columns: Even if some seem clean (like an auto-increment ID), list them. The AI may catch cross-column validation issues you did not consider.
  • Mention known issues: If you already spotted problems (wrong date ranges, suspicious email formats), mention them. The checklist will include those and suggest related checks you may have missed.
  • Use this before any analysis: Running through this checklist takes 30 minutes but can save days of debugging faulty conclusions caused by dirty data.
  • Export the checklist as a task list: Copy the output into your project management tool as a task list, check off items as you complete them, and document what you fixed.

analystanalysisquality-improvement

Get more from this prompt

Save it, score it with AI, optimize it, and track every version. Free to start.

AI quality score on 6 criteria
One-click optimization with 3 strategies
Version history to track improvements