Skip to content

Data Cleaning

Identifying and correcting errors, inconsistencies, and quality issues in datasets. For AI training data, cleaning includes removing duplicates, fixing formatting, filtering low-quality examples, and handling missing values. Data quality directly impacts model performance.

Related terms

Data PipelineTraining DataDeduplication

Related tools

FiftyOne logo
Subscription
FiftyOne

FiftyOne is the most powerful data platform for multimodal AI and CV developers. See how it can supercharge your AI workflow.

Data Management
Thoughtspot logo
Free Trial
Thoughtspot

Transform insights into action with the ThoughtSpot Agentic Analytics Platform—AI agents, automated insights, and embedded intelligence.

Analytics
Tamr logo
Subscription
Tamr

Tamr's real-time AI-native MDM platform unifies, cleans, and enriches records to power AI initiatives, decision-making, and operations with trustworthy data.

Data Management
← Back to glossary