Why Your AI Isn’t Working: Lessons From Cleaning Up 20 Years Of Dirty Data

We’ve all heard the phrase “data is the new oil.” But what if your “oil” is buried under decades of inconsistent labels, duplicated records, and legacy systems that don’t talk to each other?
That’s exactly what we found when a national technology staffing firm came to us for help. They had 20 years of recruitment data: Fortune 500 clients, a cutting-edge tech stack, and still they couldn’t answer basic questions like: Which job boards actually bring in hires? or How long does our process really take?
I’m going to walk you through what we uncovered, how we tackled it, and – just as importantly! – how you can apply what we learned to your next big tech or data project.
Meet the Client: A Company Ready to Scale Smarter
This company wasn’t behind the curve. On paper, their infrastructure looked advanced.
But when we started asking simple strategic questions, their systems gave lackluster outputs. That’s when we knew: they were data-rich, insight-poor.
The biggest offenders?
- LinkedIn appeared over 10 different ways (“LinkedIn”, “linkedin”, “Linked In”)
- 84% of candidate profiles were never updated
- 100,000+ duplicate profiles existed in the system
- Location data like “United States” appeared as “USA”, “us”, “United States of America”
Takeaway for your next project: Audit your data inputs before upgrading your tech. Build a small team to assess how many variations exist for core fields like source, location, and skills. Create a short report highlighting top inconsistencies and how they affect reporting.
A recent Gartner study found that poor data quality costs businesses an average of $12.9M per year.
The Problem: When Bad Data Becomes a Bottleneck
Their messy data wasn’t just annoying—it was expensive.
- Recruiters wasted hours managing duplicates
- No clear ROI on expensive job boards
- They couldn’t build the AI features their clients were asking for
The CEO was feature-obsessed, constantly pitching innovation to clients, but the foundation wasn’t there. He had the ambition, the market, and the team but he didn’t have trustworthy data to build on.
Takeaway for your next project: Run a quick test: have your team answer three business-critical questions using existing data. If they can’t do it quickly or the answers vary wildly, it’s time to invest in a data health assessment.
The Solution: A 5-Month Path from Dirty Data to Confident Decisions
You can’t fix 20 years of dirty data overnight. We designed a 5-month, multi-phase project that focused on cleaning AND future-proofing.
Month 1: Deep Diagnosis
- Audited every table and field
- Measured completeness, consistency, duplication
- Mapped all 160+ data sources
Months 2–4: Strategic Cleanup
- Standardized sources and taxonomy (e.g., all “JavaScript” variations became one)
- Consolidated 160 source variations down to 20
- Built a skills framework and de-duplication engine
Month 5: Automation Layer
- Introduced machine learning pipelines
- Real-time checks to prevent dirty data from re-entering
Takeaway for your next project: If you’re cleaning data, document your rules and build automation to enforce them. Assign a data owner, and schedule a quarterly review to validate key fields like candidate source, status, and skill tags.
A Harvard Business Review study notes that only 3% of companies’ data meets basic quality standards. Ongoing governance is essential.
The Results: Clean Data, Confident Strategy, And A Competitive Advantage
Once the cleanup was complete, the difference was immediate:
- 12,000+ duplicate profiles eliminated
- Data completeness improved from 40% to 95%
- Inconsistencies reduced by 85%
But the real value came after our work together: they could finally answer the questions that mattered.
- Which sources actually lead to hires?
- Which candidate profiles perform best?
- How long does each funnel stage really take?
Their CEO could now credibly say: “Our AI uses 20 years of successful placement data to match the right candidates to your roles.”
Takeaway for your next project: Once cleanup is done, build a dashboard with 3–5 key recruiting KPIs and train your team to monitor it weekly. Schedule a post-mortem six weeks later to evaluate what new decisions were made using clean data.
This wasn’t just a one-off cleanup. It was a strategic reset.
They didn’t just fix their database. They transformed how they think about and use data across their business. From intake forms to performance reviews, everything became an opportunity to learn.
And that’s the big shift: data isn’t just an asset, it’s an advantage.
Want more insights from past projects? How we helped our client scale personalized care to millions of people, how we helped another client remove a major bottleneck, and how we helped a client automate the boring stuff.