TL;DR: Claude Code is amazing but sometimes confidently says things are impossible when they're not. GitHits MCP server helped Claude find undocumented DuckDB C++ APIs by searching actual code instead of docs. This let me build proper predicate pushdowns for my web archive extension.
Claude Code has been amazing 🚀
I've been using Claude Code Pro since November and Max from December. It's genuinely changed how I work. An AI that reads your codebase and writes production code? The future is definitely here.
But there are gaps.
The problem: undocumented C++ APIs 😤
I was building duckdb-web-archive. It's a DuckDB extension for querying Wayback Machine and Common Crawl directly from SQL. The goal: implement proper predicate pushdowns.
What are predicate pushdowns?
When you write:
SELECT url, timestamp FROM wayback_machine('example.com/*') WHERE timestamp > '2020-01-01' LIMIT 10;
Without pushdowns the extension fetches all data and filters locally. With pushdowns the
WHERE and LIMIT get sent to the API itself (&from=2020, &limit=10). Way less data. Way faster.I needed to implement:
- Filter pushdown:
WHERE→ API parameters
- Limit pushdown:
LIMIT→ CDX service
- Projection pushdown: Only fetch requested columns via
&fl=
- Distinct pushdown:
DISTINCT ON→ API's&collapse=
The problem? DuckDB's internal C++ APIs for these aren't documented. Things like
TableFunctionSet, FunctionData, TableFilterSet, bind/init/scan lifecycle with filter propagation. None of it in the docs.Context7 wasn't enough 📚
I googled about documentation MCPs. First recommendation was Context7. Makes sense—it fetches up-to-date docs for libraries.
Didn't help. Everything it found was too high-level. The internal C++ extension APIs I needed exist in DuckDB's source code. First-party extensions use them. No public documentation though.
Claude kept telling me what I wanted wasn't possible. Multiple times. Very confidently.
But I knew it was possible. DuckDB's own extensions like
httpfs and postgres_scanner were doing exactly this with remote files. The capability existed. Just wasn't documented anywhere Claude could find.So I kept searching.
Enter GitHits 🎯
A colleague mentioned GitHits. A code search tool that finds real examples from all of GitHub. What caught my attention: built in the same city I'm from. Small world.
Joined the waitlist. Got approved pretty quickly.
The moment I enabled GitHits MCP in Claude Code, everything changed.
GitHits doesn't search just the documentation. It searches actual code. Millions of repos. Real implementations. When I asked about predicate pushdowns in DuckDB extensions, Claude could suddenly find examples from:
- DuckDB's own extensions (postgres_scanner, httpfs, etc.)
- Community extensions solving similar problems
- Real code with exact function signatures and patterns
Night and day. Claude went from "this isn't possible" to showing me exactly how to:
- Implement
TableFunction::pushdown_complex_filter
- Use
TableFilterSetto extract pushed filters
- Wire up
FunctionDatafor stateful scanning
- Handle bind/init/scan lifecycle correctly
The results 🎉
Completed the extension with full predicate pushdown support. Wayback Machine CDX API. Common Crawl Index Server. Both working.
A query that would've fetched thousands of records now makes just 6 HTTP requests. All filtering pushed to the source.
I was so impressed I created a PR to help others: duckdb/extension-template#158. Adds docs for building DuckDB extensions with AI assistants.
One user commented:
"The documentation here is super helpful, even without LLM agent considerations. I believe I read about some sort of effort to document the extension API in a future release, but can't think of where."
That's exactly what GitHits enabled. Finding knowledge scattered across codebases that isn't consolidated anywhere.
Why this matters 💡
Claude Code is powerful but its knowledge has a cutoff. Doesn't know every internal API. Context7 is helpful sometimes but in my experience it can’t find more complex examples and real engineering often involves undocumented internals. Patterns that only exist in source code.
GitHits fills that gap. All of open source as a searchable knowledge base. Canonical examples. Battle-tested implementations. Code that actually works.
If you're using Claude Code seriously—especially with less-documented libraries—add GitHits to your toolkit. Claude's reasoning plus GitHits' code search is a powerful combo.
duckdb-web-archive is available through DuckDB community extensions. Try it with
INSTALL web_archive FROM community;