Parsewise, a YC P25 company founded by Greg and Max, launched on Hacker News with an API that extracts structured data from buckets of unstructured documents. The pitch sounds familiar — many startups promise to turn PDFs into JSON. What makes Parsewise worth attention is what it does differently: it traces every output value down to word-level citations across multiple documents, and it rejects the sampling approach of RAG in favor of exhaustive search.

The founders frame the problem in a way that resonates with anyone who has tried to get Claude to produce a clean CSV from a folder of insurance policies. The system limitations are well known: file count limits, input type restrictions, cost, latency. But Parsewise identifies a deeper friction that most document AI products ignore. “We focused more on the ‘human harness’ rather than the model harness,” Greg wrote in the launch post, “leaning into the actual friction we saw in uptake, which is around verifiability.”

That friction is the time and cognitive load required to trust the output. A model can extract 100 fields correctly and one field wrong. Without traceability, the user must re-read every source document to verify. Parsewise’s bet is that the bottleneck is not extraction accuracy but verification speed.

How it works

Parsewise takes a bucket of data — hundreds or thousands of PDFs, Excel files, transcribed phone calls, emails — and outputs schema-compliant data. Every value carries a citation down to the word level, and those citations can span multiple documents. The system uses a tiered model architecture: vision LLMs for parsing, small models for large-scale exhaustive search, and larger models for cross-document resolution and inconsistency flagging.

The “exhaustive search” claim is the technical differentiator. Unlike RAG, which samples the most relevant chunks via embedding similarity, Parsewise finds all relevant values for a given query. Greg explained in the HN thread that the company deliberately avoids embeddings and vector similarity for specialist domains. On the Databricks OfficeQA benchmark — 90,000 pages of US Treasury documents — embedding-based approaches fail because all content maps to a small embedding space with small variations across years and expense categories.

Parsewise claims state-of-the-art results on OfficeQA, beating Claude Fable using Gemini models for visual reasoning. The benchmark tests grounded reasoning across dense financial documents, and the SOTA claim is significant because OfficeQA was designed to measure exactly the cross-document reasoning that most extraction tools cannot do.

The lineage architecture

The core abstraction is the “self-improving agent definition.” Users configure acceptable sources, resolution logic for combining values across documents, and rules for flagging uncertainty. The definitions are domain-specific and evolve over time. Greg described a workflow where manager review ratings and feedback comments update definitions, with side-by-side before-and-after comparisons on existing data before committing changes.

This is where Parsewise’s Palantir lineage shows. Greg built classical ETL and AI workflows at Palantir. Max did complex data analysis in financial services at Bain. The product reflects an understanding that enterprise data extraction is not a one-shot prompt engineering problem but an ongoing process of definition refinement, exception handling, and human-in-the-loop validation.

The model and cloud agnosticism is pragmatic. Parsewise can run in private networks, which matters for regulated industries where full automation is prohibited. The founders note that many of their customers work in exactly those sectors.

What it means for AI builders

Parsewise’s launch arrives at a moment when the document AI space is crowded but shallow. Most products solve the first step — parsing a single document into fields — and stop. Parsewise solves the second step: reasoning across documents and making the reasoning auditable.

The exhaustive search approach has tradeoffs. At scale, cost and latency become issues. When asked about e-discovery use cases with 120GB of data, Greg acknowledged that “at that scale cost and latency may actually become an issue, so probably better to consider some sort of indexing or keyword searching.” The product is not a universal document database. It is a precision extraction tool for situations where every value must be sourced and every resolution must be explainable.

For AI builders, the lesson is in the architecture. Most teams reach for the most capable model and a RAG pipeline. Parsewise reaches for the smallest model that can do the search, the most capable model for the judgment calls, and a human interface that optimizes for trust per click. The “human harness” over “model harness” framing is a direct challenge to the prevailing wisdom that better models reduce the need for verification infrastructure.

The question Parsewise leaves open is whether the agent-definition layer can become a product in its own right, or whether it remains a bespoke configuration burden that limits the addressable market. Greg’s HN answers suggest that each domain and organization ends up with highly customized definitions, and that the amount of work ranges from a few hours to a few minutes per day. That is honest but not scalable.

Still, Parsewise has identified a real gap. The market for document AI is not a model quality problem. It is a trust infrastructure problem. Parsewise builds the infrastructure, and the rest of the industry will have to follow.