Research / T-2026-7117

Parsewise (YC P25) bets that document AI's bottleneck is trust, not extraction

Q: Parsewise (YC P25) bets that document AI's bottleneck is trust, not extraction — key point 1

Parsewise traces every output value to word-level citations across multiple documents, rejecting RAG's sampling for exhaustive search.

Q: Parsewise (YC P25) bets that document AI's bottleneck is trust, not extraction — key point 2

The product focuses on verification speed over extraction accuracy, addressing the cognitive load of trusting AI output.

Q: Parsewise (YC P25) bets that document AI's bottleneck is trust, not extraction — key point 3

Parsewise uses a tiered model architecture with vision LLMs for parsing and small models for exhaustive search, avoiding embeddings for specialist domains.

Parsewise, founded by ex-Palantir and Bain engineers, launches an API for cross-document data extraction with word-level citations, beating Gemini on the Databricks OfficeQA…

Tessera Newsroom · 4 min read · July 2, 2026

Source Launch HN: Parsewise (YC P25) – Reason Across Documents with an API (news.ycombinator.com)

FIGURE T-2026-7117

25 RESEARCH

Parsewise, a YC P25 company founded by Greg and Max, launched on Hacker News with an API that extracts structured data from buckets of unstructured documents. The pitch sounds familiar — many startups promise to turn PDFs into JSON. What makes Parsewise worth attention is what it does differently: it traces every output value down to word-level citations across multiple documents, and it rejects the sampling approach of RAG in favor of exhaustive search.

The founders frame the problem in a way that resonates with anyone who has tried to get Claude to produce a clean CSV from a folder of insurance policies. The system limitations are well known: file count limits, input type restrictions, cost, latency. But Parsewise identifies a deeper friction that most document AI products ignore. “We focused more on the ‘human harness’ rather than the model harness,” Greg wrote in the launch post, “leaning into the actual friction we saw in uptake, which is around verifiability.”

That friction is the time and cognitive load required to trust the output. A model can extract 100 fields correctly and one field wrong. Without traceability, the user must re-read every source document to verify. Parsewise’s bet is that the bottleneck is not extraction accuracy but verification speed.

How it works

Parsewise takes a bucket of data — hundreds or thousands of PDFs, Excel files, transcribed phone calls, emails — and outputs schema-compliant data. Every value carries a citation down to the word level, and those citations can span multiple documents. The system uses a tiered model architecture: vision LLMs for parsing, small models for large-scale exhaustive search, and larger models for cross-document resolution and inconsistency flagging.

The “exhaustive search” claim is the technical differentiator. Unlike RAG, which samples the most relevant chunks via embedding similarity, Parsewise finds all relevant values for a given query. Greg explained in the HN thread that the company deliberately avoids embeddings and vector similarity for specialist domains. On the Databricks OfficeQA benchmark — 90,000 pages of US Treasury documents — embedding-based approaches fail because all content maps to a small embedding space with small variations across years and expense categories.

Parsewise claims state-of-the-art results on OfficeQA, beating Claude Fable using Gemini models for visual reasoning. The benchmark tests grounded reasoning across dense financial documents, and the SOTA claim is significant because OfficeQA was designed to measure exactly the cross-document reasoning that most extraction tools cannot do.

The lineage architecture

The core abstraction is the “self-improving agent definition.” Users configure acceptable sources, resolution logic for combining values across documents, and rules for flagging uncertainty. The definitions are domain-specific and evolve over time. Greg described a workflow where manager review ratings and feedback comments update definitions, with side-by-side before-and-after comparisons on existing data before committing changes.

This is where Parsewise’s Palantir lineage shows. Greg built classical ETL and AI workflows at Palantir. Max did complex data analysis in financial services at Bain. The product reflects an understanding that enterprise data extraction is not a one-shot prompt engineering problem but an ongoing process of definition refinement, exception handling, and human-in-the-loop validation.

The model and cloud agnosticism is pragmatic. Parsewise can run in private networks, which matters for regulated industries where full automation is prohibited. The founders note that many of their customers work in exactly those sectors.

What it means for AI builders

Parsewise’s launch arrives at a moment when the document AI space is crowded but shallow. Most products solve the first step — parsing a single document into fields — and stop. Parsewise solves the second step: reasoning across documents and making the reasoning auditable.

The exhaustive search approach has tradeoffs. At scale, cost and latency become issues. When asked about e-discovery use cases with 120GB of data, Greg acknowledged that “at that scale cost and latency may actually become an issue, so probably better to consider some sort of indexing or keyword searching.” The product is not a universal document database. It is a precision extraction tool for situations where every value must be sourced and every resolution must be explainable.

For AI builders, the lesson is in the architecture. Most teams reach for the most capable model and a RAG pipeline. Parsewise reaches for the smallest model that can do the search, the most capable model for the judgment calls, and a human interface that optimizes for trust per click. The “human harness” over “model harness” framing is a direct challenge to the prevailing wisdom that better models reduce the need for verification infrastructure.

The question Parsewise leaves open is whether the agent-definition layer can become a product in its own right, or whether it remains a bespoke configuration burden that limits the addressable market. Greg’s HN answers suggest that each domain and organization ends up with highly customized definitions, and that the amount of work ranges from a few hours to a few minutes per day. That is honest but not scalable.

Still, Parsewise has identified a real gap. The market for document AI is not a model quality problem. It is a trust infrastructure problem. Parsewise builds the infrastructure, and the rest of the industry will have to follow.

Tessera Newsroom

Editorial

Masthead Contact

T-REL / RESEARCH

Meituan's LongCat-2.0: A 1.6T-Parameter MoE Trained on Chinese Chips, Now Open Source

Meituan releases LongCat-2.0, a 1.6T-parameter open-source MoE model trained on 50,000 domestic chips, with a 1M context window and top-3 OpenRouter rankings.

Tessera Newsroom · June 30, 2026

Research / T-2026-8377

Central bankers warn debt-fuelled AI boom risks a global financial crash

The Bank for International Settlements warns that debt-fuelled AI spending, opaque financing, and shadow bank lending risk a global financial crash similar to the 2008 credit…

Tessera Newsroom · June 29, 2026

Research / T-2026-6838

The single Unicode character that reveals AI's typeface blind spot

A deep dive into Arabic ligature rendering exposes a blind spot in how AI systems understand written language.

Tessera Newsroom · June 29, 2026