JetBrains has published detailed deployment instructions for Mellum, its proprietary code completion LLM, as a self-hosted service that organizations can run inside their own infrastructure. The move transforms what was originally a cloud-only feature in JetBrains AI Assistant into a product that enterprises can deploy in air-gapped environments, on their own GPUs, behind their own firewalls.

That is the real news here. Not that JetBrains has a code model. The company announced Mellum in October 2024 as the engine behind AI Assistant’s cloud completion. What changed is the delivery model. Mellum now ships as a containerized service that installs via Helm charts, requires an access token from JetBrains, and demands serious hardware: Nvidia L40, H100, or H200 GPUs, 2 CPU cores, and 16 GB of RAM per node.

The performance numbers in the documentation tell a clear story about where JetBrains is positioning this. On a single Nvidia L40, Mellum handles 5 requests per second with p50 latency under 250 milliseconds and p90 under 1000 milliseconds. That supports up to 750 seats. On an H200, the system scales to 11 requests per second and supports up to 1750 seats. The latency targets are aggressive. Most developers will not notice a 250 millisecond wait for a completion suggestion.

But the hardware requirements also reveal a tradeoff. Mellum is not a lightweight model you can run on a laptop GPU or a consumer card. The minimum supported GPU is an Nvidia L4, and JetBrains explicitly labels that configuration as “for testing / debug purposes” with a maximum of 1 request per second. Production deployments require datacenter GPUs. That puts Mellum in a different category than local code completion models like Codeium’s or the smaller models that run inside VS Code via GitHub Copilot.

JetBrains is not competing on the low end. It is competing on control.

The company’s blog post from October 2024 claimed that Mellum reduced completion latency to one third of its previous time, achieved an acceptance rate of about 40 percent, and cut the cancel rate by a factor of three to four. Total completions shown more than doubled. Those are strong metrics for a code completion model. But the blog post also noted that Mellum was trained solely on publicly available, permissively licensed code. That is a meaningful differentiator in an era where every major AI company faces lawsuits over training data.

The enterprise pitch is straightforward. A company that handles sensitive code, works in regulated industries, or operates in air-gapped environments cannot send code to OpenAI, Anthropic, or even a cloud API from JetBrains. Mellum lets them run the model inside their own Kubernetes cluster. The data never leaves. The completions are generated locally. The security team can audit the deployment.

This is not a new idea. Private deployments of code completion models exist from Tabnine, Sourcegraph Cody, and others. But JetBrains has something those vendors lack: an installed base of millions of developers who already use IntelliJ-based IDEs. JetBrains does not need to convince developers to switch editors. It needs to convince their employers to pay for AI Enterprise licenses that include Mellum.

The pricing model matters here. JetBrains AI Pro, which includes cloud-based Mellum completion, costs $10 per month per user. The AI Enterprise tier, which includes self-hosted Mellum, costs more and requires a minimum number of seats. JetBrains does not publish exact Enterprise pricing publicly. But the hardware requirements mean that any organization running Mellum on-premises will also pay for GPU compute, storage, and operational overhead. The total cost of ownership is not trivial.

Still, for a company with 500 developers writing Java, Kotlin, or Python, the calculation might work. JetBrains claims Mellum supports up to 750 seats on a single L40 instance. That is roughly one GPU for a mid-size engineering organization. The latency numbers suggest developers will not notice the difference between on-premises and cloud. The security team gets air-gap compliance. The legal team gets assurance that code is not being sent to third-party APIs.

The deeper implication is about the structure of the AI coding tools market. GitHub Copilot, the market leader, runs entirely in the cloud. So does Amazon CodeWhisperer. So does Google’s Duet AI. These products are built on the assumption that developers will accept sending code to a remote server. That assumption holds for many teams. It does not hold for defense contractors, financial services firms, healthcare organizations, or any company that treats source code as a trade secret.

JetBrains is betting that a meaningful fraction of the enterprise market will pay a premium for on-premises AI. The bet is not unreasonable. The same dynamic played out in the IDE market itself. JetBrains built its business selling on-premises development tools to enterprises that would not touch cloud IDEs. Mellum extends that strategy to AI.

There is also a competitive angle against Microsoft. JetBrains and Microsoft compete in the IDE space. IntelliJ IDEA and VS Code go head to head. GitHub Copilot is a Microsoft product that works best in VS Code. JetBrains cannot afford to let Copilot become the default AI assistant for all developers, including those who use IntelliJ. Mellum gives JetBrains a proprietary AI capability that is tightly integrated with its own IDEs and cannot run inside VS Code.

The integration depth matters. Mellum is not a generic model that JetBrains plugs into its IDEs via an API. The model is designed to leverage the IDE’s internal representation of code, including project structure, type information, and static analysis results. That is something that a cloud-only model cannot do as efficiently, because it would require sending the entire project context over the network with every request. Mellum, running locally, can access that context with near-zero latency.

JetBrains claims Mellum supports Java, Kotlin, Python, Go, and PHP. Those are the languages most commonly used in JetBrains IDEs. The company has an Early Access Program for additional languages. But the model is not a general-purpose coding assistant. It is a completion model. It suggests the next few tokens. It does not answer questions, refactor code, or generate entire functions from natural language prompts. Those capabilities remain in the cloud-based AI Assistant, which can use OpenAI, Anthropic, or other providers.

That division of labor makes sense. Completion is the highest-frequency AI interaction in an IDE. It needs to be fast. It benefits from local execution. Everything else can tolerate cloud latency. JetBrains is splitting the difference.

The open question is whether Mellum’s performance justifies the infrastructure investment. A 40 percent acceptance rate is solid. But developers are accustomed to free or cheap code completion from Copilot, which costs $10 per month and requires no GPU hardware. Convincing a CTO to buy GPUs and pay for JetBrains AI Enterprise licenses requires a demonstrable improvement in developer productivity. JetBrains has not published controlled studies showing that Mellum outperforms Copilot on equivalent tasks.

What JetBrains has published is a deployment guide that reads like enterprise infrastructure documentation. Helm charts. JWT keys. Node selectors. Image pull secrets. The audience is not individual developers. The audience is the DevOps team at a large company that already runs Kubernetes, already manages GPU clusters, and already pays for JetBrains licenses.

That is the bet. JetBrains is betting that the enterprise market for AI coding tools will fragment along security and compliance lines, and that a self-hosted model with deep IDE integration will capture the high end. The company is not trying to win the consumer or startup market. It is trying to win the Fortune 500.

The hardware requirements will limit adoption to organizations that already have GPU infrastructure. But for those organizations, Mellum offers something that no cloud-only competitor can match: the guarantee that code never leaves the building.