Software / T-2026-2699

A 13M-parameter LLM trained on a single GPU: what one developer's project reveals about the state of AI research

Q: A 13M-parameter LLM trained on a single GPU: what one developer's project reveals about the state of AI research — key point 1

A 13M-parameter transformer trained on a single T4 GPU in one day produces text with surface grammar but no semantic coherence.

Q: A 13M-parameter LLM trained on a single GPU: what one developer's project reveals about the state of AI research — key point 2

The project is a pedagogical tool demonstrating the training pipeline, not a functional model for deployment.

Q: A 13M-parameter LLM trained on a single GPU: what one developer's project reveals about the state of AI research — key point 3

Training useful models requires exponential compute scaling, with a single RTX 4090 capping at ~4B parameters.

FareedKhan's train-llm-from-scratch repo shows that the barrier to entry for training language models has collapsed — but the ceiling for what a single GPU can do remains

Tessera Newsroom · 6 min read · May 31, 2026

Source FareedKhan-dev/train-llm-from-scratch (github.com)

FIGURE T-2026-2699

13M SOFTWARE

A developer named FareedKhan has published a GitHub repository that walks through training a 13 million-parameter language model from scratch on a single GPU, using the Pile dataset and a transformer architecture based on the “Attention is All You Need” paper. The project is not a research breakthrough. It is a tutorial. But it surfaces something worth examining: the gap between what is now trivially possible for an individual with a consumer GPU and what the frontier labs spend billions of dollars to achieve.

The repository is straightforward. Clone it, install dependencies, download a few compressed JSONL files from the Pile (the uncopyrighted subset hosted on HuggingFace), preprocess them into HDF5 format, and run train_transformer.py. The default configuration produces a model with a vocabulary of 50,304 tokens, a context length of 128 tokens, an embedding dimension of 128, 8 attention heads, and a single transformer block. That is a 13 million-parameter model. FareedKhan reports that training completes within a day on a T4 GPU — the kind available for free on Google Colab or Kaggle.

The output of the trained model is revealing. FareedKhan includes a sample in the repository’s README: “In ***1978, The park was returned to the factory-plate that the public share to the lower of the electronic fence that follow from the Station’s cities.” The text has recognizable structure — capital letters at sentence starts, punctuation, a date — but it is semantically incoherent. It is not a functional language model. It is a proof of concept that a single person with a single GPU and a weekend can replicate the mechanical steps of training a transformer.

That is the real news here. Not the output quality. The fact that the pipeline exists at all.

The repository includes a detailed table mapping GPU models to their maximum practical LLM training size. An RTX 3090 with 24 GB of VRAM can train up to about 4 billion parameters. An A100 with 40 GB can handle 6 to 8 billion. A Quadro RTX 8000 with 48 GB tops out at 8 to 10 billion. These numbers are not theoretical. They are the practical ceiling for training a dense transformer on a single GPU, given the memory constraints of storing optimizer states, gradients, and activations alongside the model weights.

FareedKhan explicitly frames the project as a response to a prior attempt. He previously trained a 2.3 million-parameter model on the Tiny Shakespeare dataset. The output was essentially gibberish — “Sey solmenter ! tis tonguerered if” — and he wondered what would happen if he made the architecture smaller and the training data more diverse. The 13 million-parameter model trained on the Pile produces output that at least respects the surface grammar of English. That is the improvement.

This is the state of individual-scale LLM training in 2026. A 13 million-parameter model trained on a slice of the Pile produces text that looks like language but communicates nothing. A 2 billion-parameter model — which requires a high-end consumer GPU — might produce something closer to coherent paragraphs. But the gap between that and a 70 billion-parameter model capable of sustained reasoning is not linear. It is exponential in compute, data, and engineering effort.

What the repository does not address is worth noting. The training pipeline uses a single transformer block. Modern frontier models use dozens or hundreds. The context length is 128 tokens. Frontier models operate at 128,000 or more. The tokenizer is r50k_base, the same one used in early GPT models. The training data is a small fraction of the 825 GB Pile. FareedKhan recommends using 5 to 10 percent of it, which is roughly 40 to 80 GB. That is enough to demonstrate the training loop. It is not enough to produce a model that can answer a question.

The project is a teaching tool. It shows how to implement multi-head attention, the MLP sublayer, the transformer block, and the final linear projection. It shows how to batch data, how to handle the loss function, and how to save and load checkpoints. For someone learning how transformers work under the hood, this is valuable. The code is clean and modular, with separate files for each component under src/models/.

But the commentary around the project — and projects like it — tends to overshoot. The headline “train your own LLM from scratch” implies a degree of capability that the output does not support. What you can train on a single GPU is a toy. It is a useful toy. It is a pedagogical toy. It is not a model you would deploy.

The hardware table in the repository is the most honest part of the project. It lists 25 GPU models and their limits. The takeaway is clear: if you want to train a model that approaches the quality of a 2024-era frontier model, you need multiple A100s or H100s. A single RTX 4090 caps out at around 4 billion parameters. That is roughly the size of the smallest models that produce coherent long-form text.

The project also surfaces a tension in the open-source AI ecosystem. The tools for training models are now widely available. The knowledge required to use them is increasingly documented. But the compute required to train a model that is actually useful for anything beyond demonstration has not gotten cheaper. A T4 GPU costs about $0.35 per hour on a cloud provider. Training a 13 million-parameter model for a day costs less than $10. Training a 7 billion-parameter model from scratch costs thousands. Training a 70 billion-parameter model costs millions.

That gap is structural. It is not going to close with better software alone. The repository is a reminder that the democratization of AI training has reached the point where anyone can run the pipeline, but the output quality is fundamentally constrained by the hardware budget. The frontier is not a matter of access to code. It is a matter of access to compute.

FareedKhan notes in the README that he is looking for a PhD position in AI. The repository is a reasonable portfolio piece. It demonstrates that he can implement a transformer, manage a training pipeline, and communicate the results. It is not a research contribution. It is a signal of competence in a field where the barrier to entry for basic experimentation has fallen to zero.

The question the project raises for the industry is whether the next wave of AI research will come from individuals training 13 million-parameter models on Colab GPUs, or whether the compute wall will push all meaningful capability research into institutions with data center budgets. The repository does not answer that question. But it does show that the floor for participation is lower than it has ever been. The ceiling is another matter entirely.

Tessera Newsroom

Editorial

Masthead Contact

T-REL / SOFTWARE

llama.cpp b10012 ships 27 platform builds, a quiet snapshot of AI fragmentation

llama.cpp b10012 ships 27 platform-specific binaries. The release is a quiet snapshot of AI inference fragmentation across hardware backends.

Tessera Newsroom · July 15, 2026

Software / T-2026-3616

The 18-Chapter AI Compendium That Got Friends Into DeepMind and OpenAI

An open-source textbook covering maths, CS, and AI from the ground up, with an MCP server for AI assistants.

Tessera Newsroom · July 15, 2026

Software / T-2026-5339

Nutlope/hallmark: a design skill that refuses to look AI-generated

A design skill for Claude Code, Cursor, and Codex that actively rejects the visual fingerprints of LLM training data.

Tessera Newsroom · July 14, 2026