Question 1

How does it work?

Accepted Answer

LoRA freezes the original pre-trained weights and inserts trainable low-rank decomposition matrices into specific layers, typically the query and value projection matrices in transformer attention. The weight update ΔW is approximated as the product of two smaller matrices A and B, where the rank r is much smaller than the original dimensions. Only these low-rank matrices are updated during fine-tuning, reducing the number of trainable parameters.

Question 2

What is the typical rank value and how is it chosen?

Accepted Answer

Common rank values range from 1 to 64, with 8 or 16 being typical starting points. The rank controls the expressiveness of the adaptation: higher ranks allow more capacity but increase parameters and risk overfitting. In practice, many tasks achieve strong performance with surprisingly low ranks (e.g., r=8), and the optimal rank can be determined through validation experiments.

Question 3

How does LoRA compare to full fine-tuning?

Accepted Answer

LoRA often matches or closely approaches the performance of full fine-tuning on downstream tasks while using orders of magnitude fewer trainable parameters. It also avoids catastrophic forgetting of the base model's capabilities because the original weights remain unchanged. However, for tasks requiring very large distribution shifts, full fine-tuning may still outperform LoRA, though the gap is often small.

LoRA (Low-Rank Adaptation)

LoRA (Low-Rank Adaptation)

Why it matters

First appeared

FAQ

How does it work?

What is the typical rank value and how is it chosen?

How does LoRA compare to full fine-tuning?