🎓 Knowledge Distillation
Transfer knowledge from large models to compact ones while preserving performance
Your Progress
0 / 5 completed←
Previous Module
Quantization Techniques
Introduction to Knowledge Distillation
🎯 What is Knowledge Distillation?
Knowledge distillation transfers the knowledge of a large, complex teacher model to a smaller, efficient student model. The student learns not just from hard labels, but from the teacher's soft probability distributions.
💡
Key Concept
Soft targets contain more information than hard labels - they reveal class similarities and uncertainties.
👥 Teacher-Student Framework
👨🏫
Teacher Model
- • Large, complex architecture
- • High accuracy (95%+)
- • Slow inference
- • Pre-trained and frozen
🎓
Student Model
- • Small, efficient architecture
- • Near-teacher accuracy (93-94%)
- • Fast inference (10x+)
- • Trained with distillation
📊 Hard vs Soft Targets
Hard Labels (Traditional)
Cat:
1.0
Dog:
0.0
❌ No information about class relationships
Soft Targets (Distillation)
Cat:
0.85
Dog:
0.10
Tiger:
0.03
✓ Reveals class similarities (cats vs dogs vs tigers)
✨ Benefits
📦
Model Compression
10x smaller models with minimal accuracy loss
⚡
Faster Inference
10-100x speedup for edge deployment
🎯
Better Generalization
Students often generalize better than baseline
💰
Cost Reduction
Lower infrastructure and serving costs