Knowledge Base
Explore 5 core concepts in AI/ML research.
Transformer
ArchitectureA deep learning model architecture relying on self-attention mechanisms.
Definition
The Transformer architecture processes input sequences in parallel using self-attention, allowing it to capture long-range dependencies more effectively than RNNs. It consists of encoder and decoder stacks, each containing multi-head attention and feed-forward layers.
Related Concepts
Self-AttentionPositional EncodingMulti-Head Attention
Key Papers
Attention Is All You NeedBERT
Examples: GPT-4, Claude