PaperBot

Knowledge Base

Explore 5 core concepts in AI/ML research.

Transformer

Architecture

A deep learning model architecture relying on self-attention mechanisms.

Definition

The Transformer architecture processes input sequences in parallel using self-attention, allowing it to capture long-range dependencies more effectively than RNNs. It consists of encoder and decoder stacks, each containing multi-head attention and feed-forward layers.

Related Concepts

Self-AttentionPositional EncodingMulti-Head Attention

Key Papers

Attention Is All You NeedBERT

Examples: GPT-4, Claude