Historical Context: Seq2Seq Paper and NMT by Joint Learning to Align & Translate Paper : * Seq2Seq model introduced in "Sequence to Sequence Learning with Neural Networks" paper by Sutskever et al., revolutionized NLP with end-to-end learning. For example, translating "Bonjour" to "Hello" without handcrafted features. * NMT by "Joint Learning to Align and Translate" (Luong et al.) improved Seq2Seq with attention mechanisms. For instance, aligning "Bonjour" and "Hello" more accurately based on context. Introduction to Transformers (Paper: Attention is all you need) : * Transformers introduced in "Attention is All You Need" paper. * Replaced RNN-based models with attention mechanisms, making them highly parallelizable and efficient. Why transformers : * Transformers capture long-range dependencies effectively. * They use self-attention to process tokens in parallel and capture global context efficiently. ...
DATA SCIENCE