2024. 1. 23. 07:11ㆍ코딩 도구/LG Aimers
LG Aimers: AI전문가과정 4차
Module 6. 『딥러닝(Deep Learning)』
ㅇ 교수 : KAIST 주재걸 교수
ㅇ 학습목표
Neural Networks의 한 종류인 딥러닝(Deep Learning)에 대한 기본 개념과 대표적인 모형들의 학습원리를 배우게 됩니다.
이미지와 언어모델 학습을 위한 딥러닝 모델과 학습원리를 배우게 됩니다.
Part 6. Self-Supervised Learning and Large-Scale Pre-Trained Models
-What is Self-Supervised Learning?
• Given unlabeled data, hide part of the data and train the model so that it can predict such a hidden part of data, given the remaining data.
-Transfer Learning from Self-Supervised Pre-trained Model
• Pre-trained models using a particular self-supervised learning can be fine-tuned to improve the accuracy of a given target task.
-BERT
BERT:
Pre-training of Deep BidirectionalTransformers for Language
Understanding
• Learn through masked language modeling (MLM) and next-sentence prediction (NSP) tasks
• Use large-scale data and large-scale model
-Pre-Training Tasks of BERT
Masked Language Model (MLM)
• Mask some percentage of the input tokens at random, and then predict those masked tokens.
Next Sentence Prediction (NSP)
• Predict whether Sentence B is an actual sentence that proceeds Sentence A, or a random sentence
-Further Details of BERT
1. Model Architecture
• BERT BASE: L = 12, H = 768, A = 12
• BERT LARGE: L = 24, H = 1024, A = 16
2. Input Representation
• WordPiece embeddings (30,000 WordPiece)
• Learned positional embedding
• [CLS] – Classification embedding
• Packed sentence embedding [SEP]
• Segment Embedding
3. Pre-training Tasks
• Masked LM
• Next Sentence Prediction
-GPT-1/2/3: Generative Pre-Trained Transformer
Generative Pre-Training Task
• In other words, this task is called Language Modeling.
• From another perspective, this task is called an Auto-Regressive Model, in a sense that the predicted output at the current time step will be used as an input at the next time step.
-GPT-2: Language Models are Unsupervised Multi-task Learners
Just a really big transformer-based language model
:Trained on 40GB of text
• A large amount of efforts have been put to secure a high-quality dataset.
• Take webpages from reddit links with at least 3 karma (up-vote)
:Language models can perform down-stream tasks in a zero-shot setting – without any
parameter or architecture modification
-GPT-3: Language Models are Few-Shot Learners
• Scaling up language models greatly improves task-agnostic, few-shot performance
• An autoregressive language model with 175 billion parameters in the few-shot
setting
• 96 attention layers, batch size of 3.2M, 175B parameters
-Few-Shot Learning Example of GPT-3
• Prompt: the prefix given to the model
• Zero-shot: Predict the answer given only a natural language description of the
task
• One-shot: See a single example of the task in addition to the task description
• Few-shot: See a few examples of the task
-Summary
• Models are getting bigger and bigger.
• Owing to self-supervised learning techniques, the language generation capability is getting better and better.
• We are getting closer to artificial general intelligence.
-ELECTRA
Efficiently Learning an Encoder that Classifiers Token Replacements Accurately
• Learn to distinguish real input tokens from plausible but synthetically generated replacements
• Pre-training text encoders as discriminators rather than generators
• Discriminator is the main networks for pre-training.
• Replaced token detection pre-training vs masked language model pre-training
• Outperforms MLM-based methods such as BERT and XLNet given the same model size, data, and compute
-ALBERT
Is having better NLP models as easy as having larger models?
• Obstacles
Memory Limitation
Training Speed
• Solutions
Factorized Embedding Parameterization
Cross-layer Parameter Sharing
(For Performance) Sentence Order Prediction
'코딩 도구 > LG Aimers' 카테고리의 다른 글
LG Aimers 4기 고객가치와 가격 (2) | 2024.01.25 |
---|---|
LG Aimers 4기 B2B 시장, 소비자와 고객의 차이 (6) | 2024.01.24 |
LG Aimers 4기 Transformer (4) | 2024.01.22 |
LG Aimers 4기 Seq2Seq , Natural Language Understanding and Generation (2) | 2024.01.21 |
LG Aimers 4기 Convolutional Neural Networks and Image Classification (5) | 2024.01.20 |