LG Aimers 4기 Training Neural Networks

LG Aimers 4기 Training Neural Networks

2024. 1. 19. 08:01ㆍ코딩 도구/LG Aimers

LG Aimers: AI전문가과정 4차

Module 6. 『딥러닝(Deep Learning)』

ㅇ 교수 : KAIST 주재걸 교수
ㅇ 학습목표
Neural Networks의 한 종류인 딥러닝(Deep Learning)에 대한 기본 개념과 대표적인 모형들의 학습원리를 배우게 됩니다.
이미지와 언어모델 학습을 위한 딥러닝 모델과 학습원리를 배우게 됩니다.

Part 2. Training Neural Networks

-Training Neural Networks via Gradient Descent

Given the optimization problem, min 𝑊
𝐿 (𝑊) , where 𝑊 is the neural network parameters, we optimize 𝑊 using gradient descent approach:
𝑊 ≔ 𝑊 − 𝛼 * 𝑑𝐿(𝑊)/d𝑊

-Poor Convergence Case of Naïve Gradient Descent

Suppose loss function is steep vertically but shallow horizontally:

Q: What is the trajectory along which we converge towards the minimum with SGD?

Very slow progress along flat direction, jitter along steep one

-Backpropagation to Compute Gradient in Neural Networks

First, given an input data item, compute the loss function value via Forward Propagation

Afterwards, compute the gradient with respect to each neural network parameter via Backpropagation

Finally, update the parameters using gradient descent algorithm

-Computational Graph of Logistic Regression

Computational Graph of Logistic Regression

-Forward Propagation of Logistic Regression

-Backpropagation of Logistic Regression

-Sigmoid Activation

• Maps real numbers in (−∞, ∞) into a range
of [0, 1]
• gives a probabilistic interpretation

-Problems of Sigmoid Activation

• Saturated neurons kills the gradients
• The gradient value 𝜎 𝑥 ≤1/4 which decreases the gradient during backpropagation, i.e., causing a gradient
vanishing problem

-Tanh Activation

Tanh
• tanh 𝑥 = 2 × sigmoid 𝑥 − 1
• Squashes numbers to range [-1, 1]

Strength
• Zero-centered (average is 0) Weakness
• Still kills gradients when saturated, i.e., still
causing a gradient vanishing problem

-ReLU Activation

ReLU (Rectified Linear Unit)
• 𝑓(𝑥) = max(0, 𝑥)

Strength
• Does not saturate (in + region)
• Very computationally efficient
• Converge much faster than sigmoid/tanh
Weakness
• Not zero-centered output
• Gradient is completely zero for 𝑥 < 0

Batch Normalization
-Motivation of Batch Normalization
• Saturated gradients when random initialization is done
• The parameters are not updated → Hard to optimize (in red region)

-Definition of Batch Normalization
“You want unit Gaussian activations? just make them so.”
• We consider a batch of activations at some layer to make each dimension unit Gaussian

저작자표시 비영리 변경금지 (새창열림)

'코딩 도구 > LG Aimers' 카테고리의 다른 글

LG Aimers 4기 Seq2Seq , Natural Language Understanding and Generation (2)	2024.01.21
LG Aimers 4기 Convolutional Neural Networks and Image Classification (5)	2024.01.20
LG Aimers 4기 Deep Neural Networks (2)	2024.01.18
LG Aimers 4기 Phase 1 온라인 교육 후기 (0)	2024.01.17
LG Aimers 4기 인과추론의 다양한 연구 방향 (0)	2024.01.17

MK 실험실

MK 실험실

태그

최근글

댓글

공지사항

아카이브

LG Aimers: AI전문가과정 4차

Module 6. 『딥러닝(Deep Learning)』

'코딩 도구 > LG Aimers' 카테고리의 다른 글

관련글

티스토리툴바