2024. 1. 14. 09:03ㆍ코딩 도구/LG Aimers
LG Aimers: AI전문가과정 4차
Module 4. 『지도학습(분류/회귀)』
ㅇ 교수 : 이화여자대학교 강제원 교수
ㅇ 학습목표
Machine Learning의 한 부류인 지도학습(Supervised Learning)에 대한 기본 개념과 regression/classification의 목적 및 차이점에 대해 이해하고, 다양한 모델 및 방법 (linear and nonlinear regression, classification, ensemble methods, kernel methods 등)을 통해 언제 어떤 모델을 사용해야 하는지, 왜 사용하는지, 모델 성능을 향상시키는 방법을 학습하게 됩니다.
-Ensemble Learning
이미 사용하거나 개발한 알고리즘의 간단한 확장이다.
(Supervised Learning Task에서 성능을 올릴 수 있는 방법.)
-알고리즘 때 공부한 Confusion matrix 와 ROC Curve 를 오랜만에 만났는데 다시 공부할 필요를 느끼고 대학교 수업때 시험을 위해서 열심히 공부할 필요를 느꼈다.
-Ensemble Methods
• Predict class label for unseen data by aggregating a set of
predictions : different classifiers (experts) learned from the training data
• Make a decision with a voting
-Build Ensemble Classifiers
Basic idea: Build different experts, and let them vote.
• Bagging and boosting
Advantages:
• Improve predictive performance
• Other types of classifiers can be directly included
• Easy to implement
• No too much parameter tuning
Disadvantage
• Not a compact representation
-Bagging
• Bootstrapping + aggregating (for more robust performance;
lower variance)
• Train several models in parallel
A classifier 𝐶𝑖 is learned for each 𝑆𝑖 in sample set 𝑆
• Bagging works because it reduces variance by voting/averaging (robust to overfitting)
Learning algorithm is unstable: if small changes to the training set cause large changes in the learned classifier.
Usually, the more classifiers the better
-Bootstrapping
Generate multiple datasets 𝑆𝑖 in a dataset 𝑆
• 𝑆𝑖 has 𝑛 randomly chosen samples, which may be less than the
original set, with replacement
• Repeat 𝑀 times
→ generate 𝑀 datasets, in which the size is 𝑛.
→ Train 𝑀 models
-Aggregating
• Committee prediction
-Boosting
Cascading of weak classifiers
• Train multiple models in sequence
• Assign a larger weight for misclassified points by one of the base classifiers, when training the next classifier in the sequence (combat to lower bias)
• Adaboost
Advantage
• Simple and easy to implement
• Flexible : can combine with any learning algorithm
• No prior knowledge needed about weak learner
• Versatile : can be applied on a wide variety of problems
• Non-parametric
-Adaboost
AdaBoost, short for Adaptive Boosting, by Y. Freund and R. Shapire (1996)
• 𝑀 sequential base classifiers :
ℎ1, … , ℎ𝑚, … , ℎ𝑀
• Trained on weighted form of the training set
• Weight depends on the performance of the previous classifier
• Combined to give the final classifier
- Bagging and Boosting
Improving decision tree
• By bagging -> random forest (inherently boosting)
• By boosting -> gradient boosting machine (GBM) as
generalized Adaboost
• Very popular machine learning algorithm
• One of leading methods for winning many Kaggle competition
-Supervised learning (SL)
limitation, future research topics
• SL is a baseline study on many recent AI tasks, owing to large-scaled labeled datasets
• Nevertheless, it relies on the sizes of datasets ; what if we
have no sufficient data samples?
• Data augmentation (computer-synthesized data, generated data by
unsupervised learning, etc.)
• Learning from insufficient labels (weak supervision, etc.)
• Furthermore, what if the data properties are different
between datasets?
• Domain adaptation, transfer learning, etc.
Quiz
What answers are correct? Select all that apply.
A. Adaboost algorithm considers the failure of previous classifiers
Correct.
Adaboost algorithm considers the failure of previous classifiers, when choosing (or weighting) samples in the data set
B. In many computer vision and language processing methods applying deep learning, a supervised learning do not really play an important role
False.
In fact, supervised learning is a baseline study of such recent state-of-the-art studies
'코딩 도구 > LG Aimers' 카테고리의 다른 글
LG Aimers 4기 인과추론 수행을 위한 기본 방법론 (0) | 2024.01.16 |
---|---|
LG Aimers 4기 인과성과 기본개념 (1) | 2024.01.15 |
LG Aimers 4기 그리고 Advanced Classification Model (2) | 2024.01.13 |
LG Aimers 4기 그리고 Linear Classification (0) | 2024.01.12 |
LG Aimers 4기 그리고 Gradient Descent (2) | 2024.01.11 |