Scaling Law, Architecture for Stability and Layer Stacking

September 11, 2024 sorta-informative

Scaling Law Scaling law is one of the most important findings in LLMs (and neural networks in general) 1. You can make almost all important decisions about training of models with scaling law. For example you can choose model size, number of training steps 2, hyperparameters such as learning rate and batch size 3, learning rate schedules 4, mixture of training datasets 5, etc. So if you are serious about …

Preliminary Explorations on UL2 and Second-order Optimizers

June 4, 2024 sorta-informative

In the field of large language models, the most important recipes to cook the model is not opened to publics. Model architecture itself is quite well-known because many state-of-the-art models are now open weights, and in many cases we find it is a boringly simple vanilla transformers. But for datasets and training objectives it is not well known, and many LLM builders deliberately obfuscates the details of these two. And, …

머신러닝 파이프라인 만들기

August 14, 2019 sorta-informative

딥 러닝이 유행하기 시작할 무렵 딥 러닝의 장점으로 나왔던 것이 특징을 추출하는 알고리즘(Feature extractor)을 데이터를 통해 학습한다는 것이었 …

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

October 16, 2018 sorta-informative

프리트레이닝과 전이학습 모델을 프리트레이닝하는 것이, 혹은 프리트레이닝된 모델이 모듈로 쓰는 것이 성능에 큰 영향을 미칠 수 있다는 건 너무나 잘 알려진 사실이다. …

앞으로 재미있을지도 모르는 주제들

November 3, 2017 thoughts

LSTM을 대신할 RNN Cell을 설계한다거나 하는 식의 기존의 구조를 개선하는 방안을 고안하는 것은 분명히 중요한 일이기는 하지만 그 자체로는 이전에는 불가능하거 …

특징 추출(Feature Extraction)과 딥 러닝

May 8, 2017 sorta informative

https://sinews.siam.org/Details-Page/deep-deep-trouble 뉴럴넷 연구를 하던 사람들이 오랜 겨울을 지나왔던 것처럼 이미지 처리에서, 이젠 전통적인 방법이라고 불리는 방법들을 연구하던 사람들의 고민이 깊은 모양이다. 뉴 …

배치 정규화 2

April 21, 2017 sorta-informative

batch normalization의 문제 의식은 뉴럴넷에서 하나의 레이어의 출력은 이전의 레이어의 출력에 의해 영향을 받기에, 깊은 뉴럴넷에서는 이런 &ldquo …

딥 러닝과 표 형태의 데이터

April 21, 2017 thoughts

전통적 통계적 모델링의 대상인 표 형태의 데이터tabular data에 대해서는 딥 러닝이 힘을 못 쓴다(?)는 말을 흔히 한다. 사실 이건 딥 러닝이 이미지나 텍스 …

강화 학습과 행위자 기반 모형

April 21, 2017 thoughts

https://deepmind.com/blog/understanding-agent-cooperation/ 최근에 인공지능에 승부욕이 있다느니 혹은 공격성을 보였다느니 하는 식으로 소개된 딥마인드의 연구다. 사실 연구의 핵심은 두 행위자들을 강화학습으로 훈련시켜서 …

딥 러닝 모형의 해석

April 21, 2017 thoughts

딥 러닝은 이론적 근거가 부족하고 해석이 어렵다는 등등의 평가를 흔히 받는다. 이건 통계학쪽 뿐만 아니라 머신 러닝 커뮤니티쪽에서도 (과거에는) 마찬가지였던 모양 …

Kim Seonghyeon

Posts

클레르 옵스퀴르: 33 원정대

Scaling Law, Architecture for Stability and Layer Stacking

Preliminary Explorations on UL2 and Second-order Optimizers

Constitutional AI

이미지와 텍스트 생성 모델에 대해

언어의 손실 압축에 대하여

OCR 회고

텔 아비브와 ECCV 2022 여행기 7

텔 아비브와 ECCV 2022 여행기 6