language models

클레르 옵스퀴르: 33 원정대

May 5, 2025

오랜만에 좋은 게임을 했다. 새삼스럽지만 게임은 굉장히 강력한 스토리텔링의 수단이라는 생각을 한다. 게임 플레이와 섞이기 때문에 서사의 밀도가 높지는 않겠지만 2

Scaling Law, Architecture for Stability and Layer Stacking

Sep 9, 2024

Scaling Law Scaling law is one of the most important findings in LLMs (and neural networks in general) 1. You can make almost all important decisions about training of models with scaling law. For example you can choose model size, number of training steps 2, hyperparameters such as learning rate and batch size 3, learning rate schedules 4, mixture of training datasets 5, etc. So if you are serious about

Preliminary Explorations on UL2 and Second-order Optimizers

Jun 6, 2024

In the field of large language models, the most important recipes to cook the model is not opened to publics. Model architecture itself is quite well-known because many state-of-the-art models are now open weights, and in many cases we find it is a boringly simple vanilla transformers. But for datasets and training objectives it is not well known, and many LLM builders deliberately obfuscates the details of these two. And,

Constitutional AI

Jul 7, 2023

Helpful & Harmless Agent AI 모델의 정렬(Alignment)이라고 이야기할 때 흔히 나오는 Helpfulness와 Harmlessness는 어떤 의미인가? 이는 정의

이미지와 텍스트 생성 모델에 대해

Feb 2, 2023

이미지 생성 하면 Style GAN이었던 시절에도 일러스트 생성 등은 오타쿠적 인기가 있는 주제였다. 문제의 Danbooru 데이터셋 같은 경우에도 그 시점에 이미 만들어진 데이터셋이었

언어의 손실 압축에 대하여

Feb 2, 2023

https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web LM을 다음 단어를 예측할 뿐이라거나 학습 데이터를 기억할 뿐이라는 식으로 묘사하는 것은 폄하를 위한 언어이지 LM의 실체나 실제 한계에 대해서 논하기에 적절한 방

OCR 회고

Jan 1, 2023

타이틀 커버 이미지 출처: https://www.behance.net/gallery/6146939/OCR-A-Poster/modules/152114859 4년 동안 몰두했던 OCR이라는 주제를 마무리하게 되면서 으레 그래왔듯 회고를 남겨본다. 이랬더라면 어땠을까 같은 소소한 소회보다는

텔 아비브와 ECCV 2022 여행기 7

Nov 11, 2022

텔 아비브와 ECCV 2022 여행기 1 텔 아비브와 ECCV 2022 여행기 2 텔 아비브와 ECCV 2022 여행기 3 텔 아비브와 ECCV 2022 여행기 4 텔 아비브와 ECCV 2022 여행기 5 텔 아비브와 ECCV 2022 여행기 6 텔 아비브

텔 아비브와 ECCV 2022 여행기 6

Nov 11, 2022

텔 아비브와 ECCV 2022 여행기 1 텔 아비브와 ECCV 2022 여행기 2 텔 아비브와 ECCV 2022 여행기 3 텔 아비브와 ECCV 2022 여행기 4 텔 아비브와 ECCV 2022 여행기 5 텔 아비브와 ECCV 2022 여행기 6 텔 아비브

Scaling Law, Architecture for Stability and Layer Stacking

Preliminary Explorations on UL2 and Second-order Optimizers

Kim Seonghyeon

Posts

클레르 옵스퀴르: 33 원정대

Scaling Law, Architecture for Stability and Layer Stacking

Preliminary Explorations on UL2 and Second-order Optimizers

Constitutional AI

이미지와 텍스트 생성 모델에 대해

언어의 손실 압축에 대하여

OCR 회고

텔 아비브와 ECCV 2022 여행기 7

텔 아비브와 ECCV 2022 여행기 6