https://www.youtube.com/watch?v=huH0fKmw0H0
Rmsnorm 사용, swiGLU 사용, Grouped query attention
LR 공개
optimizer 공개
weight decay 공개
rmsnorm 사용 - 정확하지만 느림
https://arxiv.org/abs/2404.14619
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release Open
arxiv.org
https://machinelearning.apple.com/research/openelm
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of…
machinelearning.apple.com
https://github.com/apple/corenet
GitHub - apple/corenet: CoreNet: A library for training deep neural networks
CoreNet: A library for training deep neural networks - apple/corenet
github.com
공개해서 좋은데, 애플 홍보 같은 느낌
그래도 대규모 모델에 이렇게 다 공개하는 건 좋은 일
'일상생활' 카테고리의 다른 글
트랜스포머 인코더 강의 (0) | 2024.04.29 |
---|---|
Leave No Context Behind: Efficient Infinite Context Transformers with Infini attention (1) | 2024.04.28 |
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer) (0) | 2024.04.25 |
Anthropic's Claude 3는 자의식이 있는가? (0) | 2024.04.24 |
4월 4주차 AI 모음 (0) | 2024.04.24 |