본문 바로가기

인공지능

Parameter-Efficient Transfer Learning for NLP https://arxiv.org/abs/1902.00751 Parameter-Efficient Transfer Learning for NLPFine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transferarxiv.org초록대규모 사전 학습된 모델을 미세 조정(fine-tuning)하는 것은 자연어 처리(NLP)에서 효.. 더보기
LLM Critics Help Catch LLM BugsNat https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/ -----------------------------------------------------------------------------------------------------------------------------------------------------------------기술 논문이고 솔직히 별로 추천하지 않음-----------------------------------------------------------------------------------------------------------------------------------------------------------.. 더보기
Robust Speech Recognition via Large-Scale Weak Supervision https://arxiv.org/abs/2212.04356 Robust Speech Recognition via Large-Scale Weak SupervisionWe study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standardarxiv.org 요약우리는 인터넷에 있는 대량의 오디오 전사를 예측하도록 훈련된 음성 처리.. 더보기
MotionBooth: Motion-Aware Customized Text-to-Video Generation https://arxiv.org/abs/2406.17758 MotionBooth: Motion-Aware Customized Text-to-Video GenerationIn this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video marxiv.orghttps://huggingface.co/papers/2406.1775.. 더보기
What are Diffusion Models? https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ What are Diffusion Models?[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27: Added classifier-free guidance, GLIDE, unCLIP and Imagen. [Updated on 2022-08lilianweng.github.io[2021-09-19 업데이트: 여러 주요 논문의 저자인 .. 더보기
Revisiting Feature Prediction for Learning Visual Representations from Video https://arxiv.org/abs/2404.08471 Revisiting Feature Prediction for Learning Visual Representations from VideoThis paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, tarxiv.org 이 논문은 비디오로부터 비지도 학습을 위한.. 더보기
Adding Conditional Control to Text-to-Image Diffusion Models https://arxiv.org/abs/2302.05543 Adding Conditional Control to Text-to-Image Diffusion ModelsWe present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers prarxiv.org 초록우리는 대규모 사전 학습된 텍스트-이미지 확산 모델에 공간 조건 제.. 더보기
DiT: Self-supervised Pre-training for Document Image Transformer https://arxiv.org/abs/2203.02378 DiT: Self-supervised Pre-training for Document Image TransformerImage Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose \textbf{DiT}, a self-supervisedarxiv.org요약이미지 변환기는 최근 자연 이미지 이해를 위해 감독 학습(ViT.. 더보기