인공지능 썸네일형 리스트형 When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training https://arxiv.org/abs/2411.13476 When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context TrainingExtending context window sizes allows large language models (LLMs) to process longer sequences and handle more complex tasks. Rotary Positional Embedding (RoPE) has become the de facto standard due to its relative positional encoding properties that benefiarxiv.org 초록확장된 컨텍스트 윈도우(con.. 더보기 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model 취업이 어렵다... 뭐 다들 비슷하겠지만 말이다 https://humanaigc.github.io/chat-anyone/ ChatAnyoneChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model.humanaigc.github.io 초록실시간 상호작용이 가능한 비디오 챗(video-chat) 기반의 초상화 생성은, 최근 텍스트 및 음성 기반 챗 기술의 급속한 발전으로 인해 차세대 기술 트렌드로 주목받고 있다. 그러나 기존의 방법들은 주로 실시간으로 머리의 움직임을 생성하는 데 집중하고 있으며, 머리의 움직임과 일치하는 신체 동작을 자연스럽게 생성하는 데 어려움을 겪고 있다. 또한 말하는 .. 더보기 MusicInfuser: Making Video Diffusion Listen and Dance https://susunghong.github.io/MusicInfuser/ MusicInfuser: Making Video Diffusion Listen and DanceMusicInfuser: Making Video Diffusion Listen and Dance Please turn on your audio! 🔈 Comparison with Prior Work MusicInfuser infuses listening capability into the text-to-video model (Mochi) and produces dancing videos while preserving prompt adherence. Ysusunghong.github.io 본 논문에서는 지정된 음악 트랙과 동기화된 고품질.. 더보기 Transformers without Normalization https://arxiv.org/abs/2503.10622?_bhlid=1a87c33b8185a942533ee1886e23e7f6c2d5f90d Transformers without NormalizationNormalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique. We introduarxiv.org 정규화(Normalization.. 더보기 SANA-Sprint: One-Step Diffusion with Continuous-TimeConsistency Distillation 취업이 쉽지 않아 지원서만 쓰다보니 너무 오래되었다... https://nvlabs.github.io/Sana/Sprint/ Sana • Stabilizing Continuous-Time Distillation: To stabilize continuous-time consistency distillation, we address two key challenges: training instabilities and excessively large gradient norms that occur when scaling up the model size and increanvlabs.github.io 초록(Abstract)본 논문에서는 초고속 텍스트-이미지(T2I, Text-to-Image) 생성을 위한 .. 더보기 Large Language Diffusion Models https://ml-gsai.github.io/LLaDA-demo/ SOCIAL MEDIA TITLE TAGSOCIAL MEDIA DESCRIPTION TAG TAGml-gsai.github.io https://arxiv.org/abs/2502.09992 Large Language Diffusion ModelsAutoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised.. 더보기 Layer Normalization https://arxiv.org/abs/1607.06450 Layer NormalizationTraining state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summedarxiv.org 초록최신의 심층 신경망을 훈련시키는 것은 계산 비용이 많이 듭니다. 훈련 시간을 단축하는 한 가지 방법은 뉴런의 활동을 정규화하는 것입니다. 최근.. 더보기 DeepSeek-V3 Technical Report https://arxiv.org/abs/2412.19437 DeepSeek-V3 Technical ReportWe present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and Deeparxiv.org 초록우리는 토큰당 37B가 활성화되고 총 671B 파라미터를 갖춘 강력한 Mixture-of-Experts(MoE) 언어 모델인.. 더보기 이전 1 2 3 4 ··· 23 다음