본문 바로가기

인공지능

NeRFactor: neural factorization of shape and reflectance under an unknown illumination https://dl.acm.org/doi/10.1145/3478513.3480496 NeRFactor: neural factorization of shape and reflectance under an unknown illumination: ACM Transactions on Graphics: Vol 40, NoWe address the problem of recovering the shape and spatially-varying reflectance of an object from multi-view images (and their camera poses) of an object illuminated by one unknown lighting condition. This enables the rend.. 더보기
Titans: Learning to Memorize at Test Time https://arxiv.org/abs/2501.00663 Titans: Learning to Memorize at Test TimeOver more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending toarxiv.org 댓글에 논문 추천도를 확인하는 것을 추천 개요10년이 넘는 기간 동안 순환 모델(recurrent mode.. 더보기
MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation https://arxiv.org/abs/2303.09975 MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image SegmentationThere has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolarxiv.org https://github.com/MI.. 더보기
Rho-1: Not All Tokens Are What You Need https://arxiv.org/abs/2404.07965 Rho-1: Not All Tokens Are What You NeedPrevious language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that "9l training". Our initial analysis examines token-level training dynamics of language model, revearxiv.org 초록기존의 언어 모델 사전 학습 기법은 모든 학습 토큰에 동일하게 다음 토큰 예측 손실을 적용해 왔습니다. .. 더보기
mochi-1-preview https://huggingface.co/genmo/mochi-1-preview genmo/mochi-1-preview · Hugging FaceMochi 1 Blog | Hugging Face | Playground | Careers A state of the art video generation model by Genmo. https://github.com/user-attachments/assets/4d268d02-906d-4cb0-87cc-f467f1497108 Overview Mochi 1 preview is an open state-of-the-art video generation modhuggingface.cohttps://github.com/genmoai/mochi GitHub - genmo.. 더보기
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms https://arxiv.org/abs/2410.18967 Ferret-UI 2: Mastering Universal User Interface Understanding Across PlatformsBuilding a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large languagearxiv.org 초록(Abstract)플랫폼의 다양성,.. 더보기
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation https://arxiv.org/abs/2410.07718 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image AnimationRecent advances in latent diffusion-based generative models for portrait image animation, such as Hallo, have achieved impressive results in short-duration video synthesis. In this paper, we present updates to Hallo, introducing several design enhancementsarxiv.org 초록최근에 Hallo와 같은 잠재 확.. 더보기
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering https://arxiv.org/abs/2410.07095 MLE-bench: Evaluating Machine Learning Agents on Machine Learning EngineeringWe introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engarxiv.org 초록우리는 AI 에이전트가 기계 학습 엔.. 더보기