인공지능 썸네일형 리스트형 CharacterGen: Efficient 3D Character Generation from Single Imageswith Multi-View Pose Calibration https://charactergen.github.io/ CharacterGen: Efficient 3D Character Generation from Single ImagesIn this paper, we present CharacterGen, a framework developed to efficiently generate 3D characters. CharacterGen introduces a streamlined generation pipeline along with an image-conditioned multi-view diffusion model. This model effectively calibrates inpcharactergen.github.io 디지털 콘텐츠 제작 분야에서 단일 이미.. 더보기 CLIP : Learning Transferable Visual Models From Natural Language Supervision https://arxiv.org/abs/2103.00020 Learning Transferable Visual Models From Natural Language SupervisionState-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual coarxiv.org 초록최신 컴퓨터 비전 시스템은 고정된 사전에 정해진 객.. 더보기 MusicLM: Generating Music From Text https://arxiv.org/abs/2301.11325 MusicLM: Generating Music From TextWe introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeliarxiv.org 요약우리는 MusicLM이라는 모델을 소개합니다. 이 모델은 "왜곡된 기타 리프에 의해 뒷받침되는 차분한 바이올린 .. 더보기 Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning https://arxiv.org/abs/2311.10709 Emu Video: Factorizing Text-to-Video Generation by Explicit Image ConditioningWe present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical desigarxiv.org 초록 우리는 텍스트-비디오 생성 모델인.. 더보기 Segment Anything https://arxiv.org/abs/2304.02643 Segment AnythingWe introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensearxiv.org 요약우리는 이미지 분할을 위한 새로운 과제, 모델, 데이터셋을 소개하는 Segment Anything (SA) 프로젝트를 발표합니다. 데이터 수집 루.. 더보기 ConvNets Match Vision Transformers at Scale https://arxiv.org/abs/2310.16764 ConvNets Match Vision Transformers at ScaleMany researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale. We challenge this belief by evaluating a performant ConvNet architecarxiv.org 많은 연구자들은 ConvNets(Convolutional Neural Networks)이 작은 또는 .. 더보기 QLoRA: Efficient Finetuning of Quantized LLMs https://arxiv.org/abs/2305.14314 QLoRA: Efficient Finetuning of Quantized LLMsWe present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quanarxiv.org 요약우리는 QLORA라는 효율적인 미세조정 접근법을 제시합니다. 이 방법은 메모리 사용량을 줄여 .. 더보기 Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling https://arxiv.org/abs/2304.01373 Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingHow do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact samearxiv.org 요약 대형 언어 모델(LLMs)은 훈련.. 더보기 이전 1 ··· 12 13 14 15 16 17 18 ··· 22 다음