본문 바로가기

인공지능

WavMark: Watermarking for Audio Generation https://arxiv.org/abs/2308.12770 WavMark: Watermarking for Audio GenerationRecent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks,arxiv.org 초록최근 무차별 음성 합성의 돌파구는 단 몇 초의 녹음을 사용하여 화자의 음성을 높은 현실성을 유지하면.. 더보기
Kolors: Effective Training of Diffusion Model forPhotorealistic Text-to-Image Synthesis (중국 모델) https://huggingface.co/Kwai-Kolors/Kolors-IP-Adapter-Plus Kwai-Kolors/Kolors-IP-Adapter-Plus · Hugging FaceKolors-IP-Adapter-Plus weights and inference code 📖 Introduction We provide IP-Adapter-Plus weights and inference code based on Kolors-Basemodel. Examples of Kolors-IP-Adapter-Plus results are as follows: Our improvements A stronger image feature extrachuggingface.co 요약우리는 Kolors라는 텍스트-이미지.. 더보기
Learning to (Learn at Test Time): RNNswith Expressive Hidden States https://arxiv.org/abs/2407.04620v1 Learning to (Learn at Test Time): RNNs with Expressive Hidden StatesSelf-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden state. We propose a new class of sequence modeliarxiv.org 요약자기-주의(Self-attention)는 긴 문맥.. 더보기
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold https://arxiv.org/abs/2305.10973 Drag Your GAN: Interactive Point-based Manipulation on the Generative Image ManifoldSynthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs)arxiv.org 요약사용자 요구를 충족하는 시.. 더보기
You Only Cache Once: Decoder-Decoder Architectures for Language Models https://arxiv.org/abs/2405.05254 You Only Cache Once: Decoder-Decoder Architectures for Language ModelsWe introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (Karxiv.org 요약우리는 대형 언어 모델을 위한 디코더-디코더 아키.. 더보기
BitNet: Scaling 1-bit Transformers for Large Language Models https://arxiv.org/abs/2310.11453 BitNet: Scaling 1-bit Transformers for Large Language ModelsThe increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designedarxiv.org 초록대형 언어 모델의 크기가 커짐에 따라 배포에 어려움이 생기고 높은 .. 더보기
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated https://arxiv.org/abs/2407.10969 Q-Sparse: All Large Language Models can be Fully Sparsely-ActivatedWe introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applyiarxiv.org 개요우리는 Q-Sparse라는 간단하면서도 효과적인 방법을.. 더보기
JEST : Data curation via joint example selection further accelerates multimodal learning https://arxiv.org/abs/2406.17711 Data curation via joint example selection further accelerates multimodal learningData curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independently. Multimodal contrastive objectives expose the depenarxiv.org 초록데이터 큐레이션은 대규모 사전.. 더보기