인공지능 썸네일형 리스트형 How much do language models memorize? https://arxiv.org/abs/2505.24832?_bhlid=6015c87a2128aa76e108898443aa2727b63d9786 How much do language models memorize?We propose a new method for estimating how much a model ``knows'' about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have struggled to disentangle memorization from generalization. Warxiv.org 초록 (Abstract)우.. 더보기 XXt Can Be Faster https://arxiv.org/abs/2505.09814?ref=mail.bycloud.ai&_bhlid=77b4be6f732dbeaa5cbd4eb23d6dc4ce93c750ee $XX^{t}$ Can Be FasterWe present RXTX, a new algorithm for computing the product of matrix by its transpose $XX^{t}$ for $X\in \mathbb{R}^{n\times m}$. RXTX uses $5\%$ fewer multiplications and $5\%$ fewer operations (additions and multiplications) than State-of-the-Art algoritarxiv.org 1 서론 더 빠.. 더보기 MatFormer: Nested Transformer for Elastic Inference https://arxiv.org/abs/2310.07707 MatFormer: Nested Transformer for Elastic InferenceFoundation models are applied in a broad spectrum of settings with different inference constraints, from massive multi-accelerator clusters to resource-constrained standalone mobile devices. However, the substantial costs associated with training these modarxiv.org 구글에서 소규모 모델에서 사용했다고 말하는 matformer를 리뷰할것이다. 초록파운데.. 더보기 Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network https://arxiv.org/abs/1710.02998 Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural networkThis paper proposes a neural network architecture and training scheme to learn the start and end time of sound events (strong labels) in an audio recording given just the list of sound events existing in the audio without time information (weak labels). Wearx.. 더보기 Perceiver: General Perception with Iterative Attention https://arxiv.org/abs/2103.03206 Perceiver: General Perception with Iterative AttentionBiological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individualarxiv.org MLA 개념의 최초 제안 초록생물학적 시스템은 시각, 청각, 촉각, 고유 감각(p.. 더보기 AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithmsNew AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluatorsdeepmind.google 새로운 AI 에이전트, 수학 및 컴퓨팅 실용 문제 해결을.. 더보기 Absolute Zero: Reinforced Self-play Reasoning with Zero Data https://www.arxiv.org/abs/2505.03335 Absolute Zero: Reinforced Self-play Reasoning with Zero DataReinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervisioarxiv.org 검증 가능한 보상(Verifiable Rewards)을 활용한 .. 더보기 AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer https://arxiv.org/abs/2303.03689 AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram TransformerIn this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED. Pretrained AST models have recently shownarxiv.org 초록본 논문에서는 대.. 더보기 이전 1 2 3 4 ··· 26 다음