JunHan's AI Factory

본문 바로가기

전체 글

Nano Banana https://flux-ai.io/model/nano-banana-ai/?utm_source=chatgpt.com Flux AI: Free Online Flux Kontext, Flux.1 AI Image GeneratorFree online advanced Flux AI for various styles of image & video creation, powered by Flux Kontext AI and Flux 1.1 AI models. Try Flux Context & Flux AI for free at Flux AI official: flux-ai.io!flux-ai.io https://nano-banana.org/?utm_source=chatgpt.com Nano Banana — Google’.. 더보기

TRANSFORMER EXPLAINER https://poloclub.github.io/transformer-explainer/ Transformer Explainer: LLM Transformer Model Visually ExplainedAn interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.poloclub.github.io Transformer란 무엇인가?Transformer는 인공지능(AI) 접근 방식을 근본적으로 바꾼 신경망 아키텍처다. Transformer는 2017년 발표된 획기적인 논문 "Attention is All You Need" 에서 처음 소개되었으며, 이후 OpenAI의 G.. 더보기

YOLOE: Real-Time Seeing Anything https://arxiv.org/abs/2503.07465 YOLOE: Real-Time Seeing AnythingObject detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories, hindering adaptability in open scenarios. Recent open-set marxiv.org 초록객체 탐지(Object detection)와 분할(Segmentation)은 컴퓨터 비전 응용 분야에서 널리 사용되고.. 더보기

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models https://arxiv.org/abs/2507.13344 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion ModelsThis paper addresses the challenge of high-fidelity view synthesis of humans with sparse-view videos as input. Previous methods solve the issue of insufficient observation by leveraging 4D diffusion models to generate videos at novel viewpoints. However, ta.. 더보기

FastVLM: Efficient Vision Encoding for Vision Language Models https://arxiv.org/abs/2412.13303 FastVLM: Efficient Vision Encoding for Vision Language ModelsScaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as ViTs become inefficient at high resolutions due toarxiv.org 초록입력 이미지 해상도를 확장하는 것은 Vision Language .. 더보기

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment https://arxiv.org/abs/2408.06266 Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in AlignmentLarge Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We starxiv.o.. 더보기

Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation https://arxiv.org/abs/2504.06225 Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via AdaptationWhile decoder-only large language models (LLMs) have shown impressive results, encoder-decoder models are still widely adopted in real-world applications for their inference efficiency and richer encoder representation. In this paper, we study a novel probarxiv.org 초록디코더 전용(Decoder-on.. 더보기

T5Gemma: A new collection of encoder-decoder Gemma models https://developers.googleblog.com/en/t5gemma/ T5Gemma: A new collection of encoder-decoder Gemma models- Google Developers BlogIn the rapidly evolving landscape of large language models (LLMs), the spotlight has largely focused on the decoder-only architecture. While these models have shown impressive capabilities across a wide range of generation tasks, the classic encoder-decodedevelopers.goog.. 더보기

이전 1 2 3 4 ··· 71 다음

티스토리툴바