본문 바로가기

전체 글

Japan-image translate to Korean https://github.com/cs20131516/image_translator https://huggingface.co/ogkalu/comic-text-segmenter-yolov8m ogkalu/comic-text-segmenter-yolov8m · Hugging FaceYolov8 medium model trained on about 3k Manga, Webtoon, Manhua and Western Comic style(very few) Images for text detection and segmentation. Training Imgsize = 1024. Training Images were resized, not cropped. It can handle the extreme Image r.. 더보기
Revisiting Feature Prediction for Learning Visual Representations from Video https://arxiv.org/abs/2404.08471 Revisiting Feature Prediction for Learning Visual Representations from VideoThis paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, tarxiv.org 이 논문은 비디오로부터 비지도 학습을 위한.. 더보기
Adding Conditional Control to Text-to-Image Diffusion Models https://arxiv.org/abs/2302.05543 Adding Conditional Control to Text-to-Image Diffusion ModelsWe present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers prarxiv.org 초록우리는 대규모 사전 학습된 텍스트-이미지 확산 모델에 공간 조건 제.. 더보기
DiT: Self-supervised Pre-training for Document Image Transformer https://arxiv.org/abs/2203.02378 DiT: Self-supervised Pre-training for Document Image TransformerImage Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose \textbf{DiT}, a self-supervisedarxiv.org요약이미지 변환기는 최근 자연 이미지 이해를 위해 감독 학습(ViT.. 더보기
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders https://arxiv.org/abs/2301.00808 ConvNeXt V2: Co-designing and Scaling ConvNets with Masked AutoencodersDriven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstratarxiv.org 다른 분야의 일을하면서 논문을 읽다보니 정신이 없다.. 더보기
이미지 prompt 추축 우리가 받은 ai 이미지의 단어를 추측해주는 사이트 https://replicate.com/methexis-inc/img2prompt methexis-inc/img2prompt – Run with an API on ReplicateRun time and cost This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 27 seconds. The predict time for this model varies significantly based on the inputs. Readme Model description Provides approximate text prompts that can bereplicate.com 더보기
인공지능 분야 모델 분석 https://www.salesforceairesearch.com/crm-benchmark Generative AI Benchmark for CRM | Salesforce AI ResearchPowering the world's smartest CRM by embedding state-of-the-art deep learning technology into the Salesforce Platform.www.salesforceairesearch.com 더보기
FIFO-Diffusion: Generating Infinite Videos from Text without Training https://arxiv.org/abs/2405.11473 FIFO-Diffusion: Generating Infinite Videos from Text without TrainingWe propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achiearxiv.org 초록 우리는 텍스트 조건 비디오 생성을 위한 사전 학습.. 더보기