본문 바로가기

전체 글

구글 콜랩 가격 T4 = 17.15 L4 = 4.82 a100 = 11.77 더보기
Robust Speech Recognition via Large-Scale Weak Supervision https://arxiv.org/abs/2212.04356 Robust Speech Recognition via Large-Scale Weak SupervisionWe study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standardarxiv.org 요약우리는 인터넷에 있는 대량의 오디오 전사를 예측하도록 훈련된 음성 처리.. 더보기
MotionBooth: Motion-Aware Customized Text-to-Video Generation https://arxiv.org/abs/2406.17758 MotionBooth: Motion-Aware Customized Text-to-Video GenerationIn this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video marxiv.orghttps://huggingface.co/papers/2406.1775.. 더보기
Claude 3.5 후기 생각보다 바이럴이 큰거지 좋다는 느낌이 없었다. (긴 코드를 작성해서 그런지)GPT나 Claude랑 둘중 1개만 하길 더 사용해보니 Claude가 더 코드를 잘 짜긴함 더보기
What are Diffusion Models? https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ What are Diffusion Models?[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27: Added classifier-free guidance, GLIDE, unCLIP and Imagen. [Updated on 2022-08lilianweng.github.io[2021-09-19 업데이트: 여러 주요 논문의 저자인 .. 더보기
Japan-image translate to Korean https://github.com/cs20131516/image_translator https://huggingface.co/ogkalu/comic-text-segmenter-yolov8m ogkalu/comic-text-segmenter-yolov8m · Hugging FaceYolov8 medium model trained on about 3k Manga, Webtoon, Manhua and Western Comic style(very few) Images for text detection and segmentation. Training Imgsize = 1024. Training Images were resized, not cropped. It can handle the extreme Image r.. 더보기
Revisiting Feature Prediction for Learning Visual Representations from Video https://arxiv.org/abs/2404.08471 Revisiting Feature Prediction for Learning Visual Representations from VideoThis paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, tarxiv.org 이 논문은 비디오로부터 비지도 학습을 위한.. 더보기
Adding Conditional Control to Text-to-Image Diffusion Models https://arxiv.org/abs/2302.05543 Adding Conditional Control to Text-to-Image Diffusion ModelsWe present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers prarxiv.org 초록우리는 대규모 사전 학습된 텍스트-이미지 확산 모델에 공간 조건 제.. 더보기