6 169 192

Inui

Norm

regisss's profile picture

Pbertinert's profile picture

lwtwl's profile picture

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

upvoted a paper 19 days ago

Less is More: Recursive Reasoning with Tiny Networks

liked a model about 1 month ago

rednote-hilab/dots.ocr

liked a model 2 months ago

meituan-longcat/LongCat-Flash-Chat

View all activity

Organizations

Norm 's collections 9

VAE

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Paper • 2411.17459 • Published Nov 26, 2024 • 12
MAGVIT: Masked Generative Video Transformer

Paper • 2212.05199 • Published Dec 10, 2022
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Paper • 2310.05737 • Published Oct 9, 2023 • 6
Finite Scalar Quantization: VQ-VAE Made Simple

Paper • 2309.15505 • Published Sep 27, 2023 • 23

TI2V Research

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12, 2024 • 39
AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4, 2024 • 23
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 57
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16, 2024 • 24

Multimodal Language Model

What does matter besides data receipt when training a Multimodal language model?

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 60
VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 41
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 72
openbmb/MiniCPM-V-2_6

Image-Text-to-Text • 8B • Updated Jun 13 • 85.3k • 1.01k

Language Model

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 9
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 9
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 420

Open Datasets

Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.

vikhyatk/lnqa

Viewer • Updated Aug 18, 2024 • 303k • 675 • 87
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24 • 3.94M • 18.7k • 221
naver-clova-ix/synthdog-en

Viewer • Updated Jan 31, 2024 • 66k • 2.5k • 23
Mutonix/Vript_Chinese

Viewer • Updated Oct 16, 2024 • 294k • 1.05k • 11

Video2Video

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 31

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion

Understanding Diffusion Models: A Unified Perspective

Paper • 2208.11970 • Published Aug 25, 2022
Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26, 2024 • 2
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 6
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 4

Fundamental Research

Scaling Law with Learning Rate Annealing

Paper • 2408.11029 • Published Aug 20, 2024 • 4
Token Turing Machines

Paper • 2211.09119 • Published Nov 16, 2022 • 1
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Paper • 2203.12602 • Published Mar 23, 2022
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Paper • 2305.13035 • Published May 22, 2023

Computer Vision

Do we still need a network for specific computer vision tasks anymore today?

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 116
facebook/sam2.1-hiera-large

Mask Generation • 0.2B • Updated Aug 15 • 60.8k • 108

VAE

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Paper • 2411.17459 • Published Nov 26, 2024 • 12
MAGVIT: Masked Generative Video Transformer

Paper • 2212.05199 • Published Dec 10, 2022
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Paper • 2310.05737 • Published Oct 9, 2023 • 6
Finite Scalar Quantization: VQ-VAE Made Simple

Paper • 2309.15505 • Published Sep 27, 2023 • 23

Video2Video

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 31

TI2V Research

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12, 2024 • 39
AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4, 2024 • 23
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 57
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16, 2024 • 24

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion

Understanding Diffusion Models: A Unified Perspective

Paper • 2208.11970 • Published Aug 25, 2022
Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26, 2024 • 2
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 6
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 4

Multimodal Language Model

What does matter besides data receipt when training a Multimodal language model?

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 60
VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 41
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 72
openbmb/MiniCPM-V-2_6

Image-Text-to-Text • 8B • Updated Jun 13 • 85.3k • 1.01k

Fundamental Research

Scaling Law with Learning Rate Annealing

Paper • 2408.11029 • Published Aug 20, 2024 • 4
Token Turing Machines

Paper • 2211.09119 • Published Nov 16, 2022 • 1
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Paper • 2203.12602 • Published Mar 23, 2022
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Paper • 2305.13035 • Published May 22, 2023

Language Model

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 9
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 9
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 420

Computer Vision

Do we still need a network for specific computer vision tasks anymore today?

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 116
facebook/sam2.1-hiera-large

Mask Generation • 0.2B • Updated Aug 15 • 60.8k • 108

Open Datasets

Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.

vikhyatk/lnqa

Viewer • Updated Aug 18, 2024 • 303k • 675 • 87
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24 • 3.94M • 18.7k • 221
naver-clova-ix/synthdog-en

Viewer • Updated Jan 31, 2024 • 66k • 2.5k • 23
Mutonix/Vript_Chinese

Viewer • Updated Oct 16, 2024 • 294k • 1.05k • 11