kelseye commited on
Commit
d78d008
·
verified ·
1 Parent(s): db95a1d

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ _cover_images_/cover_video.mp4 filter=lfs diff=lfs merge=lfs -text
37
+ _cover_images_/video_with_lora.mp4 filter=lfs diff=lfs merge=lfs -text
38
+ assets/image_with_lora.jpg filter=lfs diff=lfs merge=lfs -text
39
+ assets/image_without_lora.jpg filter=lfs diff=lfs merge=lfs -text
40
+ assets/video_with_lora.mp4 filter=lfs diff=lfs merge=lfs -text
41
+ assets/video_with_lora_2.mp4 filter=lfs diff=lfs merge=lfs -text
42
+ assets/video_without_lora.mp4 filter=lfs diff=lfs merge=lfs -text
43
+ assets/video_without_lora_2.mp4 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Wanxiao 2.1-1.3B-LoRA-High-Resolution-Fix-v1
5
+
6
+ ## Model Introduction
7
+
8
+ This LoRA model is trained based on the [Wanxiao 2.1-1.3B](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) model and the [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) framework. Since the base model was trained at a resolution of 480P, it has certain limitations in clarity. To address this, we conducted additional training to improve its performance on high-resolution videos, avoiding issues such as visual artifacts, dimness, or image collapse. We recommend using this model in the following ways:
9
+
10
+ 1. **Direct generation of high-resolution short videos**: Set the resolution to 1024 x 1024 and reduce the number of frames appropriately to avoid excessively long generation times.
11
+ 2. **Detail refinement for high-resolution videos**: First generate a video at low resolution, perform super-resolution upscaling, then use this model for video-to-video generation to enhance fine details.
12
+
13
+ ## Model Performance
14
+
15
+ ### Anime Style
16
+
17
+ Prompt: Anime style, a cute anime girl with short black hair swaying in the wind, gently turning her head.
18
+
19
+ Negative Prompt: Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, frame, stillness, overall gray tone, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless画面, cluttered background, three legs, crowded background, walking backwards
20
+
21
+ |Without this LoRA model|With this LoRA model|
22
+ |-|-|
23
+ |<div align="center"><video width="80%" controls><source src="assets/video_without_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
24
+
25
+ ### Sword and Magic
26
+
27
+ Prompt: An ancient mythological scene depicting a confrontation between a warrior and a dragon, set against a backdrop of steep cliffs. The warrior wears armor and holds a shining sword, while the dragon spreads its massive wings, flames building in its mouth.
28
+
29
+ Negative Prompt: Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, frame, stillness, overall gray tone, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless画面, cluttered background, three legs, crowded background, walking backwards
30
+
31
+ |Without this LoRA model|With this LoRA model|
32
+ |-|-|
33
+ |<div align="center"><video width="80%" controls><source src="assets/video_without_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
34
+
35
+ ## Usage Instructions
36
+
37
+ This model is built upon the [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) framework. Please install it first:
38
+
39
+ ```
40
+ pip install diffsynth
41
+ ```
42
+
43
+ ```python
44
+ import torch
45
+ from diffsynth import ModelManager, WanVideoPipeline, save_video
46
+ from modelscope import snapshot_download
47
+ ```
48
+
49
+ ```python
50
+ snapshot_download(
51
+ model_id="DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
52
+ local_dir="models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
53
+ allow_file_pattern="*.safetensors"
54
+ )
55
+ model_manager = ModelManager(device="cpu")
56
+ model_manager.load_models(
57
+ [
58
+ "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors",
59
+ "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth",
60
+ "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth",
61
+ ],
62
+ torch_dtype=torch.bfloat16,
63
+ )
64
+ model_manager.load_lora("models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1/model.safetensors")
65
+ pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
66
+ pipe.enable_vram_management(num_persistent_param_in_dit=None)
67
+
68
+ video = pipe(
69
+ prompt="An ancient mythological scene depicting a confrontation between a warrior and a dragon, with steep cliffs in the background. The warrior wears armor and holds a shining sword, while the dragon spreads its enormous wings, flames building up in its mouth.",
70
+ negative_prompt="Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, painting, still image, overall gray tone, worst quality, low quality, JPEG compression artifacts, ugly, deformed, extra fingers, poorly drawn hands, poorly drawn face, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards",
71
+ num_inference_steps=50,
72
+ seed=1, tiled=True,
73
+ num_frames=33, height=1024, width=1024, sigma_shift=10,
74
+ )
75
+ save_video(video, "video.mp4", fps=15, quality=5)
76
+ ```
README_from_modelscope.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: MusePublic/wan2.1-1.3b@v1
3
+ cover_images:
4
+ - _cover_images_/cover_video.mp4
5
+ frameworks:
6
+ - Pytorch
7
+ license: Apache License 2.0
8
+ tags:
9
+ - LoRA
10
+ - text2video generation
11
+ tasks:
12
+ - text-to-video-synthesis
13
+
14
+ trigger_words:
15
+ - ""
16
+
17
+ vision_foundation: WAN_VIDEO_2_1_T2V_1_3_B
18
+ ---
19
+
20
+ # 通义万相2.1-1.3B-LoRA-高分辨率修复-v1
21
+
22
+ ## 模型介绍
23
+
24
+ 本 LoRA 模型是基于模型[通义万相2.1-1.3B](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B)和框架 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 训练的LoRA。由于基础模型的训练分辨率为 480P,在清晰度上存在一定缺陷,因此我们进行了额外的训练以修复模型在高分辨率视频上的效果,避免出现画面崩坏、灰暗的问题。本模型建议的使用方式:
25
+
26
+ 1. **高分辨率短视频直出**:将分辨率设置为 1024 x 1024,同时适当减少帧数从而避免生成时间过长。
27
+ 2. **高分辨率视频细节润色**:先使用低分辨率生成视频,经超分后再使用本模型进行视频生视频,对画面细节进行润色。
28
+
29
+ ## 模型效果
30
+
31
+ ### 二次元动漫
32
+
33
+ 提示词:动漫风格,一个可爱的二次元小美女,黑色短发,头发随风摇曳,少女的头轻轻转动。
34
+
35
+ 负面提示词:色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
36
+
37
+ |不启用本 LoRA 模型|启用本 LoRA 模型|
38
+ |-|-|
39
+ |<div align="center"><video width="80%" controls><source src="assets/video_without_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
40
+
41
+ ### 剑与魔法
42
+
43
+ 提示词:一幅古代神话场景,展现了勇士与龙的对峙,背景是险峻的山崖,勇士身披铠甲,手持闪亮的剑,龙展开巨大翅膀,火焰在口中蓄势待发。
44
+
45
+ 负面提示词:色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
46
+
47
+ |不启用本 LoRA 模型|启用本 LoRA 模型|
48
+ |-|-|
49
+ |<div align="center"><video width="80%" controls><source src="assets/video_without_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
50
+
51
+ ## 使用说明
52
+
53
+ 本模型基于框架 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 训练,请先安装
54
+
55
+ ```
56
+ pip install diffsynth
57
+ ```
58
+
59
+
60
+ ```python
61
+ import torch
62
+ from diffsynth import ModelManager, WanVideoPipeline, save_video
63
+ from modelscope import snapshot_download
64
+
65
+
66
+ snapshot_download(
67
+ model_id="DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
68
+ local_dir="models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
69
+ allow_file_pattern="*.safetensors"
70
+ )
71
+ model_manager = ModelManager(device="cpu")
72
+ model_manager.load_models(
73
+ [
74
+ "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors",
75
+ "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth",
76
+ "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth",
77
+ ],
78
+ torch_dtype=torch.bfloat16,
79
+ )
80
+ model_manager.load_lora("models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1/model.safetensors")
81
+ pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
82
+ pipe.enable_vram_management(num_persistent_param_in_dit=None)
83
+
84
+ video = pipe(
85
+ prompt="一幅古代神话场景,展现了勇士与龙的对峙,背景是险峻的山崖,勇士身披铠甲,手持闪亮的剑,龙展开巨大翅膀,火焰在口中蓄势待发。",
86
+ negative_prompt="色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走",
87
+ num_inference_steps=50,
88
+ seed=1, tiled=True,
89
+ num_frames=33, height=1024, width=1024, sigma_shift=10,
90
+ )
91
+ save_video(video, "video.mp4", fps=15, quality=5)
92
+ ```
_cover_images_/cover_video.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b792577bcfe53b36c03db9d27e98b2feb46b8c6e992dd33ff48f71825cde63f4
3
+ size 263642
_cover_images_/video_with_lora.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ee0576f086e03c05e964d660840c52ad412946f2796bc07409823ff7213e0ee
3
+ size 834808
assets/image_with_lora.jpg ADDED

Git LFS Details

  • SHA256: 263f632b3d80f1d22d0d71bb99a5bc397e1863368fcc909bc22545fececab468
  • Pointer size: 131 Bytes
  • Size of remote file: 162 kB
assets/image_without_lora.jpg ADDED

Git LFS Details

  • SHA256: 3f053ef2cd83be94dcb3cf435d25e92ff668d09b7a8a48ba9ce86092fa9fa1d3
  • Pointer size: 131 Bytes
  • Size of remote file: 109 kB
assets/video_with_lora.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ee0576f086e03c05e964d660840c52ad412946f2796bc07409823ff7213e0ee
3
+ size 834808
assets/video_with_lora_2.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b792577bcfe53b36c03db9d27e98b2feb46b8c6e992dd33ff48f71825cde63f4
3
+ size 263642
assets/video_without_lora.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:990e5c59973dd8fc713b87cf3fb971e9dfb3f7fe06e860d92b3665cf6b86dd4f
3
+ size 547379
assets/video_without_lora_2.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a61f7efbb4eec3082e2cbd310360926caafd9f076ac8ae14e91e94be4304f7a8
3
+ size 192589
configuration.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "aigc_model": true,
3
+ "model_file_location": "model.safetensors",
4
+ "framework": "Pytorch",
5
+ "task": "other"
6
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0680e3d63e29fa5c831774e27532cfec9680fdbc5bae242b28869a010b098e86
3
+ size 175049888