Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

.gitattributes +8 -0
README.md +76 -0
README_from_modelscope.md +92 -0
_cover_images_/cover_video.mp4 +3 -0
_cover_images_/video_with_lora.mp4 +3 -0
assets/image_with_lora.jpg +3 -0
assets/image_without_lora.jpg +3 -0
assets/video_with_lora.mp4 +3 -0
assets/video_with_lora_2.mp4 +3 -0
assets/video_without_lora.mp4 +3 -0
assets/video_without_lora_2.mp4 +3 -0
configuration.json +6 -0
model.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+_cover_images_/cover_video.mp4 filter=lfs diff=lfs merge=lfs -text
+_cover_images_/video_with_lora.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/image_with_lora.jpg filter=lfs diff=lfs merge=lfs -text
+assets/image_without_lora.jpg filter=lfs diff=lfs merge=lfs -text
+assets/video_with_lora.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/video_with_lora_2.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/video_without_lora.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/video_without_lora_2.mp4 filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+license: apache-2.0
+---
+# Wanxiao 2.1-1.3B-LoRA-High-Resolution-Fix-v1
+## Model Introduction
+This LoRA model is trained based on the [Wanxiao 2.1-1.3B](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) model and the [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) framework. Since the base model was trained at a resolution of 480P, it has certain limitations in clarity. To address this, we conducted additional training to improve its performance on high-resolution videos, avoiding issues such as visual artifacts, dimness, or image collapse. We recommend using this model in the following ways:
+1. **Direct generation of high-resolution short videos**: Set the resolution to 1024 x 1024 and reduce the number of frames appropriately to avoid excessively long generation times.
+2. **Detail refinement for high-resolution videos**: First generate a video at low resolution, perform super-resolution upscaling, then use this model for video-to-video generation to enhance fine details.
+## Model Performance
+### Anime Style
+Prompt: Anime style, a cute anime girl with short black hair swaying in the wind, gently turning her head.
+Negative Prompt: Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, frame, stillness, overall gray tone, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless画面, cluttered background, three legs, crowded background, walking backwards
+|Without this LoRA model|With this LoRA model|
+|-|-|
+|<div align="center"><video width="80%" controls><source src="assets/video_without_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
+### Sword and Magic
+Prompt: An ancient mythological scene depicting a confrontation between a warrior and a dragon, set against a backdrop of steep cliffs. The warrior wears armor and holds a shining sword, while the dragon spreads its massive wings, flames building in its mouth.
+Negative Prompt: Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, frame, stillness, overall gray tone, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless画面, cluttered background, three legs, crowded background, walking backwards
+|Without this LoRA model|With this LoRA model|
+|-|-|
+|<div align="center"><video width="80%" controls><source src="assets/video_without_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
+## Usage Instructions
+This model is built upon the [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) framework. Please install it first:
+```
+pip install diffsynth
+```
+```python
+import torch
+from diffsynth import ModelManager, WanVideoPipeline, save_video
+from modelscope import snapshot_download
+```
+```python
+snapshot_download(
+    model_id="DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
+    local_dir="models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
+    allow_file_pattern="*.safetensors"
+)
+model_manager = ModelManager(device="cpu")
+model_manager.load_models(
+    [
+        "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth",
+    ],
+    torch_dtype=torch.bfloat16,
+)
+model_manager.load_lora("models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1/model.safetensors")
+pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
+pipe.enable_vram_management(num_persistent_param_in_dit=None)
+video = pipe(
+    prompt="An ancient mythological scene depicting a confrontation between a warrior and a dragon, with steep cliffs in the background. The warrior wears armor and holds a shining sword, while the dragon spreads its enormous wings, flames building up in its mouth.",
+    negative_prompt="Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, painting, still image, overall gray tone, worst quality, low quality, JPEG compression artifacts, ugly, deformed, extra fingers, poorly drawn hands, poorly drawn face, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards",
+    num_inference_steps=50,
+    seed=1, tiled=True,
+    num_frames=33, height=1024, width=1024, sigma_shift=10,
+)
+save_video(video, "video.mp4", fps=15, quality=5)
+```

README_from_modelscope.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+base_model: MusePublic/wan2.1-1.3b@v1
+cover_images:
+- _cover_images_/cover_video.mp4
+frameworks:
+- Pytorch
+license: Apache License 2.0
+tags:
+- LoRA
+- text2video generation
+tasks:
+- text-to-video-synthesis
+trigger_words:
+- ""
+vision_foundation: WAN_VIDEO_2_1_T2V_1_3_B
+---
+# 通义万相2.1-1.3B-LoRA-高分辨率修复-v1
+## 模型介绍
+本 LoRA 模型是基于模型[通义万相2.1-1.3B](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B)和框架 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 训练的LoRA。由于基础模型的训练分辨率为 480P，在清晰度上存在一定缺陷，因此我们进行了额外的训练以修复模型在高分辨率视频上的效果，避免出现画面崩坏、灰暗的问题。本模型建议的使用方式：
+1. **高分辨率短视频直出**：将分辨率设置为 1024 x 1024，同时适当减少帧数从而避免生成时间过长。
+2. **高分辨率视频细节润色**：先使用低分辨率生成视频，经超分后再使用本模型进行视频生视频，对画面细节进行润色。
+## 模型效果
+### 二次元动漫
+提示词：动漫风格，一个可爱的二次元小美女，黑色短发，头发随风摇曳，少女的头轻轻转动。
+负面提示词：色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走
+|不启用本 LoRA 模型|启用本 LoRA 模型|
+|-|-|
+|<div align="center"><video width="80%" controls><source src="assets/video_without_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
+### 剑与魔法
+提示词：一幅古代神话场景，展现了勇士与龙的对峙，背景是险峻的山崖，勇士身披铠甲，手持闪亮的剑，龙展开巨大翅膀，火焰在口中蓄势待发。
+负面提示词：色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走
+|不启用本 LoRA 模型|启用本 LoRA 模型|
+|-|-|
+|<div align="center"><video width="80%" controls><source src="assets/video_without_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|<div align="center"><video width="80%" controls><source src="assets/video_with_lora.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>|
+## 使用说明
+本模型基于框架 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 训练，请先安装
+```
+pip install diffsynth
+```
+```python
+import torch
+from diffsynth import ModelManager, WanVideoPipeline, save_video
+from modelscope import snapshot_download
+snapshot_download(
+    model_id="DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
+    local_dir="models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1",
+    allow_file_pattern="*.safetensors"
+)
+model_manager = ModelManager(device="cpu")
+model_manager.load_models(
+    [
+        "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth",
+    ],
+    torch_dtype=torch.bfloat16,
+)
+model_manager.load_lora("models/DiffSynth-Studio/Wan2.1-1.3b-lora-highresfix-v1/model.safetensors")
+pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
+pipe.enable_vram_management(num_persistent_param_in_dit=None)
+video = pipe(
+    prompt="一幅古代神话场景，展现了勇士与龙的对峙，背景是险峻的山崖，勇士身披铠甲，手持闪亮的剑，龙展开巨大翅膀，火焰在口中蓄势待发。",
+    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
+    num_inference_steps=50,
+    seed=1, tiled=True,
+    num_frames=33, height=1024, width=1024, sigma_shift=10,
+)
+save_video(video, "video.mp4", fps=15, quality=5)
+```

_cover_images_/cover_video.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b792577bcfe53b36c03db9d27e98b2feb46b8c6e992dd33ff48f71825cde63f4
+size 263642

_cover_images_/video_with_lora.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ee0576f086e03c05e964d660840c52ad412946f2796bc07409823ff7213e0ee
+size 834808

assets/image_with_lora.jpg ADDED Viewed

Git LFS Details

SHA256: 263f632b3d80f1d22d0d71bb99a5bc397e1863368fcc909bc22545fececab468
Pointer size: 131 Bytes
Size of remote file: 162 kB

assets/image_without_lora.jpg ADDED Viewed

Git LFS Details

SHA256: 3f053ef2cd83be94dcb3cf435d25e92ff668d09b7a8a48ba9ce86092fa9fa1d3
Pointer size: 131 Bytes
Size of remote file: 109 kB

assets/video_with_lora.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ee0576f086e03c05e964d660840c52ad412946f2796bc07409823ff7213e0ee
+size 834808

assets/video_with_lora_2.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b792577bcfe53b36c03db9d27e98b2feb46b8c6e992dd33ff48f71825cde63f4
+size 263642

assets/video_without_lora.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:990e5c59973dd8fc713b87cf3fb971e9dfb3f7fe06e860d92b3665cf6b86dd4f
+size 547379

assets/video_without_lora_2.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a61f7efbb4eec3082e2cbd310360926caafd9f076ac8ae14e91e94be4304f7a8
+size 192589

configuration.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+    "aigc_model": true,
+    "model_file_location": "model.safetensors",
+    "framework": "Pytorch",
+    "task": "other"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0680e3d63e29fa5c831774e27532cfec9680fdbc5bae242b28869a010b098e86
+size 175049888