Skip to content

Commit 939bb9e

Browse files
a-r-r-o-wDN6yiyixuxu
authored andcommitted
[core] Mochi T2V (#9769)
* update * udpate * update transformer * make style * fix * add conversion script * update * fix * update * fix * update * fixes * make style * update * update * update * init * update * update * add * up * up * up * update * mochi transformer * remove original implementation * make style * update inits * update conversion script * docs * Update src/diffusers/pipelines/mochi/pipeline_mochi.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * Update src/diffusers/pipelines/mochi/pipeline_mochi.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * fix docs * pipeline fixes * make style * invert sigmas in scheduler; fix pipeline * fix pipeline num_frames * flip proj and gate in swiglu * make style * fix * make style * fix tests * latent mean and std fix * update * cherry-pick 1069d21 * remove additional sigma already handled by flow match scheduler * fix * remove hardcoded value * replace conv1x1 with linear * Update src/diffusers/pipelines/mochi/pipeline_mochi.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * framewise decoding and conv_cache * make style * Apply suggestions from code review * mochi vae encoder changes * rebase correctly * Update scripts/convert_mochi_to_diffusers.py * fix tests * fixes * make style * update * make style * update * add framewise and tiled encoding * make style * make original vae implementation behaviour the default; note: framewise encoding does not work * remove framewise encoding implementation due to presence of attn layers * fight test 1 * fight test 2 --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: yiyixuxu <yixu310@gmail.com>
1 parent a820e3a commit 939bb9e

26 files changed

+3727
-22
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,8 @@
270270
title: LatteTransformer3DModel
271271
- local: api/models/lumina_nextdit2d
272272
title: LuminaNextDiT2DModel
273+
- local: api/models/mochi_transformer3d
274+
title: MochiTransformer3DModel
273275
- local: api/models/pixart_transformer2d
274276
title: PixArtTransformer2DModel
275277
- local: api/models/prior_transformer
@@ -306,6 +308,8 @@
306308
title: AutoencoderKLAllegro
307309
- local: api/models/autoencoderkl_cogvideox
308310
title: AutoencoderKLCogVideoX
311+
- local: api/models/autoencoderkl_mochi
312+
title: AutoencoderKLMochi
309313
- local: api/models/asymmetricautoencoderkl
310314
title: AsymmetricAutoencoderKL
311315
- local: api/models/consistency_decoder_vae
@@ -400,6 +404,8 @@
400404
title: Lumina-T2X
401405
- local: api/pipelines/marigold
402406
title: Marigold
407+
- local: api/pipelines/mochi
408+
title: Mochi
403409
- local: api/pipelines/panorama
404410
title: MultiDiffusion
405411
- local: api/pipelines/musicldm
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLMochi
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLMochi
20+
21+
vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLMochi
25+
26+
[[autodoc]] AutoencoderKLMochi
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# MochiTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import MochiTransformer3DModel
20+
21+
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
22+
```
23+
24+
## MochiTransformer3DModel
25+
26+
[[autodoc]] MochiTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

docs/source/en/api/pipelines/mochi.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
-->
15+
16+
# Mochi
17+
18+
[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.
19+
20+
*Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.*
21+
22+
<Tip>
23+
24+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
25+
26+
</Tip>
27+
28+
## MochiPipeline
29+
30+
[[autodoc]] MochiPipeline
31+
- all
32+
- __call__
33+
34+
## MochiPipelineOutput
35+
36+
[[autodoc]] pipelines.mochi.pipeline_output.MochiPipelineOutput

0 commit comments

Comments
 (0)