🧱 Data Preprocess

🧱 Data Preprocess#

To save GPU memory, we precompute text embeddings and VAE latents to eliminate the need to load the text encoder and VAE during training.

We provide a sample dataset to help you get started. Download the source media using the following command:

python scripts/huggingface/download_hf.py --repo_id=FastVideo/mini_i2v_dataset --local_dir=FastVideo/mini_i2v_dataset --repo_type=dataset

The folder crush-smol_raw/ contains raw videos and captions for testing preprocessing, while crush-smol_preprocessed/ contains latents prepared for testing training.

To preprocess the dataset for fine-tuning or distillation, run:

bash scripts/preprocess/v1_preprocess_wan_data_t2v # for wan

Process your own dataset#

If you wish to create your own dataset for finetuning or distillation, please refer mini_i2v_dataset/crush-smol_raw/ to structure you video dataset in the following format:

path_to_your_dataset_folder/
├── videos/
│   ├── 0.mp4
│   ├── 1.mp4
├── videos.txt
└── prompt.txt

To geranate the videos2caption.json and merge.txt, run

python scripts/dataset_preparation/prepare_json_file.py --data_folder mini_i2v_dataset/crush-smol_raw/ --output your_output_folder

Adjust the DATA_MERGE_PATH and OUTPUT_DIR in scripts/preprocess/v1_preprocess_****.sh accordingly and run:

bash scripts/preprocess/v1_preprocess_****.sh

The preprocessed data will be put into the OUTPUT_DIR and the videos2caption.json can be used in finetune and distill scripts.