vision_utils
¶
Functions¶
fastvideo.models.vision_utils.create_default_image
¶
create_default_image(width: int = 512, height: int = 512, color: tuple[int, int, int] = (0, 0, 0)) -> Image
Create a default black PIL image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
width
|
int
|
Image width in pixels |
512
|
height
|
int
|
Image height in pixels |
512
|
color
|
tuple[int, int, int]
|
RGB color tuple |
(0, 0, 0)
|
Returns:
| Type | Description |
|---|---|
Image
|
PIL.Image.Image: A new PIL image with specified dimensions and color |
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.get_default_height_width
¶
get_default_height_width(image: Image | ndarray | Tensor, vae_scale_factor: int, height: int | None = None, width: int | None = None) -> tuple[int, int]
Returns the height and width of the image, downscaled to the next integer multiple of vae_scale_factor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
`Union[PIL.Image.Image, np.ndarray, torch.Tensor]`
|
The image input, which can be a PIL image, NumPy array, or PyTorch tensor. If it is a NumPy array, it
should have shape |
required |
height
|
`Optional[int]`, *optional*, defaults to `None`
|
The height of the preprocessed image. If |
None
|
width
|
`Optional[int]`, *optional*, defaults to `None`
|
The width of the preprocessed image. If |
None
|
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
|
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.load_image
¶
Loads image to a PIL Image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
`str` or `PIL.Image.Image`
|
The image to convert to the PIL Image format. |
required |
convert_method
|
Callable[[PIL.Image.Image], PIL.Image.Image], *optional*
|
A conversion method to apply to the image after loading it. When set to |
None
|
Returns:
| Type | Description |
|---|---|
Image
|
|
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.load_video
¶
load_video(video: str, convert_method: Callable[[list[Image]], list[Image]] | None = None, return_fps: bool = False) -> tuple[list[Image], float | Any] | list[Image]
Loads video to a list of PIL Image.
Args:
video (str):
A URL or Path to a video to convert to a list of PIL Image format.
convert_method (Callable[[List[PIL.Image.Image]], List[PIL.Image.Image]], optional):
A conversion method to apply to the video after loading it. When set to None the images will be converted
to "RGB".
return_fps (bool, optional, defaults to False):
Whether to return the FPS of the video. If True, returns a tuple of (images, fps).
If False, returns only the list of images.
Returns:
List[PIL.Image.Image] or Tuple[List[PIL.Image.Image], float | None]:
The video as a list of PIL images. If return_fps is True, also returns the original FPS.
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.normalize
¶
Normalize an image array to [-1,1].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
images
|
`np.ndarray` or `torch.Tensor`
|
The image array to normalize. |
required |
Returns:
| Type | Description |
|---|---|
ndarray | Tensor
|
|
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.numpy_to_pt
¶
Convert a NumPy image to a PyTorch tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
images
|
`np.ndarray`
|
The NumPy image array to convert to PyTorch format. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
|
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.pil_to_numpy
¶
pil_to_numpy(images: list[Image] | Image) -> ndarray
Convert a PIL image or a list of PIL images to NumPy arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
images
|
`PIL.Image.Image` or `List[PIL.Image.Image]`
|
The PIL image or list of images to convert to NumPy format. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
|
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.preprocess_reference_image_for_clip
¶
Preprocess reference image to match CLIP encoder requirements.
Applies normalization, resizing to 224x224, and denormalization to ensure the image is in the correct format for CLIP processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Input PIL image |
required |
device
|
device
|
Target device for tensor operations |
required |
Returns:
| Type | Description |
|---|---|
Image
|
Preprocessed PIL image ready for CLIP encoder |
Source code in fastvideo/models/vision_utils.py
fastvideo.models.vision_utils.resize
¶
resize(image: Image | ndarray | Tensor, height: int, width: int, resize_mode: str = 'default', resample: str = 'lanczos') -> Image | ndarray | Tensor
Resize image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`
|
The image input, can be a PIL image, numpy array or pytorch tensor. |
required |
height
|
`int`
|
The height to resize to. |
required |
width
|
`int`
|
The width to resize to. |
required |
resize_mode
|
`str`, *optional*, defaults to `default`
|
The resize mode to use, can be one of |
'default'
|
Returns:
| Type | Description |
|---|---|
Image | ndarray | Tensor
|
|