LTX 2.3 is one of the fastest open-source video generation models available right now. Running it through ComfyUI gives you full control over every parameter. This tutorial covers the setup, the key nodes, and the settings that actually matter.
What Makes LTX 2.3 Different
Lightricks released LTX 2.3 in March 2026 with a rebuilt core. It runs on a 22-billion-parameter Diffusion Transformer and handles text-to-video, image-to-video, and native audio-to-video generation in one pipeline.
A few things stand out:
- Native 9:16 portrait output — the first in the LTX family, trained on real portrait data. Actual results for Reels and Shorts, not cropped landscape footage.
- Sharper detail — a rebuilt VAE produces cleaner textures and edges across the full frame.
- Native audio sync — sound is generated alongside video, not added in post.
- Fast — roughly 18x the throughput of WAN 2.2 14B on H100s.
The text encoder upgrade is worth noting too. LTX 2.3 uses Gemma 3 12B Instruct, which means it interprets complex prompts — spatial layout, character actions, mood — in a single pass without confusion.
Setting Up LTX 2.3 in ComfyUI
You need ComfyUI v0.16 or later. LTX 2.3 nodes are built in natively — no extra custom node packs required.
Step 1: Update ComfyUI
Pull the latest version and reinstall requirements. Check the bottom-left of the interface to confirm v0.16+.
Step 2: Download the model
You need two files: the FP8-quantized checkpoint (~22GB, works on 16GB+ VRAM) and the Gemma 3 12B text encoder. Place them in models/checkpoints/ and models/text_encoders/ respectively.
huggingface-cli download Lightricks/LTX-2.3 \
--include "ltx-2.3-22b-dev-fp8.safetensors" \
--local-dir /path/to/ComfyUI/models/checkpoints/
Step 3: Load the official template
The fastest path is the ComfyUI Template Library. Go to Video > LTX-2.3 and load either the text-to-video or image-to-video workflow. You'll have a working pipeline in under two minutes.
The Settings That Matter
Most of the quality difference comes from three parameters:
- CFG scale: 5.5 — Don't go above 7. Higher CFG makes motion robotic and oversaturates colors.
- Scheduler: euler_ancestral — Produces the most natural-looking motion.
- Steps: 30–50 — 30 is fast. 50 adds more detail.
For prompting, describe motion — not just what a scene looks like. Write what's happening. Skip quality tags like "8k" or "masterpiece." LTX 2.3 uses Gemma for text encoding and those tokens don't help.
A solid prompt format: [scene] + [camera motion] + [subject action] + [lighting]. Keep it specific. Vague prompts produce vague video.
Hardware Requirements
| GPU VRAM | Model to Use | Max Resolution | Duration |
|---|---|---|---|
| 24GB (e.g., RTX 4090) | FP16 full | 1920×1080 | up to 6s |
| 16GB (e.g., RTX 3090) | FP8 quantized | 1280×720 | up to 4s |
| 10–12GB | INT4 quantized | 512×512 | short clips |
Enable VAE tiling if you're near your VRAM limit. It saves ~3GB with minimal quality loss.
Don't Have a Local GPU?
Running LTX 2.3 locally requires at least 16GB VRAM. If your setup isn't there yet, LTX-23.app runs the same model in the cloud with no setup or downloads required. It supports all four generation modes — text, image, audio, and video-to-video — and outputs up to 1080p with no watermarks. New accounts get free credits to start.
It's a practical way to test prompts and workflows before committing to a local build.
Wrapping Up
LTX 2.3 raises the bar for open-source video generation. The ComfyUI integration is clean, the official templates work out of the box, and native audio sync opens up a new range of creative projects. Get the model, load the template, keep your CFG under 7, and describe motion in your prompts.
