You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Waypoint-1-Small is a 2.3 billion parameter control-and-text-conditioned causal diffusion model. It is a transformer architecture utilizing rectified flow, distilled via self forcing with DMD. The model can autoregressively generate new frames given historical frames, actions, and text.

Capabilities:

Can generate worlds in realtime on high-end consumer hardware Allows for exploration and interaction with worlds via control inputs Allows for guidance of generated world via text prompts Can be prompted with any number of starting frames and controls

Usage:

In order to simply use Waypoint-1-Small, we recommend Biome for local, the Overworld streaming client, or the Hugging Face hosted Gradio Space.

To run the model locally, we recommend an NVIDIA RTX 5090, which should achieve 20-30 FPS, or an RTX 6000 Pro Blackwell, which should achieve ~35 FPS.

Keywords

To properly explain limitations and misuse we must define some terms. While the model can be used for general interactive video generation tasks, we herein define interacting with the model via sending controls and receiving new frames as “playing” the model, and the agent/user inputting controls as the “player”. The model has two forms of output, continuations and generations. Continuations occur when seed frames are given and no inputs are given. For example, if a scene has fire or water, you may see them evolve progressively in the generated frames even if no action is given. Likewise, if you seed with an image of a humanoid entity, the entity will persist on the screen as you move/look around. However, generations occur when the player plays with the model extensively, for example moving around, turning around fully, or interacting with objects/items. Continuations roughly correspond to moving around already existing information in the given context frames while generations correspond to creating entirely new information.

Limitations

Continuations can plausibly model any inputted scene or photo, and will depend largely on the seed frame given. For generations, the model may occasionally:
Ignore given text prompt
Ignore certain controls in specific contexts
Fail to generate realistic text or interactive HUD/UI elements
Fail to generate human/animal entities
Fail to generate realistic motion for given entities
Prompt adherence is heavily dependent on prompting strategy
Fail to generate faces

Out of Scope Usage

The model and derivatives must not be used
For harassment or bullying
For the purpose of exploiting or harming minors in any way
For simulating extremely violent acts
For generating violent/gory video
For facilitation of large-scale disinformation campaigns
For the purpose of generating any sexually explicit or suggestive material

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Overworld/Waypoint-1-Small

Waypoint-1

Collection

The first real time diffusion world model designed for consumer hardware • 2 items • Updated about 7 hours ago