View Single Post
Old 05-06-2023, 11:06 PM   #4 (permalink)
Master EcoModder
freebeard's Avatar
Join Date: Aug 2012
Location: northwest of normal
Posts: 28,834
Thanks: 8,188
Thanked 8,956 Times in 7,398 Posts
Things move fast, Six months later and it's Midjourney that it the competitor.

146,064 views 28 Apr 2023
DeepFloyd IF is a state-of-the-art text-to-image model that can generate high-quality images based on text prompts. It was introduced by StabilityAI and its multimodal AI research lab DeepFloyd. The model consists of a frozen text encoder based on the T5 transformer and three cascaded pixel diffusion modules: a base model that generates 64x64 px image, and two super-resolution models that upscale the image to 256x256 px and 1024x1024 px23. The model has a high degree of photorealism and language understanding, achieving a zero-shot FID score of 6.66 on the COCO dataset. The model can also perform image modification, style transfer, super-resolution and inpainting using text prompts.
It's Free and Open Source software so the 3D models should be along shortly. From the transcript at 5:23
even better deep Floyd's IF barely just
barely runs on consumer Hardware so
consumer graphics cards 16 gigabytes of
vram for if XL this is the base 64 by 64
model and then upscaling to 256 by 256
but you can't do 1024 by 1024 on the 16
gigabyte model however with 24 gigabytes
of vram you can actually go all the way
up to the 1024 by 1024 model which means
that if you had an RTX 4090 which I mean
most of you don't have that's a very
expensive graphics card you could run
this thing at home and gen generate up
to 1024 by 1024 images all on your own
It seems to do particularly well with food, cars and spelling things correctly.
Without freedom of speech we wouldn't know who all the idiots are. -- anonymous poster

Last edited by freebeard; 05-06-2023 at 11:13 PM..
  Reply With Quote