Is OpenAI's Sora in Trouble Yet?

17 Jun 2024

Have you heard about the latest sensation in the generative AI world, the Luma Dream Machine? It’s being called OpenAI’s Sora’s biggest rival. But is it really that good?

Comparing them is tricky because Dream Machine is available to everyone, while Sora isn’t. But let’s see what we can find out. Hard to deny that right now, Dream Machine is leading because we can actually use it. It’s the best tool for generating videos from images, beating competitors like Pika and Runway ML. But how does it compare to the mysterious Sora?

Since we can’t use Sora, we’ll compare OpenAI’s public demos to what Luma Dream Machine can do. Here’s the plan: we’ll take the first frame from OpenAI’s demo videos and use the same prompt with Luma’s Dream Machine. This will show us how well Dream Machine can copy the same physics, movement, and space as Sora. Even if OpenAI’s demos are cherry-picked, we can still compare the details and see how both models perform.

Below, I’ve put together some video comparisons. Each set has three examples. The first video is from OpenAI’s demo on Sora’s website. The second is made with Dream Machine’s image-to-video feature, using the same prompt and the first frame of Sora’s demo as a guide. The third shows how Luma’s tool works with just the prompt. This is interesting because both Sora and Dream Machine use text-to-video, so we can compare their creativity and how well they follow the prompt.

So, without further ado, let’s check out the examples and see which tool comes out on top.

Tokyo Walk

https://youtu.be/-1sqHhXZHjM?embedable=true

Let’s compare OpenAI’s demo with the Luma Dream Machine. In the first comparison, Dream Machine shows impressive camera movement, and the main person’s actions are smooth and natural. However, there are issues with unnatural artifacts and inconsistent looks of objects and people throughout the clip. Unlike OpenAI’s video, the background crowd appears to melt and change shape as the video progresses.

The main character’s face also changes unnaturally, making the video look obviously fake, a problem Sora doesn’t have.

In the text-to-video example, Dream Machine’s video isn’t bad, but the unnatural morphing of objects is noticeable. For instance, an umbrella appears in a pedestrian’s hand out of nowhere, clearly indicating AI generation. This makes it no competition for royalty-free stock clips. A thing that Sora’s generations can probably be.

However, Dream Machine sticks to the prompt well: black jacket, red dress, lipstick, sunglasses, reflective street, pedestrians, and neon lights are all present. So, well done on following the details!

Gold Rush

https://youtu.be/CXHwkkCdUCE?embedable=true

When comparing Luma’s image-to-video result to OpenAI’s, it’s not terrible. However, the camera movement isn’t as smooth as in the Tokyo video, stopping abruptly and making the scene harsh. The worst part is the character’s movement at the end of the clip, which appears unnatural and random. Additionally, the buildings on the left degrade in realism with each frame, a problem not seen in Sora’s example.

Similar to the previous clip, there’s a lack of stability and consistency, with too many artifacts. Sora also excels in making the clip look vintage with a low frame rate and overall old-school quality, suggesting it can stylize its output according to the prompt, which Dream Machine didn’t achieve here.

In the text-to-video example with a short and open-ended prompt, Luma’s model chose a different scene from gold rush history. It seems more in style with the era, using the right colors and lighting. However, the morphing effect and unnatural movement ruin the whole clip, making it unusable in video projects.

SUV in the Dust

https://youtu.be/6BkApqdQ5ZY?embedable=true

This video is my favorite on OpenAI’s website. The car moves very naturally, with excellent lighting, shadows, and dynamics. It’s indistinguishable from a real video, making it perfect for content creators. In contrast, Dream Machine’s camera movement is correct, but the objects get squashed and mangled unnaturally. In the second part of the clip, the perspective becomes heavily distorted, clearly looking like an AI generation.

For the text-to-video example, the result is actually quite nice—one of the best I’ve managed to get from Luma’s product. It’s less dynamic than the first one but looks pretty natural. However, it suffers from a different issue. The prompt was extensive, specifying that the SUV should be seen from behind with dust coming off the tires. Dream Machine interpreted it differently.

This highlights a key aspect of AI content generators: without precise prompt interpretation, we can waste hours generating variations that don’t fit our vision or needs.

Museum

https://youtu.be/EBAyuJapaAY?embedable=true

The Museum example is a different kind of beast. Well, not actually a beast - it’s more subtle, calm, and less dynamic. Just a simple walk with a steady camera. OpenAI’s version is accurate. It’s not exciting, but it doesn’t lack realism. Luma’s version presents a different camera movement but looks good too, without the distortions seen in other clips. The main issue is that the pictures that are not part of the original image appear blurred and lack definition. Overall, the video is fine, and with a few tweaks, we could get a proper result.

There are no obvious visual flaws in the second video either. The gallery looks fine. My biggest issue is the choice of camera movement in the first part, which isn’t very realistic. Interestingly, Dream Machine generated two scenes for one prompt, with a cut in the middle showing a different room in the museum. It’s fascinating that the model decided to do this. The second part has better camera movement, making it more pleasing to the eye.

Backward Jogger

https://youtu.be/Ti_vdFi7Z3c?embedable=true

This example is interesting because, on Sora’s page, it’s shown as one of the model’s problems: the jogger is running the wrong way. No treadmill works like that, but in the AI world, anything is possible. Is this Dream Machine’s chance to shine? The image-to-video result is actually pretty good.

The jogger still runs backward, as in the input image, but the camera movement and jogger’s behavior are almost perfect. There are some minor distortions, and the camera perspective gets a little weird over time, but with a bit of cherry-picking, we could get a decent result for our productions.

The version generated with just the prompt is also interesting. It’s very dynamic and a bit distorted, but this might suit certain productions, especially if a shaky, sketch-like aesthetic is desired. Not bad at all. Finally, Luma’s model is getting closer to its future competitor.

Italian Puppy

https://youtu.be/6lB5YLwe4jQ?embedable=true

The last main example on the OpenAI site features a Dalmatian in a colorful Italian city. The original video made with Sora isn’t perfect. In a longer clip, the dog starts acting a bit oddly, and its animation isn’t as natural as in other showcased videos. How does Luma’s newest AI handle this?

Not well at all. Maybe it’s because they only had one take (and the generator is pretty rate-limited), but what we see is a festival of glitches and unrealistic imagery. The dog’s texture changes as the video progresses, the buildings look like they’re made of playdough, and another dog-like abomination appears at the end, making it look more like Salvador Dali’s work than a real video. This is definitely the worst example so far.

Dream Machine’s own creation isn’t any better. It didn’t follow the prompt, failing to include the Dalmatian at all. There’s no window for the dog to sit in, the buildings look cartoonish, and the overall architecture is nonsensical. Worst of all are the cyclists on heavily distorted bikes, deformed creatures driving into the canal, or morphing into other cyclists without any reason. This falls way below expectations.

Verdict?

For what’s available to the public now, Luma’s new AI is truly impressive. It pushes the boundaries, generating really nice camera motion and often very realistic movements of people and objects. It seems to work better when provided with a reference image, producing effects better than its current competition.

But is it as good as Sora? It seems far from it, at least for now. Sora’s creations can be mistaken for real videos, at least at first glance. The showcase suggests that Sora could compete with stock videos and make life easier for filmmakers and content creators. Dream Machine, on the other hand, often produces glitches and doesn’t always follow prompts accurately.

It’s another step forward in model improvements, but still not reliable and stable enough for widespread use.

Is it a true rival for Sora? Not yet. However, we haven’t interacted directly with Sora, and OpenAI’s showcase might be carefully curated. Sora could potentially make similar mistakes as Luma’s model. Until Sora is publicly available, we can’t be certain.

Personally, I’m glad we have Dream Machine. It brings us closer to the perfect AI video generator. It’s useful in some cases and will likely improve over time. I appreciate Luma releasing this tool, giving us another way to enjoy generative AI for video clips.

On the other hand, I hope Sora works as shown in the showcase. If it does, it will be a significant leap forward. I’m eagerly waiting for it to become publicly available so I can compare the results myself.