I uploaded a photo of an outdoor scene and got a three paragraph description giving the location (taken from GPS coordinates, presumably), a description of the scene, weather conditions, and the statement that there were things in the sky that could be UFOs.
Another one: “The car license plates visible give a hint of local registration.”
It looks like a LLM trained on images, which is to say, its output would be text that sounds like it plausibly belongs in a description of an image, whether or not it is true or even meaningful.