Do you like (AI) clocks?

LegoBrickOnFire@lemmy.world · edit-2 7 days ago

Do you like (AI) clocks?

criticon@lemmy.ca · 7 days ago

This is my favorite

Shaper@lemmy.world · 7 days ago

gaslight clock

ripcord@lemmy.world · edit-2 7 days ago

So the prompt is intentionally breaking things…?

Edit: I guess this was somehow the chatty output doing who know wtf

huppakee@piefed.social · 7 days ago

Removed by mod

LegoBrickOnFire@lemmy.world · 7 days ago

that’s scary how dementia works :'(

LaLuzDelSol@lemmy.world · 7 days ago

This kinda freaked me out: AI models fed their own outputs as training data will quickly start making distorted images that look spookily like human painting made under the progression of mental illness or drugs.

https://www.nature.com/articles/d41586-024-02420-7

BeeegScaaawyCripple@lemmy.world · 7 days ago

well that was terrifying

BatmanAoD@programming.dev · edit-2 7 days ago

Thanks for sharing this! I really think that when people see LLM failures and say that such failures demonstrate how fundamentally different LLMs are from human cognition, they tend to overlook how humans actually do exhibit remarkably similar failures modes. Obviously dementia isn’t really analogous to generating text while lacking the ability to “see” a rendering based on that text. But it’s still pretty interesting that whatever feedback loops did get corrupted in these patients led to such a variety of failure modes.

As an example of what I’m talking about, I appreciated and generally agreed with this recent Octomind post, but I disagree with the list of problems that “wouldn’t trip up a human dev”; these are all things I’ve seen real humans do, or could imagine a human doing.

huppakee@piefed.social · 7 days ago

such a variety of failure modes

What i find interesting is that in both cases there is a certain consistency in the mistakes too - basically every dementia patient still understands the clock is something with a circle and numbers and not a square with letters for example. LLMs can tell you cokplete bullshit, but still understands it has to be done with perfect grammar in a consistant language. So much so it struggles to respond outside of this box - ask it to insert spelling errors to look human for example.

the ability to “see”

This might be the true problem in both cases, both the patient and the model can not comprehend the bigger picture (a circle is divided into 12 segments, because that is how we deconstructed the time it takes for the earth to spin around it’s axis). Things that seem logical to use, are logical because of these kind of connections with other things we know and comprehend.

NeatNit@discuss.tchncs.de · 7 days ago

… what

monotremata@lemmy.ca · 7 days ago

Basically this: https://www.psychdb.com/cognitive-testing/clock-drawing-test

psud@aussie.zone · 5 days ago

I guess that test is going to become less useful as generation Z age

NeatNit@discuss.tchncs.de · edit-2 7 days ago

Thanks.

What I still didn’t figure out about the comment I replied to is:

What is each row? They’re labeled I, II, III, IV. What’s being counted?
Why did they link to a home interior design website under “via”?

monotremata@lemmy.ca · 6 days ago

Good questions. I don’t know, and I can no longer try to find out, as the mods have now removed the comment. (Sorry for the double-post–I got briefly confused about which comment you were referring to and deleted my first post, then realized I’d been frazzled and the post in question really was deleted by the mods.)

monotremata@lemmy.ca · 6 days ago

deleted by creator

ripcord@lemmy.world · 7 days ago

Get educated

ZC3rr0r@lemmy.ca · 6 days ago

Thanks for sharing that mindfuck. I honestly would’ve thought something was wrong with my cognition if you hadn’t mentioned it was a test beforehand.

crunchy@lemmy.dbzer0.com · 7 days ago

It’s funny how GPT-5 is consistently the worst one, and it’s not even close.

snooggums@piefed.world · 7 days ago

qwen 2.5 is absolutely pants on head ridiculous compared to gpt5 when I’m looking at it right now.

dimjim@sh.itjust.works · 7 days ago

Some of these are absolutely hilarious

sheepishly@fedia.io · 7 days ago

Given that the AI models are basically constructing these “blindly”- using the language model to string together html and javascript without really being able to check how it looks- some of these are actually pretty impressive. But also making the AI do things it’s bad at is funny. Reminds me of all the AI ASCII art fails…

dullbananas (Joseph Silva)@lemmy.ca · 7 days ago

deleted by creator

I Cast Fist@programming.dev · 6 days ago

I don’t even

I was surprised that both Grok and Gemini 2.5 got it right once, only to fuck it up on the refresh

QuinnyCoded@sh.itjust.works · edit-2 7 days ago

qwen is trying her best 😭😔

zerofk@lemmy.zip · edit-2 7 days ago

So far, I’d give qwen the prize for most artistic impression of a clock.

Kimi K2 appears to consistently get it right.

zerofk@lemmy.zip · 7 days ago

And just as I typed that, Kimi made one where 9 and 10, and 11 and 12 overlapped.

TrickDacy@lemmy.world · 7 days ago

What is this obsession with clocks recently?

Panties@lemmy.ca · edit-2 7 days ago

I don’t know if it’s actually related, but I’ve read that asking people to draw a clock face is a simple way to identify some brain problems

Quick screening for dementia, according to this

Edit: I guess this means most of the AI has ‘Conceptual Deficits’, pretty accurate lol

Trainguyrom@reddthat.com · 7 days ago

Would be funny if AI models are generating such wildly useless “clocks” because they ingested too many dementia screening tests in their training data

altkey (he\him)@lemmy.dbzer0.com · 7 days ago

There is someone training the biggest, bestest model to draw clock faces to pass that test as we speak.

tomiant@piefed.social · 7 days ago

I’m guessing it’s an easy metric to compare benchmarks. “Write a clock”.

tomiant@piefed.social · 7 days ago

You know, I don’t, and what the fuck?

Bazell@lemmy.zip · 7 days ago

Well, KIMI K2 seems to have created the working one. Others failed. I suppose that this model was optimized for this while others not.

SolarBoy@slrpnk.net · 7 days ago

The clocks change every minute. I’ve seen some from deepseek and qwen that looked ok. But kimi seems to be the most consistent

rekabis@lemmy.ca · 6 days ago

Another reason why, while AI might be a fun toy, no one who is serious about getting work done will touch it with a dirty barge pole. The gratuitous hallucinations alone ought to be a sufficient deterrent.

psud@aussie.zone · edit-2 5 days ago

I got a couple of good ones:

A correct AI generated clock

Another correct AI clock

When I first opened it, deepseek also had a correct clock, but I accidentally refreshed the page when I scrolled up to double check its time, second time through only these two were right

Ed. Names are below the clock, the top one was by the clock drawing champion Kimi K2

kersplomp@piefed.blahaj.zone · edit-2 6 days ago

Really cool idea, but the site seems a bit biased for the chinese models, or is otherwise set up weird. I’m not able to reproduce how consistently bad the others are in web dev arena, which generally accepted as the gold standard for testing AI web dev ability.

AppleTea@lemmy.zip · 6 days ago

Each model is allowed 2000 tokens to generate its clock. Here is its prompt: Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.

are you using the same prompt?

kersplomp@piefed.blahaj.zone · 6 days ago

There’s a couple differences. It’s giving it the current time as part of the prompt, which is interesting. The other difference is that it’s asking it to make it responsive. But even when I use that exact prompt (inserting the time obv), it works fine on claude, openai, and gemini.

So there’s definitely an issue specific to this page somewhere. Maybe it’s not iframing them? I’m on mobile so I can’t check.

Steve@startrek.website · 6 days ago

goatinspace@feddit.org · 7 days ago

Gpt 5 👍

Alaknár@sopuli.xyz · 7 days ago

Ummm…

Do you like (AI) clocks?

Do you like (AI) clocks?

AI World Clocks

It’s funny how GPT-5 is consistently the worst one, and it’s not even close.