Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful youāll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cutānāpaste it into its own post ā thereās no quota for posting and the bar really isnāt that high.
The post Xitter web has spawned soo many āesotericā right wing freaks, but thereās no appropriate sneer-space for them. Iām talking redscare-ish, reality challenged āculture criticsā who write about everything but understand nothing. Iām talking about reply-guys who make the same 6 tweets about the same 3 subjects. Theyāre inescapable at this point, yet I donāt see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldnāt be surgeons because they didnāt believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I canāt escape them, I would love to sneer at them.


ZITRON DROPPED
I donāt really understand what point Zitron is making about each query requiring a ācompletely fresh static promptā, nor about the relative ordering of the user and static prompts. Why would these things matter?
There are techniques for caching some of the steps involved with LLMs. Like I think you can cache the tokenization and maybe some of the work of the attention head is doing if you have a static, known, prompt? But I donāt see why you couldnāt just do that caching separately for each model your model router might direct things to? And if you have multiple prompts you just do a separate caching for each one? This creates a lot of memory usage overhead, but not more excessively more computation⦠well you do need to do the computation to generate each cache. I donāt find it that implausible that OpenAI couldnāt manage to screw all this up somehow, but Iām not quite sure the exact explanation of the problem Zitron has given fits together.
(The order of the prompts vs. user interactions does matter, especially for caching⦠but I think you could just cut and paste the user interactions to separate it from the old prompt and stick a new prompt on it in whatever order works best? You would get wildly varying quality in output generated as it switches between models and prompts, but this wouldnāt add in more computationā¦)
Zitron mentioned a scoop, so I hope/assume someone did some prompt hacking to get GPT-5 to spit out some of itās behind the scenes prompts and he has solid proof about what he is saying. I wouldnāt put anything past OpenAI for certain.
I think this hinges on the system prompt going after the user prompt, for some router-related non-obvious reason, meaning at each model change the input is always new and thus uncacheable.
Also going by the last Claude system prompt that leaked these things can be like 20.000 tokens long.