

Good luck telling the promptfondlers that LLMs are only useful for entertainment and not for any useful work.


Good luck telling the promptfondlers that LLMs are only useful for entertainment and not for any useful work.


Iām sure these English instructions work because they feel like they work. Look, these LLMs feel really great for coding. If they donāt work, thatās because you didnāt pay $200/month for the pro version and you didnāt put enough boldface and all-caps words in the prompt. Also, I really feel like these homeopathic sugar pills cured my cold. I got better after I started taking them!
No joke, I watched a talk once where some people used an LLM to model how certain users would behave in their scenario given their socioeconomic backgrounds. But they had a slight problem, which was that LLMs are nondeterministic and would of course often give different answers when prompted twice. Their solution was to literally use an automated tool that would try a bunch of different prompts until they happened to get one that would give consistent answers (at least on their dataset). I would call this the xkcd green jelly bean effect, but I guess if you call it āfinetuningā then suddenly it sounds very proper and serious. (The cherry on top was that they never actually evaluated the output of the LLM, e.g. by seeing how consistent it was with actual user responses. They just had an LLM generate fiction and called it a day.)


AI seems good at purple prose and metaphors that donāt exactly make sense. No, I do not give a fuck about the ātriangle of calmā when it comes to, of all things, the narrator taking off her shoes. No, I am not interested in how long the narrator sets the timer on the microwave when she makes literally the blandest meal of all time.
Now Iām sure the techbros truly think this is good āliteraryā writing. After all, they only care that the writing sounds flowery, because they seem to be very good at missing the actual meaning of everything. I remember Saltman saying that the movie Oppenheimer needed to be more optimistic to inspire more kids to become physicists (while also saying that The Social Network did that for startup founders).


The articleās entire premise is Musk saying some random shit. Remember how Musk said that he would land a man on Mars in 10 years 13 years ago? Honestly, I am incensed that people like Musk and Trump can just say shit and many people will just accept it. I can no longer tolerate it.
Putting aside the very real human ability to screw up such a concept and turn any fair system into an unfair one, ā¦
He says this after mentioning UBI. He really doesnāt want to confront the unfortunate fact that UBI is entirely a political issue. Whatever magical beliefs one may have about how AI can create wealth, the question of how to distribute it is a social arrangement. What exactly stops the wealthy from consolidating all that wealth for themselves? The goodness of their hearts? Or is it political pushback (and violence in the bad old days), as demonstrated in every single example we have in history?
Iād say the problem is even worse now. In previous eras, some wealthy people funded libraries and parks. Nowadays we see them donate to weirdo rationalist nonsense that is completely disconnected from reality.
No getting up early and commuting on public transit. ā¦
This is followed by four whole paragraphs about how the office sucks and wouldnāt it be wonderful if AI got rid of all that. Guess what, we have remote work already! Remember how, during COVID, many software engineering jobs went fully remote, and it turned out that the work was perfectly doable and the workersā lives improved? But then there were so many puff pieces by managers about the wonderful environment of the office, and back to the office they went. Donāt worry, when the magical AI is here, theyāll change their minds.
Yes, there are āmindless, stupid, inane thingsā like chores that are unavoidable. There are also other mindless, stupid, inane things that are entirely avoidable but exist anyway because some people base their entire lives around number go up.


Iād say that the great problems that last for decades do not fall purely to random bullshit and require serious advances in new concepts and understanding. But even then, the romanticized warrior culture view is inaccurate. Itās not like some big brain genius says āIām gonna solve this problemā and comes up with big brain ideas that solve it. Instead, a big problem is solved after people make tons of incremental progress by trying random bullshit and then someone realizes that the tools are now good enough to solve the big problem. A better analogy than the Good Will Hunting genius is picking a fruit: you wait until it is ripe.
But math/CS research is not just about random bullshit go. The truly valuable part is theory and understanding, which comes from critically evaluating the results of whatever random bullshit one tries. Why did idea X work well with Y but not so well with Z, and where else could it work? So random bullshit go is a necessary part of the process, but Iād say research has value (and prestige) because of the theory that comes from people thinking about it critically. Needless to say, LLMs are useless at this. (In the Knuth example, the AI didnāt even prove that its construction worked.)
I think intelligence is overrated for research, and the most important quality for research is giving a shit. Solving big problems is mostly a question of having the right perspective and tools, and raw intelligence is not very useful without them. To do that, one needs to take time to develop opinions and feelings about the strengths and weaknesses of various tools.
Of course, every rule has exceptions, and there have been long standing problems that have been solved only when someone had the chutzpah to apply far more random bullshit than anyone had dared to try before.


The 31st try resulted in them only solving the problem for odd m, but the even m case was still open. So of course this happened:
Filip also told me that he asked Claude to continue on the even case after the odd case had been resolved. āBut there after a while it seemed to get stuck. In the end, it was not even able to write and run explore programs correctly anymore, very weird. So I stopped the search.ā
Knuth did add a postscript on other friends maybe kinda vibing a possible solution for even m:
On March 3, Stappers wrote me as follows: āThe story has a bit of a sequel. I put Claude Opus 4.6 to work on the m = even cases again for about 4 hours yesterday. It made some progress, but not a full solution. The final program . . . sets up a partial fiber construction similar to the odd case, then runs a search to fix it all up. . . . Claude spent the last part of the process mostly on making the search quicker instead of looking for an actual construction. . . . It was running many programs trying to find solutions using simulated annealing or backtrack. After I suggested to use the ORTools CP-SAT [part of Googleās open source toolkit, with the AddCircuit constraint] to find solutions, progress was better, since now solutions could be found within seconds.ā This program is [4].
Then on March 4, another friend ā Ho Boon Suan in Singapore ā wrote as follows: āI have code generated by gpt-5.3-codex that generates a decomposition for even m ā„ 8. . . . Iāve tested it for all even m from 8 to 200 and bunch of random even values between 400 and 2000, and it looks good. Seems far more chaotic to prove correctness by hand here though; the pattern is way more complex.ā That program is [5]. (Wow. The graph for m = 2000 has 8 billion vertices!)
I find it slightly funny how Stappers suggested to the AI to use specific external tools that are actually reliable (like ORTools). This also makes me question how much the of the AIās āinsightā was a result of handholding and the rubber duck effect.
For context:


I can see them by clicking on the āReasoned for 6m 34sā box on my desktop browser (I know, great UI design) without needing to log in. This did not work on my phone. In any case, I can copy it here:
Exploring Hadamard matrix construction for order 428
For a Hadamard matrix of order 428, we need to confirm that itās a multiple of 4, which it is (428 = 4 * 107). However, the challenge is that order 107 doesnāt work since itās not a multiple of 4. A possible approach could involve using constructions like Williamson or Goethals-Seidel. Alternatively, I could try a Kronecker product of smaller matrices.
Examining constructions for Hadamard matrix of order 428
Since 107 ā” 3 mod 4, Paley type I gives a Hadamard of order 108, but I need 428, not 432. Thatās a little tricky. I wonder if thereās a known construction for order 4*107 using āHadamard matrix of order 4qā or conference matrices. Could I adapt the generalized Paley approach for this?
Exploring conference matrix construction for Hadamard order 428
For q = 107, Paley construction doesnāt work since q ā” 3 mod 4. But if I can create a conference matrix of order 214 (since 214 ā” 2 mod 4), then a Hadamard matrix of order 428 would follow. Conference matrices require specific conditions, but this seems promising. Letās focus on 214.
Searching for explicit Hadamard matrix of order 428
To generate a Hadamard matrix of order 428, it might be feasible since many such matrices exist up to large orders, but I need one explicitly with ±1 entries. A direct construction could involve methods like Goethals-Seidel, which uses circulant matrices to satisfy specific conditions. Iāll need to implement this method.
Exploring construction methods and searching for Hadamard matrix 428
For n=107, there might be a way to use Legendre sequences for Goethals-Seidel difference sets. I know there are constructions for order 4q where q is an odd prime power, but using approaches like Scarpis or Turyn may help too. Iāll search for an explicit Hadamard matrix of order 428 for a possible example or construction.
Exploring accessible sources and construction options for Hadamard matrix
It seems like there may be server timeouts, so Iāll need to try other sources like GitHub, arXiv, or design theory papers. We could construct the Hadamard matrix ourselves using known algorithms, but itās heavy. If accessible, sequences from a Kharaghani paper could help, and SageMath might have relevant functions too.
Identifying base sequences to build Hadamard matrix
To create a Hadamard matrix of order 428, I think I can use base sequences of lengths 71 and 36, which combine to 107. Iāll look for Turyn-type sequences or T-sequences in accessible PDFs, potentially from open sources like arXiv. A useful file might be āH_428,ā so Iāll search for it.
Checking GitHub for Hadamard matrix at 428 order
I found a GitHub project, āanymatrix,ā which claims to have Hadamard matrices up to order 428. It might include the 428 matrix in MATLAB format. Iāll check the repository to see if it has the actual file with the matrix data. Letās open it and explore!
Opening GitHub repository for 428 Hadamard matrix
The page at math.ntua.gr timed out before, but I might try downloading the file directly instead of opening it. Alternatively, I could explore the GitHub āanymatrixā repository, which likely has Hadamard matrices up to order 428. Iāll open the repositoryās root to check its structure and find the matrix file.
Opening Hadamard folder
Iāll go ahead and open the Hadamard folder at link ID 81. Time to check whatās inside and see if the matrix or relevant files are there!
Trying GitHub raw content for Hadamard matrix
It looks like extracting the files might have failed because of JS requirements, so Iām considering an alternative: accessing raw content directly from GitHub. Iāll search the repository for relevant files containing the 428 matrix ā once I find out the exact file names, I can access them with no issue!
Trying direct access to Hadamard file
It seems the math.ntua listing might still be promising, with both text and zip formats for the Hadamard matrix. Iāll try opening the direct file URL instead of the page, hoping itāll avoid the timeout issues. Itās worth retrying, as the timeout could have been temporary. Letās try again!
Trying alternative ways to access Hadamard matrix
Since opening the math.ntua.gr server is
Checking old Hadamard links
The page seems to be from 2001, and it has a link to Neil Sloaneās library list, which might include the H428 matrix. However, the link for item 8 points to āwww.research.att.com ,ā which could have moved. Letās click and see if itās still accessible.
Trying alternative download for Hadamard matrix
The ATT site removed the old page, and Sloaneās library now points to a page that times out. Another option might be using Anymatrix, but GitHub listings arenāt working. Iāll try accessing the IPM website directly to download the matrix ā maybe the fileās URL works better with HTTP instead of HTTPS. Letās give it a try.
Searching for accessible Hadamard matrix
The page redirects to HTTPS and times out, which might indicate blocking. Maybe the matrix is hosted elsewhere, like in a plain text or PDF format. Iāll search for terms like āh428.txtā or āhadamard428.txtā on cs.uleth.ca ā thereās a mention of a PDF that could include it. Letās give that a try.
Opening ADTHM page for possible matrix
Iāve found a listing on cs.uleth, and thereās a link on the ADTHM page that might lead to the matrix in text format. It could be what Iām looking for, so Iāll go ahead and open the ADTHM page to check it out. Letās see if it includes the matrix!
Accessing matrix text through link
Awesome! I found a link labeled āEntries as textā (ID: 26). This could be the matrix in the text format Iām looking for. Iāll go ahead and click the link to download it. Letās see if this works!
Parsing the XPM image to a matrix
The H428.html contains XPM image data, not matrix entries. It seems to represent values with two colors, p and m, which likely stand for +1 and -1. I can map p to 1 and m to -1 (just to simplify things) and validate that the Hadamard property holds by checking H*H^T = 428I. Iāll fetch and parse the full content to confirm.
Downloading H428.html
Alright, Iāll use the container.download method to fetch the H428.html file. I previously opened the URL, so that should meet the requirement. Now, Iāll download the file directly to /mnt/data/H428.html and get the content I need. This should work smoothly and efficiently!
Understanding limitations for container.download
It seems the container.download method restricts us to file types other than text/html ā likely just images or data. Since I canāt use click or fetch via Python, Iāll need to rely on web.run for the content. However, the 200-word limit means we canāt fully retrieve the 428 lines. I may need to process it in parts.
Handling 200-line limits in tool response
Since the tool response gets cut off around 200 words, I canāt present more than that in the final output. However, I can strategically open various line ranges and parse those internal portions to reconstruct the full matrix. This should allow me to gather all the necessary data without breaking the word limits.
Handling file retrieval and parsing
To retrieve the full content, I can call the tool multiple times from different lines, reconstructing the file. However, retrieving 428 lines in total means a heavy operation, spread across several calls. Given the content-type restrictions, I can process the data using Python, but I need to work with a local file.
Figuring out how to retrieve and process data
To get more lines, I can use a tool to fetch a longer response, but thereās still a limit to the toolās output. Since the H428.html file is too large, I could check if thereās a plain text version like āH428.txtā in the directory that can be downloaded. Iāll also look for any relevant comments in the HTML.
Looking for matrix in different format
The file starts with ā/* XPM */,ā indicating itās actually plain text despite being labeled as HTML. I could try adding a download parameter or using a raw link, but the content type restriction remains. Next, Iāll check if thereās a plain text version of the matrix, like āH428.txtā in the directory.
Considering ways to fetch and process the file
Thereās no plain text, but the XPM format might be parseable. I could gradually access the file using the web tool, then reconstruct it with Python. One option is fetching parts with web tool, accumulating in Python, and then processing. I could work around content type restrictions with query parameters for text/plain.
Exploring options to access content
The H428.html doesnāt seem like real HTML. Maybe I can access it through a ?download=1 query or H428.xpm? I might also try the enlarge image link (id 25). If itās a GIF/PNG, I could download it and extract pixel values, though this might involve some image processing. Iāll check it out.
Downloading and inspecting image
The image is a PNG, so I can download it using the container. Iāll pull down H428_big.png and check the imageās pixel structure. Itās likely 856x856, possibly scaled at 2x, though the matrix should be 428x428. Iāll inspect the image further after downloading it.
Opening image for inspection
The file is small, just 5kb, which likely means itās compressed and may be a 428x428 image with a palette. Iāll go ahead and open the image to inspect it using the container, and then I can analyze the pixels to understand how itās structured further.
Processing image for matrix conversion
We have the image, so itās time to use Python to read it and convert the pixel colors into matrix entries. First, I need to check the imageās pixel dimensionsāmore likely it will be 428x428 or a multiple of that. Iāll use Python with PIL to open the image and confirm


The AI people are still infatuated with math. The Epoch AI staff, after being thoroughly embarrassed last year by the FrontierMath scandal, have now decided to make a new FrontierMath Open Problems benchmark, this time with problems that people might give a shit about!
I decided to look at one of the easiest āmoderately interestingā problems and noticed that GPT-5.2 Pro managed to solve a warm up version of the problem, i.e. a version that had been previously solved. Wow, these reasoning models sure are capable of math! So I was curious and looked at the reasoning trace and it turns out that ⦠the model just found an obscure website with the right answer and downloaded it. Well, I guess you could say it has some impressive reasoning as it figures out how to download and parse the data, maybe.


Hey, youāre selling them short: there are also ReLU and softmax activation functions thrown around here and there. Clankers arenāt just linear transformations! /j


I am a computer science PhD so I can give some opinion on exactly what is being solved.
First of all, the problem is very contrived. I cannot think of what the motivation or significance of this problem is, and Knuth literally says that it is a planned homework exercise. Itās not a problem that many people have thought about before.
Second, I think this problem is easy (by research standards). The problem is of the form: āWithin this object X of size m, find any example of Y.ā The problem is very limited (the only thing that varies is how large m is), and you only need to find one example of Y for each m, even if there are many such examples. In fact, Filip found that for small values of m, there were tons of examples for Y. In this scenario, my strategy would be ārandom bullshit goā: there are likely so many ways to solve the problem that a good idea is literally just trying stuff and seeing what sticks. Knuth did say the problem was open for several weeks, but:
I guess ārandom bullshit goā is served well by a random bullshit machine, but you still need an expert who actually understands the problem to read the tea leaves and evaluate if you got something useful. Knuthās narrative is not very transparent about how much Filip handheld for the AI as well.
I think the main danger of this (putting aside the severe societal costs of AI) is not that doing this is faster or slower than just thinking through the problem yourself. Itās that relying on AI atrophies your ability to think, and eventually even your ability to guard against the AI bullshitting you. The only way to retain a deep understanding is to constantly be in the weeds thinking things through. Weāve seen this story play out in software before.


I was pissed when my (non-academic) friends saw this and immediately started talking about how mathematicians and computer scientists need to use AI from now on.


scott jumpscare


Baldur Bjarnasonās essay remains evergreen.
Consider homeopathy. You might hear a friend talk about āwater memoryā, citing all sorts of scientific-sounding evidence. So, the next time you have a cold you try it.
And you feel better. It even feels like you got better faster, although you canāt prove it because you generally donāt document these things down to the hour.
āMaybe there is something to it.ā
Something seemingly working is not evidence of it working.
Were you doing something else at the time which might have helped your body fight the cold?
Would your recovery have been any different had you not taken the homeopathic āremedyā?
Did your choosing of homeopathy over established medicine expose you to risks you werenāt aware of?
Even when looking at Knuthās account of what happened, you can already tell that the AI is receiving far more credit than what it actually did. There is something about a nondeterministic slot machine that makes it feel far more miraculous when it succeeds, while reliable tools that always do their job are boring and stupid. The downsides of the slot machine never register in comparison to the rewards. Does it feel so miraculous when I get an idea after experimenting in Mathematica?
I feel like math research is particularly susceptible to this, because it is the default that almost all of oneās attempts do not succeed. So what if most of the AIās attempts do not succeed? But if it is to be evaluated as a tool, we have to check if the benefits outweigh the costs. Did it give me more productive ideas, or did it actually waste more of my time leading me down blind alleys? More importantly, is the cognitive decline caused by relying on slot machines going to destroy my progress in the long term? I donāt think anyone is going to do proper experiments for this in math research, but we have already seen this story play out in software. So many people were impressed by superficial performances, and now we are seeing the dumpster fire of bloat, bugs, and security holes. No, I donāt think I want that.
And then there is the narrative of not evaluating AI as an objective tool based on what it can actually do, but instead as a tidal wave of Unending Progress that will one day sweep away those elitists with actual skills. Random lemmas today mean the Millennium Prize problems tomorrow! This is where the AI hype comes from, and why people avoid, say, comparing AI with Mathematica. To them I say good luck. We have dumped hundreds of billions of dollars into this, and there are only so many more hundreds of billions of dollars left. Were these small positive results (and significant negatives) worth hundreds of billions of dollars, or perhaps were there better things that these resources could have been used for?


Donāt worry, thereās always Effective Altruism if you ever feel guilty about causing the suffering of regular people. Just say youāre going to donate your money at some point eventually in the future. There you go, 40 trillion hypothetical lives saved!


This somehow makes things even funnier. If he had any understanding of modern math, he would know that representing a set of things as points in some geometric space is one of the most common techniques in math. (A basic example: a pair of numbers can be represented by a point in 2D space.) Also, a manifold is an extremely broad geometric concept: knowing that two things are manifolds does not meant that they are the same or even remotely similar, without checking the details. There are tons of things you can model as a manifold if you try hard enough.
From what I see, Scoot read a paper modeling LLM inference with manifolds and thought āwow, cool!ā Then he fished for neuroscience papers until he found one that modeled neurons using manifolds. Both of the papers have blah blah blah something something manifolds so there must be a deep connection!
(Maybe there is a deep connection! But the burden of proof is on him, and he needs to do a little more work than noticing that both papers use the word manifold.)


Kolmogorov complexity:
So we should see some proper definitions and basic results on the Kolmogorov complexity, like in modern papers, right? We should at least see a Kt or a pKt thrown in there, right?
Understanding IS compression ā extracting structure from data. Optimal compression is uncomputable. Understanding is therefore always provisional, always improvable, never verifiably complete. This kills āstochastic parrotā from a second independent direction: if LLMs were memorizing rather than understanding, they could not generalize to inputs not in their training data. But they do. Generalization to novel input IS compression ā extracting structure, not regurgitating sequences.
Fuck!


Nonsensical analogies are always improved by adding a chart with colorful boxes and arrows going between them. Of course, the burden of proof is on you, dear reader, to explain why the analogy doesnāt make sense, not on the author to provide more justification than waving his hands really really hard.
Many of these analogies are bad as, I donāt know, āDenmark and North Korea are the same because they both have governmentsā or something. Humans and LLMs both produce sequences of words, where the next word depends in some way on the previous words, so they are basically the same (and you can call this āpredictingā the next word as a rhetorical flourish). Yeah, what a revolutionary concept, knowing that both humans and LLMs follow the laws of time and causality. And as we know, evolution āoptimizesā for reproduction, and thatās why there are only bacteria around (they can reproduce every 20 minutes). He has to be careful, these types of dumbass āoptimizationā interpretations of evolution that arose in the late 1800s led to horrible ideas about race science ⦠wait a minute ā¦
He isnāt even trying with the yellow and orange boxes. What the fuck do āhigh-D toroidal attractor manifoldsā and ā6D helical manifoldsā have to do with anything? Why are they there? And he really thinks he can get away with nobody closely reading his charts, with the ā(???, nothing)ā business. Maybe I should throw in that box in my publications and see how that goes.
I feel like his arguments rely on the Barnum effect. He makes statements like āhumans and LLMs predict the next wordā and āevolution optimizes for reproductionā that are so vague that they can be assigned whatever meaning he wants. Because of this, you canāt easily dispel them (he just comes up with some different interpretation), and he can use them as carte blanche to justify whatever he wants.


Maybe I should apply to be a director of AI safety at Meta. I know one safety measure that works: donāt use AI.


Whatās next, are the crypto bros gonna make some dumb talking point about how traditional finance also uses so much energy ⦠oh wait, they already did that.
By far the dumbest āfeatureā in the codebase is this thing called āBuddyā (described in a few places such as here). Honestly, I donāt really know what itās for or what the point is.
Great, so they were planning on a gacha system where you can get an ASCII virtual pet that, uhh, occasionally makes comments? Truly a serious feature for a serious tool for the serious discipline of software engineering. Imagine if IntelliJ decided to pull this bullshit.
The Onion could not have come up with a better way to illustrate this very point.