• 7 Posts
  • 37 Comments
Joined 7 days ago
cake
Cake day: January 6th, 2026

help-circle

  • Not to be a downer if you’re anti-AI, but you should know a functional, small, 1B parameter model only needs ~85GB of data if the training data set is high quality (the four-year old chinchilla paper set out the 20 to 1 optimization rule for ai training, so it may require even less today).

    That’s basically nothing. If a language has over ~130,000 books or an equivalent amount of writing (1,500 books is about a gig in plain ascii), a functional text-based ai model could be built that uses it.

    My understanding is there are next to zero languages in existence today that do not have this amount of quality text. Certainly, spoken languages that have no written word are not accessible like this, but most endangered languages with few speakers that have a historical written word could in theory have ai models built that effectively communicate in those languages.

    To give you an idea of what this means for less-written languages and a website revolving around them, look at worldcat (which does NOT have anywhere near most of the written text available entirely online for each language listed, it’s JUST a resource for libraries): https://www.oclc.org/en/worldcat/inside-worldcat.html

    But this gets even harder for a theoretical website used to avoid an LLM that can read it, because this is all assuming creating an ai model for language from scratch. That is not necessary today because of transfer learning.

    Major LLM models with over 100 diverse major languages can be fined-tuned on an insignificant amount of data (even 1GB could work in theory) and produce results like those of a 1B parameter model trained solely on one language. This is because the multi-lingual models developed cross-cultural vector-based understandings of Grammer.

    In truth, the only remaining major barriers for any language not understood by fine-tuning an ai model today are both (1) digitization and (2) character recognition. Digitization will vanish as an issue for basically every written language that has a unique script within the next ten years. Character recognition (and more specifically, the economic viability of building the character recognition) will be the only remaining issue.

    Ironically, in creating such a website, you will be creating more data for a future potential ai model to use in training. Especially if whatever you write makes the language of greater economic importance.


  • Iirc:

    The same officer who killed Renee Good stuck his hand in the window of a car driven by a convicted sex offender earlier this year, and refused to let go when the sex offender started driving away, attempting all sorts of nonlethal force like a taser, until the sex offender crashed.

    There is some dispute over whether the officer was truly “stuck” or just held on in order to have greater charges against the convicted sex offender. What is indisputable, is that the officer never attempted to use his gun in that moment, while he did use it on Renee Good.























  • You’re following rules no one else does and only wanting to try if it’s guaranteed to succeed.

    That’s categorically an opposite of what “that said, it’s still worth going after the guy criminally. He deserves prison and the AG should give it a shot” means, right?

    I said try even if it’s not guaranteed. My contribution to our conversation was opining on the likelihood of success.

    Throw his ass in jail for murder with no bail

    I’d love this, and think it’s unlikely.

    He can beat the charges, but he’ll never get the months/years of his life back while awaiting trial

    Make it so in the back of every ICE agent’s head there’s a constant reminder: “There could be consequences”.

    Regardless of how we accomplish it, nothing gets fixed till that thought is always in their minds.

    Yes to all of this!



  • You don’t have to support Maduro (I sure as fuck don’t) to know that the Trump administration is definitely in the wrong to play world police, invade Venezuela, and kidnap a foreign leader. Originally they claimed operations in Venezuela were in defense of democracy, and now some vague accusation about drugs are supposed to explain why all of this is necessary. Both excuses are complete bullshit, but it’s especially hypocritical (although not surprising in the least) for Trump to threaten to cancel U.S. midterms days after kidnapping Maduro and pretending to be protecting the U.S. or some kind of global defender of free speech and the democratic process.

    Yeah most of that is right I think. I’d caveat that the attack was more about the naked imperialism in Trump’s publicly articulated “Donroe Doctrine” than drugs or oil specifically.

    I don’t really think the Chevron stuff Trump did is odd. Chevron has a longer history operating in Venezuela than any of the other companies. Bad, certainly. I have no love for Trump or Chevron. but not odd.

    I kinda miss Chevron deference. As an aside, it is ironic that the namesake for a legal theory providing more administrative authority to the federal government was a private oil company, instead of, like, “administrative deference.”