• luciferofastora@feddit.org
    link
    fedilink
    arrow-up
    46
    ·
    6 days ago

    I’m a data analyst and primary authority on the data model of a particular source system. Most questions for figures from that system that can’t be answered directly and easily in the frontend end up with me.

    I had a manager show me how some new LLM they were developing (which I had contributed some information about the model to) could quickly answer some questions that usually I have to answer manually, as part of a pitch to make me switch to his department so I can apply my expertise for improving this fancy AI instead of answering questions manually.

    He entered a prompt, got a figure that I knew wasn’t correct and I queried my data model for the same info, with a significantly different answer. Given how much said manager leaned on my expertise in the first place, he couldn’t very well challenge my results and got all sheepish about how the AI still in development and all.

    I don’t know how that model arrived at that figure. I don’t know if it generated and ran a query against the data I’d provided. I don’t know if it just invented the number. I don’t know how the devs would figure out the error and how to fix it. But I do know how to explain my own queries, how to investigate errors and (usually) how to find a solution.

    Anyone who relies on a random text generator - no matter how complex that generation method to make it sound human - to generate facts is dangerously inept.

    • jj4211@lemmy.world
      link
      fedilink
      arrow-up
      15
      ·
      6 days ago

      I don’t know how the devs would figure out the error and how to fix it.

      This is like the biggest factor that people don’t get when thinking of these models in the context of software. “Oh it got it wrong, but the developers will fix it in an update”. Nope, they can fix traditional software mistakes, LLM output and machine learning things… They can throw more training data at it (which sometimes just changes what it gets wrong) and hope for the best, they can do better job at curating the context window to give the model the best shot at outputting the right stuff (e.g. the guy who got Opus to generate a slow crappy buggy compiler had to traditionally write a filter to find and show only the ‘relevent’ compiler output back to the models), they can try to generate code to do what you want and have you review the code and correct issues. But debugging and fixing the model itself… that’s just not a thing at all.

      I was in a meeting where a sales executive was bragging about the ‘AI sales agent’ they were working, but admitting frustration with the developres and a bit confused why the software developers weren’t making progress when those same developers always made decent progress before, and they should be able to do this even faster because they have AI tools to help them… It eternally seemed in a state that almost worked but not quite no matter what model or iteration they went to, no matter how much budget they allocated, when it came down to the specific facts and figures it would always screw up.

      I cannot understand how long these executives wade in the LLM pool and still believes in capabilities beyond what anyone has experienced.

      • lightnsfw@reddthat.com
        link
        fedilink
        arrow-up
        4
        ·
        6 days ago

        I cannot understand how long these executives wade in the LLM pool and still believes in capabilities beyond what anyone has experienced.

        They leave the actual work to the boots on the ground so they don’t see how shitty the output is. They listen to marketing about how great it is and mandate everyone use it and then any feedback is filtered through all the brownnosers that report to them.

      • luciferofastora@feddit.org
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        6 days ago

        It eternally seemed in a state that almost worked but not quite no matter what model or iteration they went to, no matter how much budget they allocated, when it came down to the specific facts and figures it would always screw up.

        This is probably the biggest misunderstanding since “Project Managers think three developers can produce a baby in three months”: Just throw more time and money at AI model “development” for better results. It supposes predictable, deterministic behaviour that can be corrected, but LLMs aren’t deterministic ny design, since that wouldn’t sound human anymore.

        Sure, when you’re a developer dedicated to advancing the underlying technology, you may actually produce better results in time, but if you’re just the consumer, you may get a quick turnaround for an alright result (and for some purposes, “alright” may be enough) but eventually you’ll plateau at the limitations of the model.

        Of course, executives universally seem to struggle with the concept of upper limits, such as sustainable growth or productivity.

  • pseudo@jlai.lu
    link
    fedilink
    arrow-up
    53
    ·
    6 days ago

    When you delegate, to a person, a tool or a process, you check the result. You make sure that the delegated tasks get done and correctly and that the results are what is expected.

    Finding that it is not the case after months by luck shows incompetence. Look for the incompetent.

    • flying_sheep@lemmy.ml
      link
      fedilink
      arrow-up
      13
      ·
      edit-2
      6 days ago

      Yeah. Trust is also a thing, like if you delegate to a person that you’ve seen getting the job done multiple times before, you won’t check as closely.

      But this person asked to verify and was told not to. Insane.

    • Tja@programming.dev
      link
      fedilink
      arrow-up
      7
      ·
      6 days ago

      100%

      Hallucinations are widely known, this is a collective failure of the whole chain of leadership.

    • jj4211@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      6 days ago

      Problem being is that whoever is checking the result in this case had to do the work anyway, and in such a case… why bother with the LLM that can’t be trusted to pull the data anyway?

      I suppose they could take the facts and figures that a human pulled and have an LLM verbose it up for people who for whatever reason want needlessly verbose BS. Or maybe an LLM can do a review of the human generated report to help identify potential awkward writing or inconsistencies. But delegating work that you have to do anyway to double check the work seems pointless.

      • pseudo@jlai.lu
        link
        fedilink
        arrow-up
        1
        ·
        6 days ago

        Like someone here said “trust is also thing”. Once you check a few time that the process is right and the result are right, you don’t need to check more than ponctually. Unfortunatly, that’s not what happened in this story.

  • MuteDog@lemmy.world
    link
    fedilink
    arrow-up
    22
    arrow-down
    1
    ·
    6 days ago

    Apparently that reddit post itself was generated with AI. Using AI to bash AI is an interesting flex.

    • Crozekiel@lemmy.zip
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 days ago

      Have any evidence of that? The only thing I saw was commentors in that thread (who were obvious AI-bros) claiming it must be AI generated because “it just wouldn’t happen”…

  • Bubbaonthebeach@lemmy.ca
    link
    fedilink
    English
    arrow-up
    38
    ·
    7 days ago

    To everyone I’ve talked to about AI, I’ve suggested a test. Take a subject that they know they are an expert at. Then ask AI questions that they already know the answers to. See what percentage AI gets right, if any. Often they find that plausible sounding answers are produced however, if you know the subject, you know that it isn’t quite fact that is produced. A recovery from an injury might be listed as 3 weeks when it is average 6-8 or similar. Someone who did not already know the correct information, could be damaged by the “guessed” response of AI. AI can have uses but it needs to be heavily scrutinized before passing on anything it generates. If you are good at something, that usually means you have to waste time in order to use AI.

    • NABDad@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      ·
      7 days ago

      I had a very simple script. All it does is trigger an action on a monthly schedule.

      I passed the script to Copilot to review.

      It caught some typos. It also said the logic of the script was flawed and it wouldn’t work as intended.

      I didn’t need it to check the logic of the script. I knew the logic was sound because it was a port of a script I was already using. I asked because I was curious about what it would say.

      After restating the prompt several times, I was able to get it to confirm that the logic was not flawed, but the process did not inspire any confidence in Copilot’s abilities.

    • laranis@lemmy.zip
      link
      fedilink
      arrow-up
      9
      ·
      7 days ago

      Happy cake day, and this absolutely. I figured out its game the first time I asked it a spec for an automotive project I was working on. I asked it the torque specs for some head bolts and it gave me the wrong answer. But not just the wrong number, the wrong procedure altogether. Modern engines have torque to yield specs, meaning essentially you torque them to a number and then add additional rotation to permanently distort the threads to lock it in. This car was absolutely not that and when I explained back to it the error it had made IT DID IT AGAIN. It sounded very plausible but someone following those directions would have likely ruined the engine.

      So, yeah, test it and see how dumb it really is.

    • Tja@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      6 days ago

      Do the same to any person online, most blogs by experts, or journalists.

      Even apparently easy to find data, like the specs of a car. Sucking and lying is not exclusive to LLMs.

  • tover153@lemmy.world
    link
    fedilink
    arrow-up
    49
    arrow-down
    1
    ·
    7 days ago

    Before anything else: whether the specific story in the linked post is literally true doesn’t actually matter. The following observation about AI holds either way. If this example were wrong, ten others just like it would still make the same point.

    What keeps jumping out at me in these AI threads is how consistently the conversation skips over the real constraint.

    We keep hearing that AI will “increase productivity” or “accelerate thinking.” But in most large organizations, thinking is not the scarce resource. Permission to think is. Demand for thought is. The bottleneck was never how fast someone could draft an email or summarize a document. It was whether anyone actually wanted a careful answer in the first place.

    A lot of companies mistook faster output for more value. They ran a pilot, saw emails go out quicker, reports get longer, slide decks look more polished, and assumed that meant something important had been solved. But scaling speed only helps if the organization needs more thinking. Most don’t. They already operate at the minimum level of reflection they’re willing to tolerate.

    So what AI mostly does in practice is amplify performative cognition. It makes things look smarter without requiring anyone to be smarter. You get confident prose, plausible explanations, and lots of words where a short “yes,” “no,” or “we don’t know yet” would have been more honest and cheaper.

    That’s why so many deployments feel disappointing once the novelty wears off. The technology didn’t fail. The assumption did. If an institution doesn’t value judgment, uncertainty, or dissent, no amount of machine assistance will conjure those qualities into existence. You can’t automate curiosity into a system that actively suppresses it.

    Which leaves us with a technology in search of a problem that isn’t already constrained elsewhere. It’s very good at accelerating surfaces. It’s much less effective at deepening decisions, because depth was never in demand.

    If you’re interested, I write more about this here: https://tover153.substack.com/

    Not selling anything. Just thinking out loud, slowly, while that’s still allowed.

    • plenipotentprotogod@lemmy.world
      link
      fedilink
      arrow-up
      8
      arrow-down
      1
      ·
      7 days ago

      Very well put. This is a dimension to the ongoing AI nonsense that I haven’t seen brought up before, but it certainly rings true. May I say also that “They already operate at the minimum level of reflection that they’re willing to tolerate.” Is a hell of a sentence and I’m a little jealous that I didn’t come up with it.

      • tover153@lemmy.world
        link
        fedilink
        arrow-up
        7
        arrow-down
        1
        ·
        7 days ago

        Thanks, I really appreciate that. I’ve been getting a little grief this weekend because some of my posts are adapted from essays I’ve been working on for Substack, and apparently careful editing now makes you suspect as an actual person.

        I’m very real, just flu-ridden and overthinking in public. Glad the line landed for you.

    • GalacticSushi@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      27
      ·
      7 days ago

      Bro, just give us a few trillion dollars, bro. I swear bro. It’ll be AGI this time next year, bro. We’re so close, bro. I just need need some money, bro. Some money and some god-damned faith, bro.

      • vaderaj@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        7 days ago

        User: Hi big corp AI(LLM), do this task

        Big Corp AI: Here is output

        User: Hi big corp your AI’s output is not up to standard I guess it’s a waste of…

        Big Corp: use this agent which ensures correct output (for more energy)

        User: it still doesn’t work…guess I was wrong all along let me retry…

        And the loop continues until they get a few trillion dollars

    • rumba@lemmy.zip
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      7 days ago

      You can make something AI based that does this, but it’s not cheap or easy. You have to make agents that handle data retrieval and programmatically make the LLM to chose the right agent. We set one up at work, it took months. If it can’t find the data with a high certainty, it tells you to ask the analytics dept.

  • db_null@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    17
    ·
    6 days ago

    I guarantee you this is how several, if not most, fortune 500 companies currently operate. The 50k DOW is not just propped up by the circlejerk spending on imaginary RAM. There are bullshit reports being generated and presented every day.

    I patiently wait. There is a diligent bureaucrat sitting somewhere going through fiscal reports line by line. It won’t add up… receipts will be requested… bubble goes pop

  • Strider@lemmy.world
    link
    fedilink
    arrow-up
    37
    ·
    7 days ago

    It doesn’t matter. Management wants this and will not stop until they run against a wall at full speed. 🤷

  • mudkip@lemdro.id
    link
    fedilink
    English
    arrow-up
    23
    ·
    6 days ago

    Ah yes, what a surprise. The random word generator gave you random numbers that aren’t actually real.

  • Jankatarch@lemmy.world
    link
    fedilink
    arrow-up
    21
    ·
    6 days ago

    Tbf at this point corporate economy is made up anyway so as long as investors are gambling their endless generational wealth does it matter?

    • wabasso@lemmy.ca
      link
      fedilink
      English
      arrow-up
      9
      ·
      6 days ago

      This is how I’m starting to see it too. Stock market is just the gambling statistics of the ownership class. Line goes down and we’re supposed to pretend it’s harder to grow food and build houses all of a sudden.

      • jj4211@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        6 days ago

        There’s a difference. If I go and gamble away my life savings, then I’m on the street. If they gamble away their investments, the government will say ‘poor thing’ and give them money to keep the economy ok.

  • CaptPretentious@lemmy.world
    link
    fedilink
    arrow-up
    26
    ·
    7 days ago

    My workplace, the senior management, is going all in on Copilot. So much so that at the end of last year to told us to use Copilot for year end reviews! Even provided a prompt to use, told us to link it to Outlook (not sure why, since our email retention isn’t very long)… but whatever.

    I tried it, out of curiosity because I had no faith. It started printing out stats for things that never happened. It provided a 35% increase here, a 20% decress there, blah blah blah. It didn’t actually highlight anything I do or did. And I’m banking that a human will partially read my review, not just use AI.

    If someone read it, I’m good. If AI reads it, I do wonder if I screwed myself. Since senior mgmt is just offloading to AI…

    • jj4211@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      6 days ago

      Ah, the fun of performance reviews. No one actually cares what is written there, the result is decided ignoring the actual content.

      So everyone pretends that what you write in there is important and pretends that the written response is important, but nothing you or they will write has any chance of changing promotions and raises. Those may come, but when they come, it’s never because someone read your write up and thought ‘OMG, give that person a raise and promotion’.

      So it’s all an act so I can see why management wants to take any opportunity to shuffle people off to even more token efforts.

      Every year I try to convince my coworker that his hours and hours of scrutinizing his records and crafting just the perfect performance review that captures the essence of his entire year is wasted, compared to me logging into the tool and spending 10 minutes writing some vague stuff off the top of my head. I don’t lie or anything, just have a relatively brief and vague review, because I know they already know how much they cared about what I did and I’m not talking them into more.

  • Decq@lemmy.world
    link
    fedilink
    arrow-up
    16
    ·
    6 days ago

    Surely this is just fraud right? Seeing they have a board directors they have shareholders probably? I feel they should at least all get fired, if not prosecuted. This lack of competency is just criminal to me.

  • Lemminary@lemmy.world
    link
    fedilink
    arrow-up
    20
    ·
    edit-2
    7 days ago

    Our AI that monitors customer interactions sometimes makes up shit that didn’t happen during the call. Any agent smart enough could probably fool it into giving the wrong summary with the right key words. I only caught on when I started reading the logs carefully, but I don’t know if management cares so long as the business client is happy.

    • jj4211@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      6 days ago

      Sounds like material that is generated that the executives demand be generated but never actually uses. My work has a ton of this, because the executives want people to feel like they are accountable and being reviewed even as they know the executives don’t understand the direct output of their work, so people have to do the technical thing and separately eternally do non-technical writeups of what the technical work meant. I think someone checked and the executives didn’t even log into the system they demanded.

      So LLM to generate the bullshit that no one wants to write or read but wants to pretend it’s important.