• NιƙƙιDιɱҽʂ@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    2 days ago

    Right, other words see higher attention as it builds a sentence, leading it towards where it “wants” to go, but LLMs literally take a series of words, then spit out then next one. There’s a lot more going on under the hood as you said, but fundamentally that is the algorithm. Repeat that over and over, and you get a sentence.

    If it’s writing a poem about flowers and ends the first part on “As the wind blows,” sure as shit “rose” is going to have significant attention within the model, even if that isn’t the immediate next word, as well as words that are strongly associated with it to build the bridge.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 day ago

      The attention mechanism working this way was at odds with the common wisdom across all frontier researchers.

      Yes, the final step of the network is producing the next token.

      But the fact that intermediate steps have now been shown to be planning and targeting specific future results is a much bigger deal than you seem to be appreciating.

      If I ask you to play chess and you play only one move ahead vs planning n moves ahead, you are going to be playing very different games. Even if in both cases you are only making one immediate next move at a time.