• CrabLangEnjoyer@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 year ago

    A current state of the art ai model from Microsoft can achieve acceptable quality with about 3 seconds of audio. Commercially available stuff like eleven labs about 30 minutes. But quality will obviously vary heavily but then again they’re using a low quality phone call so maybe not that important

    • sramder@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      That’s downright scary :-) I think it took longer in the last Mission Impossible.

      30 minutes is still pretty minimal for the kind of targeted attack it sounds like this is used for. I suppose we all need to work with our families on code words or something.

      I went in thinking the article was a bit alarmist, but that’s clearly not the case. Thank for the insight.