• 0 Posts
  • 21 Comments
Joined 5 months ago
cake
Cake day: July 10th, 2024

help-circle
  • For the non-roboticists: SLAM = Simultaneous Localization And Mapping.

    In robot navigation problems we often face the problem to get a grasp of the environment and the robot’s position in it. It’s easier if there’s already a map provided and some sort of external observer who knows where the robot is relative to the map.

    Since people don’t usually go into your home to map it out and install some sensors in order to locate the robot, SLAM is the way to go. While moving through an environment, a map of the environment is created and by utilzing some fancy techniques based on sensor data like from cameras, mic+loudspeaker, LIDAR or whatever, it is possible to also infer the robot’s position.


  • God forbid people have some self expression

    They do indeed forbid it.

    10 "If you go to battle against your enemies, and the LORD your God delivers them into your control, you may take some prisoners captive. 11 If you see among the prisoners a beautiful woman and you desire her, then you may take her as your wife. 12 Bring her to your house, but shave her head and trim her nails

    Deuteronomy 21

    Oh man, religions are batshit crazy.



  • My point is, that the following statement is not entirely correct:

    When AI systems ingest copyrighted works, they’re extracting general patterns and concepts […] not copying specific text or images.

    One obvious flaw in that sentence is the general statement about AI systems. There are huge differences between different realms of AI. Failing to address those by at least mentioning that briefly, disqualifies the author regarding factual correctness. For example, there are a plethora of non-generative AIs, meaning those, not generating texts, audio or images/videos, but merely operating as a classifier or clustering algorithm for instance, which are - without further modifications - not intended to replicate data similar to its inputs but rather provide insights.
    However, I can overlook this as the author might have just not thought about that in the very moment of writing.

    Next:
    While it is true that transformer models like ChatGPT try to learn patterns, the most likely token for the next possible output in a sequence of contextually coherent data, given the right context it is not unlikely that it may reproduce its training data nearly or even completely identically as I’ve demonstrated before. The less data is available for a specific context to generalise from, the more likely it becomes that the model just replicates its training data. This is in principle fine because this is what such models are designed to do: draw the best possible conclusions from the available data to predict the next output in a sequence. (That’s one of the reasons why they need such an insane amount of data to be trained on.)
    This can ultimately lead to occurences of indeed “copying specific texts or images”.

    but the fact that you prompted the system to do it seems to kind of dilute this point a bit

    It doesn’t matter whether I directly prompted it for it. I set the correct context to achieve this kind of behaviour, because context matters most for transformer models. Directly prompting it do do that was just an easy way of setting the required context. I’ve occasionally observed ChatGPT replicating identical sentences from some (copyright-protected) scientific literature when I used it to get an overview over some specific topic and also had books or papers about that on hand. The latter demonstrates again that transformers become more likely to replicate training data the more “specific” a context becomes, i.e., having significantly less training data available for that context than about others.






  • If we’re speaking of transformer models like ChatGPT, BERT or whatever: They don’t have memory at all.

    The closest thing that resembles memory is the accepted length of the input sequence combined with the attention mechanism. (If left unmodified though, this will lead to a quadratic increase in computation time the longer that sequence becomes.) And since the attention weights are a learned property, it is in practise probable that earlier tokens of the input sequence get basically ignored the further they lie “in the past”, as they usually do not contribute much to the current context.

    “In the past”: Transformers technically “see” the whole input sequence at once. But they are equipped with positional encoding which incorporates spatial and/or temporal ordering into the input sequence (e.g., position of words in a sentence). That way they can model sequential relationships as those found in natural language (sentences), videos, movement trajectories and other kinds of contextually coherent sequences.