@sweng

sweng@programming.dev · 20 days ago

Isn’t that the point of the article? It’s not open-source currently, but will be, once the AGPL option is added.

sweng@programming.dev · 3 months ago

They can, and are being made. E.g. the state of accessibility on Gome.

sweng@programming.dev · 4 months ago

I think you are replying to the wrong person?

I did not say it helps with accuracy. I did not say LLMs will get better. I did not even say we should use LLMs.

But even if I did, non of your points are relevant for the Firefox usecase.

sweng@programming.dev · 4 months ago

Wikipedia is no less reliable than other content. There’s even academic research about it (no, I will not dig for sources now, so feel free to not believe it). But factual correctness only matters for models that deal with facts: for e.g a translation model it does not matter.

Reddit has a massive amount of user-generated content it owns, e.g. comments. Again, the factual correctness only matters in some contexts, not all.

I’m not sure why you keep mentioning LLMs since that is not what is being discussed. Firefox has no plans to use some LLM to generate content where facts play an important role.

sweng@programming.dev · edit-2 4 months ago

What do you mean “full set if data”?

Obviously you can not train on 100% of material ever created, so you pick a subset. There is a a lot of permissively licensed content (e.g. Wikipedia) and content you can license (e.g. Reddit). While not sufficient for an advanced LLM, it certainly is for smaller models that do not need wide knowledge.

sweng@programming.dev · 4 months ago

I’d say the main differences are at least

package availability
update frequency
backporting
packaging philosophy (e.g. plain upstream vs customizations, include all funtionality in single packege vs split out optional features)
default confguration for packages

sweng@programming.dev · edit-2 4 months ago

Feel free to assume that, but don’t claim an assumption as a fact.

You recommended using native package managers. How many of them have been audited?

sweng@programming.dev · edit-2 4 months ago

You know what else we shouldn’t assume? That that it doesn’t have a security feature. And we additionally then shouldn’t go around posting that incorrect assumption as if it were a fact. You know, like you did.

sweng@programming.dev · 4 months ago

There is no general copyright issue with AIs. It completely depends on the training material (if even then), so it’s not possible to make blanket statements like that. Banning technology, because a particular implementation is problematic, makes no sense.

sweng@programming.dev · edit-2 4 months ago

I’m confused why you think it would be anything else, and why you are so dead set on this. Repos include a signing key. There is an option to skip signature checking. And you think that signature checking is not used during downloads, despite this?

Ok, here are a few issues related to signatures being checked by default, when downloading: https://github.com/flatpak/flatpak/issues/4836 https://github.com/flatpak/flatpak/issues/5657 https://github.com/flatpak/flatpak/issues/3769 https://github.com/flatpak/flatpak/issues/5246 https://askubuntu.com/questions/1433512/flatpak-cant-check-signature-public-key-not-found https://stackoverflow.com/questions/70839691/flatpak-not-working-apparently-gpg-issue

Flatpak repos are signed and the signature is checked when downloading.

It’s OK to be wrong. Dying on this hill seems pretty weird to me.

sweng@programming.dev · edit-2 4 months ago

From the page:

It is recommended that OSTree repositories are verified using GPG whenever they are used. However, if you want to disable GPG verification, the --no-gpg-verify option can be used when a remote is added.

That is talking about downloading as well. Yes, you can turn it off, but so can you usually do it with native package managers, e.g. pacman: https://wiki.archlinux.org/title/Pacman/Package_signing

sweng@programming.dev · edit-2 4 months ago

That doesn’t seem to be true? https://flatpak-testing.readthedocs.io/en/latest/distributing-applications.html#gpg-signatures

sweng@programming.dev · 4 months ago

In what way don’t they “securely download” ?

sweng@programming.dev · 4 months ago

Do you hapen to know where? Searching seems to give no results.

sweng@programming.dev · edit-2 4 months ago

In theory, if you have the inputs, you have reproducible outputs, modulo perhaps some small deviations due to non-deterministic parallelism. But if those effects are large enough to make your model perform differently you already have big issues, no different than if a piece of software performs differently each time it is compiled.

sweng@programming.dev · 4 months ago

The analogy works perfectly well. It does not matter how common it is. Pstching binaries is very hard compared to e.g. LoRA. But it is still essentially the same thing, making a derivative work by modifying parts of the original.

sweng@programming.dev · edit-2 4 months ago

I don’t see your point? What is the “source” for Mona Lisa I would use? For LLMs I could reproduce them given the original inputs.

Creating those inputs may be an art, but so could any piece of code. No one claims that code being elegant disqualifies it from being open source.

sweng@programming.dev · 4 months ago

How is that different then e.g. patching a closed-sourced binary? There are plenty of community patches to old games to e.g. make them work on newer hardware. Architectural independence seems irrelevant, it’s no different than e.g Java bytecode.

sweng@programming.dev · 4 months ago

It would depend on the format what is counted as source, and what isn’t.

You can create a picture by hand, using no input data.

I challenge you to do the same for model weights. If you truly just sit down and type away numbers in a file, then yes, the model would have no further source. But that is not something that can be done in practice.

sweng@programming.dev · edit-2 4 months ago

“Open source” and “source available” are different things. See e.g. https://opensource.org/osd and https://opensource.com/article/18/2/coining-term-open-source-software