It strikes me reading this, that edited archives of author's works is an antiqua...

delusional · on July 2, 2024

> What a treasure it would be to have an LLM that is trained on every single piece of philosophical, religious, political, economic, etc. writing from the earliest Sumerian clay tablets to the current copyright cut-off date.

It strikes me that such an LLM would have weights tuned only predicting the languages that were put into it. It would be unable to connect those texts to modern ideas and modern language.

You can ask it a out Vatican texts, but only in Vatican language.

zoogeny · on July 2, 2024

I suppose that is only if you assume I meant "only trained on ..." which is a limited reading of that idea.

But even if you were to make that assumption, I feel pretty confident that an LLM trained on 5000 years of recorded language, from the Egyptian Hieroglyphs, through Hellenic Greek, through Shakespeare and including all text in all languages up to 1928, would be a pretty broad base of training.

czarit · on July 2, 2024

Challenge: Tell me you have never read a thoughtful and contextualizing scholarly archival study without saying so.

Solution: "Dump it in a database and let AI sort it out"

zoogeny · on July 2, 2024

If the available options are:

1. Purchase an out-of-print copy of a scholarly archival study on ebay for $100+

2. Load the original raw contents into an LLM and perform the analysis myself

I think the freedom to choose would be a massive benefit. It doesn't prevent you from doing what you want to do.