Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It strikes me reading this, that edited archives of author's works is an antiquated concept. I can understand that back in the 1800s (or even the 1930s) having a giant box of paper would have been daunting to sort out, prepare for publication and then release. But in 2024? It seems ridiculous. Dump it all in SQLite and put it onto the Internet. Let AI sort it out.

I really wish a lot of antique content was available this way. I like to watch YouTube channels like Esoterica [1] and often he will lament that scholarly editions of ancient works are either unavailable or only available with much effort at exorbitant prices. We are living in a time where I should be able to have access to the entire Nag Hammadi library as high quality images that I can feed into an LLM for casual analysis. Imagine the entire Vatican Library available in a format similar to The Pile.

What a treasure it would be to have an LLM that is trained on every single piece of philosophical, religious, political, economic, etc. writing from the earliest Sumerian clay tablets to the current copyright cut-off date.

1. https://www.youtube.com/@TheEsotericaChannel



> What a treasure it would be to have an LLM that is trained on every single piece of philosophical, religious, political, economic, etc. writing from the earliest Sumerian clay tablets to the current copyright cut-off date.

It strikes me that such an LLM would have weights tuned only predicting the languages that were put into it. It would be unable to connect those texts to modern ideas and modern language.

You can ask it a out Vatican texts, but only in Vatican language.


I suppose that is only if you assume I meant "only trained on ..." which is a limited reading of that idea.

But even if you were to make that assumption, I feel pretty confident that an LLM trained on 5000 years of recorded language, from the Egyptian Hieroglyphs, through Hellenic Greek, through Shakespeare and including all text in all languages up to 1928, would be a pretty broad base of training.


Challenge: Tell me you have never read a thoughtful and contextualizing scholarly archival study without saying so.

Solution: "Dump it in a database and let AI sort it out"


If the available options are:

1. Purchase an out-of-print copy of a scholarly archival study on ebay for $100+

2. Load the original raw contents into an LLM and perform the analysis myself

I think the freedom to choose would be a massive benefit. It doesn't prevent you from doing what you want to do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: