Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Static linking prevents that, at the cost of disk space and memory [...]

Some deduplication work for disk and memory pages should be able to help here?

(It might need compiler and linker support to produce binaries that are easier to deduplicate. Eg you might want to disable unused-function elimination for your static libraries and restrict whole program 'Link Time Optimization'? Or you might want your deduplicator to be a bit smarter and store diffs internally, like git does? Or your build system can work this way in the first place and produce diffs against common bases.

I don't know what's optimal or even practical, but you can do static linking and still save memory and disk space.)



In principle, yes? But it's a much more roundabout way of saving space. Reducing or avoiding optimizations like LTO or unused function elimination is at odds with minimizing binary sizes and maximizing performance. It's asking developers to prioritize the disk usage of the system as a whole over the performance of their own software.


You are right. But it's the same roundabout way that git is using.

Older version control systems like subversion used to store diffs.

Users of Git care a lot of about diffs between versions. And typically treat a specific commit as if it was a diff. Commands like 'git cherry-pick' re-inforce that notion.

However, internally each git commit is a complete snapshots of the state of the repository. Diffs are created on the fly as derived data.

Now even more internally, git does store snapshots as mostly as deltas. But to close the circle: those deltas have nothing to do with the diffs that users see.

This sounds very roundabout, but results in a simple and performant system, that doesn't ask the user to make compromises.

My suggestion for static libraries was along the same lines, if a bit clumsier.


I think deduping distinct but equal memory pages and marking them as copy on write is a relatively new feature in kernels. Easily 20 years later than shared libraries becoming common.


In this case you wouldn't need copy-on-write, because executable pages aren't writeable these days. https://en.wikipedia.org/wiki/W%5EX


I'd say from a kernel perspective they should be copy-on-write.

Firstly, the generalized kernel mechanism which scans for equal pages and de-dupes them [which by the way is disabled by default on Linux] probably doesn't care about if it's working on data or code; it seems like the primary use case at its introduction was for KVM, which, a kernel probably loads code pages and hence writes to them at some point, such as when it reads them from disk.

Second, someone can use mprotect(2) to make them writeable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: