Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not to sound too sales-y but if you are looking into clickhouse and are currently based on postgres, you might also want to check out timescale. Since we're just a postgres extension it's 100% compatible with existing systems but provides a lot of the same speed benefits as clickhouse for analytical queries.

Don't be confused by the timeseries branding.



I've been very confused by the timeseries branding - I had always thought timescale was purely about adding time series features to PostgreSQL. I didn't know the extension had other applications.

Looks like you've expanded into vector indexing - https://github.com/timescale/pgvectorscale - and an extension which bakes RAG patterns (including running prompts from SQL queries) into PostgreSQL: https://github.com/timescale/pgai


That's interesting. Our first extension (TimescaleDB) is great for time-series and real-time analytics.

And yes you are correct, pgvectorscale scales pgvector for embeddings, and pgai includes dev experience niceties for AI (eg automatic embedding management).

Would love to hear any suggestions on how we could make this less confusing. :-)


The name of the company is timescale. That’s what’s confusing.


People form initial impressions of a company and what they do, then file those away in their heads and it can be hard to get them to expand their mental model later on.

I guess that's why we have marketing teams!


Honestly, before I joined timescale I had the same impression. Since then I learned that a bunch of the improvements timescale brings (continuous aggregates, hybrid row/column storage and automatic partitioning) are much more widely useful than just IOT sensor data. There is definitely some room for improvement just in communication there.

Ajay already commented, that he's open to new ideas on how to frame timescale. I for myself always thought of postgres as the best jack of all trades database. It's basically the best db if you don't 100% know what's the best choice yet. Timescale expands on that and enhances postgres's capabilities even further so that use-cases which would usually call for a second storage option (e.g. analytics, vector) end up working great with just postgres itself.

I'd personally love if we also had a full-text offering akin to paradedb/pg_search so noone would ever need to host elasticsearch again. But it also doesn't make sense to spread the valuable postgres expert resources too thin.


Not at all too sales-y.

I'm all for keeping as much as possible in your initial Postgres deployment as possible. If your team isn't having to work around things and things "just work" it's a wonderful conjunction of requirements and opportunity. It's incredible how much you can get out of a single instance, really remarkable. I'd also add it's still worth it even if there is a little pain.

But I've found that once I cross about 8-12 terabytes of data I need to specialize, and that a pure columnar solution like ClickHouse really begins to shine even compared to hybrid solutions given the amortized cost of most analytic workloads. This difference quickly adds up and I think at that scale really makes a difference to the developer experience that a switch is worth the consideration. Otherwise stick to Postgres and save your org some money and more importantly sanity.

You reach a point when you have enough queries doing enough work that the extra I/O and memory required by PAX/hybrid becomes noticeably more costly than pure columnar, at least for the workloads that I have experience with.

ClickHouse is now in my toolbox right alongside Postgres with things to deploy that I can trust to get the job done.


Great summary and spot on! Once you reach that TBs scale Postgres is hard to scale. Yes, you could make Postgres scale to larger scales but it would need deep expertise and architecting and the experience wouldn’t be “it just works”. Ex-Citus here, we had PB scale deployments which needed multiple months of effort to implement and an expert team to manage. Eventually many (ex: CloudFlare, Heap) migrated to purpose built stores like ClickHouse, Singlestore. And not to forget storage costs. Sure there was compression/columnar in Postgres/Citus too - but it didn’t fare well compared to pure columnar stores.

(Disclaimer: This is Sai from ClickHouse/PeerDB team)


YMMV but our largest internal dogfooded Timescale instance is 100s of terabytes

https://www.timescale.com/blog/how-we-scaled-postgresql-to-3...

(Post is a year old, IIRC the database is over one petabyte now)


Totally doable, of course. But I'll need fewer ClickHouse servers for the same amount of data, and I'll get more utilization out of them with faster query times. High selectivity combined with colocated row data means that hybrid storage formats will need to read more I/O, use more memory, and churn through more buffer for the same queries.


At the risk of getting my CEO angry (sorry Ajay :D): ClickHouse is great. But it also means hosting another database and losing ACID compliance, the question is often not ClickHouse vs Timescale but Postgres + ClickHouse vs just Timescale.

In general the argument I was originally trying to make is not to never use ClickHouse, I think it's a great product. But if you already are on postgres, it might just be easier to give Timescale a try than to adapt everything to work with ClickHouse right away. There is more to consider here than raw query speed.

And while I'm sure the systems behave differently scale and speed wise, I also wouldn't say Timescale looses straight up, there is situations where Timescale is faster and if it really breaks down for a use-case nothing stops you from still doing the postgres to ClickHouse migration. In the end timescale is just a better postgres, so there is no lock in.


A few other things I can think of as well

- you'd probably at least want a read replica so you're not running queries on your primary db

- if you're going to the trouble of setting up a column store, it seems likely you're wanting to integrate other data sources so need some ETL regardless

- usually column store is more olap with lower memory and fast disks whereas operational is oltp with more memory and ideally less disk io usage

I suppose you could get some middle ground with PG logical rep if you're mainly integrating PG data sources


We use TSDB and are pretty happy with it.

But it is much less performant than CH.


Glad you like it! And yes I'm not saying Timescale is better than Clickhouse generally. But it does avoid having to host a second database, you keep ACID compliance, your application code doesn't have to change... There is more to analytics than raw speed, and even in raw speed we're slowly catching up. For some types of queries TS actually performs better than Clickhouse afaik, but I'm not a benchmarking expert so take it with a grain of salt.

Always choose the right tool for the right job, Clickhouse is amazing software. I just wanted to mention that if someone currently runs analytics queries via postgres and runs into performance issues, trying out timescale doesn't really hurt and might be a simpler solution than migrating to Clickhouse.


Timescale is a very nice product but not at all close to clickhouse in terms of speed based on my own tests on very large tables (billions of rows)


How does Timescale compare to other extensions like Citus and Hydra/DuckDB?


I'm not an expert on either of those, but this is my take anyway: Citus is distributed and afaik just eventually consistent, for a lot of folks that's simply not an option as their application relies on ACID compliance.

As far as I understand Hyda + DuckDB it is a higher level add-on onto postgres than timescale is. This is on the one hand nice since you can just put it into an existing database without any migration effort whatsoever. But this also means they likely don't really interact with systems like the storage engine. For example Timescales deeper integration allows us to actually store the data differently on disk which allows for saving space via compression.


for one thing, depending on your licensing contraints:

- Citus is AGPLv3 https://github.com/citusdata/citus/blob/v13.0.1/LICENSE

- Hydra is Apache 2 https://github.com/hydradatabase/columnar/blob/v1.1.2/LICENS...

- Timescale is mostly Apache 2 https://github.com/timescale/timescaledb/blob/2.18.2/LICENSE


How do you guys fare for ad tech aggregation? We have something similar to this: https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

But actively trying to simplify and remove as many gears as possible.


I didn't expect so many comments. I'm about to fly cross Atlantic and can't answer appropriately to everyone right now without internet but will try to do it justice once I'm home.


Could you go into the details of how one might go about replicating a PG db to a tsdb one? I assume application level would not be the most simple/reliable?


As other folks have already mentioned, since timescale is just a postgres extension you can install it on your existing postgres instance and don't need to migrate anything.

Of course if you come to our cloud you're going to have to do some sort of migration effort but that shouldn't be more complicated than going from one postgresdb to another.


you don't

the data stays in PGDB - TSDB is an extension installed onto the data base server


Exactly. You can have the best of both worlds with Timescale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: