Follow

New article: From SQL to Nested Data Structures.
dev.yorhel.nl/doc/sqlobject

@ayo Only skimmed through the article, so sorry if you already mentioned it and I didn't notice, but:

In my (small) experience, queries with a "WHERE id IN (... lots... of...ids...)" part tend to be very slow, at least in Postgres.

Joins tend to work better, and since you do the grouping on the application side anyway, you may as well modify the initial query to be "SELECT * FROM games AS g JOIN releases AS r ON r.game_id = g.game_id", which should be faster.

Also, if you "ORDER BY game_id" on the db side, you could then optimize the aggregation by taking advantage of releases with the same game_id being next to each other in the query results.

@Wolf480pl Interesting, using "IN(lots of ids)" has been really fast for me, and that's what I've been using on VNDB for the past 12 years. There were a few points where Postgres had trouble dealing with >50 ids in one query, but that was fixed at some point.

The problem with that join is that it either works only for 1-to-1 mappings, or you duplicate all the game columns with the results. Fetching multiple nested lists that way gets complex and slow pretty fast.

ORDER BY optimization: Possibly, but such optimizations tend to make the code less general, and perl hash tables have been fast enough for me. You'll usually don't want to fetch more data from the SQL database than you're going to display to the user anyway, so the performance improvement may not matter much.

@ayo
>50 ids in one query

I'm talking more like 500 ids in one query. Well, I guess your dataset doesn't have as big fanouts as mine, so it works fine in your case.

That in my case the list of IDs was generated by Django ORM (i.e. lots of Python code) probably doesn't help either.

@ayo as for duplicating game columns, AFAIK it doesn't hurt if there isn't too many of them, or if you just need one or two and select exactly those which you need.

@Wolf480pl @ayo remind me, are we still on pg 9.4 with that?
@Wolf480pl @ayo that setup was especially effective when you were a super user but the application still did `WHERE id IN` on 1500 contests in batches of 500

@314 @ayo
context:
The query's purpose was to check which contests you're allowed to see.
Being a superuser means it doesn't need to query the database because you're allowed to see everything.

@Wolf480pl @ayo you mean, should not need to query the database, but thanks to the recently introduced third permission type, it needed a better fix than "if user.is_super_user"

@Wolf480pl 50 was the cutoff where Postgres used to have trouble, I have a bunch of pages with some in the 1000 range. This character listing listing for example has lots of traits on one page... vndb.org/v16106/chars#chars

@ayo wat... a VN with 160 characters?

Also, which join are you talking about? character -< caracter_trait?
That'd be 160 ids... still a lot tho.

@Wolf480pl Hmm, indeed not in the thousands. I thought I had a much larger IN() somewhere, but can't find it now.

Maybe that explains why it hasn't been a performance issue for me, I guess. :blobcat:

Sign in to participate in the conversation
niu.moe

Welcome to your niu world ! We are a cute and loving international community O(≧▽≦)O !