SQL random - CSS-Tricks

This topic is empty.

Viewing 4 posts - 1 through 4 (of 4 total)

Author

Posts
October 15, 2014 at 7:02 am #186293

fooman
Participant

Is there truly anything wrong with:

SELECT *
FROM tableName
ORDER BY RAND
LIMIT 1

Assuming we have less than a million records haha
I read a lot of articles online and a lot of them seem to go super deep into the subject but are built on the impression that you have a large dataset.

And if this is incorrect, how would YOU do it efficiently in a table with ~50 records?

October 15, 2014 at 8:06 am #186305

__
Participant

If it’s only 50 records, it _almost_* doesn’t really matter how you do it. However, be aware that the “trouble” point is well under one million. : )

* that * is actually more likely to cause trouble. You should always explicitly specify the column names you want, and only select the ones you actually need.

In your case, if any of those columns are large (e.g., TEXT columns), things might go slowly even though it’s a small number of records. Think of it this way: you’re selecting the entire table just to pick one record. That’s like putting every carton of eggs in your shopping basket but only buying one (at random, no less).

You might consider selecting one random id first (make sure this column is indexed), and then selecting only that record.

October 15, 2014 at 10:03 am #186315

fooman
Participant

Solid analogy haha :)

How do you go about doing that if, say, the table has gaps between ids?
Articles are great for learning and all that, but I’d love to see what you guys would actually do in real-world scenerios where you need to choose a random news article in a table of 200 on every page load or something.

October 15, 2014 at 10:29 am #186319
__
Participant
This exact question is addressed in a book by Bill Karwin called SQL Antipatterns. If you want to get more serious about DBs, I highly recommend it. Regardless, I think the “random selection” question is published online (IIRC, that’s how I found the book in the first place).

Assuming int PKs named “id“, this will select a record with a random id between 0 and the greatest id, or the next highest if that number doesn’t exist. Not as even a distribution (records right after a gap will be selected more frequently—usually not a big deal), but scales fantastically:
```
select * from tableName t1
join (
    select ceil(
        rand() * (select max(id) from tableName)
    ) randID
) t2
where t1.id >= t2.randID
order by t1.id limit 1;
```
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

The forum ‘Other’ is closed to new topics and replies.