- This topic is empty.
October 15, 2014 at 7:02 am #186293foomanParticipant
Is there truly anything wrong with:
ORDER BY RAND
Assuming we have less than a million records haha
I read a lot of articles online and a lot of them seem to go super deep into the subject but are built on the impression that you have a large dataset.
And if this is incorrect, how would YOU do it efficiently in a table with ~50 records?October 15, 2014 at 8:06 am #186305__Participant
If it’s only 50 records, it _almost_* doesn’t really matter how you do it. However, be aware that the “trouble” point is well under one million. : )
*is actually more likely to cause trouble. You should always explicitly specify the column names you want, and only select the ones you actually need.
In your case, if any of those columns are large (e.g.,
TEXTcolumns), things might go slowly even though it’s a small number of records. Think of it this way: you’re selecting the entire table just to pick one record. That’s like putting every carton of eggs in your shopping basket but only buying one (at random, no less).
You might consider selecting one random
idfirst (make sure this column is indexed), and then selecting only that record.October 15, 2014 at 10:03 am #186315foomanParticipant
Solid analogy haha :)
How do you go about doing that if, say, the table has gaps between ids?
Articles are great for learning and all that, but I’d love to see what you guys would actually do in real-world scenerios where you need to choose a random news article in a table of 200 on every page load or something.October 15, 2014 at 10:29 am #186319__Participant
This exact question is addressed in a book by Bill Karwin called SQL Antipatterns. If you want to get more serious about DBs, I highly recommend it. Regardless, I think the “random selection” question is published online (IIRC, that’s how I found the book in the first place).
intPKs named “
id“, this will select a record with a random id between 0 and the greatest id, or the next highest if that number doesn’t exist. Not as even a distribution (records right after a gap will be selected more frequently—usually not a big deal), but scales fantastically:
select * from tableName t1 join ( select ceil( rand() * (select max(id) from tableName) ) randID ) t2 where t1.id >= t2.randID order by t1.id limit 1;
- The forum ‘Other’ is closed to new topics and replies.