Forums

The forums ran from 2008-2020 and are now closed and viewable here as an archive.

Home Forums Other SQL random

  • This topic is empty.
Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #186293
    fooman
    Participant

    Is there truly anything wrong with:

    SELECT *
    FROM tableName
    ORDER BY RAND
    LIMIT 1

    Assuming we have less than a million records haha
    I read a lot of articles online and a lot of them seem to go super deep into the subject but are built on the impression that you have a large dataset.

    And if this is incorrect, how would YOU do it efficiently in a table with ~50 records?

    #186305
    __
    Participant

    If it’s only 50 records, it _almost_* doesn’t really matter how you do it. However, be aware that the “trouble” point is well under one million. : )

    * that * is actually more likely to cause trouble. You should always explicitly specify the column names you want, and only select the ones you actually need.

    In your case, if any of those columns are large (e.g., TEXT columns), things might go slowly even though it’s a small number of records. Think of it this way: you’re selecting the entire table just to pick one record. That’s like putting every carton of eggs in your shopping basket but only buying one (at random, no less).

    You might consider selecting one random id first (make sure this column is indexed), and then selecting only that record.

    #186315
    fooman
    Participant

    Solid analogy haha :)

    How do you go about doing that if, say, the table has gaps between ids?
    Articles are great for learning and all that, but I’d love to see what you guys would actually do in real-world scenerios where you need to choose a random news article in a table of 200 on every page load or something.

    #186319
    __
    Participant

    This exact question is addressed in a book by Bill Karwin called SQL Antipatterns. If you want to get more serious about DBs, I highly recommend it. Regardless, I think the “random selection” question is published online (IIRC, that’s how I found the book in the first place).

    Assuming int PKs named “id“, this will select a record with a random id between 0 and the greatest id, or the next highest if that number doesn’t exist. Not as even a distribution (records right after a gap will be selected more frequently—usually not a big deal), but scales fantastically:

    select * from tableName t1
    join (
        select ceil(
            rand() * (select max(id) from tableName)
        ) randID
    ) t2
    where t1.id >= t2.randID
    order by t1.id limit 1;
    
Viewing 4 posts - 1 through 4 (of 4 total)
  • The forum ‘Other’ is closed to new topics and replies.