• # May 10, 2013 at 9:43 am

    I’m just curious. How do large-scale websites like FB and twitter store images, and go about retrieving them?
    They are named in what looks like a hash of some sort, including the directory structure.

    I’m just being curious here. If anyone knows anything as to how and why they do what they do please point me to where I can learn!

    # May 10, 2013 at 10:32 am

    Welll here’s one of mine:

    I think there might be some clues in that string.

    # May 10, 2013 at 10:36 am

    i think the 858221159 is the secret right Paulie?

    (ontopic look for cdn’s .is like chaching on servers across the world)

    # May 10, 2013 at 10:43 am

    TBH I dunno but my assumption is that that that string includes reference to me specifically (by user no or something) and to the specific image.

    The ‘folder’ **may** relate to the ‘type’ of photo…in this case it’s a profile header image (hphotos?) etc., etc.

    # May 10, 2013 at 10:50 am

    This reply has been reported for inappropriate content.

    Care to share some insights I’m missing asides from CDN’s?

    For example, the image name… why do they even bother renaming it to that? Is it a random filename generated based on time upload? Is it a government secret hash that will burn your soul if you know why they generate the filenames the way they do? haha

    I guess same goes for sites like flickr. They have very similar filenames. The directory structure is different (all numeric) and they are also hosted on a farm server… literally named farm8 on the account I’ve been looking at the past 2 mins.

    *please excuse my ignorance on any of the above

    # May 10, 2013 at 10:53 am

    This reply has been reported for inappropriate content.

    I didn’t see your post before replying.
    It seems like the filenames of both image-hosting sites are similar. I wonder if it’s a similar hash of sorts.
    Directory structure is all a guess to me.

    I wonder if they create directories per user… or just one larger directory…. but there are file limits in directories aren’t there?

    # May 10, 2013 at 10:54 am

    the link is just a subfolder from a well known cdn(Content delivery network)

    a nice article for fooman getting started

    # May 10, 2013 at 10:56 am

    >why do they even bother renaming it to that?

    Because you could have a billion images all called ‘myimage01’. Renaming it makes it unique.

    As to how they assemble the string, that’s anyone’s guess.

    # May 10, 2013 at 11:21 am

    This reply has been reported for inappropriate content.

    Thanks guys! Much appreciated pointing me in the right direction.

    # May 10, 2013 at 4:41 pm

    This reply has been reported for inappropriate content.

    After more research and seeing what the general consensus is online, I found this site:
    File Name Hashing

    It’s all well and good, but a few things I’m not sure about.

    1. I’ve read in a few places that keeping the files limited to 1000 per directory is a good idea (for various reasons differing here and there, but it’s the number often stated). What is keeping the file number to this 1000 limit on the previous algorithm? I’ve actually not seen any algorithm that actually enforces the limit, it just kinda depends on the hash to spread out the files.

    2. How does the previous link ensure that the files are spread evenly?

    3. Why does he use bytes as directory structure indicators, rather than just using the first few chars of the hash?

    Is all this worth it if you aren’t storing a million user images? It’s awesome to know either way.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.