Site Announcements » Floor Bored's Dev Blog » Post 3
The Symlink Situation on Twibooru
Wow, that's a bit of an ominous title. Let's get out of the way what, exactly, I mean by that.
Talking with a friend of mine, it was recently brought to my attention that Twibooru makes a lot of symlinks to images on its local filesystem. Well, I knew this already, but I didn't really consider the implications of this quite fully.
Why all the symlinks?
Let's look at some code:
# VERSIONS_TO_GENERATE looks a little like this: VERSIONS_TO_GENERATE = { thumb: [250, 250], ... } # And somewhere else... VERSIONS_TO_GENERATE.each_pair do |type, size| w, h = size dest_file = "#{dir}/#{type}.#{version_file_ext}" # If the version we're about to generate is smaller, generate it. if ((w < model.image_width || h < model.image_height)) || processor.is_video? || model.image_mime_type == 'image/gif' processor.generate_version(size, dest_file) else # just link it platform_link(processor.rasterized, dest_file) end end
Pretty simple, hopefully. A little messy, but it gets the job done. If a given image size/"version" we're about to generate is going to be LARGER than the original image file, skip generating it and just symlink the original image file to it, to save both CPU time and disk space, since it makes absolutely no sense to enlarge an image to create a thumbnail.
platform_link
is simply a function that was added by the original Booru-on-Rails developers, which doesn't attempt to make a symlink on Windows because Windows doesn't support typical unix-style symlinks.
The better way
Why do we need to store all of these symlinks? They take up disk space (seriously! It's small, but it does add up when you have many thousands or even millions of them), slow down access (you have to dereference the symlink), and hurt cache hit rates (you're serving the exact same file twice with a different name.)
Well... We don't! What if, instead of creating these links at image thumbnail generation time, we instead defer determining which versions should exist to the point at which we actually generate URLs to those versions, and if a version is larger than the original image, we simply return a URL to the best fitting size?
And that's exactly what I did. The URL-generation code now contains a bit of code that looks like this:
urls = { full: "#{base_path}/full.#{file_ext}" } smallest_candidate = :full VERSIONS_TO_GENERATE.each do |version, dimensions| # Are we requesting a version that is larger than the original image? If so, just return the next best fit (which may be the original image.) if dimensions[0] >= model.image_width && dimensions[1] >= model.image_height urls[version] = urls[smallest_candidate] else urls[version] = "#{base_path}/#{version}.#{file_ext}" smallest_candidate = version end end
How does this look in practice?
We can turn to the site's API to have a look at what this looks like in practice.
Here's all the versions, from the code, that should be served:
tall: [1024, 4096], large: [1280, 1024], medium: [800, 600], small: [320, 240], thumb: [250, 250], thumb_small: [150, 150], thumb_tiny: [50, 50]
Here's a nice, big image, with dimensions 2383x2000. https://twibooru.org/1740018.json
"representations": { "full":"https://cdn.twibooru.org/img/2020/7/20/1740018/full.png", "tall":"https://cdn.twibooru.org/img/2020/7/20/1740018/tall.png", "large":"https://cdn.twibooru.org/img/2020/7/20/1740018/large.png", "medium":"https://cdn.twibooru.org/img/2020/7/20/1740018/medium.png", "small":"https://cdn.twibooru.org/img/2020/7/20/1740018/small.png", "thumb":"https://cdn.twibooru.org/img/2020/7/20/1740018/thumb.png", "thumb_small":"https://cdn.twibooru.org/img/2020/7/20/1740018/thumb_small.png", "thumb_tiny":"https://cdn.twibooru.org/img/2020/7/20/1740018/thumb_tiny.png" }
This is exactly how the API response would have looked for ANY image before my change, and exactly how the API response will continue to look for images whose original versions are all larger than the biggest thumb size.
Things get different if we pick an image that's a little smaller, like this one whose size is 350x350. https://twibooru.org/1223260.json
"representations": { "full":"https://cdn.twibooru.org/img/2020/7/18/1223260/full.png", "tall":"https://cdn.twibooru.org/img/2020/7/18/1223260/full.png", "large":"https://cdn.twibooru.org/img/2020/7/18/1223260/full.png", "medium":"https://cdn.twibooru.org/img/2020/7/18/1223260/full.png", "small":"https://cdn.twibooru.org/img/2020/7/18/1223260/small.png", "thumb":"https://cdn.twibooru.org/img/2020/7/18/1223260/thumb.png", "thumb_small":"https://cdn.twibooru.org/img/2020/7/18/1223260/thumb_small.png", "thumb_tiny":"https://cdn.twibooru.org/img/2020/7/18/1223260/thumb_tiny.png" }
Would you look at that? If we consult our handy table of versions up there, you'll find that for every version which is bigger than the original image (medium and larger), the full version's URL is just returned instead. For every version which is actually smaller (small and smaller), we return a URL to a generated thumbnail.
Note that the actual files that are being served have not changed in any way. Only the URLs. Previously, for this image, you would get served a URL such as https://cdn.twibooru.org/img/2020/7/18/1223260/medium.png for the medium
version, but on the server side it would just see it as a link to the full
version and serve you that file.
And, one more for completeness, a really tiny one, of size 8x8. https://twibooru.org/2144747.json
Perhaps as you would expect, the URLs served:
"representations": { "full":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png", "tall":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png", "large":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png", "medium":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png", "small":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png", "thumb":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png", "thumb_small":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png", "thumb_tiny":"https://cdn.twibooru.org/img/2020/7/23/2144747/full.png" }
So, yeah. That's why that's like that now. Any questions, as usual, poke me here or on the thread and I'll see what I can do!