Interesting question from Andy:
Serious question. Why do we use .html instead of .htm? / @adactio @css
— Andy Clarke (@Malarkey) December 12, 2019
The most likely answer from the thread: DOS was a massive operating system for PCs for a long time and it had a three-character limit on file extensions.
Interesting that the first book on HTML covers this specifically:
From the HTML Manual of Style (1994)… the first book on HTML 📚🙂 pic.twitter.com/PtUTdr7I2k
— Phil (@phildcpickering) December 12, 2019
Where my mind went was server software. I know that web servers automatically do different things with different file types. In a test on my own server (set up to serve a WordPress site), I put some files at the root that all contain the exact same content: <h1>Cool</h1>
- file.text = file is rendered as plain text in browser (Content-Type: text/plain)
- file.html = file renders as HTML in browser (Content-Type: text/html)
- file.htm = file renders as HTML in browser (Content-Type: text/html)
- file.fart = file is downloaded by browser (Content-Type: application/octet-stream)
You can write code to serve files with whatever content types you want, but in lieu of that, file extensions do matter because they affect how default web servers choose to serve file type headers.
The answer to the question about why we use a 4 letter file extension cannot be:
“we used to use an OS with 3 letter file extensions”
While not explicit, it seems to me the answer is indicating why we use .html indirectly by explaining the only reason that there exists an alternative to .html.
When I first started learning html in the 90’s conventional wisdom said the .htm files were understood by Windows servers, while .html were used on *nix. Microsoft later brought in .asp, then .aspx, and the rest, as they say, is history.
HTML and HTM are the same file format. As written in the article, it’s supposed to be HTML, but we used to write HTM on DOS because of the OS’s three character limit.
This doesn’t answer the question in the title (?)
Nevermind, I didn’t read the tweeted screenshot
İf you add .fart in your web server as text/html it runs also.
I, too, have always wondered about this. Thank you for tracking down the answer!
Unless, like you said, you are on DOS they have always been interchangeable. And web servers can be configured to handle any extension as any MINE type. The ones you listed are just the defaults. You could easily serve HTML as binary files if you wanted.
I meant MIME type :)
Do you know that ACTUAL Microsoft Office saves the signature with .htm as legacy code? I realized more than 10 years ago, and it keeps like that since then
You can use any file extension you want for your website ad long as you configure the mime type for that extension. If you wanted .fart you could make this work.
The early web servers ran on Unix file systems almost exclusively and gave little consideration to the PC/DOS file systems that only supported 3 character extensions.
For most practical purposes even the file names were limited to 8 characters. Windows had some sort of long file name support at the time but most of the FTP clients didn’t support this and used the 8 character DOS filename alias when transferring files back and forth. It was a mess and attempting to work between the two systems was a nightmare.
The Internet preceded Dos and Mac systems. Internet servers were all some variant of Unix systems in which dotted extension names were really just part of the file name as far as the Unix file system was concerned, but certain apps and services (i.e. system apps) were designed to expect certain dotted suffixes in the name (i.e. extensions). The standard for webpages was .html, but early Windows systems couldn’t participate in this new world because of the 3-character extension until .htm was allowed as an alternative. Macs had a different issue since they did not require extension suffixes in the name and users could easily rename the entire file name. So Mac users had to make sure to explicitly enter the extension part of the name when creating files on the Internet. Finally, the name matters and so does the exact casing of each character in it because a hyperlink to a file name on the internet is case sensitive as well as name sensitive, thus .html, .htm, and .HTM are all different if the resident on a Unix (or Linux) server (and even if it is on a Windows server now, which is case insensitive, it may someday be ported to a Unix server and the links will break). A web browser and the web server itself “could” be designed to try different alternatives of a file name to give the appearance of having the right file name but I don’t know if they do that—perhaps a Windows-based web server will try that, but that just leads to sloppy practices with inconsistent naming that falls like a house of cards in the future when you site gets ported to a Unix based server.
Yes it can. It explains the origins of .htm in the first place, which was a compromise for a particular OS. The extension was intended to be .html to mirror the acronym HTML.
If DOS allowed more than 3 characters for file extensions, then the alternative .htm extension wouldn’t even exist. And it’s existence is the main reason people ask why we use .html instead of .htm. And if that’s not the reason then you can equally ask, instead of .html why not “x”? .html seems the most descriptive. .htm and other alternatives may be shorter or have some other qualities, but a choice needs to be made and .html seems the most fitting and natural.
The reason we don’t use .htm very much and use .html instead is because the internet mostly runs on servers that allow 4 letter file extensions, and html stands for hypertext markup language. The only reason .htm even exists is because early DOS and Windows systems couldn’t handle the full .html file extension. So it was abbreviated as .htm instead. So the answer is .html is the original way and it’s the preferred way, whereas .htm is a variation.
Back in the 90s when Commodore was still in business selling that “computer way ahead of it’s time” the legendary Amiga it had built-in Datatype support rendering the need to give files for common types an extension at all – the OS just knew what file type a file was and would create a Preview in Workbench (think Finder).
I think Macs and Windows even in 2020 rely on file name extensions – be interesting to have a definitive answer.
https://en.wikipedia.org/wiki/AmigaOS
The difference was whether the file type was specified as part of the file name or stored as a separate field in the file system. It’s basically two different ways to do the same thing. Having it as part of the file name is probably easier to use since you can change the file type just by changing it’s name (like changing txt to html).