Over the past several years, I have taken an interest in usability and web design. One of the areas that seems to be often overlooked when it comes to design of a site is the design of the URIs on that site. Modern CMS systems allow for varying degrees of URI customization, but the defaults are often not as usable as they could be, and URIs are often placed last in the design process.
Clean URIs are one component of a clean website, and it is an important one. The majority of end-user access to the Internet involves a URI, and whether or not the user actually enters the URI, they are working with one nonetheless.
First, I would like to talk about the guiding principles behind URI design, then talk about the practical implementation of the principles.
Note: Originally, I wrote this article draft using the term “URL,” but since “URL” has been mostly deprecated by “URI,” I’ve updated to use the term URI. More information from W3C.
Principles
First, let’s take a look at some of the general principles of URI design.
A URI must represent an object, uniquely and permanently
One of the most fundamental philosophies behind a URI is that it represents a data object on the Internet. The URI must be unique so that it is a one-to-one match – one URI per one data object.
While this is always the goal, there are times at which it is very difficult or impossible to accomplish. Canonical URL tags were invented to help reduce the amount of duplicate content seen by a search engine. While not a final solution, canonical URLs are strongly recommended as large search engines like Google are now paying attention to them. For more information about canonical URLs, check out this article by SEOmoz.
URIs should also be permanent (i.e. choose the URI once and leave it at that). This speaks to good URI design before a site is launched, with the URIs being carefully planned. There will come a time when you do want to make improvements to your choices or otherwise must change URI structure. When this becomes a necessity, be sure to set up HTTP 301 moved permanently redirects on your server. This tells browsers and search engines the new location of the content and will also preserve any PageRank that the old URI has accumulated.
Be as human-friendly as possible
This is the most fundamental driving factor behind URI design (or it should be). URIs should be designed with the end user in mind. Search Engine Optimization (SEO) and ease of development should come second.
One way to keep a URI user-friendly is to keep it short and to the point. This means using as few characters as possible while still maintaining usability. So, /about is better than /about-acme-corp-page. While striving to be as short as possible, it should not sacrifice that user-friendliness by using URIs like /13d2 as this holds no meaning for the end users.
Conversely, using a shortlink whenever sharing a URI is encouraged. This is great for tweeting links on Twitter, or otherwise sharing on social sites like Facebook or Google Buzz. It is great if you can control your own URI shortener for SEO reasons, although a site like Bit.ly is good too. I personally use PrettyLink Pro (a WordPress plugin) to create my short URIs. An alternative is the Short URL plugin.

WordPress provides a button to get a shortlink to a post based on WordPress’ own /?p=XXX format which is likely to be shorter than your chosen permalink structure. The advantage being that will work as long as your site is around. The disadvantage is the shortness of the link is dependent on the length of your domain name.
The URI should not rely on information that is not important to the content or the user. A common example of this is using the database ID as the URI, as in /products/23. The end user does not care that the product is database record number 23, so a URI like /products/ballpoint-pen is much better. It can be tempting to resort to such poor URI structure as it is often easier on the backend to query the database with an ID rather than have to do a lookup on an alias to find the object.
One good test to see if a URI is a user-friendly URI is the “speech-friendly” test. You should be able to mention a URI in a conversation with a friend, and it should make sense. For example:
My bio’s at domain dot com slash jim
instead of
My bio’s at domain dot com slash page slash g g 2 3
Consistency
URIs across a site must be consistent in format. Once you pick your URI structure, be consistent and follow it! Having good URI structure for part of the site means that you still have poor structure overall. In order for a user to trust that URIs work a certain way on a site, the format must be consistent. If you must switch structure (maybe you’re updating a poorly-designed site), use 301 redirects as previously mentioned.
“Hackable” URIs
Related to consistency, URIs should be structured so that they are intelligibly “hackable” or changeable. For example, if /events/2010/01 shows a monthly calendar with events from January 2010, then:
- /events/2009/01 should show an events calendar for January 2009
- /events/2010 should show events for the entire year of 2010
- /events/2010/01/21 should show the events for January 21st, 2010
Keywords
The URI should be composed of keywords that are important to the content of the page. So, if the URI is for a blog post that has a long title, only the words important to the content of the page should be in the URI. For example, if the blog post is “My Trip to Best Buy for Memory Cards,” then the URI might be /posts/2010/07/02/trip-best-buy-memory-cards or something similar.
As a side benefit, using important keywords in the URI will improve SEO. My personal SEO philosophy is that, rather than optimize for search engines, optimize for good content. Search engines have made it their goal to find the best content on the web, so doing everything possible to create an easy-to-use site with great content and opportunities for further information (links) will, in my opinion, yield the best long-term results for search engine visibility.
Technical Details
We have covered some of the guiding principles behind URI design. Now, let’s look at some technical implementations of those guidelines.
No evidence of the underlying technology
The URI should not have .html, .htm, .aspx (a big annoyance), or anything else attached to it that is only designed to show the underlying technology. No end user cares that your site was written in ASP.NET (.aspx), ColdFusion (.cfm), or uses Server Side Includes (.shtml) – or at least most end users don’t. The extra info is just extra typing and extra room for error and frustration.
The one exception to this rule is appending a URI with a postfix like .atom, .rss, or .json to request that the certain format be returned. Alternatively, the format could be requested with the Accept HTTP header.
No WWW
The www. should be dropped from the website URI, as it is unnecessary typing and violates the rules of being as human-friendly as possible and not including unnecessary information in the URI.
Many users, however, will still type in the www. prefix, so www.domain.com should 301 redirect to domain.com. The same goes for 301 redirecting www.subdomain.domain.com to subdomain.domain.com.
Format
URIs should be in the format:
domain.com/[key information]/[name]/?[modifiers]
Key information is information that is not the object identifier (like the post title), but is still key to the object being accessed. This may include:
- the type of thing (i.e. posts)
- the overall parent category (i.e. technology)
- key data members (i.e. the date posted)
Modifiers modify the view, not the data model being represented, and thus they are part of the query string and not the URI itself.
The amount of “key information” should be kept to a minimum, as URIs should not be overly nested. Each item placed in the key information section must really be key to addressing the page.
In the end, the URI should represent a descending hierarchy. For example
- domain
- type
- category
- title
Example: http://domain.com/posts/servers/nginx-ubuntu-10.04. In the case of items with dates, the format should follow the descending hierarchy:
- year
- month
- day
Example: http://domain.com/news/tech/2007/11/05/google-announces-android.
Google News has some interesting requirements for webpages that want to be listed in the Google News results – Google requires at least a 3-digit unique number. Due to the fact that they will ignore numbers that look like years, a 5 or more digit number is preferred. Also recommended is a Google News sitemap. This is one of those cases where if you absolutely must target Google News, you must conform to this inferior URI structure. But, if you must, make sure that you are consistent and that it is still hackable (for example, use the format yyyymmdd like 20100701).
All lowercase
All characters must be lowercase. Attempting to describe a URI to someone when mixed case is involved is next to impossible.
If someone types the URI in mixed-case, they should be 301 redirected to the lowercase page. That sounds really nice, but in practice, I’m not exactly sure if that is possible… using a CMS that rewrites all requests to a single file would be the easiest way to accomplish it as the script could issue the 301 to lowercase, but I’m not sure if there’s an easier way (.htaccess rules or something).
Actions appended to the URI
Actions may be appended to the URI, like show, delete, edit, etc. Non-destructive actions (those that do not change the object) should be requested with a HTTP GET, while destructive actions should be POSTed to the URI. Run a Google search for REST URI Design for more information.
URI identifiers should be made URI friendly
A URI might contain the title of a post, and that title might contain characters that are not URI-friendly. That post title must therefore be made URI friendly. For example
- All uppercase characters are made lowercase
- Characters like é should be converted to e (etc.)
- Spaces should be replaced with hyphens
- Unknown characters (!, @, #, $, %, ^, &, *, etc.) should be replaced with a hyphen
- Double hyphens (–) should be replaces with a single hyphen
- Probably more rules I’m forgetting
Characters can be URI escaped (like %20 for the space character), but this is generally a bad idea for many of the above reasons (shows technology, unnecessary typing, etc.)
Fun idea
Use a sentence like structure (credit to Chris Shiflett):
chriscoyier.net/authored/digging-into-wordpress/
chriscoyier.net/has-worked-for/chatman-design/
chriscoyier.net/likes/trailer-park-boys
jacobwg.com/thinks/this-post/is/basically-done
If you know of any more URI guidelines that I missed or have any comments about those I did remember, I’d love to hear them!
Credits
Many thanks to the Forrst community who saw the initial (very) rough drafts of this post and contributed many insightful comments. Special thanks to @chriscoyier, @caludio, @steerpike, and @mattthehoople for directly contributing to the guideline list and to all the other Forrst commenters for providing helpful discussion.
Thank you to my dad for proofreading and review! Thank you also to Chris for being kind enough to offer to post this on CSS Tricks!
“No evidence of the underlying technology”
How would you recommend doing that exactly?
I know one way of doing it but that seems a little bit too complicated…
Using a CMS should solve that issue… almost all modern CMSs manage the clean URIs in such a way as to hide the .html (etc.) part.
As for the .rss or .atom, you can setup that in a CMS with custom aliases (I’m thinking of Drupal), but the easiest implementation that I have seen so far is from Ruby on Rails.
Absolutely. Rails makes this a cinch :)
True, but I’m not a CMS person (yet anyway :P)
So basically the question is: how would you do this if you were writing a plain HTML site or something that doesn’t use a CMS.
you need to use modrewrite (on apache) to modify the end-user url into whatever url that you are using internally.
basically you put a set of rules in your site’s .htaccess file (do a google search on htaccess & modrewrite) that does the hard-work of redirecting the site where it should be.
the ‘worst case’ scenario for doing this would be to use your web server’s default page handling functionality and organize your site in a hierarchical directory structure. basically each post becomes an ‘index.html’ file in the appropriate sub-directory.
so if you have example.com/news as a URI, you would simply put an index.html file in the /news directory and it ‘hides’ the underlying technology being used.
of course this is a bit of a cludge, but if you have a simple couple-page brochure site, it might be acceptable.
@Speedgun:
All of the solutions suggested here would work. But, the very simplest would be to use Apache’s DirectoryIndex directive in your server config or .htaccess.
DirectoryIndex basically tells Apache: “If no file is specified, use this”. So, ‘DirectoryIndex index.php’ would cause http://www.foo.com/bar would actually access http://www.foo.com/bar/index.php
Of course, the other solutions work better for anything more complex than a few pages, but that should get you started on simpler sites.
Use apache MultiViews. That way when you ‘GET /some/path/file’ apache will look for ‘file.*’ in ‘/some/path/’ and send it with the correct mime type, and language too. Very nice.
Og create a ‘file.var’ defining what to send depending on the browser request.
Lots of possibilities.
You use Apache’s Rewrite, not Multiviews or strange directory structures. The gist is the user asks the server for a URL, and that URL is rewritten according to certain rules, and then the rewritten URL is what the server uses to return the page.
The simplest example is just to rewrite URL’s to add “.html” at the end. So the user asks for /foo and you rewrite that to /foo.html to return the actual file.
If you have underscores in a URL they are hard to type for almost all users. So if you have /what_are_they.html don’t give out that link, instead use /what/are/they and rewrite slash to be underscore and add “.html” and the right file will be returned.
Me, I like to just take the title of the document, lowercase it, replace spaces with slashes, and that is the URL. So the article “Guidelines for URI Design” by CSS-Tricks would be:
https://css-tricks.com/guidelines/for/uri/design
Then all through the site there is no question what the link should be. You can deduce the URL for an article just by its name. It makes it quite a bit harder to screw up a link.
If the title changes, I leave the original in place, but add a redirect to the new article, so that the old title becomes a synonym for the new title instead of a 404.
Regarding Hamranhansenhansen’s comment:
I disagree that replacing spaces with slashes is the best solution. Slashes represent directories, and directories are generally a hierarchical way to organize information. People understand this convention and machines certainly understand it.
The example “/guidelines/for/uri/design” is almost ok, especially if the site has multiple articles with titles that start with “guidelines,” but the slashes approach is often going to say the wrong thing about the content it points to.
Think about this – if I see a URL that has example.com/guidelines/for/uri/design , I’m going to think there are probably more articles at the example.com/guidelines/ address. It may be that there’s nothing but a 404 at this address, though.
So, I recommend using slashes for actual hierarchical directories that help organize your site’s content and using dashes “-” to replace the spaces in your pages’ titles – just like the actual canonical URI for this article does.
Well, for a static site, I would probably use .htaccess rewrite rules (or Nginx config, if you’re not using Apache) to get the clean URIs. I’m assuming that it’s a relatively small site (since it’s static), so maybe you could update the .htaccess rules each time you add a page (obviously for larger sites, that’s very impractical).
Feel free to contact me at my website and I’d be happy to help with the rewrite rules.
Static or dynamic doesn’t matter. Both use URL’s. All you are doing with Rewrite is translating a clean URL into a technical URL. So you can have /products/foo rewritten to /products.php?item=foo or rewritten to /index.html?page=products&item=foo or whatever you require.
Also see:
Cool! Thanks for the solution!
Yep thanks :)
Beware of the “www” guidelines above. If you run a sizable site, remember that cookies served from .domain.com are sent with EVERY SINGLE HTTP REQUEST. If you have a large number of subdomains, you may want to consider redirect all main traffic TO www so that you can serve, say, images, from another subdomain without all the cookie traffic.
Note what many large sites do, like Facebook.
That’s really interesting! I hadn’t considered the cookie issue before. So, you’re saying that a cookie set from http://domain.com will be sent to all requests at http://*.domain.com? I’m definitely not an expert on cookies!
One possible workaround would be to use a separate domain name for the static content, or maybe just use the CDN URI. That’s not as clean as using static.domain.com, though. But, does the end-user care about the static domain name?
This may be one of those places where reality makes it inconvenient to follow the optimistic guideline. Though, on my sites, with the amount of traffic I get (basically none), it probably isn’t an issue…
Yes, that’s right. But with pretty much all languages, you can specify. So for example:
setcookie(“TestCookie”, $value, time()+3600, “/scripts/”, “.example.com”);
So this cookie is only available in the “/scripts” directory, but is sent to all subdomains of example.com.
setcookie(“TestCookie”, $value, time()+3600, “/”, “www.example.com”);
This one is only sent with all requests only to http://www.example.com, but of course, has to be set by http://www.example.com.
Using a www means that if you serve your images from media.example.com or img.example.com those requests won’t relay cookies. You’re right that using examplecdn.com would serve the same purpose, but then you need multiple domains.
I wouldn’t call the www. portion of the URI “unnecessary” information, either – it’s the hostname of the web server.
That’s rather important in a variety of cases, including the cookie aspect mentioned above. Others that come immediately to mind are the impact on edge caching, intranet sites, SSL traffic (since http://www.domain.tld and domain.tld may not resolve to the same IP address), and so on.
Another drag about the ‘www’ suggestion: sending a lot of 301 redirects to alter URIs will slow down your page loads because of the extra trip to the server to get the redirect.
Yahoo! has a great discussion of faster/better alternative to redirects. I like to use mod_rewrite.
Very interesting article!
BTW, totally unrelated, but how did you make Chrome look like that? First mod that I really like!
It is not Chrome. It is Safari browser.
It appears to be the Mac version of Chrome. Safari’s tabs look a bit different.
Yeah that’s Chrome for Macintosh.
Chrome for Mac has borders around buttons. If you know how to turn them off, say it. It looks really, really cool like that.
Nice article. Do you have any thoughts on pagination URLs (blah.com/widgets/blue/page-16 or blah.com/widgets/blue?page=16 or something)
And.. what’s your feeling about “order by” URLs? (e.g. blah.com/widgets/blue?sort=price)
I would lean toward putting pagination in the query string (like blue?page=16), since it’s a view modifier, not an object (same with the order by URIs).
Since URIs are supposed to be permanent, doing something like blah.com/widgets/blue/page-16 would violate that rule because the content of the page would be constantly changing with the addition of new blue widgets. :) Make sense?
Also related – I really don’t like pagination systems that have the second page be ?page=1. That can get really annoying especially when you’re trying to “hack” your way to a deeper page and have to remember that the page number in the URI is one less than the actual page number.
Seems obvious, but there’s lots of sites that use that scheme for URIs.
So the guideline:
* /list
* /list?page=2
* /list?page=3
* etc.
Interested in this question as well.
Jacob is correct with his answer. The reason for this is that “/widgets/blue” tells me that I am looking at a resource of blue widgets, where the query ?page=16 implies that that I have told your server to send me a specific subset of the blue widget resources.
The reason that this is good practice is because the HTTP Protocol is stateless. This means that when I access a URI, I should retrieve the correct representation of it regardless of anything other actions I have performed on a web site.
Query parameters in a URI should tell the server how they should send back a resource (e.g. a specific subset). Do not confuse this with sending back a representation of the resource though (e.g. xml, html, or json), HTTP’s Accept and Content-Type headers determine that.
Great explanation!
Thanks Jacob! This is a great article on general resource design and introduction to RESTful principles. Great work!
I have a question on URLs with directories. A co-worker says that you should have all files in the root of the site, so that it’s something like this: http://www.domain.com/kevin instead of http://www.domain.com/about/kevin
They said this is because it is better for SEO purposes; saying the page ranks better in the root. I did some research into this and the general consensus seems to be that that is not correct and that it doesn’t matter how many directories you have in the URI. And it makes sense that it should not matter since all google cares about is content. I know in the past google did care about the URLs and how they were formed, such as having long database URLs with & and numbers in them and such.
What are your thoughts on this? I’ve been trying to educate my boss on this for years but they won’t listen.
Thanks!
Kevin
I’m definitely no expert, but as I understand it, keywords in URIs are very important to Google (that may have changed, I don’t know for sure). So, having a page at the root might be a good idea if the stuff that might come before it would be irrelevant to the page itself.
Again, it seems like the best strategy is to optimize for content and user-friendliness, and Google will treat you well.
So, in the case of your example, /about/kevin is better than /kevin because with /kevin, it is not obvious that it is an about page. And the fact that it is an about page is an important part of the page.
It is also not hackable; whereas with /about/kevin you could go to /about and expect to find all the company bios (just an example).
Now, if the URI was /pages/about/kevin or even /db/content/pages/about/kevin, then /about/kevin would be better because the other info is truly not important to the page.
Just my thoughts, anyway…
For SEO, search engines think that a page it’s more important if it’s near the root, but not in the physical directory structure, in the site navigation structure, that sometimes are not the same. Let’s give an example.
Suppose you have /about/kevin and /staff/employee-of-the-month/richard on your site. From the home page if you want to access Kevin’s page, you have to go first to /about and from here to Kevin’s page. But you have a direct link in the homepage to Richard’s profile page. So Kevin is 2 pages deep (in navigation) from the home while Richard is only one. Search engines will consider Richard’s page more important than Kevin’s.
Also, answering your first question, it is better to have /about/kevin and /about/richard than /kevin and /richard because the ‘ about’ word offers semantic value to search engines and also to humans.
I would also still suggest certain items be kept in the query parameters. Many search engines do not like duplicate content, so when it sees ?sort=up ?sort=down … Many webmaster tools from the top contenders allow you to remove certain query parameters so they can be ignored in your site index. You cannot do that if you rewrite your addresses to look like a folder structure.. unless you remove “folders” from the index that look like “/sort/up/”
And yes.. it is a balance of relevant and top content on the site, but still semantic. So it is good to be near the root, but if you have a deep structure site, it is still important to show the semantic path
Superb review about URI design!!
great!
That’s great but it’d be nice to give some real “Technical Details” on how to achieve those URIs!
I know htaccess can do some of those, like removing “.html” or “.php” and “www”.
But what about changing a URI like:
domain.com/portfolio.php?id=24
into
domain.com/portfolio/chris/
?
Yeah… I’m hoping to do some technical follow-up on my site sometime in the near future.
Best bet for changing portfolio?id to a clean URI would probably be a custom / pre-built CMS where all traffic is routed to the index.php script that determines what content to serve.
You can use Apache’s Rewrite for that as well, but you might want to use “id=chris” instead of “id=24”. It is straightforward to rewrite /portfolio/chris to /portfolio.php?id=chris because the same information is there. Ideally, your clean and dirty URL’s have the exact same information and you just translate their form using Rewrite.
from today on, I will also refer to URL as URI.
I might even loose the www as your right about that too. It is a waste of time.
Thumbs up, cheers Chris.
Modern Web frameworks such as Django and Rails make it difficult to create bad URLs.
I’m the lead developer of Mango, a file-based blogging system. Mango makes it simple to create elegant URLs, and supports short URLs “out of the box”.
One could write a post and save it as 4=>wordpress-conversion-script.text which would make the post accessible at /wordpress-conversion-script/, and /4/ would automatically redirect to the canonical URL.
It’s possible to set this up in WordPress, as well. At our WordPress sites, the blog posts use the permalink structure of example.com/blog/%postid%/%posttitle%/
This structure has a couple of nice advantages. First, it puts the postid first, which makes it easier for WordPress to figure out that you’re requesting a post and not a page.
Second, for a URI like example.com/blog/123/some-blog-article/ , we have the article’s title in the canonical url, but the cool thing is that a link to example.com/blog/123/ will also show the correct content. We have a built-in link shortener, in this case.
Cool post Jacob. I’ve been thinking quite a lot about URI design for an upcoming project.
I had this pretty much figured out, but I’m wondering about more complicated actions.
For instance, doing an edit on an item within a category.
http://myawesomeblog.com/categoryname/postname/edit or
http://myawesomeblog.com/postname/edit
Maybe put the edit/delete before the item alias?
Flickr still has one of the coolest URI design schemes out there. For example http://www.flickr.com/photos/username/4846720345/in/contacts/ to display recent shots from your contacts.
I would recommend to move towards a more RESTful way to interact with your URIs.
What I mean by this is that you should rather use HTTP’s Methods (GET, POST, PUT, DELETE) to interact with your resources.
This practice boils down to the following patterns:
GET: retrieve a resource from the server
POST: create a resource on the server
PUT: modify/edit/interact with a resource on the server
DELETE: remove a resource from the server
so instead of:
http://myawesomeblog.com/postname/edit
I would recommend to perform the following request from the client:
PUT http://myawesomeblog.com/postname/
{…bunch of http headers}
{request body goes here}
I agree, although you do need a URI for the actual edit page, right? So /postname/edit would do an HTTP PUT to /postname/update or something (that’s how Rails works).
Hi guys,
Totally agree on the REST-ful part, thanks.
@jacob yes, you are right.
What I would do is append the original URI with the action, so http://myawesomeblog.com/categoryname/postname/edit. Not sure if that’s the best way, but it seems like the best option.
I don’t care for systems like Drupal that have /my/clean/uri, then /node/2443/edit for the edit page. There’s a module for that, though… :)
Google says that if you have a news sitemap, the need to have a 3+ number identifier is waived
It’s actually possible to catch all mixed-case requests by adding [NC] to your mod-rewrite-rules in .htaccess and then redirect to the corresponding lowercase url.
Cool! I’ll have to do some more looking into that for all of my projects.
A really great post. I didn`t know that /about is better than using your /about-company-name.
Also, the get short link is also nice.
One problem I have is having /blog/category/
I would prefer just /blog/interviews etc
One other really strong reason for /about that I forgot to mention is that many users nowadays will just go to /about on any domain and expect the about page. Doesn’t hurt that it’s short and to the point.
*No evidence of the underlying technology*
My website is just a bunch of static HTML files. I just made a script that generates a simple site for me. Should I keep the .html or rewrite it?
Look at the top of the comments, I asked this same question and got an answer
great article on URIs!!Very helpful and I totally agree with your sayings! :)
Good stuff, though IMO:
(a) It’s better to canonicalize on the “www” subdomain and redirect from the root domain rather than the other way around. “www” is more specific, and distinguishes from other services/subdomains, like “mail”, “calendar”, “store”, etc.
(b) URI’s should never be case-sensitive, so you shouldn’t have to worry about making them lowercase. I would love to see the official spec updated to reflect this.
I would argue the other way for both.
The “www” confuses users and is hard to say and the assumption when it is missing is you’re talking about the Web. If you say store.domain.com then we know you’re not asking for www. And it’s 4 less characters in every single URL you make. The collective amount of time that www has wasted is many, many lifetimes.
Some file systems are case-sensitive, and some are not. Making everything lowercase gets around this and it makes URLs easier to type. If your URL is /Index.html do you want “Index.html” or “index.html”? They are not the same file on most Web servers.
This is a great subject and one that I think many people creating websites often neglect.
As you did not mention it I would like to suggest Tim Berners-Lee’s Cool URIs don’t change from 1998 as further reading.
You do not explicitly mention it, but I think it is also important to keep in mind: “No evidence of the underlying technology” also applies to media-files not only web-pages. Or what do you think?
I agree with your last comment, “No evidence of the underlying technology”. This is why RESTful patterns are such a great way to interact with one’s server – all that one needs to know is how to use the HTTP protocol. The server then handles what representation the client should receive back.
I agree with webpages: do not include .html, .htm, .php, .aspx… on webpages
But I don’t agree with media types or media types for usability and accesibility problems (if I understood in your comment). If you link to a non-HTML document, you should always include the extension or file format in the URI. I mean, using “http://example.com/annual-report” to redirect in the server to “http://example.com/annual-report.pdf” or “http://example.com/annual-report.doc” is not a good practice (in my opinion).
Think about people who doesn’t have Adobe Reader or a .doc file viewer installed. If they see the link without extension they will click it waiting to see another HTML page, but instead they will receive a “Download file” dialog box, and they will not be able to open the file.
If they see the file extension in the URI (displayed on the status bar when you hover the link), they can realize that that links goes to a document they can not open, and they will not click the link.
Think also about people browsing on mobile devices (even smartphones) or people with screen readers, for example.
You make a valid point here. If I see a dialog box pop-up and I do not know the file extension I’m going to be a lot more wary of what I am downloading (though with PDF documents browsers generally display them rather than force a download).
Agreed – the file extension should be in the URI for media and downloads… I agree with James; if I don’t see a file extension, I will assume either (1) the site is broken or (2) the link went to a download splash page.
another reason to hide the technology is an extra step in hiding what you’re using on the backend from malicious users. Sure, they can still figure out what you are using fairly easily, but it’s an extra step they need to take beyond a simple glance, which helps protect you from the really lazy ones. :)
The Google News requirement isn’t necessarily inferior and it doesn’t cause any problems. You don’t have to replace your URI entirely:
All you have to do is add (for WordPress) /%post_id%/ somewhere to your permalink structure.
The news site we run (http://rvanews.com/) is successfully pulled into Google News by having the post ID tacked onto the end of the URIs.
Really good post man!
I myself use WordPress permalinks with the following structure:
/%postname%/
That sets up nice and clean URIs
There is constant discussion about that particular format being a bad choice for WordPress performance. That is what I use here on CSS-Tricks though and I can’t say it’s that big of a deal. Who knows though I don’t have any speed tests or anything to prove things either way.
See above – /%postid/%postname%/ is what we’re using on our sites.
yourls.org is awesome. check it out :P
Great post!
But, I disagree with your statement about dropping the www from a URI and using a 301 to redirect it to the non-www page.
It should be the other way around.
‘www’ specifies that the URI is an http URI, in the same way that ‘mail.’ specified a mail server, or ‘ftp.’ specified an FTP server.
An http request sent to foo.com should be 301’ed to http://www.foo.com, since http://www.foo.com is the correct server for http. Same goes for mail requests to foo.com getting redirected to mail.foo.com.
http specifies that the URL is an HTTP URL. That is why the www is redundant. It wastes an incredible amount of time and effort for the vast majority of humanity.
www is the name of the Web server, mail is the name of the mail server. You can name these anything you want. Your mail server could be fred.foo.com. Your Web server could be web.foo.com.
I agree with all your points, except for the “No WWW” one. For some sites it makes sense to remove it as it is indeed not needed and makes for a better looking domain.
However, removing the “www” actually makes many domains look a lot uglier. I couldn’t imagine big name sites like Yahoo! or Google removing the www as to not be redundant. I agree to remove it for subdomains, but as for the main domain it looks a lot better and more professional to keep the www.
Of course the user doesn’t have to type it, and you don’t even have to link it with the WWW if you don’t want to. Simply force it to be added with .htaccess: http://dev.myunv.com/snippets/htaccess/force-www-subdomain/
Ugly is subjectively worse. Four characters less is objectively better. If you show “www.” to the user they will type it. They don’t know it is not necessary.
They will also type “ww.” and “wwww.”
The problem I believe saying the host name is not needed is false. Host name are need by http. Not using them can cause problems with http as pointed out earlier with cookies and ssl.
My vote is leave the www. You also say that if people will type what you tell them. I also think in my experience this is false. If I tell someone to go to sub.domain.com they almost always type http://www.sub.domain.com. Test redirct without www and then redirect to the www. I am willing to bet (unless everyone bookmarks your site) that you will have more redirect when you remove the www then when you leave the www.
Learning that typing “google.com” will redirect you to “www.google.com” is not that hard to figure out. We should stop giving in to users who don’t know these basics and it’s their fault for now knowing.
Plus having the www in front for a site that takes advantage of organizing their content through subdomains has a huge benefit by distinguishing the homepage from everything else. At least in my opinion.
Hi chris,
cudos for wrapping this topic up.
Kind regards, mtness.
marvelous. remember kids, url is what people refer to you as, uri is where you’re at!
random – on keeping them short, instead of /images i use /i instead of /media i use /m not exactly what you’re saying, but it fits in there.
great write up!
The sentence-like structure seems to be a great idea !
how about the suffix?
is it better to use …/my-article.html
rather than …/my-article
for filetype recognition and also caching ?
You don’t remove the “.html” from the files, only from the URLs. You use the URL /my-article and you use Apache Rewrite to add “.html” and then the file /my-article.html is what the server returns. Later you may want to change the suffix of that file to “.shtml” or “.php” but the URL will stay the same.
Excellent article Chris.
I have something to add to the “No WWW” statement though:
There was a time where I was hell bent against including www. in the domain name for all the reasons you state, but as I started working on larger sites I couldn’t ignore the domain cookie benefits of including www in the domain.
The issue with omitting www from a domain is that any cookie you set will be returned to you for requests from any subdomain. So if you have static.domain.com serving your static content and you set a cookie for domain.com, static.domain.com is no longer a cookie-less domain.
Alternatively, setting a cookie for http://www.domain.com ensures that your cookie will only be returned to you for that domain, and static.domain.com will remain cookie-less.
That said, most people aren’t serving enough content to need a CDN or take this as a concern, in which case I completely agree. It’s ugly and reduces readability. Unfortunately, this is one of those instances where it makes sense for technology to dictate an aspect of the user experience: Just don’t admit that to the back-end department if you can help it ;)
It’s also good practice to include a trailing slash due to the way some web servers handle contextual URL’s and files/directories.
There’s a lot of good advice here, but I feel like a lot of this is the author’s personal opinion wrapped up as fact. I feel very strongly about semantic urls, and I’ve never once considered removing the extension from a static HTML file. In fact, I don’t think I’ve ever heard of ANYONE doing that (because anyone comfortable enough with mod_rewrite to do that probably doesn’t make a lot of static HTML sites).
If you’re using a CMS, it makes sense, but it seems like a LOT of work for very little return to just arbitrarily remove the extensions from the URL. The fact that you imply that .aspx is somehow the worst extension makes it pretty clear that at least some of this is just your own pet peeves.
The WWW one also seems a little suspect. If you type amazon.com into any modern browser, it will turn it into http://www.amazon.com and try that first. So the only way that would be of any benefit to anyone is if they typed the full “http://amazon.com” into a browser, and who does that?
I’m not saying any of this is a bad thing to do, but it’s kind of presented here as “If you don’t do it this way you’re doing it wrong.” and I’m just not sure that’s true.
> There’s a lot of good advice here, but I
> feel like a lot of this is the author’s
> personal opinion wrapped up as fact.
That’s Web development for you. If you don’t like the ideas presented here, don’t use them. Most of Web development is dogma, not fact. It’s tradition and best practices being constantly adapted for the changing Web. Users use URL’s today in a different way than they did 10 years ago, and the users themselves are different.
>but it seems like a LOT of work for
>very little return to just arbitrarily
>remove the extensions from the URL.
It’s not a lot of work, and you get a huge return. If you think it is a small return, you’re not understanding it.
The whole point of URLs is not to change. Otherwise, we could just give out the IP address. If you put “.php” or “.aspx” in your URLs you are committing either to always using PHP or ASP forever after, or committing to the URL changing at some point in the future, causing a train wreck.
Imagine if you bought a Nokia phone and your phone number was nokia.555.1212. You give it out to all of your friends. It becomes a synonym for “call Grover.” It’s in directories and contact lists everywhere. Then a couple of years later you want a Motorola phone but you can’t use “nokia” anymore, your number has to change to “motorola.555.1212” and now your entire history is washed away. Nobody can call you. A better way to do it is to just use “555.1212” routed through a system that doesn’t care what phone you have. Same with removing “.aspx” from your URLs.
At this point, the Web is old enough that we know we are going to change technologies over and over again. Even if you stay with ASP, a new version may come out that uses “.aspx2” and you may want to use that and all your URLs break.
All of your analytics are attached to your URLs, all your user’s bookmarks, all the sites linking to you.
Rewriting URLs is not very hard. There is a command called Rewrite in Apache’s server directives.
And you make it easier for all of your users to use your URLs, which they do in Twitter and other social networking.
You may not want to do this with a site that is live, but on new sites you should definitely do it. Also, clients like clean URLs (especially if they have been through a redesign that broke all their links and analytics) and will think better of you as a Web developer if you build them a site that has links like domain.com/name/of/article which they themselves can decode and type.
I’m not convinced by your argument about omitting suffixes (although I would do it for neatness, personally, anyway). If I were to build a site with .aspx extensions and then later switch to using PHP, it would make no difference at all. You can use any extension at all, with the appropriate rewriting. A site running on PHP could just easily be configured to display a .aspx extension as it could to have no extension at all.
Still, as I said, I would go for the clean URIs anyway. Just not for that reason. :)
Sorry for the double-post, but I meant to add that I HAVE run into situations where a novice user won’t take an address at face value without the WWW. http://amazon.com looks wrong to many novice users that I’ve encountered.
Excellent round up of good practices, delicious-ed and added to a weekly round up of tips and tricks.
Only thing I disagreed with is the ‘www’ thing which I think @greg johnson covered nicely.
Jacob, great job on this article. Some important details discussed here, thank you.
I will have to say, however, that I strongly disagree with some assumptions you made regarding how URIs are shared from person to person. For example you said:
And later:
Generally speaking people don’t attempt to “describe” URIs in person, unless it’s a basic page like “about”. Most URIs (even the clean ones that abide by your URI design standards) are neawr impossible to relate in person, and near impossible to remember. That’s just not the way people communicate “links” to others. In most cases, people will do one of three things.
1) They’ll say “I’ll text/email the link to you”;
2) They’ll say “Go to the home page (or Google) and search for [whatever]”; or
3) They’ll say “Go to the home page and click on [whatever]”
The only circumstance where this would change would be something very simple like the “about” or “contact” page, but those are so easy to find on most websites anyhow.
Anyways, thanks for the info you provided here. Regardless of my above opinion, I think you did a great job with this and have brought up some very important issues for consideration.
Going to have to agree with this. Personally, I say Louis’ number 1 or 3. Mainly because even I can’t remember the URI, or I just plain don’t pay attention to where I am within the site. I just say go to the to whatever.com, and click on about, you’ll find the guy there.
No one is going to remember anything more than that kind of stuff.
Also, URL might not be the proper term anymore, but try asking a client that is an average web user, for their URI. You’ll get a blank look. Developers might know, but average Joe doesn’t. Therefore, in defiance of all of you, I shall continue to say URL!
LONG LIVE URL!
I usually say URL in conversation myself, simply because that’s what I originally learned it as. Yeah, my mom looked at this article and said, “what’s a URI?” to which I explained it was basically the same as a URL, which she understood.
Maybe we’re to blame for the general public not knowing the term URI… I don’t know. So, I’m going to try to use it from now on in the hopes of educating those around me about a tool that they use every day.
But, I’ll probably still slip into URL when I’m not thinking about it. :)
I agree with you… I rarely ever tell someone a full URI in a conversation. I was mainly thinking that you should strive to make it possible, not actually do it. More of a quick test for clean URIs. I could theoretically tell someone to go to css-tricks.com/guidelines-for-uri-design, but could definitely not tell someone to go to http://www.amazon.com/Designing-Social-Interfaces-Principles-Experience/dp/0596154925/ref=sr_1_1?ie=UTF8&s=books&qid=1280927599&sr=8-1 (a good book, by the way).
I usually share links via email, or will sometimes write down the link on a piece of paper (old-school method, I guess). In the paper situation, it only works with well-designed URIs.
Thanks for your feedback! Sorry I wasn’t clear in the article.
Oh, I think it was clear, you did a good job. I just don’t think it’s all that important for a URI to be readable for that reason, since people just don’t do that.
Again, great job on this article.
Don’t using wordpress??
How to?
For those interested in creating your own Short URLs, see my tutorial:
http://sean-o.com/short-URL
A good source of URI design for me is flickr.
They employ a pattern for single pictures like this:
http://www.flickr.com/photos/%5Buser-name%5D/%5Bpicture-id%5D
instead of:
http://www.flickr.com/photos/%5Buser-name%5D/%5Bpicture-TITLE%5D
The advantage is, that users may change the title without loosing the URI.
While the resulting URIs may be not as nice as the blog style (keywords in the last part of the URI) they conform with the principle of having permanent URIs.
Imagine the case that someone gets into legal trouble because of the title of his entry. Beeing forced to change the title will mess up the URI as well.
What do you think about this?
Stefan
I have always thought flickr was great for doing this. I think that it does still look ok. Granted not as nice as you said. But you can change your titles without effecting URI.
However with blogs you could argue, hey I might change the category it is in as well or tag. So I would also argue not to use those as a permilink. I always stick to publish date structure as this wont change.
That final ID is what always throws me. Title for readability or numeric ID to allow the flexibility of not having to do redirects.
I’m going to agree with those who say keep the www part. The cookie point is a good one, but even without that, a lot of people find the lack of www on a URI confusing. These tend to be people who’ve been using the web for a long time, but not that frequently and not with any great experience.
Whichever way you do it, though, it’s got to be better than those damned sites that refuse to recognize the domain if you *don’t* type the www. There are still a surprising number of them out there.
I agree with most of your article. Where I get caught up, and I’m surprised no one has mentioned it yet, is your use of shortlinks. They go against everything you suggest in your article.
If shortlinks are what users are reading the most, and I would argue this is becoming more and more the case because of social networks, than users are rarely benefitting from the hard work you’ve put into your canonical URIs.
I don’t believe most users actually read URIs one they are in the address bar, much less hack them. Services like twitter and Facebook are where, I believe, users are actually reading the URIs the most. Shortlinks are, for the most part, unreadable.
Short links have no other purpose than conserving character counts. They are needed in tight situations but “using a shortlink whenever sharing a URI is encouraged” seems antithetical to your article’s main point.
Could you explain the reasoning behind your suggest a bit more?
Great article, design for good content and good SEO will follow, excellent theory, more needs to be written on this. SEO should be the result of a well designed site, rather than a separate objective.
.aspx is very annoying? Over .html, .htm, or, say, .php or .jsp? How is the four-letter .NET page extension more annoying than the other server-side page extensions? Care to explain this to all the .NET developers out there?
Chris
Just personal experience with .NET websites… nothing against .NET developers. Most .NET-powered websites that I’ve visited usually have URIs like http://msdn.microsoft.com/en-us/library/015103yb.aspx or at least redirect to Default.aspx. (Microsoft has gotten better with URIs over the years, though)
I personally don’t like technologies that “color” your product… as in, you shouldn’t look at a website and say, “that’s built in ASP” or “I can tell that’s a X site” So, you should have complete control of the HTML, JS, CSS, and URIs for your site.
I have not personally done much with ASP.NET, and it may have easy ways to rewrite URIs… I know it is a very powerful language.
So, nothing against .NET devs… .jsp is usually as bad too (session ids in the URI, etc.), and you’re right, there’s nothing worse about .aspx over .1234, just the fact that, from the sites that I have seen, .aspx usually comes with other non-clean elements.
I apologize if my comments seemed like an attack on .NET developers.
They don’t seem like an attach. I am not sure that I can say that more .net sites include .aspx than did php contain .php or anything else for that matter.
And even though you could overcome the whole suffix fairly simply it wasn’t something that was really spoken about in the community. CMS system written in .net pretty much all took care of it, but for people who sat down with a starter site they just never second guessed it.
Since then (better late than never) Microsoft has given us System.Web.Routing which is built into the system and makes things uber easy (especially with MVC). Anyhow it was available in .net 3/3.5. But is ships in .net 4.
Anyhow Microsoft seems to be doing a fair amount to make sure people know the routing namespace is there and I believe have made all there starter site and project templates use it by default in 4.0 so hopefully people will see it and start using it.
Hi Jacob,
Thanks for the reply (and for the helpful article by the way I forgot to mention).
You are right in that the very early days of .NET, the URL writing was pretty bad. The finally do have their act together with the newer rewrite modules (http://learn.iis.net/page.aspx/460/using-the-url-rewrite-module/) so that’s a good thing. I think the problem was IIS 6 more than anything else, and IIS7 seems to have gone a long way to resolving that problem.
I definitely agree that you want to eliminate the server-side processing from the URI. For us, this has been a bigger challenge because most blogging software (like WordPress, etc.) is better optimized for a Linux/PHP environment. Hopefully, .NET will get some more focus on being able to easily re-route URIs, as well as some of the other cool features I’m seeing more and more of (and your article touches upon).
Thanks again,
Chris
I’ll give it a go. ASPX is annoying to me because it serves as a warning that the site I’m about to view is probably loaded with invalid and inefficient code.
Now, don’t get me wrong. I think ASP.NET is a great platform for building real applications that have web-based interfaces, but many of the sites and CMS systems I’ve seen built on top of ASP.NET are simply atrocious.
I say this after working 3.5 years as a senior consultant managing large web-based software projects written in ASP.NET.
I know it’s possible to get ASP.NET to put out clean code, but in my experience, it’s a rarity to find an actual example of this in the wild.
I have always seem to be able to get asp.net to output pretty clean code(especially since you could change just about everything about .net). My problem was always webforms and viewstate. I understood my MS did webforms (to make it easier for winforms devs) but man it was messy. Again, with your own handler you could do away with winforms (and I sometimes did), but now we have MVC which maybe isn’t yet to were rails is. But for me it is close and very nice.
Hi Kurt,
Very true: CMS systems built on top of ASP.NET can be very limited and problematic to accomplish easy tasks. There just isn’t enough of them and not enough variation/choice in platforms of CMS for .NET it seems.
Like John says, it’s definitely possible to output clean code in .NET, and it’s more of the unaware or lazy developers that don’t simply add a line like
to their web.config file (similar to your .htaccess file for apache folks). There are other things they can do, but that does help output much cleaner code.
Yeah, viewstate kinda sucks for most applications, but can be turned off.
MVC seems like the future for sure but I may switch to Rails vs. trying to learn that.
Chris
I found MVC to be very simple to pick up. I looked at rails also. I really like it, but ruby is my main issue there. It was simpler for me to pick up .net MVC than rails simply because my rails skills are well non existent at best.
Anyhow I think they are both great. I have head people say that rails performance wise can sometime get away from you. And from a commercial standpoint I like that .net is compiled.
The ruby community though seems to be more friendly than the .net. More helpful I mean. .Net guys seem to like to try to talk over you when you ask a question.
Don’t using wordpress, How to?
This is quite obviously off topic, but how did you get the borderless buttons on chrome!?
Maybe you should search on this page for »chrome« and you’ll find the answer.
Taggart
Website optimized for many help. Proper study.
his is quite obviously off topic, but how did you get the borderless buttons on chrome!
Thank you for your very helpful and friendly guidance, I published a translation into Russian (http://legco.net/entry-382.php), if you do not mind