I got a tweet about this a while back, and it’s been on my list of things to attempt to start a discussion about. Or even better, gather some real data about. It also might be a smidge related to SEO (as the asker leads with), and since yesterday’s baby thought about SEO, it reminded me of this.
Here’s the question:
Hi @chriscoyier can CSS class name affect SEO? i.e. can class name "sex" for <select> element with M/F options trigger safe search filters?
— Piotr Merton (@piotrmerton) October 18, 2016
In other words, HTML like this:
<label for="sex">Sex:</label>
<select id="sex" name="sex">
<option>Male</option>
<option>Female</option>
<option>Yes, please</option>
<option>Heyyy-ooo</option>
<option>Honk honk</option>
<option>Wakka Wakka</option>
</select>
Silly attempts to thwart being gender normative aside, is simply the presence of the word “Sex” on the page enough to trigger warnings, blockage, or exclusion from certain apps? Perhaps an app using Google Safe Browsing APIs, a firewall of sorts configured for blocking certain content, or some other kind of software designed to filter web content.
What if your business was Barry’s Delicious Smoothie Mart so you prefixed a bunch of your CSS selectors:
.bdsm-header {
background: papayawhip;
}
Will there be unintended consequences there?
I’m having a bit of Deja Vu about this conversation, in which I remember seeing a conversation about this somewhere where someone said this has indeed caused some minor problems for them, but I can’t seem to dig that up right now.
In fact, I don’t have any personal stories or data to share with you on this subject, I just wanted to open it up for comments from folks that actually do have some data.
In spending some time searching around about this issue, I found plenty of “experts” chiming in saying “no, CSS classes don’t affect anything.” That stands to reason, but there is no data cited, references provided, or proof supplied. But also note in our first example, the text we were worried about is “sex”, and it doesn’t only appear in classes, but other attributes as well, not to mention the actual visible text of the label.
So if you know something for real, let us know below.
There’s a classic talk by a DeviantArt dev on the subject http://dt.deviantart.com/journal/We-Give-a-F-How-the-Site-Loads-392679726
This is very interesting! From there, the comment from a user that kicked off an investigation:
We once, long ago, had a heck of a time with a number of “enterprise” customers that used WebSense filtering proxies.
Our app had reports where you could pick the format, one of which was excel. The URL was of the form /app/report?format=MSEXCEL
Of course this URL was blocked for all of our WebSense using customers because it contained the string “sex”. Testing indicated the string “sex” or any number of other naught words in any form in a URL or in most places in the HTML body caused the blockage. This must have been the WebSense default way back when because it was reported by multiple customers as soon as we rolled out the export-to-Excel feature.
To fix it, we made https mandatory for the whole app. (remember when we used to protect only login forms with SSL? This was pre-TLS 1.0 I think). Fortunately TLS-interception proxies weren’t yet a thing.
Content filters are remarkably stupid and usually poorly maintained (even today).
It’s an interesting thought but I would just simply stay away from any risk whever possible. Finding out that a prefix used across a whole site or app had to be changed could be a potential balll-ache
Having the class name .ad or .advert can cause adBlock to hide that content.