When I came up in web development (2005-2010 were formative years for me), one of the first lessons I learned was to have a clean foundation of HTML. “What Beautiful HTML Code Looks Like” is actually one of the most popular posts on this very site. The image in that post made its way to popular pages on subreddits every once in a while.
Now, while I still generally write HTML like that by default when working on sites like this one, I also work on projects that don’t have HTML output like that at all. I don’t work on Twitter, but here’s what you might see when opening up DevTools and inspecting the DOM there:
Nobody would accuse that of being “clean” HTML. In fact, it’s not hard to imagine criticism being thrown at it. Why all the divs! divitis!! Is that seriously a
<div role="button"> c’mon now. Those are awful class names! Robot barf!
What’s probably closer to the truth is that it doesn’t actually matter that much. It’s not that semantics don’t matter. It’s not that accessibility doesn’t matter. It’s not that performance doesn’t matter. It’s that this output actually does those things fairly well, or at least as well as they intend to do them.
Giuseppe Gurgone gets into the details.
React Native for Web provides cross platform primitives that normalize inconsistencies and allow to build web applications that are, among other things, touch friendly.
To the eyes of somebody who’s not familiar with the framework, the HTML produced by React Native for Web might look utterly ugly and full of bad practices.
That DOM actually does produce an accessibility tree that is expected and usable. The
<div>s with roles are to overcome certain cross-platform styling limitations. Those classes are from a styling framework that helps with scoping CSS. It looks wacky, but it’s all for a reason.
That’s not to say all this is above criticism. You could argue that robotic class names don’t allow for user stylesheets that may assist with accessibility. You could argue the superfluous divs make for an unnecessarily heavy DOM. You could argue that shipping robot barf makes the web less learnable, particularly without sourcemaps.
There are things to talk about, but just seeing a bunch of divs with weird class names doesn’t mean it’s bad code. And it’s not limited to React Native either, loads of frameworks have their own special twists in what they actually ship to browsers, and it’s almost always in service of making the site work better in some fashion, not to serve in teaching or readability.
There is a non-trivial rationale for this markup which is, this markup actually makes it hard to programmically scrape via libraries such as lxml or Beautiful Soup.
This moves “data collection use cases” for Twitter away from the web, and into the Twitter API, where Twitter can more easily control rate and access. Mission accomplished.
It’s a protection against scrapers. I tried to use one but couldn’t make it work.
That is indeed the reason behind this “robotic” DOM construction. And about the post in general, there is no excuse aside of security through obscurity (against scrappers) to not build a lightweight and fast web app. You don’t need any type of obfuscation to make it work better in touch devices, it makes no sense!
That won’t help against scrapers though, since somewhere at the bottom of that nested hell is a single
<article>. Its structure is pretty static so it can be traversed by scraper just fine.