Chapter 4: Search | CSS-Tricks

Previously in web history…

After an influx of rapid browser development following the creation of the web, Mosaic becomes the popular choice. Recognizing the commercial potential of the web, a team at O’Reilly builds GNN, the first commercial website. With something to browse with, and something to browse for, more and more people begin to turn to the web. Many create small, personal sites of their own. The best the web has to offer becomes almost impossible to find.

eBay had had enough of these spiders. They were fending them off by the thousands. Their servers buzzed with nonstop activity; a relentless stream of trespassers. One aggressor, however, towered above the rest. Bidder’s Edge, which billed itself as an auction aggregator, would routinely crawl the pages of eBay to extract its content and list it on its own site alongside other auction listings.

The famed auction site had unsuccessfully tried blocking Bidder’s Edge in the past. Like an elaborate game of Whac-A-Mole, they would restrict the IP address of a Bidder’s Edge server, only to be breached once again by a proxy server with a new one. Technology had failed. Litigation was next.

eBay filed suit against Bidder’s Edge in December of 1999, citing a handful of causes. That included “an ancient trespass theory known to legal scholars as trespass to chattels, basically a trespass or interference with real property — objects, animals, or, in this case, servers.” eBay, in other words, was arguing that Bidder’s Edge was trespassing — in the most medieval sense of that word — on their servers. In order for it to constitute trespass to chattels, eBay had to prove that the trespassers were causing harm. That their servers were buckling under the load, they argued, was evidence of that harm.

Judge Ronald M. Whyte found that last bit compelling. Quite a bit of back and forth followed, in one of the strangest lawsuits of a new era that included the phrase “rude robots” entering the official court record. These robots — as opposed to the “polite” ones — ignored eBay’s requests to block spidering on their sites, and made every attempt to circumvent counter measures. They were, by the judge’s estimation, trespassing. Whyte granted an injunction to stop Bidder’s Edge from crawling eBay until it was all sorted out.

Several appeals and countersuits and counter-appeals later, the matter was settled. Bidder’s Edge paid eBay an undisclosed amount and promptly shut their doors. eBay had won this particular battle. They had gotten rid of the robots. But the actual war was already lost. The robots — rude or otherwise — were already here.

If not for Stanford University, web search may have been lost. It is the birthplace of Yahoo!, Google and Excite. It ran the servers that ran the code that ran the first search engines. The founders of both Yahoo! and Google are alumni. But many of the most prominent players in search were not in the computer science department. They were in the symbolic systems program.

Symbolic systems was created at Stanford in 1985 as a study of the “relationship between natural and artificial systems that represent, process, and act on information.” Its interdisciplinary approach is rooted at the intersection of several fields: linguistics, mathematics, semiotics, psychology, philosophy, and computer science.

These are the same fields of study one would find at the heart of artificial intelligence research in the second half of the 20ᵗʰ century. But this isn’t the A.I. in its modern smart home manifestation, but in the more classical notion conceived by computer scientists as a roadmap to the future of computing technology. It is the understanding of machines as a way to augment the human mind. That parallel is not by accident. One of the most important areas of study at the symbolics systems program is artificial intelligence.

Numbered among the alumni of the program are several of the founders of Excite and Srinija Srinivasan, the fourth employee at Yahoo!. Her work in artificial intelligence led to a position at the ambitious A.I. research lab Cyc right out of college.

Marisa Mayer, an early employee at Google and, later, Yahoo!’s CEO, also drew on A.I. research during her time in the symbolic systems program. Her groundbreaking thesis project used natural language processing to help its users find the best flights through a simple conversation with a computer. “You look at how people learn, how people reason, and ask a computer to do the same things. It’s like studying the brain without the gore,” she would later say of the program.

Search on the web stems from this one program at one institution at one brief moment in time. Not everyone involved in search engines studied that program — the founders of both Yahoo! and Google, for instance, were graduate students of computer science. But the ideology of search is deeply rooted in the tradition of artificial intelligence. The goal of search, after all, is to extract from the brain a question, and use machines to provide a suitable answer.

At Yahoo!, the principles of artificial intelligence acted as a guide, but it would be aided by human perspective. Web crawlers, like Excite, would bear the burden of users’ queries and attempt to map websites programmatically to provide intelligent results.

However, it would be at Google that A.I. would become an explicitly stated goal. Steven Levy, who wrote the authoritative book on the history of Google, In the Plex, describes Google as a “vehicle to realize the dream of artificial intelligence in augmenting humanity.” Founders Larry Page and Sergey Brin would mention A.I. constantly. They even brought it up in their first press conference.

The difference would be a matter of approach. A tension that would come to dominate search for half a decade. The directory versus the crawler. The precision of human influence versus the completeness of machines. Surfers would be on one side and, on the other, spiders. Only one would survive.

The first spiders were crude. They felt around in the dark until they found the edge of the web. Then they returned home. Sometimes they gathered little bits of information about the websites they crawled. In the beginning, they gathered nothing at all.

One of the earliest web crawlers was developed at MIT by Matthew Gray. He used his World Wide Wanderer to go and find every website on the web. He wasn’t interested in the content of those sites, he merely wanted to count them up. In the summer of 1993, the first time he sent his crawler out, it got to 130. A year later, it would count 3,000. By 1995, that number grew to just shy of 30,000.

Like many of his peers in the search engine business, Gray was a disciple of information retrieval, a subset of computer science dedicated to knowledge sharing. In practice, information retrieval often involves a robot (also known as “spiders, crawlers, wanderers, and worms”) that crawls through digital documents and programmatically collects their contents. They are then parsed and stored in a centralized “index,” a shortcut that eliminates the need to go and crawl every document each time a search is made. Keeping that index up to date is a constant struggle, and robots need to be vigilant; going back out and re-crawling information on a near constant basis.

The World Wide Web posed a problematic puzzle. Rather than a predictable set of documents, a theoretically infinite number of websites could live on the web. These needed to be stored in a central index —which would somehow be kept up to date. And most importantly, the content of those sites needed to be connected to whatever somebody wanted to search, on the fly and in seconds. The challenge proved irresistible for some information retrieval researchers and academics. People like Jonathan Fletcher.

Fletcher, a former graduate and IT employee at the University of Stirling in Scotland, didn’t like how hard it was to find websites. At the time, people relied on manual lists, like the WWW Virtual Library maintained at CERN, or Mosaic’s list of “What’s New” that they updated daily. Fletcher wanted to handle it differently. “With a degree in computing science and an idea that there had to be a better way, I decided to write something that would go and look for me.”

He built Jumpstation in 1993, one of the earliest examples of a searchable index. His crawler would go out, following as many links as it could, and bring them back to a searchable, centralized database. Then it would start over. To solve for the issue of the web’s limitless vastness, Fletcher began by crawling only the titles and some metadata from each webpage. That kept his index relatively small, but but it also restricted search to the titles of pages.

Fletcher was not alone. After tinkering for several months, WebCrawler launched in April of 1994 out of the University of Washington. It holds the distinction of being the first search engine to crawl entire webpages and make them searchable. By November of that year, WebCrawler had served 1 million queries. At Carnegie Mellon, Michael Maudlin released his own spider-based search engine variant named for the Latin translation of wolf spider, Lycos. By 1995, it had indexed over a million webpages.

Search didn’t stay in universities long. Search engines had a unique utility for wayward web users on the hunt for the perfect site. Many users started their web sessions on a search engine. Netscape Navigator — the number one browser for new web users — connected users directly to search engines on their homepage. Getting listed by Netscape meant eyeballs. And eyeballs meant lucrative advertising deals.

In the second half of the 1990’s, a number of major players entered the search engine market. InfoSeek, initially a paid search option, was picked up by Disney, and soon became the default search engine for Netscape. AOL swooped in and purchased WebCrawler as part of a bold strategy to remain competitive on the web. Lycos was purchased by a venture capitalist who transformed it into a fully commercial enterprise.

Excite.com, another crawler started by Stanford alumni and a rising star in the search engine game for its depth and accuracy of results, was offered three million dollars not long after they launched. Its six co-founders lined up two couches, one across from another, and talked it out all night. They decided to stick with the product and bring in a new CEO. There would be many more millions to be made.

AltaVista, already a bit late to the game at the end of 1995, was created by the Digital Equipment Corporation. It was initially built to demonstrate the processing power of DEC computers. They quickly realized that their multithreaded crawler was able to index websites at a far quicker rate than their competitors. AltaVista would routinely deploy its crawlers — what one researcher referred to as a “brood of spiders” — to index thousands of sites at a time.

As a result, AltaVista was able to index virtually the entire web, nearly 10 million webpages at launch. By the following year, in 1996, they’d be indexing over 100 million. Because of the efficiency and performance of their machines, AltaVista was able to solve the scalability problem. Unlike some of their predecessors, they were able to make the full content of websites searchable, and they re-crawled sites every few weeks, a much more rapid pace than early competitors, who could take months to update their index. They set the standard for the depth and scope of web crawlers.

Never fully at rest, AltaVista used its search engine as a tool for innovation, experimenting with natural language processing, translation tools, and multi-lingual search. They were often ahead of their time, offering video and image search years before that would come to be an expected feature.

Those spiders that had not been swept up in the fervor couldn’t keep up. The universities hosting the first search engines were not at all pleased to see their internet connections bloated with traffic that wasn’t even related to the university. Most universities forced the first experimental search engines, like Jumpstation, to shut down. Except, that is, at Stanford.

Stanford’s history with technological innovation begins in the second half of the 20th century. The university was, at that point, teetering on the edge of becoming a second-tier institution. They had been losing ground and lucrative contracts to their competitors on the East Coast. Harvard and MIT became the sites of a groundswell of research in the wake of World War II. Stanford was being left behind.

In 1951, in a bid to reverse course on their downward trajectory, Dean of Engineering Frederick Terman brokered a deal with the city of Palo Alto. Stanford University agreed to annex 700 acres of land for a new industrial park that upstart companies in California could use. Stanford would get proximity to energetic innovation. The businesses that chose to move there would gain unique access to the Stanford student body for use on their product development. And the city of Palo Alto would get an influx of new taxes.

Hewlett-Packard was one of the first companies to move in. They ushered in a new era of computing-focused industry that would soon be known as Silicon Valley. The Stanford Research Park (later renamed Stanford Industrial Park) would eventually host Xerox during a time of rapid success and experimentation. Facebook would spend their nascent years there, growing into the behemoth it would become. At the center of it all was Stanford.

The research park transformed the university from one of stagnation to a site of entrepreneurship and cutting-edge technology. It put them at the heart of the tech industry. Stanford would embed itself — both logistically and financially — in the crucial technological developments of the second half of the 20ᵗʰ century, including the internet and the World Wide Web.

The potential success of Yahoo!, therefore, did not go unnoticed.

Jerry Yang and David Filo were not supposed to be working on Yahoo!. They were, however, supposed to be working together. They had met years ago, when David was Jerry’s teaching assistant in the Stanford computer science program. Yang eventually joined Filo as a graduate student and — after building a strong rapport — they soon found themselves working on a project together.

As they crammed themselves into a university trailer to begin working through their doctoral project, their relationship become what Yang has often described as perfectly balanced. “We’re both extremely tolerant of each other, but extremely critical of everything else. We’re both extremely stubborn, but very unstubborn when it comes to just understanding where we need to go. We give each other the space we need, but also help each other when we need it.”

In 1994, Filo showed Yang the web. In just a single moment, their focus shifted. They pushed their intended computer science thesis to the side, procrastinating on it by immersing themselves into the depths of the World Wide Web. Days turned into weeks which turned into months of surfing the web and trading links. The two eventually decided to combine their lists in a single place, a website hosted on their Stanford internet connection. It was called Jerry and David’s Guide to the World Wide Web, launched first to Stanford students in 1993 and then to the world in January of 1994. As catchy as that name wasn’t, the idea (and traffic) took off as friends shared with other friends.

Jerry and David’s Guide was a directory. Like the virtual library started at CERN, Yang and Filo organized websites into various categories that they made up on the fly. Some of these categories had strange or salacious names. Others were exactly what you might expect. When one category got too big, they split it apart. It was ad-hoc and clumsy, but not without charm. Through their classifications, Yang and Filo had given their site a personality. Their personality. In later years, Yang would commonly refer to this as the “voice of Yahoo!”

That voice became a guide — as the site’s original name suggested — for new users of the web. Their web crawling competitors were far more adept at the art of indexing millions of sites at a time. Yang and Filo’s site featured only a small subset of the web. But it was, at least by their estimation, the best of what the web had to offer. It was the cool web. It was also a web far easier to navigate than ever before.

Jerry Yang (left) and David Filo (right) in 1995 (Yahoo, via Flickr)

At the end of 1994, Yang and Filo renamed their site to Yahoo! (an awkward forced acronym for Yet Another Hierarchical Officious Oracle). By then, they were getting almost a hundred thousand hits a day, sometimes temporarily taking down Stanford’s internet in the process. Most other universities would have closed down the site and told them to get back to work. But not Stanford. Stanford had spent decades preparing for on-campus businesses just like this one. They kept the server running, and encouraged its creators to stake their own path in Silicon Valley.

Throughout 1994, Netscape had included Yahoo! in their browser. There was a button in the toolbar labeled “Net Directory” that linked directly to Yahoo!. Marc Andreessen, believing in the site’s future, agreed to host their website on Netscape’s servers until they were able to get on steady ground.

Yahoo! homepage in Netscape Navigator, circa 1994

Yang and Filo rolled up their sleeves, and began talking to investors. It wouldn’t take long. By the spring of 1996, they would have a new CEO and hold their own record-setting IPO, outstripping even their gracious host, Netscape. By then, they became the most popular destination on the web by a wide margin.

In the meantime, the web had grown far beyond the grasp of two friends swapping links. They had managed to categorize tens of thousands of sites, but there were hundreds of thousands more to crawl. “I picture Jerry Yang as Charlie Chaplin in Modern Times,” one journalist described, “confronted with an endless stream of new work that is only increasing in speed.” The task of organizing sites would have to go to somebody else. Yang and Filo found help in a fellow Stanford alumni, someone they had met years ago while studying abroad together in Japan, Srinija Srinivasan, a graduate of the symbolic systems program. Many of the earliest hires at Yahoo! were given slightly absurd titles that always ended in “Yahoo.” Yang and Filo went by Chief Yahoos. Srinivasan’s job title was Ontological Yahoo.

That is a deliberate and precise job title, and it was not selected by accident. Ontology is the study of being, an attempt to break the world into its component parts. It has manifested in many traditions throughout history and the world, but it is most closely associated with the followers of Socrates, in the work of Plato, and later in the groundbreaking text Metaphysics, written by Aristotle. Ontology asks the question “What exists?”and uses it as a thought experiment to construct an ideology of being and essence.

As computers blinked into existence, ontology found a new meaning in the emerging field of artificial intelligence. It was adapted to fit the more formal hierarchical categorizations required for a machine to see the world; to think about the world. Ontology became a fundamental way to describe the way intelligent machines break things down into categories and share knowledge.

The dueling definitions of the ontology of metaphysics and computer science would have been familiar to Srinija Srinivasan from her time at Stanford. The combination of philosophy and artificial intelligence in her studies gave her a unique perspective on hierarchical classifications. It was this experience that she brought to her first job after college at the Cyc Project, an artificial intelligence research lab with a bold project: to teach a computer common sense.

**Srinija Srinivasan (**Getty Images/James D. Wilson)

At Yahoo!, her task was no less bold. When someone looked for something on the site, they didn’t want back a random list of relevant results. They wanted the result they were actually thinking about, but didn’t quite know how to describe. Yahoo! had to — in a manner of seconds — figure out what its users really wanted. Much like her work in artificial intelligence, Srinivasan needed to teach Yahoo! how to think about a query and infer the right results.

To do that, she would need to expand the voice of Yahoo! to thousands of more websites in dozens of categories and sub-categories without losing the point of view established by Jerry and David. She would need to scale that perspective. “This is not a perfunctory file-keeping exercise. This is defining the nature of being,” she once said of her project. “Categories and classifications are the basis for each of our worldviews.”

At a steady pace, she mapped an ontology of human experience onto the site. She began breaking up the makeshift categories she inherited from the site’s creators, re-constituting them into more concrete and findable indexes. She created new categories and destroyed old ones. She sub-divided existing subjects into new, more precise ones. She began cross-linking results so that they could live within multiple categories. Within a few months she had overhauled the site with a fresh hierarchy.

That hierarchical ontology, however, was merely a guideline. The strength of Yahoo!’s expansion lay in the 50 or so content managers she had hired in the meantime. They were known as surfers. Their job was to surf the web — and organize it.

Each surfer was coached in the methodology of Yahoo! but were left with a surprising amount of editorial freedom. They cultivated the directory with their own interests, meticulously deliberating over websites and where they belong. Each decision could be strenuous, and there were missteps and incorrectly categorized items along the way. But by allowing individual personality to dictate hierarchal choices, Yahoo! retained its voice.

They gathered as many sites as they could, adding hundreds each day. Yahoo! surfers did not reveal everything on the web to their site’s visitors. They showed them what was cool. And that meant everything to users grasping for the very first time what the web could do.

At the end of 1995, the Yahoo! staff was watching their traffic closely. Huddled around consoles, employees would check their logs again and again, looking for a drop in visitors. Yahoo! had been the destination for the “Internet Directory” button on Netscape for years. It had been the source of their growth and traffic. Netscape had made the decision, at the last minute (and seemingly at random), to drop Yahoo!, replacing them with the new kids on the block, Excite.com. Best case scenario: a manageable drop. Worst case: the demise of Yahoo!.

But the drop never came. A day went by, and then another. And then a week. And then a few weeks. And Yahoo! remained the most popular website. Tim Brady, one of Yahoo!’s first employees, describes the moment with earnest surprise. “It was like the floor was pulled out in a matter of two days, and we were still standing. We were looking around, waiting for things to collapse in a lot of ways. And we were just like, I guess we’re on our own now.”

Netscape wouldn’t keep their directory button exclusive for long. By 1996, they would begin allowing other search engines to be listed on their browser’s “search” feature. A user could click a button and a drop-down of options would appear, for a fee. Yahoo! bought themselves back in to the drop-down. They were joined by four other search engines, Lycos, InfoSeek, Excite, and AltaVista.

By that time, Yahoo! was the unrivaled leader. It had transformed its first mover advantage into a new strategy, one bolstered by a successful IPO and an influx of new investment. Yahoo! wanted to be much more than a simple search engine. Their site’s transformation would eventually be called a portal. It was a central location for every possible need on the web. Through a number of product expansions and aggressive acquisitions, Yahoo! released a new suite of branded digital products. Need to send an email? Try Yahoo! Mail. Looking to create website? There’s Yahoo! Geocities. Want to track your schedule? Use Yahoo! Calendar. And on and on the list went.

Competitors rushed the fill the vacuum of the #2 slot. In April of 1996, Yahoo!, Lycos and Excite all went public to soaring stock prices. Infoseek had their initial offering only a few months later. Big deals collided with bold blueprints for the future. Excite began positioning itself as a more vibrant alternative to Yahoo! with more accurate search results from a larger slice of the web. Lycos, meanwhile, all but abounded the search engine that had brought them initial success to chase after the portal-based game plan that had been a windfall for Yahoo!.

The media dubbed the competition the “portal wars,” a fleeting moment in web history when millions of dollars poured into a single strategy. To be the biggest, best, centralized portal for web surfers. Any service that offered users a destination on the web was thrown into the arena. Nothing short of the future of the web (and a billion dollar advertising industry) was at stake.

In some ways, though, the portal wars were over before they started. When Excite announced a gigantic merger with @Home, an Internet Service Provider, to combine their services, not everyone thought it was a wise move. “AOL and Yahoo! were already in the lead,” one investor and cable industry veteran noted, “and there was no room for a number three portal.” AOL had just enough muscle and influence to elbow their way into the #2 slot, nipping at the heels of Yahoo!. Everyone else would have to go toe-to-toe with Goliath. None were ever able to pull it off.

Battling their way to market dominance, most search engines had simply lost track of search. Buried somewhere next to your email and stock ticker and sports feed was, in most cases, a second rate search engine you could use to find things — only not often and not well. That’s is why it was so refreshing when another search engine out of Stanford launched with just a single search box and two buttons, its bright and multicolored logo plastered across the top.

A few short years after it launched, Google was on the shortlist of most popular sites. In an interview with PBS Newshour in 2002, co-founder Larry Page described their long-term vision. “And, actually, the ultimate search engine, which would understand, you know, exactly what you wanted when you typed in a query, and it would give you the exact right thing back, in computer science we call that artificial intelligence.”

Google could have started anywhere. It could have started with anything. One employee recalls an early conversation with the site’s founders where he was told “we are not really interested in search. We are making an A.I.” Larry Page and Sergey Brin, the creators of Google, were not trying to create the web’s greatest search engine. They were trying to create the web’s most intelligent website. Search was only their most logical starting point.

Imprecise and clumsy, the spider-based search engines of 1996 faced an uphill battle. AltaVista had proved that the entirety of the web, tens of millions of webpages, could be indexed. But unless you knew your way around a few boolean logic commands, it was hard to get the computer to return the right results. The robots were not yet ready to infer, in Page’s words, “exactly what you wanted.”

Yahoo! had filled in these cracks of technology with their surfers. The surfers were able to course-correct the computers, designing their directory piece by piece rather than relying on an algorithm. Yahoo! became an arbiter of a certain kind of online chic; tastemakers reimagined for the information age. The surfers of Yahoo! set trends that would last for years. Your site would live or die by their hand. Machines couldn’t do that work on their own. If you wanted your machines to be intelligent, you needed people to guide them.

Page and Brin disagreed. They believed that computers could handle the problem just fine. And they aimed to prove it.

That unflappable confidence would come to define Google far more than their “don’t be evil” motto. In the beginning, their laser-focus on designing a different future for the web would leave them blind to the day-to-day grind of the present. On not one, but two occasions, checks made out to the company for hundreds of thousands of dollars were left in desk drawers or car trunks until somebody finally made the time to deposit them. And they often did things different. Google’s offices, for instances, were built to simulate a college dorm, an environment the founders felt most conducive to big ideas.

Google would eventually build a literal empire on top of a sophisticated, world-class infrastructure of their own design, fueled by the most elaborate and complex (and arguably invasive) advertising mechanism ever built. There are few companies that loom as large as Google. This one, like others, started at Stanford.

Even among the most renowned artificial intelligence experts, Terry Winograd, a computer scientist and Stanford professor, stands out in the crowd. He was also Larry Page’s advisor and mentor when he was a graduate student in the computer science department. Winograd has often recalled the unorthodox and unique proposals he would receive from Page for his thesis project, some of which involved “space tethers or solar kites.” “It was science fiction more than computer science,” he would later remark.

But for all of his fanciful flights of imagination, Page always returned to the World Wide Web. He found its hyperlink structure mesmerizing. Its one-way links — a crucial ingredient in the web’s success — had led to a colossal proliferation of new websites. In 1996, when Page first began looking at the web, there were tens of thousands of sites being added every week. The master stroke of the web was to enable links that only traveled in one direction. That allowed the web to be decentralized, but without a central database tracking links, it was nearly impossible to collect a list of all of the sites that linked to a particular webpage. Page wanted to build a graph of who was linking to who; an index he could use to cross-reference related websites.

Page understood that the hyperlink was a digital analog to academic citations. A key indicator of the value of a particular academic paper is the amount of times it has been cited. If a paper is cited often (by other high quality papers), it is easier to vouch for its reliability. The web works the same way. The more often your site is linked to (what’s known as a backlink), the more dependable and accurate it is likely to be.

Theoretically, you can determine the value of a website by adding up all of the other websites that link to it. That’s only one layer though. If 100 sites link back to you, but each of them has only ever been linked to one time, that’s far less valuable than if five sites that each have been linked to 100 times link back to you. So it’s not simply how many links you have, but the quality of those links. If you take both of those dimensions and aggregate sites using backlinks as a criteria, you can very quickly start to assemble a list of sites ordered by quality.

John Battelle describes the technical challenge facing Page in his own retelling of the Google story, The Search.

Page realized that a raw count of links to a page would be a useful guide to that page’s rank. He also saw that each link needed its own ranking, based on the link count of its originating page. But such an approach creates a difficult and recursive mathematical challenge — you not only have to count a particular page’s links, you also have to count the links attached to the links. The math gets complicated rather quickly.

Fortunately, Page already knew a math prodigy. Sergey Brin had proven his brilliance to the world a number of times before he began a doctoral program in the Stanford computer science department. Brin and Page had crossed paths on several occasions, a relationship that began on rocky ground but grew towards mutual respect. The mathematical puzzle at the center of Page’s idea was far too enticing for Brin to pass up.

He got to work on a solution. “Basically we convert the entire Web into a big equation, with several hundred million variables,” he would later explain, “which are the page ranks of all the Web pages, and billions of terms, which are the links. And we’re able to solve that equation.” Scott Hassan, the seldom talked about third co-founder of Google who developed their first web crawler, summed it up a bit more concisely, describing Google’s algorithm as an attempt to “surf the web backward!”

The result was PageRank — as in Larry Page, not webpage. Brin, Page, and Hassan developed an algorithm that could trace backlinks of a site to determine the quality of a particular webpage. The higher value of a site’s backlinks, the higher up the rankings it climbed. They had discovered what so many others had missed. If you trained a machine on the right source — backlinks — you could get remarkable results.

It was only after that that they began matching their rankings to search queries when they realized PageRank fit best in a search engine. They called their search engine Google. It was launched on Stanford’s internet connection in August of 1996.

Google solved the relevancy problem that had plagued online search since its earliest days. Crawlers like Lycos, AltaVista and Excite were able to provide a list of webpages that matched a particular search. They just weren’t able to sort them right, so you had to go digging to find the result you wanted. Google’s rankings were immediately relevant. The first page of your search usually had what you needed. They were so confident in their results they added an “I’m Feeling Lucky” button which took users directly to the first result for their search.

Google’s growth in their early days was not unlike Yahoo!’s in theirs. They spread through word of mouth, from friends to friends of friends. By 1997, they had grown big enough to put a strain on the Stanford network, something Yang and Filo had done only a couple of years earlier. Stanford once again recognized possibility. It did not push Google off their servers. Instead, Stanford’s advisors pushed Page and Brin in a commercial direction.

Initially, the founders sought to sell or license their algorithm to other search engines. They took meetings with Yahoo!, Infoseek and Excite. No one could see the value. They were focused on portals. In a move that would soon sound absurd, they each passed up the opportunity to buy Google for a million dollars or less, and Page and Brin could not find a partner that recognized their vision.

One Stanford faculty member was able to connect them with a few investors, including Jeff Bezos and David Cheriton (which got them those first few checks that sat in a desk drawer for weeks). They formally incorporated in September of 1998, moving into a friend’s garage, bringing a few early employees along, including symbolics systems alumni Marissa Mayer.

Larry Page (left) and Sergey Brin (right) started Google in a friend’s garage.

Even backed by a million dollar investment, the Google founders maintained a philosophy of frugality, simplicity, and swiftness. Despite occasional urging from their investors, they resisted the portal strategy and remained focused on search. They continued tweaking their algorithm and working on the accuracy of their results. They focused on their machines. They wanted to take the words that someone searched for and turn them into something actually meaningful. If you weren’t able to find the thing you were looking for in the top three results, Google had failed.

Google was followed by a cloud of hype and positive buzz in the press. Writing in Newsweek, Steven Levy described Google as a “high-tech version of the Oracle of Delphi, positioning everyone a mouse click away from the answers to the most arcane questions — and delivering simple answers so efficiently that the process becomes addictive.” It was around this time that “googling” — a verb form of the site synonymous with search — entered the common vernacular. The portal wars were still waging, but Google was poking its head up as a calm, precise alternative to the noise.

At the end of 1998, they were serving up ten thousand searches a day. A year later, that would jump to seven million a day. But quietly, behind the scenes, they began assembling the pieces of an empire.

As the web grew, technologists and journalists predicted the end of Google; they would never be able to keep up. But they did, outlasting a dying roster of competitors. In 2001, Excite went bankrupt, Lycos closed down, and Disney suspended Infoseek. Google climbed up and replaced them. It wouldn’t be until 2006 that Google would finally overtake Yahoo! as the number one website. But by then, the company would transform into something else entirely.

After securing another round of investment in 1999, Google moved into their new headquarters and brought on an army of new employees. The list of fresh recruits included former engineers at AltaVista, and leading artificial intelligence expert Peter Norvig. Google put an unprecedented focus on advancements in technology. Better servers. Faster spiders. Bigger indexes. The engineers inside Google invented a web infrastructure that had, up to that point, been only theoretical.

They trained their machines on new things, and new products. But regardless of the application, translation or email or pay-per-click advertising, they rested on the same premise. Machines can augment and re-imagine human intelligence, and they can do it at limitless scale. Google took the value proposition of artificial intelligence and brought it into the mainstream.

In 2001, Page and Brin brought in Silicon Valley veteran Eric Schmidt to run things as their CEO, a role he would occupy for a decade. He would oversee the company during its time of greatest growth and innovation. Google employee #4 Heather Cairns recalls his first days on the job. “He did this sort of public address with the company and he said, ‘I want you to know who your real competition is.’ He said, ‘It’s Microsoft.’ And everyone went, What?“

Bill Gates would later say, “In the search engine business, Google blew away the early innovators, just blew them away.” There would come a time when Google and Microsoft would come face to face. Eric Schmidt was correct about where Google was going. But it would take years for Microsoft to recognize Google as a threat. In the second half of the 1990’s, they were too busy looking in their rearview mirror at another Silicon Valley company upstart that had swept the digital world. Microsoft’s coming war with Netscape would subsume the web for over half a decade.

Enjoy learning about web history with stories just like this? Jay is telling the full story of the web, with new stories every 2 weeks. Sign up for his newsletter to catch up on the latest... of what's past.

John

# September 15, 2020

I believe PageRank was invented at the Internet Archive and appropriated by Google.

Jay Hoffmann

Permalink to comment# September 16, 2020

I did not come across that in my research, but would love to read more about it if you have any links for me to follow.
Frank

Permalink to comment# December 24, 2020

… PageRank is just a silly idea in practice, but it is beautiful mathematically. You start off with a simple idea, such as the quality of a page is the sum of the quality of the pages that link to it, times a scalar. This sets you up with finding the eigen vectors of a huge sparse matrix. And because it is so much work, Google appears not to be updating its PageRank values that much.

A lot of people think this way of analyzing links was invented by Google. It wasn’t. As I said, IBM’s CLEVER project did it first. Furthermore, it doesn’t work. Well, it works, but no better than the simple link and link text analysis methods employed by the other engines. I know this because we implemented our own version at Infoseek and didn’t see a whole lot of difference. And Yahoo did a result comparison between Google and Inktomi before it purchased Inktomi and concluded the same thing.

What I think really brought Google into the spotlight was its index size, speed, and dynamically generated summaries. Those were Google’s winning points and still are till this day, not PageRank.SK Now tell us about Gigablast.

From: queue.acm.org “A Conversation with Matt Wells”

Markus Seyfferth

# September 22, 2020

What a fantastic read. Really enjoyed this masterpiece of web search, thank you for writing (and publishing) that!

Zuhair A K

# September 23, 2020

Excellent history of search engine evolution. Would love to read next chapters. Isn’t it Peter Norvig instead of Peter Norving ?

Previously in web history…

Comments