Searching for information: If I were Neeva

Almost from its start, Google Search has been regraded as best-in-class, delivering more relevant results to both general and specific queries than competitors. Maintaining this relevance and dominance has required large and ongoing investments in the search algorithm and web indexers, investments recouped thorough the use of user data for targeted advertising. In this, Google is solving two simultaneous search problems: users searching for information they don’t know, and advertisers searching for users who may purchase their products.

Many users are disconcerted by this (~anonymized) use of their search history to display ads, motivating various competitors to Google Search. DuckDuckGo, for instance, also sells advertising space in search listings, but they don’t use user information in ad display. Lacking Google’s profitability and scale their results are less relevant, but sufficiently for many users. Startpage is similarly motivated, paying Google to display Goggle search results while showing generic, not user-specific, Google ads. In this they’re like a pure Google Search, one that isn’t trying to maximize profit or pay for bad ideas.

In this vein, I was interested to see the launch of Neeva as a ‘subscription search’ service, paying for Google Search results without ads or (Google) tracking. While some people will be attracted to this, it seems like a minor differentiator, as the primary user experience is still delivered entirely by Google, one subject to cancellation at Google’s whim.

If I were to start a search company, I’d start by recognizing that many individual searches are not successful (1) because either the user doesn’t know what they’re actually searching for (2) or how to phrase the question in sufficiently specific way (3). I’d focus on the value of questions: searching for information is valuable because it is an attempt to take what I know and learn something else.

That is to say that while I’m sometimes scatterbrained (*multitasking), I rarely conduct multiple, unrelated searches at the same time. Successive searches, observation (1), are an indication of the failure of the prior searches to produce a sufficient answer, yet Google and every other search I’ve seen neglect this temporal aspect continuing to show results you’ve already rejected. Most of this is due to the too-simple interface of text box + contextualized hyperlinks, preventing search from attempting to group results by their textual similarity, vintage, or domain. It’s nice if the answer can be in the first three results, but that is often not the case and search should help us better understand how various results compare to each other. This naturally leads to a longer interaction with the search results — the thing Google pays to serve — as the user is encouraged to see how the results differ and through a click or two to delve into the differences before leaving the search engine. And here’s the critical point: if the search engine can show the user the point at which it cannot determine which of two groups of results is more responsive to the users query, that is precisely the limitation of the search engine and the kind of information the search engine needs to learn to serve better results. Keeping users on the search page for that initial screening allows the collection of data to make search better. Not to sell ads, but to make search better!

Building off (1), (2) realizes that many searches have as their target a vague notion, something that may be only half remembered or a conjecture the user is looking to verify or disambiguate. Here, the search limitation of one query = one set of results is limiting, as multiple queries might be more helpful in circumlocuating to what the user is actually asking. Google attempts this with the ‘other users searched for’, but these suggestions are almost always entirely different questions. When the user hasn’t made a sufficiently specific query, the thing to do is to ask them for more context, not suggest a different question. In recognizing when the results are insufficiently specific and facilitating this narrowing, the search engine is again able to gather more information to more directly answer the user’s question, again making search better.

In the other direction, (3), some searches are nearly impossible. Searching for anything to do with Microsoft Excel, for instance, invariably returns pages of low-value clutter. Google search provides no ability to filter out SEO clickbait nor any facility to distinguish between the needs of beginning and advanced users. The problem isn’t in making the query exact but specifying the kinds of resources to exclude in a way the search engine understands. Now, Google and other engines have long offered boolean keywords or the ability to negate (-) a search term, but because of the interface it can be challenging to verify that the query has been parsed correctly. Just as rephrasing a question before answering communicates understanding of the question, search engines should attempt to communicate to the user that their pages of results are not just wasting the user’s time.

So if I were Neeva (or DuckDuckGo since they’ve written their own search engine), I’d attempt to make search a better experience, by providing better results and context than Google does. I’d classify users according to their search history/sophistication and prefer results that other users in their class have found helpful, while at the same time making clear that the results are tailored based on their search history and offering a way to remove the class restriction. Search that uses my and other users search history to become better, in a transparent way, that’s what I’d subscribe to.