Searching for information: If I were Neeva

Almost from its start, Google Search has been regraded as best-in-class, delivering more relevant results to both general and specific queries than competitors. Maintaining this relevance and dominance has required large and ongoing investments in the search algorithm and web indexers, investments recouped thorough the use of user data for targeted advertising. In this, Google is solving two simultaneous search problems: users searching for information they don’t know, and advertisers searching for users who may purchase their products.

Many users are disconcerted by this (~anonymized) use of their search history to display ads, motivating various competitors to Google Search. DuckDuckGo, for instance, also sells advertising space in search listings, but they don’t use user information in ad display. Lacking Google’s profitability and scale their results are less relevant, but sufficiently for many users. Startpage is similarly motivated, paying Google to display Goggle search results while showing generic, not user-specific, Google ads. In this they’re like a pure Google Search, one that isn’t trying to maximize profit or pay for bad ideas.

In this vein, I was interested to see the launch of Neeva as a ‘subscription search’ service, paying for Google Search results without ads or (Google) tracking. While some people will be attracted to this, it seems like a minor differentiator, as the primary user experience is still delivered entirely by Google, one subject to cancellation at Google’s whim.

If I were to start a search company, I’d start by recognizing that many individual searches are not successful (1) because either the user doesn’t know what they’re actually searching for (2) or how to phrase the question in sufficiently specific way (3). I’d focus on the value of questions: searching for information is valuable because it is an attempt to take what I know and learn something else.

That is to say that while I’m sometimes scatterbrained (*multitasking), I rarely conduct multiple, unrelated searches at the same time. Successive searches, observation (1), are an indication of the failure of the prior searches to produce a sufficient answer, yet Google and every other search I’ve seen neglect this temporal aspect continuing to show results you’ve already rejected. Most of this is due to the too-simple interface of text box + contextualized hyperlinks, preventing search from attempting to group results by their textual similarity, vintage, or domain. It’s nice if the answer can be in the first three results, but that is often not the case and search should help us better understand how various results compare to each other. This naturally leads to a longer interaction with the search results — the thing Google pays to serve — as the user is encouraged to see how the results differ and through a click or two to delve into the differences before leaving the search engine. And here’s the critical point: if the search engine can show the user the point at which it cannot determine which of two groups of results is more responsive to the users query, that is precisely the limitation of the search engine and the kind of information the search engine needs to learn to serve better results. Keeping users on the search page for that initial screening allows the collection of data to make search better. Not to sell ads, but to make search better!

Building off (1), (2) realizes that many searches have as their target a vague notion, something that may be only half remembered or a conjecture the user is looking to verify or disambiguate. Here, the search limitation of one query = one set of results is limiting, as multiple queries might be more helpful in circumlocuating to what the user is actually asking. Google attempts this with the ‘other users searched for’, but these suggestions are almost always entirely different questions. When the user hasn’t made a sufficiently specific query, the thing to do is to ask them for more context, not suggest a different question. In recognizing when the results are insufficiently specific and facilitating this narrowing, the search engine is again able to gather more information to more directly answer the user’s question, again making search better.

In the other direction, (3), some searches are nearly impossible. Searching for anything to do with Microsoft Excel, for instance, invariably returns pages of low-value clutter. Google search provides no ability to filter out SEO clickbait nor any facility to distinguish between the needs of beginning and advanced users. The problem isn’t in making the query exact but specifying the kinds of resources to exclude in a way the search engine understands. Now, Google and other engines have long offered boolean keywords or the ability to negate (-) a search term, but because of the interface it can be challenging to verify that the query has been parsed correctly. Just as rephrasing a question before answering communicates understanding of the question, search engines should attempt to communicate to the user that their pages of results are not just wasting the user’s time.

So if I were Neeva (or DuckDuckGo since they’ve written their own search engine), I’d attempt to make search a better experience, by providing better results and context than Google does. I’d classify users according to their search history/sophistication and prefer results that other users in their class have found helpful, while at the same time making clear that the results are tailored based on their search history and offering a way to remove the class restriction. Search that uses my and other users search history to become better, in a transparent way, that’s what I’d subscribe to.

If I Were: Arm&Hammer

‘Fresh box for baking!’

..I’d sell baking soda in 1 teaspoon packages.   They already realize the need (‘Fresh box for baking!’) but choose to sell 100tsp per container for $0.94.  Other parts of the box tell me to change boxes every month, so, according to Arm&Hammer, 95% (5tsp = 5 batches of cookies / mo) of the product they sell to customers is intended to be wasted.  I’ll gladly pay $1/5tsp/mo to have a less self-contradicting package!

If I Were: Smuckers
Smucker’s Grape Jelly

Seriously. Sell jelly in squeeze tubes. Just be done with it already.  OK, get fancy and sell PB&J in a dual-tube.  Tube packaging efficiency is so much better, and, if you do the nozzle right, you don’t need to refrigerate because no air will enter (see Franzia/bagged wine).  TetraPak and others have this figured out; why are we still discussing this?


A machine to help us communicate…

The last time I had free nights and weekends, 2011, our country was growing predictably chipper over the coming election, so I spent some of that free time considering debaterdebator.

The idea was simple: build a machine to help us communicate. I wondered whether personal relationships could be leveraged to draw friends into better discussions on political things, whether that discussion might benefit from ready access to conversation aides, and if the participants could be encouraged beyond the cable news/talk radio and red/blue strawmen. Fundamentally, I thought (and think), that people desire increased good for their selves, family, friends, and broader culture, that they would find it disconcerting to disagree with people they already trust and interact with, and that these relationships would have the best chance of drawing people together, into some greater understanding of and respect for our mutual interests and individual concerns.

Now, this idea struck some friends as obviously bad; that once you start down the partisan debate path, forever will it taint the relationship. I think that’s both wrong and unfortunate.  It’s unfortunate because it is the willful maintenance of a veneer, a retreat from true, honest conversation and, actually, a diminution of the friendship.  That’s Facebook, that’s the echo chamber we have today.  Instead, the (benevolent!) platform could strategically choose topics for conversation and then support the users in the formulation and conduct of a discussion.

So, in the case that uncle Jimbo had posted/tweeted articles and messages critical of the latest IPCC report on climate change, and nephew Jimmy the converse, our platform would begin with their relationship and detect that the article content and audience graph between these two relatives only minimally overlapped.  From there it would have prompted each person to review the other’s endorsements and encourage them to pose questions to each other.  Since neither participant is an expert in climate change, the platform would use the same article graph to find and suggest bridge articles that appeared to span the difference in perspective and might serve as a basis for shared knowledge and increased accord.  By encouraging each person to question the other (filtering, at least crudely, against attacks) within the context of a shared set of information, I hoped that they would come to a better understanding of each other’s perspective, their own, their culture, and their world.  And this would be valuable to them, to our society, and to the many interested parties.

I believed this interaction was enabled by reasoning over social platforms and the simple encouraging of people to seek explanations for their positions and, in citing them, attempt to defend their sources and arguments against those mustered charitably by their friend.  So, the platform should be a quest to discover truth, at first between two friends and likely reaching a trivial depth, but it had the potential to stoke something other than partisan, tribal cynicism, which we have far too much of.

I believed a this system was technologically possible 5 years ago (2011); it is even more the case today, and this is slowly being realized in a predictably breathless manner:

Those all referred to Facebook, and indeed their seat high atop Mt. Data enables them to rain benefits and harms on those below according to, essentially, their Ferengi whim.

While there are rational fears about the power of big-data (with the right analysis Facebook can create knowledge, and hence powers, that no other entity save the NSA can approach), it need not be exercised so insidiously.  I very much believe technological advances are tools that we can choose to apply for our benefit; so the contrast between what Facebook is doing and what I wanted to do is the user’s choice.  Facebook wants to drive greater engagement to expose their users to more ads, to encourage consumerism.  The problem is that Facebook is not helping people; it seeks to capitalize on their preferences, fears, and weaknesses, rather than aid their discovery, growth, and participation in the world.  It and many other platforms today provide a useful service, but they could do so much more.

Enable your customers

Here’s one of many articles describing the challenge of maintaining personal privacy amidst the spread of data brokers

The core and often unacknowledged challenge here is the asymmetry, that people will generally modify their behavior (self-censor in some contexts) according to their surroundings and present purposes.  This ability falters when we do not know what our friends and adversaries know about us, preventing us from leveraging our friendship or guarding against their power.  So far we’ve avoided the utopic and dystopic extremes, but the classic example of Target illustrates the potential benefits and dangers:

“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again.

On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”

These situations and extremes can be avoided.  While the first responsibility is on companies to ethically and honestly interact with their customers, that is essential, I can see the temptation to wring customer data for pennies, and that once some return has been achieved to expand these datastores. As Target stated in the preceding, “We’ve developed a number of research tools that allow us to gain insights into trends and preferences within different demographic segments of our guest population.” and “Our mission is to make Target the preferred shopping destination for our guests by delivering outstanding value, continuous innovation and exceptional guest experience…” 

Years ago, I complained on twitter that companies like Roundy’s, in the notoriously low-margin grocery sector, have customer loyalty programs but retain all of the data and conclusions for their internal use.  Hiding the purpose and mechanism of these programs is one way to avoid customer ire, but this tactic is one misstep or abuse from cataclysmic failure.  I think it better to share the results with customers.  Given the limited shelf life of many products and limited shelf space for all the rest, customer predictability is directly related to efficient stocking and pricing, but they could do much better than coupons in creating and maintaining product demand.  Specifically, grocery stores should email or have an app that forecasts my likely grocery needs and serves as a starting point for the weekly grocery list. It is quite easy to curate a list of things that a given customer uses on a weekly and monthly basis and is probably close to running out of.  This is a service to the customer, an aide to their busy life, but most importantly it shifts the customer’s mindset from one of generally needing to get groceries to a plan to go to a particular grocery store for these specific items…and some others.  Coupons could be issued in the email/app for ancillary purchases (Cool Whip® to go with the planned ice cream), but they’re made more valuable because they can be applied to the customer’s shopping list, reinforcing the commitment and articulating the expected savings.  At this point it does not matter if other stores have the same prices and coupons, they can’t match the convenience.  This system would naturally encourage online/pre-bagged grocery pickup and it could give a very meaningful perspective on the customer’s consumption trends.  Again, they have the data and are currently using it to improve their processes while denying the same benefits and insights to their customers; if they open the access both grocer and consumer can understand their consumption, but they can also collaborate and discover a more efficient relationship.

So, I do not see a reason for companies to hide their insights from their customers and believe that the negative effects, both in bad PR and missed opportunities, exceed whatever benefit there is in deceiving customers.  As we both know, it’s been too long since I had a Twinkie..

LED Christmas Lights

For any given product, it is always interesting to see which aspects are improved and which languish.  Engadget’s recommendation of the best LED and incandescent Christmas lights highlighted this; the difference between the LED and incandescent strands is almost entirely restricted to the bulb (and associated electronic driver).

LED Christmas light (top) versus incandescent (bottom)

Looking carefully at the above picture, the incandescent bulb can be removed from the socket while the continuous mold line (lying exactly between the socket’s two flat sides) suggests that the LED bulb cannot be removed.  Assuming this, I’d wager that the LED bulb is first connected to the wires and then the socket molded around this connection.  The socket length and design, then, serve no functional purpose beyond fulfilling the consumer’s expectation.

A simple machine to make LED Christmas lights. From left; a spool of multistrand, multiconductor wire passes between two vertical-alignment rollers, through a wire-puller (drive), and into the LED inserter. LEDs fall from the vertical hopper into a bit and are driven by the pneumatic cylinder into the wire, such that their leads pierce the wire and making electrical contact. Exiting the wire, the leads are bent against the wire in the same manner as a stapler curls the staple on the back of the paper. This prototype does not encase the LEDs, so after the press the light strand is wound around a spool for packaging.

LEDs can be inserted directly into the wire, then encapsulated in plastic/rubber for electrical isolation.

LED Christmas lights were just coming to the market during my senior year in high school.  My senior project focused on the attachment of the bulb to the wire, where I realized that the increase in bulb quality (incandescent to LED) and associated decrease in bulb failure lessened the need for consumer-replaceable bulbs.  So, I designed a light strand where the LEDs were directly inserted into the wire and also a machine to construct these strands.

Removing the socket results in a more compact light strand which should be cheaper to produce (less material and elimination of a dedicated electrical assembly) and less visually-intrusive because the ‘socket’ has been substantially reduced (Christmas light strands are green to blend in with the tree).  Electrical contact is maintained without soldering by the compliance of the wire, much as a nail driven into wood is retained by forces from the compressed fibers.

I’ve learned much in the ten years since this project, but this idea remains relevant and would be fun to revisit.  I built this project in the context of the Szmanda science scholarship, so my paper and presentation highlight the energy efficiency of LED lights against traditional incandescents:

Pandora Bounties

I’ve used Pandora for many years and its selection is persistently limited in some genres of music I’d like to hear more of.  So given this, why can’t it offer bounties for creation of new music?

In my experience, liking or disliking more than every fourth track quickly leads their algorithm to overfit and play the same collection of, say, 50 songs without any variety.  This leads to user fatigue and consumption elsewhere.  As each station is essentially trying to learn what subgenre you enjoy, many users may end up with essentially the same overfit station and Pandora can tell how long they persist under the repetition before leaving for another source.  These statistics give Pandora a reasonable way to set the music-creation bounties, as they can be viewed as a projection of future earnings once the music has been created.

More importantly, this idea creates what should be a central element of Pandora’s business plan, that is to better inform artists what music is in demand.  Nostalgic artists are free to cling to the idea that they only produce what moves the heart, but for practically-minded and emerging artists, this program would give them some basic criteria to shoot for.

Works submitted for each particular bounty could then be introduced into the stations of previously-frustrated users, with the bounty winner determined by likes or plays over a set trial period.  (An element of this would be notifying users that Pandora has found new music for their station, encouraging them to give it another try.)

These works needn’t be new, as the bounty could incentivize retired musicians to enroll their music in Pandora’s catalog where rightsholder discovery may have prevented their prior inclusion.  Newly-created works would benefit both the artist and Pandora, as made-for-Pandora works could be recorded with significantly less or no label involvement, allowing Pandora to give artists all of the streaming fees (they could also be treated as works-for-hire, though this might reduce participation).


Consider a Facebook/Netflix/etc. app that analyzes a user’s profile to determine how sheltered they may be relative to their peers.  This data would be presented to the user, accompanied by a breakout plan to get them up to their peer norm.  Offers from media providers (Netflix/Amazon/etc.) would encourage the user to ‘rectify’ the situation through term-limited catch-up deals/packages ($10 to ‘get’ Game of Thrones or $20 and the Fifty Shades of Gray will be on your doorstep tomorrow).  The obvious revenue elements are kickbacks from providers (essentially referral fees), demographic intuition for advertisers (what groups are not responding to our advertising campaigns), appreciation from data providers (for promoting complete taste profiles), social literacy trivia games (à la Jeopardy), and offers in an ongoing to-do list to maintain social literacy (number and frequency of entries informed by prior rectification speed/thoroughness).  The offerings should also be tailored by age/maturation as the desire to be socially-literate varies/decreases with age.

I can’t say I would use such a ‘service’, but because the data and revenue model exist it should as well.