Scientists make theories, and engineers make devices. Computer scientists make algorithms, which are both theories and devices.
In farming, we plant the seeds, make sure they have enough water and nutrients, and reap the grown crops. Why can’t technology be more like this? It can, and that’s the promise of machine learning. Learning algorithms are the seeds, data is the soil, and the learned programs are the grown plants. The machine-learning expert is like a farmer, sowing the seeds, irrigating and fertilizing the soil, and keeping an eye on the health of the crop but otherwise staying out of the way.
A new type of network effect takes hold: whoever has the most customers accumulates the most data, learns the best models, wins the most new customers, and so on in a virtuous circle
Machine learning comes to the rescue, scouring the literature for relevant information, translating one area’s jargon into another’s, and even making connections that scientists weren’t aware of.
Here, then, is the central hypothesis of this book: All knowledge—past, present, and future—can be derived from data by a single, universal learning algorithm. I call this learner the Master Algorithm.
Bayes’ theorem, as the formula is known, tells you how to update your beliefs whenever you see new evidence. A Bayesian learner starts with a set of hypotheses about the world. When it sees a new piece of data, the hypotheses that are compatible with it become more likely, and the hypotheses that aren’t become less likely (or even impossible). After seeing enough data, a single hypothesis dominates, or a few do.
Being aware of this is the first step to a happy life in the twenty-first century. Teach the learners, and they will serve you; but first you need to understand them. What in my job can be done by a learning algorithm, what can’t, and—most important—how can I take advantage of machine learning to do it better? The computer is your tool, not your adversary. Armed with machine learning, a manager becomes a supermanager, a scientist a superscientist, an engineer a superengineer. The future belongs to those who understand at a very deep level how to combine their unique expertise with what algorithms do best.
Control of data and ownership of the models learned from it is what many of the twenty-first century’s battles will be about—between governments, corporations, unions, and individuals.
In his Pensées, published in 1669, Pascal said we should believe in the Christian God because if he exists that gains us eternal life, and if he doesn’t we lose very little. This was a remarkably sophisticated argument for the time, but as Diderot pointed out, an imam could make the same argument for believing in Allah. And if you pick the wrong God, the price you pay is eternal hell. On balance, considering the wide variety of possible Gods, you’re no better off picking a particular one to believe in than you are picking any other. For every God that says “do this,” there’s another that says “no, do that.” You may as well just forget about God and enjoy life without religious constraints.
A clock that’s always an hour late has high bias but low variance. If instead the clock alternates erratically between fast and slow but on average tells the right time, it has high variance but low bias.
In 1994, a team of researchers from the University of Minnesota and MIT built a recommendation system based on what they called “a deceptively simple idea”: people who agreed in the past are likely to agree again in the future. That notion led directly to the collaborative filtering systems that all self-respecting e-commerce sites have.
You don’t need explicit ratings to do collaborative filtering, by the way. If Ken ordered a movie on Netflix, that means he expects to like it. So the “ratings” can just be ordered/not ordered, and two users are similar if they’ve ordered a lot of the same movies. Even just clicking on something implicitly shows interest in it. Nearest-neighbor works with all of the above. These days all kinds of algorithms are used to recommend items to users, but weighted k-nearest-neighbor was the first widely used one, and it’s still hard to beat.
teaching the computer about you. The more you teach it, the better it can serve you—or manipulate you. Life is a game between you and the learners that surround you. You can refuse to play, but then you’ll have to live a twentieth-century life in the twenty-first. Or you can play to win. What model of you do you want the computer to have? And what data can you give it that will produce that model? Those two questions should always be in the back of your mind whenever you interact with a learning algorithm—as they are when you interact with other people.
your model will go on millions of dates so you don’t have to, and come Saturday, you’ll meet your top prospects at an OkCupid-organized party, knowing that you’re also one of their top prospects—and knowing, of course, that their other top prospects are also in the room. It’s sure to be an interesting night.
In the world of the Master Algorithm, “my people will call your people” becomes “my program will call your program.” Everyone has an entourage of bots, smoothing his or her way through the world. Deals get pitched, terms negotiated, arrangements made, all before you lift a finger.
between you and them there needs to be an honest data broker that guarantees your data won’t be misused, but also that no free riders share the benefits without sharing the data.
The twentieth century needed labor unions to balance the power of workers and bosses. The twenty-first needs data unions for a similar reason. Corporations have a vastly greater ability to gather and use data than individuals. This leads to an asymmetry in power, and the more valuable the data—the better and more useful the models that can be learned from it—the greater the asymmetry. A data union lets its members bargain on equal terms with companies about the use of their data. Perhaps labor unions can get the ball rolling, and shore up their membership, by starting data unions for their members. But labor unions are organized by occupation and location; data unions can be more flexible. Join up with people you have a lot in common with; the models learned will be more useful to you that way.
The European Union’s Court of Justice has decreed that people have the right to be forgotten, but they also have the right to remember, whether it’s with their neurons or a hard disk. So do companies, and up to a point, the interests of users, data gatherers, and advertisers are aligned. Wasted attention benefits no one, and better data makes better products. Privacy is not a zero-sum game, even though it’s often treated like one.
Today, most people are unaware of both how much data about them is being gathered and what the potential costs and benefits are. Companies seem content to continue doing it under the radar, terrified of a blowup. But sooner or later a blowup will happen, and in the ensuing fracas, draconian laws will be passed that in the end will serve no one. Better to foster awareness now and let everyone make their individual choices about what to share, what not, and how and where.
(Hold on to your vote—it may be the most valuable thing you have.) When the unemployment rate rises above 50 percent, or even before, attitudes about redistribution will radically change. The newly unemployed majority will vote for generous lifetime unemployment benefits and the sky-high taxes needed to fund them.
Eventually, we’ll start talking about the employment rate instead of the unemployment one and reducing it will be seen as a sign of progress. (“The US is falling behind. Our employment rate is still 23 percent.”) Unemployment benefits will be replaced by a basic income for everyone. Those of US who aren’t satisfied with it will be able to earn more, stupendoUSly more, in the few remaining human occupations. Liberals and conservatives will still fight about the tax rate, but the goalposts will have permanently moved. With the total value of labor greatly reduced, the wealthiest nations will be those with the highest ratio of natural resources to population. (Move to Canada now.) For those of US not working, life will not be meaningless, any more than life on a tropical island where nature’s bounty meets all needs is meaningless. A gift economy will develop, of which the open-source software movement is a preview. People will seek meaning in human relationships, self-actualization, and spirituality, much as they do now. The need to earn a living will be a distant memory, another piece of humanity’s barbaric past that we rose above.
People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.