iPhone 6s
ƒ/2.2
4.15 mm
1/30
50

Timnit Gebru wants to replace the U.S. Census (which costs $1B/year to implement) by simply analyzing the cars seen in Google Street View images.

After processing 22 million observed cars, she found some fascinating things, like the predictive power of “the sedan/truck ratio” for political party. Republicans sure like trucks! More findings in the comments below.

From the AI in Fintech Forum today at Stanford ICME.

6 responses to “Visual Computational Sociology”

  1. Car make and age gives a good indication of income levels: And she has a lot of granularity. Here is a ranking of analyzed cities by income segregation and the detail view of the worst one, Chicago. Red regions have expensive cars; yellow is cheap; green is not statistically significant: She did not skip a beat showing the ranked list of auto correlation with black neighborhoods… #1 the Cadillac countAnd how about the crime rate? Vehicle density… and those suspicious vans

  2. Another street-view vector: number and type of trees on street Vs telephone poles and wires. No trees, all poles and wires = poor and economically depressed. All old/mature trees, no visible poles and wires = guess?

  3. Yes! So much more they could do.

    P.S. As we ponder how much your car says about you, I was reminded of this ad network analysis of Tesla and Prius fans from 2014 What Tesla fans find interesting — SpaceX, Victoria’s Secret & New Jersey!

  4. And now her work has been published. A summary in the Economist: Neighbourhood watch
    A machine-learning census of America’s cities

    Millions of images of public streets offer a cheap, sweeping view of America’s demography

    Mar 2nd 2017
    “WOULD it not be of great satisfaction to the king to know, at a designated moment every year, the number of his subjects?” A military engineer by the name of Sébastien le Prestre de Vauban posed this question to Louis XIV in 1686, pitching him the idea of a census. All France’s resources, the wealth and poverty of its towns and the disposition of its nobles would be counted, so that the king could control them better.

    These days, such surveys are common. But they involve a lot of shoe-leather, and that makes them expensive. America, for instance, spends hundreds of millions of dollars every year on a socioeconomic investigation called the American Community Survey; the results can take half a decade to become available. Now, though, a team of researchers, led by Timnit Gebru of Stanford University in California, have come up with a cheaper, quicker method. Using powerful computers, machine-learning algorithms and mountains of data collected by Google, the team carried out a crude, probabilistic census of America’s cities in just two weeks.

    First, the researchers trained their machine-learning model to recognise the make, model and year of many different types of cars. To do that they used a labelled data set, downloaded from automotive websites like Edmunds and Cars dot com. Once the algorithm had learned to identify cars, it was turned loose on 50m images from 200 cities around America, all collected by Google’s Streetview vehicles, which provide imagery for the firm’s mapping applications. Streetview has photographed most of the public streets in America, and in among them the researchers spotted 22m different cars—around 8% of the number on America’s roads.

    The computer classified those cars into one of 2,657 categories it had learned from studying the Edmunds and Cars dot com data. The researchers then took data from the traditional census, and split them in half. One half was fed to the machine-learning algorithm, so it could hunt for correlations between the cars it saw on the roads in those neighbourhoods and such things as income levels, race and voting intentions. Once that was done, the algorithm was tested on the other half of the census data, to see if these correlations held true for neighbourhoods it had never seen before. They did. The sorts of cars you see in an area, in other words, turn out to be a reliable proxy for all sorts of other things, from education levels to political leanings. Seeing more sedans than pickup trucks, for instance, strongly suggests that a neighbourhood tends to vote for the Democrats.

    The system has limitations: unlike a census, it generates predictions, not facts, and the more fine-grained those predictions are the less certain they become. The researchers reckon their system is accurate to the level of a precinct, an American political division that contains about 1,000 people. And because those predictions rely on the specific, accurate data generated by traditional surveys, it seems unlikely ever to replace them.

    On the other hand, it is much cheaper and much faster. Dr Gebru’s system ran on a couple of hundred processors, a modest amount of hardware by the standards of artificial-intelligence research. It nevertheless managed to crunch through its 50m images in two weeks. A human, even one who could classify all the cars in an image in just ten seconds, would take 15 years to do the same.

    The other advantage of the AI approach is that it can be re-run whenever new data become available. As Dr Gebru points out, Streetview is not the only source of information out there. Self-driving cars, assuming they catch on, will use cameras, radar and the like to keep track of their surroundings. They should, therefore, produce even bigger data sets. (Vehicles made by Tesla, an electric-car firm, are capturing such information even now.) Other kinds of data, such as those from Earth-imaging satellites, which Google also uses to refresh its maps, could be fed into the models, too. De Vauban’s “designated moment” could soon become a constantly updated one."

  5. your picture is now on wikipedia on timnits page – nice license

  6. and in the forthcoming cover story of WIRED: "Steve Jurvetson, a friend of Elon Musk and an early investor in Tesla, enthusiastically posted photos of her slides to Facebook. A longtime AI aficionado, he wasn’t surprised that machine-learning algorithms could identify specific cars. But the way Gebru had extracted signals about society from photos illustrated how the technology could spin gold from unexpected sources—at least for those with plenty of data to mine. “It was, ‘My God, think of all the data that Google has,’” Jurvetson says. “It made me realize the power of having the biggest data set.”

Leave a Reply

Your email address will not be published. Required fields are marked *