Finding an interesting data set and a story it tells can be the most difficult part of producing an infographic or data visualization.
Data visualization is the end artifact, but it involves multiple steps – finding reliable data, getting the data in the right format, cleaning it up (an often underestimated step in the amount of time it takes!) and then finding the story you will eventually visualize.
Following is a list useful resources for finding data. Your needs will vary from one project to another, but this list is a great place to start — and bookmark.
1. Government and political data
Data.gov: This is the go-to resource for government-related data. It claims to have up to 400,000 data sets, both raw data and geo spatial, in a variety of formats.
The only caveat in using the data sets is you have to make sure you clean them, since many have missing values and characters.
Socrata is another good place to explore government-related data. One great thing about Socrata is they have some visualization tools that make exploring the data easier.
City-specific government data: Some cities have their own data portals setup to browse through city-related data. For example, at San Francisco Data you can browse through everything from crime statistics to parking spot available in the city.
The Census Bureau houses a ton of information about our lives around income, race, education, population and business.
2. Data aggregators
These are the places that house data from all kinds of sources. Sometimes it’s easier to find something here related to a specific category.
Programmable Web: A really useful resource to explore API’s and also mashups of different API’s.
Infochimps have a data marketplace that offers thousands of public and propietary data sets for download and API access, in a wide range of categories, from historical Twitter and OK Cupid data, to geo locations data, in different formats. You can even upload you own data if you like.
Data Market is a good place to explore data related to economics, healthcare, food and agriculture, and the automotive industry.
Google Public data explorer houses a lot of data from world development indicators, OECD and human development indicators, mostly related to economics data and the world.
Junar is a great data scraping service that also houses data feeds.
Buzzdata is a social data sharing service that allows you to upload your own data and connect and follow others who are uploading their own data.
3. Social data
Usually, the best place to get social data for an API is the site itself: Instagram, GetGlue, Foursquare, pretty much all social media sites have their own API’s. Here are more details on the most popular ones.
Twitter: Access to the Twitter API for historical uses is fairly limited, to 3200 tweets. For more, check out PeopleBrowsr, Gnip (also offers historical access to the WP Automattic data feed), DataSift, Infochimps, Topsy.
Foursquare: They have their own API and you can get it through Infochimps, as well.
Facebook: The Facebook graph API is the best resource for Facebook.
Face.com: A great tool for facial recognition data.
4. Weather data
Wunderground has detailed weather information and also let’s you search historical data by zip code or city. It gives temperature, wind, precipitation and hourly observations for that day.
Weatherbase has detailed weather stats on temperature, rain and humidity of nearly 27,000 cities.
5. Sports data
These three sites have comprehensive information on teams, players coaches and leaders by season.
ESPN recently came up with its own API, too. You have to be a partner to get access to their data.
6. Universities and research
Searching the work of academics who specialize in a particular area is always a great place to find some interesting data.
If you come across specific data that you would like to use, say, in a research paper, the best way to go is to contact the professor directly. (That is how we got the data for our What are the Odds piece, which is one of the most-viewed infographics on the web.)
One university that makes some of the datasets used in its courses publicly available is UCLA.
7. News data
The New York Times has a great API and a really good explorer to access any article in the publication. The data is returned in json format.
The Guardian Data Blog regularly posts visualizations and makes data available through a Google docs format. The great thing about this is that that the data has already been cleaned.