Little Big Data Assignment

Create a poster that features a visualization of at least 1000 separate pieces of data.

Working with large data sets by hand is time consuming, tedious, and error prone. Using data processing applications and custom code can make working with this kind of data much easier. In addition, these tools afford a level of experimentation and exploration that working by hand does not. Start this assignment by sourcing, collecting, or generating an interesting and large data set. Create a rich visualization of this data that separately represents at least 1000 pieces of data. This is enough data that working manually is impractical, but small enough to be easily handled by (even very inefficient) computer programs.

Proposal Due February 11
Proposal Due February 18
WIP Review February 25
Final Critique March 11

Choosing a Data Set

The data you work with is up to you. By today’s standards you don’t need that much, but you do need at least 1000 separate pieces of data. A list of U.S. state populations counts as 50 pieces of data, even though they add up to about 314 million people. The annual state populations for the past 50 years would be more than enough.

You can use existing data provided by a third party. For example the U.S. government provides a lot of data. The FBI knows how many murders involve arson, spouses, and/or bare hands(XLS). The USDA can tell you if baked fish have more calories than Lucky Charms(XML). Also, be sure to look through the research assignment posts for more sources. Better yet, start with what you want to make work about and find out what related data is available.

Alternatively, you can collect or generate your own data. Create a Google survey and get 1000 people to fill it out. Or get 100 people to tell you 10 things. Or get one person to tell you 1000 things. Use a Fitbit, Nike FuelBand, or a smart-phone App to track your daily activity. Use the server logs from your website. Use meta-data from your photo collection. Make a program that spiders a website. Count the frequency of names in the Bible or GOT by chapter. Count something. Measure something.

Visualizing the Data

What about your data is interesting? What do you want to show? Why? First figure out the intent of your communication, then design a visualization in service of that intent. The type of visualization is open. You may use any type of chart, graph, or map. You can stick to conventional types of displays or design your own.

The choice of tools is open, consider the following.

  • The nature and format of your data
  • The type of visualization you want to create
  • Integration with Illustrator or other software for creating the final poster
  • Your experience and skill
  • Your learning objectives

I will demo using the Processing language to generate an editable vector visualization from tabular data. D3 also has tools for reading common data formats and exporting editable visualizations.

Designing the Poster

This is a poster, not just a printed data visualization. “Poster” is a pretty broad term, but your work should stand on its own, expressing a complete thought. Your visualization should be the most important element on the poster, but it does not have to stand alone. You may choose to support your visualization with additional graphics or text.

Similarly the code you use to build your visualization doesn’t have to do everything. Rather, consider it as part of your work flow.

Constraints

  • Your final product will be a printed poster (Around A1 Size).
    It’s printed, so animation, interactivity and depth are out. The large, printed format offers much more resolution than your screen. Take advantage of that.

  • You need to separately represent at least 1000 pieces of data.
    If you show 1000 peoples birth months in a pie chart, you are showing 12 pieces of data. If you plot their births on a time-line, you are showing 1000.

Proposal

Your proposal is due next week. The effort you put in over the next week in planning your project will be critical to the success of your project. You will need to research sources of data, programming languages, and other tools. Your real assignment for this week is to develop an achievable plan to create a project you are excited about; the proposal is only a summary of this effort.

By next week, post your proposal to the blog. Include:

  • The subject of your poster.

    Whether your poster promotes, persuades, or informs, it will be about something. Start by telling me the purpose of your project.

  • The data you will use.

    If you will be using an existing data source, include a link. Be sure to review the data to make sure it includes the information you need, and that it is formated in a way that you can use. If you are going to collect or generate your data, include a brief outline of the tools and methods you will use to do so.

  • The tools you will use to create the visualization.

    Include a list of the primary tools you will use. This includes on-line services, applications, and programming languages and libraries. For each tool please indicate your level of proficiency. Be sure that you will have access to the tools you need.

Example Code

I’m hosting in class demo code for this class on Github