Little Big Data: Top Reddit Posts

  • Top 2.5 Million Posts on RedditThe concept of Reddit as the frontpage of the internet is intriguing–and generally produces images of memes or cats. But where do the interests actually tend to, and can it provide any useful insight?
  • Chris Dary’s DatasetBroken down into separate CSV files by subreddit:
    https://github.com/umbrae/reddit-top-2.5-million
  • Gephi, maybe ProcessingI’ve got some experience with Processing, so it would be a shame to not put it to use. But I’d like to play with the data in Gephi first and see if I’ll need more capability or not. I’d also like to take a stab at R—but maybe now is not the time?

Some other datasets:
– Huge movie database, 2,300+ movies (https://docs.google.com/file/d/0ByF5keQa-4J1ZkxCYlVPcTVxc2c/edit)
– Full transcription of public utterances/recordings by Mark Zuckerberg (http://zuckerbergfiles.org)