Exploring what speakers say at TED conferences with Big Data tools
I am sure that you know what TED is. If you do not know, TED is a nonprofit devoted to spreading ideas, usually in the form of short, powerful talks (18 minutes or less). TED began in 1984 as a conference where Technology, Entertainment and Design converged, and today covers almost all topics — from science to business to global issues — in more than 100 languages.
Consider simple wordcount example in the big data world that is an easiest step you can take to learn more about how mapreduce works. But you can make it powerful with any other data. So, here is the story.
My friend(Jamal Maktoubian) and I think that how can we know what are speakers talking about and we decided to answer our question with big data tools as we are working with them. We fetched videos of recent 6 months(93 videos) and save all their transcripts in a text file. Here we have the input file for our wordcount program. Then, we launched our program in a hadoop cluster with two nodes. In the next step, we have filtered data with data of interest and we made the graph that you can see below
The graph shows that TED speakers talk more about people, love and women and less about idea. However, it is not hundred percent accurate, but this is a frequency of words. We can add more insights to our use case. If you are interested to develop it, simply send a comment. Codes are available here.