Indian PM Narendra Modi interacts with the public through an hour or so long radio programme called Mann ki Baat, translated to Inner Thoughts, or Heart's Talk to be literal.
The first case in India was reported on 30th January 2020, and until the next Mann ki Baat episode on 23rd February 2020, India had only 3 coronavirus cases so there was no real need to focus on this. However from that point onward, the number of cases started growing rapidly in India and by the next episode, India had almost 1000 cases warranting an action from the government.
This code cleans the data and prepares the word cloud for the programme to see what topics were most talked about as we progressed into the year. The code removes special characters, numbers, extra spaces, and most common and not-so-relevant words from the speech, turns it into lower case, and proceeds to add the text into the text area a few lines at a time to create a sense changing topics being talked about as the programme progresses.
To show different perspectives, there are two word clouds. One combines the text for all programmes since March and creates an animtion of word cloud. Another one builds the same word cloud for individual episodes so we can see what the trend was on the whole as well as over time as the cases increased.
Watch the videos here:
- Cumulative Word frequency cloud of Mann ki Baat since March 2020
- Word frequency cloud of Mann ki Baat since March 2020
|____raw/ Contains text taken verbatim from pmindia.gov.in
|____clean_1/ Text with hindi words, headings from dialgues, etc removed manually
|____clean_2/ Text with special characters, numbers, repeat characters etc removed
|____clean_3/ Text with small words, along with some other, removed
|____blocklist.txt List of words not small but insignificant to get the topic
|____safelist.txt List of important words (contextual, nounds etc) to keep
|____text_cleaner.js JS file that does the actual cleaning from raw/ folder until clean_3/
|____app.js Sample server to create a webpage where the words can be animated in word cloud
|____combine_divide_animate.js takes text of all speeches and animates them over time
|____divide_animate_single.js Open the next file and start creating its word cloud
- Speeches
- Text Cleaning idea taken from Approsto's text cleaner
- Word cloud done with Jason Davies' wordcloud
- Coronavirus data from owid/covid-19-data, some missing data was filled from Statista's website
- Coronavirus data plotted and animated on Flourish Studio
- Reddit post that motivated/guided me to the tools used.
I've tried to be as unbiased as possible, but because I'm cleaning the data, choosing the words to add or remove manually, there's likely a bias in there. Please feel free to open a pull request to improve this tool in any way.