Dem Debate 1: Sentiment analysis

Tue, Jul 2, 2019

The 2020 Democratic race finally got underway after last week’s first debates (even though it feels like the race has been going on forever). I’ve been absorbing plenty of commentary of how the two nights transpired (including a few more data-centric and analytical pieces).

_{^{Not forgetting the gif-able moments.}}

I downloaded the full debate transcripts from NBC News (posted here and here) and cleaned it up a little to play around with. You can download it here as an R data frame.

For this post, I attempted some basic sentiment analysis to visualize how the mood of the debates progressed over time and between candidates.

Moody speeches

First, I wanted to look at the mood of the speeches given. A speech is a conversational turn taken by one speaker until another speaker takes a turn.¹

The tidytext package contains several datasets that classifies or rates words in terms of their sentiment. One of these, by Bing Liu et al., classifies 6,788 terms as either “positive” or “negative”. Using this dataset, I counted the number of positive and negative terms within each speech after removing stopwords (“the”, “and”, “a”…). This yielded a sentiment difference score for each speech. If a speech contains 10 “positive” words and 4 “negative” words, this results in a sentiment score of +6.

Plotting the difference scores over time show the sentimental trends of each debate. For both nights of the debate, the speeches start off slightly more positive before becoming more negative, and end with a positive flourish. In the following graphs, I also labeled a few outliers.

During the Night 1 debate, Amy Klobuchar wrapped up with a pitch containing a +11 sentiment score. During the Night 2 debate, Kamala Harris gave both the most positive and most negative² speeches.

Moodiest candidate?

Plotting the difference scores above by candidate instead reveals which candidates employed more “positive” vs. “negative” terms in their speeches.

Amy Klobuchar uses the most “positive” terms on average. Amongst the other major candidates, Elizabeth Warren and Joe Biden are the only ones with at least neutral speeches. Every other major candidate (and most candidates overall) use more “negative” terms on average, with Pete Buttigieg and Kamala Harris being the most extreme.

Candidate trends

Returning to the sentiment trend evident across the debates, I wondered if the candidates had different sentiment trends as the debate wore on. Computing sentiment by speech will likely result in too few observations to discern these trends. One solution is to compute the sentiment of individual words (excluding stopwords).

The tidytext package contains another sentiment dataset by Finn Årup Nielsen that rates 2,476 words on a scale from -5 to +5. The numerical range allows us to discriminate between words that possess sentiments of differing magnitudes (as opposed to simply “positive” vs. “negative”).

In the following graphs, I plot the trends for the top 8 candidates (in terms of polling averages). Most of these candidates appear to use more “positive” words at the end of the debate, which makes sense if they are trying to leave a positive impression of themselves to the audience.

A quick visual inspection reveals some qualitative differences Some candidates like Beto O’Rourke, Cory Booker, and Joe Biden have relatively low variation in their sentiment. Elizabeth Warren and Amy Klobuchar demonstrate more complex trends – they start out pretty negative and fluctuate before a positive finish.

Beyond sentiment

While a sentiment analysis of words used in the debates reveal trends and distinctions between candidates, there are limitations to focusing on sentiment alone. The sentiment of words can be influenced by more complex contextual factors, and the intended meaning of an utterance may not be obvious from the words being used. I’ll come back to this dataset, hopefully before the next Democratic debate.

A speech could end because a speaker yields the floor, but could also end due to an interruption (in which case, the interruption counts as the next speech). This distinction is unimportant for the present analyses.↩
I thought Harris’ most negative speech would be when she attacked Joe Biden, but it was actually a speech during which she managed to cover both the climate crisis and the threat Donald Trump poses to national security because of his relationships with Putin and Kim Jong-un.↩