Dem Debate 1: Sentiment analysis

The 2020 Democratic race finally got underway after last week’s first debates (even though it feels like the race has been going on forever). I’ve been absorbing plenty of commentary of how the two nights transpired (including a few more data-centric and analytical pieces).

Not forgetting the gif-able moments.

I downloaded the full debate transcripts from NBC News (posted here and here) and cleaned it up a little to play around with. You can download it here as an R data frame.

For this post, I attempted some basic sentiment analysis to visualize how the mood of the debates progressed over time and between candidates.

Moody speeches

First, I wanted to look at the mood of the speeches given. A speech is a conversational turn taken by one speaker until another speaker takes a turn.1

The tidytext package contains several datasets that classifies or rates words in terms of their sentiment. One of these, by Bing Liu et al., classifies 6,788 terms as either “positive” or “negative”. Using this dataset, I counted the number of positive and negative terms within each speech after removing stopwords (“the”, “and”, “a”…). This yielded a sentiment difference score for each speech. If a speech contains 10 “positive” words and 4 “negative” words, this results in a sentiment score of +6.

Plotting the difference scores over time show the sentimental trends of each debate. For both nights of the debate, the speeches start off slightly more positive before becoming more negative, and end with a positive flourish. In the following graphs, I also labeled a few outliers.

During the Night 1 debate, Amy Klobuchar wrapped up with a pitch containing a +11 sentiment score. During the Night 2 debate, Kamala Harris gave both the most positive and most negative2 speeches.

Moodiest candidate?

Plotting the difference scores above by candidate instead reveals which candidates employed more “positive” vs. “negative” terms in their speeches.

Amy Klobuchar uses the most “positive” terms on average. Amongst the other major candidates, Elizabeth Warren and Joe Biden are the only ones with at least neutral speeches. Every other major candidate (and most candidates overall) use more “negative” terms on average, with Pete Buttigieg and Kamala Harris being the most extreme.

Beyond sentiment

While a sentiment analysis of words used in the debates reveal trends and distinctions between candidates, there are limitations to focusing on sentiment alone. The sentiment of words can be influenced by more complex contextual factors, and the intended meaning of an utterance may not be obvious from the words being used. I’ll come back to this dataset, hopefully before the next Democratic debate.


  1. A speech could end because a speaker yields the floor, but could also end due to an interruption (in which case, the interruption counts as the next speech). This distinction is unimportant for the present analyses.

  2. I thought Harris’ most negative speech would be when she attacked Joe Biden, but it was actually a speech during which she managed to cover both the climate crisis and the threat Donald Trump poses to national security because of his relationships with Putin and Kim Jong-un.