WSJ Sentiment, Reader Engagement, and SP500 Moves

WSJ Sentiment, Reader Engagement, and SP500 Moves

Link to R Shiny App | Link to Github repo

Introduction

Online reader engagement has become an increasing focus of news publications seeking to foster increased reader loyalty in order to bolster web advertising revenues (Masullo Chen, Ng, Riedl, & Chen, 2020; Farhi, 2007). As the share of U.S. newspaper advertising revenue generated digitally grew from ~17% in 2011 to nearly 35% in 2018 (Pew Research Center, 2018), how newspapers engage with the ~70 million U.S. adults who prefer to get their daily news online has become central (Geiger, 2019). Power users—those who visit a site ≥10 times per month and spend >1 hour—are especially prized (Olmstead & Rosenstiel, 2011). Commenting behavior is positively correlated with repeat visits and time on site, making comments and inter-user discussion increasingly valuable (Ziegele, Weber, Quiring, & Breiner, 2018).

Opinion pieces and emotionally charged articles can drive engagement by confirming readers’ predispositions (Bakir & McStay, 2018), leading to more comments/shares and higher ad revenue. To investigate relationships between emotionality, subjectivity, polarity, and engagement, I scraped Wall Street Journal (WSJ) articles and asked:

  • Question 1: Can a statistically significant relationship be demonstrated between a WSJ article’s subjectivity/objectivity and positivity/negativity (as defined by Python sentiment libraries) and the number of reader comments?
  • Question 2: Given WSJ’s financial readership and prior literature linking media tone/coverage to markets (e.g., Aman, 2013), is there a significant relationship between WSJ sentiment on day t and S&P 500 moves on day t + n where 0 ≤ n ≤ 1?

Data and Web Scraping

To answer Question 1, I scraped 22,772 full-text WSJ articles published between Jan‑2019 and Jul‑2020 from the WSJ news archives. For each article I captured: article text, headline, sub-headline, date, author, number of comments, and section (rubric). Below is a descriptive table from the R Shiny App showing variables scraped, followed by a sample of where this text appears on a WSJ page:

Variables table and article DOM locations

I used Selenium because (i) WSJ requires login and Selenium can fill username/password by element ID, and (ii) explicit waits ensure all elements load before scraping (via WebDriverWait). The login flow is shown below:

Selenium login flow

The app’s EDA tab includes a word cloud of common keywords (slider to control count). Unsurprisingly in an election year, many high-frequency words are political: democrats, political, Trump, Biden, Schumer. More general terms common to national/global coverage—public, rule, federal, law, policy—also appear. A bar chart ranks WSJ sections by average comments per article: Politics and Opinion lead, with Politics more than double the third-ranked section U.S.

Section engagement summary

Data Cleaning and Preprocessing

For the S&P regression, paragraph texts for the same date were concatenated into one cell per day (groupby + join). This avoided dropping empty-paragraph rows and sped up processing. The resulting 232 unique days were inner-joined with daily SPX prices/volumes from Yahoo Finance. Because markets close on weekends/holidays, the merged set contains 158 unique trading days—consistent with the calendar.

Daily aggregation and merge

Python Sentiment Analysis

Two common frameworks were used:

VADER variables

  • Negative — float in [0,1] (negativity score)
  • Neutral — float in [0,1] (neutrality score)
  • Positive — float in [0,1] (positivity score)
  • Compound — normalized aggregate of negative/neutral/positive

TextBlob variables

  • Polarity — float in [-1,1] (1 = positive, -1 = negative)
  • Subjectivity — float in [0,1] (1 = subjective, 0 = objective)

Visualizations and Data Manipulation

TextBlob. At first glance, article polarity shows little relationship with comment counts; polarity is tightly distributed around the mean. Subjectivity similarly explains little variation, though it’s more widely spread with more outliers.

Polarity & subjectivity vs comments

Many low‑comment articles flatten relationships: 2,497 of 12,329 had ≤5 comments. Distributions are left‑skewed (box/bar plots below). I filtered to >5 comments and trimmed high-end outliers using IQR: Q3 (165) + 1.5 × 147 = 386 comments. The filtered set has 8,578 articles (5–386 comments).

Comment distribution boxplot

Comment distribution bars

On the filtered set, I repeated visual regressions (seaborn) using paragraph and headline text scores. Results still look largely flat between comments and TextBlob polarity/subjectivity.

Paragraph polarity vs comments

Paragraph subjectivity vs comments

Headline polarity vs comments

Headline subjectivity vs comments

VADER. Overall, positivity vs comments is flat on the full dataset.

A surprisingly strong positive relationship appeared between negativity and comments in the full set:

VADER scores vs comments (full)

After filtering, positivity remains flat and the negativity link weakens—suggesting outliers drove much of it. Next I quantified with linear regressions.

VADER positivity vs comments (filtered)

VADER negativity vs comments (filtered)

Results — Simple Linear Regression Analysis

i) Number of Comments

A linear regression of comments on TextBlob polarity, TextBlob subjectivity, VADER positivity, and VADER negativity (full 12,329‑article set) yields very low adjusted R² (0.014) and a non‑significant overall F‑test (Prob(F) = 0.2045). Interestingly, VADER negativity is individually significant at the 1% level.

OLS summary (all sentiment vars)

Regressing comments on negativity alone gives a similarly low adjusted R² (0.014). Despite weak predictive power, the simple model is preferable to the larger one. One hypothesis: highly negative stories (e.g., tragedies) may prompt communal responses, boosting comments. Further analysis would be needed.

ii) Same‑day S&P 500 % Change

Four OLS models regressed same‑day SPX % change on (i) TextBlob only, (ii) VADER only, (iii) both sets, and (iv) both + SPX volume. Adj R² values are ~0.01 with high p‑values—poor models with low predictive power. Somewhat surprisingly, TextBlob polarity is significant at the 10% level, despite earlier flat relationships with comments.

Same‑day models

iii) Following‑day S&P 500 % Change

A model for next‑day SPX % change again shows low explanatory power (Adj R² 0.008, Prob(F) 0.315). VADER compound is significant at the 5% level (β ≈ 0.36), suggesting a 1‑unit rise in compound score associates with a +0.36% SPX move the next day—though the small sample cautions against strong conclusions.

Next‑day model

Thanks for reading!