{"cells":[{"cell_type":"markdown","metadata":{},"source":["## Sentiment Analysis using LDA\n","\n","1. Data Collection: We will start by collecting the top 20 news summaries for each company in the Dow Jones Industrial Average using the Yahoo Finance API.\n","\n","2. Initial Sentiment Analysis: Perform a basic sentiment analysis on these summaries to get an initial sentiment score for each company.\n","\n","3. Topic Modeling: Use Latent Dirichlet Allocation (LDA) to identify five key topics that these news summaries were talking about.\n","\n","4. Topic-Specific Sentiment Analysis: Calculate the average sentiment for news summaries belonging to each of these topics.\n","\n","5. Weighted Sentiment Analysis: Use these topic-specific sentiment scores to recalculate a weighted sentiment score for each company.\n","\n","6. Comparison: Compare the original and new weighted sentiment scores to evaluate the difference."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install -q yahoo_fin pandas_datareader gensim textblob"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"text/plain":["[nltk_data] Downloading package stopwords to\n","[nltk_data] /home/hexuser/nltk_data...\n","[nltk_data] Unzipping corpora/stopwords.zip.\n","[nltk_data] Downloading package punkt to /home/hexuser/nltk_data...\n","[nltk_data] Unzipping tokenizers/punkt.zip.\n"]},"execution_count":null,"metadata":{},"output_type":"execute_result"},{"data":{"text/plain":["True"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["import nltk\n","nltk.download('stopwords')\n","nltk.download('punkt')"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import requests\n","import pandas as pd\n","from yahoo_fin import stock_info as info\n","from yahoo_fin import news\n","from pandas_datareader import DataReader\n","import numpy as np\n","import warnings\n","warnings.filterwarnings('ignore')\n","\n","from gensim import corpora, models\n","from nltk.corpus import stopwords\n","from nltk.tokenize import word_tokenize\n","import string\n"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"text/plain":["['AAPL',\n"," 'AMGN',\n"," 'AMZN',\n"," 'AXP',\n"," 'BA',\n"," 'CAT',\n"," 'CRM',\n"," 'CSCO',\n"," 'CVX',\n"," 'DIS',\n"," 'DOW',\n"," 'GS',\n"," 'HD',\n"," 'HON',\n"," 'IBM',\n"," 'INTC',\n"," 'JNJ',\n"," 'JPM',\n"," 'KO',\n"," 'MCD',\n"," 'MMM',\n"," 'MRK',\n"," 'MSFT',\n"," 'NKE',\n"," 'PG',\n"," 'TRV',\n"," 'UNH',\n"," 'V',\n"," 'VZ',\n"," 'WMT']"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Get the list of tickers that comprise the Dow Jones Industrial Average\n","tickers = info.tickers_dow()\n","tickers"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/de3ed2a9-623b-4a20-a897-fce77b20c7f5\"}","text/html":["
\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Summaries | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," [Magnificent Seven stocks, including AI leader... | \n","
\n"," \n"," 1 | \n"," AMGN | \n"," [Amgen's shares have come under pressure this ... | \n","
\n"," \n"," 2 | \n"," AMZN | \n"," [Amazon.com said on Wednesday it plans to push... | \n","
\n"," \n"," 3 | \n"," AXP | \n"," [The pair both declared substantial improvemen... | \n","
\n"," \n"," 4 | \n"," BA | \n"," [Boeing’s global fleet of 787 Dreamliner jets ... | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Initialize an empty DataFrame to store the summaries\n","dow_news_df = pd.DataFrame(columns=['Ticker', 'Summaries'])\n","# Iterate through the list of Dow tickers and fetch news summaries\n","for ticker in tickers:\n"," ticker_news = news.get_yf_rss(ticker)\n"," summaries = [article['summary'] for article in ticker_news]\n"," dow_news_df = dow_news_df.append({'Ticker': ticker, 'Summaries': summaries}, ignore_index=True)\n","dow_news_df.head()"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/2420f88f-a6f1-40cd-8392-d445b2bd5720\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Summaries | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," [Magnificent Seven stocks, including AI leader... | \n","
\n"," \n"," 1 | \n"," AMGN | \n"," [Amgen's shares have come under pressure this ... | \n","
\n"," \n"," 2 | \n"," AMZN | \n"," [Amazon.com said on Wednesday it plans to push... | \n","
\n"," \n"," 3 | \n"," AXP | \n"," [The pair both declared substantial improvemen... | \n","
\n"," \n"," 4 | \n"," BA | \n"," [Boeing’s global fleet of 787 Dreamliner jets ... | \n","
\n"," \n"," 5 | \n"," CAT | \n"," [The bull and bear debate over the cyclical st... | \n","
\n"," \n"," 6 | \n"," CRM | \n"," [Key Insights Institutions' substantial holdin... | \n","
\n"," \n"," 7 | \n"," CSCO | \n"," [Cisco Systems (CSCO) concluded the recent tra... | \n","
\n"," \n"," 8 | \n"," CVX | \n"," [(Bloomberg) -- President Joe Biden’s administ... | \n","
\n"," \n"," 9 | \n"," DIS | \n"," [Workers who help bring Disneyland’s beloved c... | \n","
\n"," \n"," 10 | \n"," DOW | \n"," [Should corporate executives’ pay be tied to c... | \n","
\n"," \n"," 11 | \n"," GS | \n"," [(Bloomberg) -- Banks have found another way t... | \n","
\n"," \n"," 12 | \n"," HD | \n"," [Home Depot (HD) has been one of the stocks mo... | \n","
\n"," \n"," 13 | \n"," HON | \n"," [Honeywell (HON) gains from solid momentum in ... | \n","
\n"," \n"," 14 | \n"," IBM | \n"," [IBM (IBM) doesn't possess the right combinati... | \n","
\n"," \n"," 15 | \n"," INTC | \n"," [Shares of the chip equipment manufacturer pul... | \n","
\n"," \n"," 16 | \n"," JNJ | \n"," [Johnson & Johnson (JNJ) continued with its lo... | \n","
\n"," \n"," 17 | \n"," JPM | \n"," [(Bloomberg) -- JPMorgan Chase & Co. Chief Exe... | \n","
\n"," \n"," 18 | \n"," KO | \n"," [Mastercard, Netflix, Coca-Cola, Berkshire Hat... | \n","
\n"," \n"," 19 | \n"," MCD | \n"," [With over 38,000 locations in more than 100 c... | \n","
\n"," \n"," 20 | \n"," MMM | \n"," [NORTHAMPTON, MA / ACCESSWIRE / April 15, 2024... | \n","
\n"," \n"," 21 | \n"," MRK | \n"," [The fact that multiple Merck & Co., Inc. ( NY... | \n","
\n"," \n"," 22 | \n"," MSFT | \n"," [Consumers face the prospect of permanently hi... | \n","
\n"," \n"," 23 | \n"," NKE | \n"," [Adidas's strong first-quarter figures suggest... | \n","
\n"," \n"," 24 | \n"," PG | \n"," [The Management Top 250 ranking, compiled by r... | \n","
\n"," \n"," 25 | \n"," TRV | \n"," [Travelers' (TRV) first-quarter results reflec... | \n","
\n"," \n"," 26 | \n"," UNH | \n"," [UnitedHealth Group (UNH) breezed past the Zac... | \n","
\n"," \n"," 27 | \n"," V | \n"," [Visa (V) has an impressive earnings surprise ... | \n","
\n"," \n"," 28 | \n"," VZ | \n"," [Looking beyond Wall Street's top -and-bottom-... | \n","
\n"," \n"," 29 | \n"," WMT | \n"," [The stock has significantly outperformed the ... | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["dow_news_df"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/64783261-bb2c-4fb3-be16-258583a23e81\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Average Sentiment | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," 0.195268 | \n","
\n"," \n"," 1 | \n"," AMGN | \n"," 0.125121 | \n","
\n"," \n"," 2 | \n"," AMZN | \n"," 0.143147 | \n","
\n"," \n"," 3 | \n"," AXP | \n"," 0.158369 | \n","
\n"," \n"," 4 | \n"," BA | \n"," 0.145588 | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["from textblob import TextBlob\n","# Function to calculate sentiment polarity\n","def calculate_sentiment(text):\n"," return TextBlob(text).sentiment.polarity\n","# Initialize an empty DataFrame to store the sentiment scores\n","dow_sentiment_df = pd.DataFrame(columns=['Ticker', 'Average Sentiment'])\n","# Iterate through the DataFrame and calculate the average sentiment for each ticker\n","for index, row in dow_news_df.iterrows():\n"," ticker = row['Ticker']\n"," summaries = row['Summaries']\n"," if summaries:\n"," avg_sentiment = np.mean([calculate_sentiment(summary) for summary in summaries])\n"," dow_sentiment_df = dow_sentiment_df.append({'Ticker': ticker, 'Average Sentiment': avg_sentiment}, ignore_index=True)\n","dow_sentiment_df.head()"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/b007097d-7793-4eef-9bd0-523f0dd3dd89\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Average Sentiment | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," 0.195268 | \n","
\n"," \n"," 1 | \n"," AMGN | \n"," 0.125121 | \n","
\n"," \n"," 2 | \n"," AMZN | \n"," 0.143147 | \n","
\n"," \n"," 3 | \n"," AXP | \n"," 0.158369 | \n","
\n"," \n"," 4 | \n"," BA | \n"," 0.145588 | \n","
\n"," \n"," 5 | \n"," CAT | \n"," 0.099819 | \n","
\n"," \n"," 6 | \n"," CRM | \n"," 0.134925 | \n","
\n"," \n"," 7 | \n"," CSCO | \n"," 0.088520 | \n","
\n"," \n"," 8 | \n"," CVX | \n"," 0.124590 | \n","
\n"," \n"," 9 | \n"," DIS | \n"," 0.169991 | \n","
\n"," \n"," 10 | \n"," DOW | \n"," 0.180742 | \n","
\n"," \n"," 11 | \n"," GS | \n"," 0.239865 | \n","
\n"," \n"," 12 | \n"," HD | \n"," 0.168376 | \n","
\n"," \n"," 13 | \n"," HON | \n"," 0.109561 | \n","
\n"," \n"," 14 | \n"," IBM | \n"," 0.148592 | \n","
\n"," \n"," 15 | \n"," INTC | \n"," 0.043373 | \n","
\n"," \n"," 16 | \n"," JNJ | \n"," 0.087794 | \n","
\n"," \n"," 17 | \n"," JPM | \n"," 0.075948 | \n","
\n"," \n"," 18 | \n"," KO | \n"," 0.215687 | \n","
\n"," \n"," 19 | \n"," MCD | \n"," 0.155715 | \n","
\n"," \n"," 20 | \n"," MMM | \n"," 0.157566 | \n","
\n"," \n"," 21 | \n"," MRK | \n"," 0.140685 | \n","
\n"," \n"," 22 | \n"," MSFT | \n"," 0.105799 | \n","
\n"," \n"," 23 | \n"," NKE | \n"," 0.073771 | \n","
\n"," \n"," 24 | \n"," PG | \n"," 0.160547 | \n","
\n"," \n"," 25 | \n"," TRV | \n"," 0.138650 | \n","
\n"," \n"," 26 | \n"," UNH | \n"," 0.114048 | \n","
\n"," \n"," 27 | \n"," V | \n"," 0.124004 | \n","
\n"," \n"," 28 | \n"," VZ | \n"," 0.145537 | \n","
\n"," \n"," 29 | \n"," WMT | \n"," 0.099878 | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["dow_sentiment_df"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":[]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/0d236868-ff6d-48fe-834b-ac2342b966fe\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Summary | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," Magnificent Seven stocks, including AI leader ... | \n","
\n"," \n"," 1 | \n"," AAPL | \n"," So much for the \"pay or okay\" model that the F... | \n","
\n"," \n"," 2 | \n"," AAPL | \n"," Apple is opening up web distribution for iOS a... | \n","
\n"," \n"," 3 | \n"," AAPL | \n"," These four stocks will be the cream of the cro... | \n","
\n"," \n"," 4 | \n"," AAPL | \n"," Apple has fixed a bug that suggested the Pales... | \n","
\n"," \n"," 5 | \n"," AAPL | \n"," Apple CEO Tim Cook says ‘the investment abilit... | \n","
\n"," \n"," 6 | \n"," AAPL | \n"," These are stocks you should always consider bu... | \n","
\n"," \n"," 7 | \n"," AAPL | \n"," Amazon, Apple initiated: Wall Street's top ana... | \n","
\n"," \n"," 8 | \n"," AAPL | \n"," The tech giant is no longer the world's top sm... | \n","
\n"," \n"," 9 | \n"," AAPL | \n"," These companies are at earlier stages in their... | \n","
\n"," \n"," 10 | \n"," AAPL | \n"," Which of these tech titans is the better buy r... | \n","
\n"," \n"," 11 | \n"," AAPL | \n"," After Vietnam, Cook then flew further south to... | \n","
\n"," \n"," 12 | \n"," AAPL | \n"," Apple will consider making some of its product... | \n","
\n"," \n"," 13 | \n"," AAPL | \n"," The iShares Expanded Tech Sector ETF is outper... | \n","
\n"," \n"," 14 | \n"," AAPL | \n"," In just a few days, India will commence the wo... | \n","
\n"," \n"," 15 | \n"," AAPL | \n"," The legendary investor has made tens of billio... | \n","
\n"," \n"," 16 | \n"," AAPL | \n"," Not all AI stocks trade at stratospheric valua... | \n","
\n"," \n"," 17 | \n"," AAPL | \n"," This ETF contains some of the world's most imp... | \n","
\n"," \n"," 18 | \n"," AAPL | \n"," (Bloomberg) -- Apple Inc. is weighing the poss... | \n","
\n"," \n"," 19 | \n"," AAPL | \n"," Apple CEO Tim Cook said the company will “look... | \n","
\n"," \n"," 20 | \n"," AMGN | \n"," Amgen's shares have come under pressure this y... | \n","
\n"," \n"," 21 | \n"," AMGN | \n"," Not every business in operation today is built... | \n","
\n"," \n"," 22 | \n"," AMGN | \n"," Amgen (NASDAQ:AMGN) today provided an update r... | \n","
\n"," \n"," 23 | \n"," AMGN | \n"," Zacks.com users have recently been watching Am... | \n","
\n"," \n"," 24 | \n"," AMGN | \n"," In the most recent trading session, Amgen (AMG... | \n","
\n"," \n"," 25 | \n"," AMGN | \n"," Amgen's stock isn't expensive, but the busines... | \n","
\n"," \n"," 26 | \n"," AMGN | \n"," GLP-1 medications will likely expand beyond th... | \n","
\n"," \n"," 27 | \n"," AMGN | \n"," In this article, we discuss 11 best biotech ET... | \n","
\n"," \n"," 28 | \n"," AMGN | \n"," It’s been a volatile start to the second quart... | \n","
\n"," \n"," 29 | \n"," AMGN | \n"," Looking ahead, the future of the U.S. economy ... | \n","
\n"," \n"," 30 | \n"," AMGN | \n"," Joseph Artuso of Easterly Investment Partners ... | \n","
\n"," \n"," 31 | \n"," AMGN | \n"," In this piece, we will take a look at the ten ... | \n","
\n"," \n"," 32 | \n"," AMGN | \n"," In this article, we discuss 13 best cheap divi... | \n","
\n"," \n"," 33 | \n"," AMGN | \n"," These drugmakers have made important moves ove... | \n","
\n"," \n"," 34 | \n"," AMGN | \n"," Amgen (AMGN) concluded the recent trading sess... | \n","
\n"," \n"," 35 | \n"," AMGN | \n"," Use the recent short-term weakness from these ... | \n","
\n"," \n"," 36 | \n"," AMGN | \n"," A Phase 3 study will compare Merck’s experimen... | \n","
\n"," \n"," 37 | \n"," AMGN | \n"," Amgen (AMGN) expects strong sales growth of pr... | \n","
\n"," \n"," 38 | \n"," AMGN | \n"," Amgen (AMGN) closed the most recent trading da... | \n","
\n"," \n"," 39 | \n"," AMGN | \n"," Amgen (AMGN) has an impressive earnings surpri... | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Initialize an empty DataFrame to store the top 20 summaries for each ticker\n","dow_top20_summaries_df = pd.DataFrame(columns=['Ticker', 'Summary'])\n","# Iterate through the list of Dow tickers and fetch the top 20 news summaries\n","for ticker in tickers:\n"," ticker_news = news.get_yf_rss(ticker)[:20]\n"," for article in ticker_news:\n"," summary = article['summary']\n"," dow_top20_summaries_df = dow_top20_summaries_df.append({'Ticker': ticker, 'Summary': summary}, ignore_index=True)\n","dow_top20_summaries_df.head(40)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/200645fa-f332-4423-96e3-6c89de913ed6\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Summary | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," Magnificent Seven stocks, including AI leader ... | \n","
\n"," \n"," 1 | \n"," AAPL | \n"," So much for the \"pay or okay\" model that the F... | \n","
\n"," \n"," 2 | \n"," AAPL | \n"," Apple is opening up web distribution for iOS a... | \n","
\n"," \n"," 3 | \n"," AAPL | \n"," These four stocks will be the cream of the cro... | \n","
\n"," \n"," 4 | \n"," AAPL | \n"," Apple has fixed a bug that suggested the Pales... | \n","
\n"," \n"," ... | \n"," ... | \n"," ... | \n","
\n"," \n"," 595 | \n"," WMT | \n"," The price reductions come as consumers feel th... | \n","
\n"," \n"," 596 | \n"," WMT | \n"," This retailer's faster growth helped fund a bi... | \n","
\n"," \n"," 597 | \n"," WMT | \n"," Nichole Hart walks 20,000 steps as she searche... | \n","
\n"," \n"," 598 | \n"," WMT | \n"," Alaska Permanent, the largest U.S. state wealt... | \n","
\n"," \n"," 599 | \n"," WMT | \n"," The retailer could be a more exciting stock to... | \n","
\n"," \n","
\n","
600 rows × 2 columns
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["dow_top20_summaries_df"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/c5f1dec3-12cf-4f26-828a-7975cc587610\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Summary | \n"," Sentiment | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," Magnificent Seven stocks, including AI leader ... | \n"," 1.000000 | \n","
\n"," \n"," 1 | \n"," AAPL | \n"," So much for the \"pay or okay\" model that the F... | \n"," 0.233333 | \n","
\n"," \n"," 2 | \n"," AAPL | \n"," Apple is opening up web distribution for iOS a... | \n"," 0.225000 | \n","
\n"," \n"," 3 | \n"," AAPL | \n"," These four stocks will be the cream of the cro... | \n"," 0.000000 | \n","
\n"," \n"," 4 | \n"," AAPL | \n"," Apple has fixed a bug that suggested the Pales... | \n"," 0.100000 | \n","
\n"," \n"," 5 | \n"," AAPL | \n"," Apple CEO Tim Cook says ‘the investment abilit... | \n"," -0.125000 | \n","
\n"," \n"," 6 | \n"," AAPL | \n"," These are stocks you should always consider bu... | \n"," 0.000000 | \n","
\n"," \n"," 7 | \n"," AAPL | \n"," Amazon, Apple initiated: Wall Street's top ana... | \n"," 0.500000 | \n","
\n"," \n"," 8 | \n"," AAPL | \n"," The tech giant is no longer the world's top sm... | \n"," 0.250000 | \n","
\n"," \n"," 9 | \n"," AAPL | \n"," These companies are at earlier stages in their... | \n"," 0.062500 | \n","
\n"," \n"," 10 | \n"," AAPL | \n"," Which of these tech titans is the better buy r... | \n"," 0.392857 | \n","
\n"," \n"," 11 | \n"," AAPL | \n"," After Vietnam, Cook then flew further south to... | \n"," 0.100000 | \n","
\n"," \n"," 12 | \n"," AAPL | \n"," Apple will consider making some of its product... | \n"," 0.000000 | \n","
\n"," \n"," 13 | \n"," AAPL | \n"," The iShares Expanded Tech Sector ETF is outper... | \n"," 0.000000 | \n","
\n"," \n"," 14 | \n"," AAPL | \n"," In just a few days, India will commence the wo... | \n"," -0.200000 | \n","
\n"," \n"," 15 | \n"," AAPL | \n"," The legendary investor has made tens of billio... | \n"," 0.500000 | \n","
\n"," \n"," 16 | \n"," AAPL | \n"," Not all AI stocks trade at stratospheric valua... | \n"," 0.000000 | \n","
\n"," \n"," 17 | \n"," AAPL | \n"," This ETF contains some of the world's most imp... | \n"," 0.450000 | \n","
\n"," \n"," 18 | \n"," AAPL | \n"," (Bloomberg) -- Apple Inc. is weighing the poss... | \n"," 0.066667 | \n","
\n"," \n"," 19 | \n"," AAPL | \n"," Apple CEO Tim Cook said the company will “look... | \n"," 0.350000 | \n","
\n"," \n"," 20 | \n"," AMGN | \n"," Amgen's shares have come under pressure this y... | \n"," 0.300000 | \n","
\n"," \n"," 21 | \n"," AMGN | \n"," Not every business in operation today is built... | \n"," 0.000000 | \n","
\n"," \n"," 22 | \n"," AMGN | \n"," Amgen (NASDAQ:AMGN) today provided an update r... | \n"," 0.000000 | \n","
\n"," \n"," 23 | \n"," AMGN | \n"," Zacks.com users have recently been watching Am... | \n"," 0.150000 | \n","
\n"," \n"," 24 | \n"," AMGN | \n"," In the most recent trading session, Amgen (AMG... | \n"," 0.058333 | \n","
\n"," \n"," 25 | \n"," AMGN | \n"," Amgen's stock isn't expensive, but the busines... | \n"," -0.500000 | \n","
\n"," \n"," 26 | \n"," AMGN | \n"," GLP-1 medications will likely expand beyond th... | \n"," 0.000000 | \n","
\n"," \n"," 27 | \n"," AMGN | \n"," In this article, we discuss 11 best biotech ET... | \n"," 0.384091 | \n","
\n"," \n"," 28 | \n"," AMGN | \n"," It’s been a volatile start to the second quart... | \n"," 0.125000 | \n","
\n"," \n"," 29 | \n"," AMGN | \n"," Looking ahead, the future of the U.S. economy ... | \n"," 0.188333 | \n","
\n"," \n"," 30 | \n"," AMGN | \n"," Joseph Artuso of Easterly Investment Partners ... | \n"," 0.000000 | \n","
\n"," \n"," 31 | \n"," AMGN | \n"," In this piece, we will take a look at the ten ... | \n"," 0.428571 | \n","
\n"," \n"," 32 | \n"," AMGN | \n"," In this article, we discuss 13 best cheap divi... | \n"," 0.487143 | \n","
\n"," \n"," 33 | \n"," AMGN | \n"," These drugmakers have made important moves ove... | \n"," 0.075000 | \n","
\n"," \n"," 34 | \n"," AMGN | \n"," Amgen (AMGN) concluded the recent trading sess... | \n"," 0.000000 | \n","
\n"," \n"," 35 | \n"," AMGN | \n"," Use the recent short-term weakness from these ... | \n"," 0.000000 | \n","
\n"," \n"," 36 | \n"," AMGN | \n"," A Phase 3 study will compare Merck’s experimen... | \n"," 0.100000 | \n","
\n"," \n"," 37 | \n"," AMGN | \n"," Amgen (AMGN) expects strong sales growth of pr... | \n"," 0.433333 | \n","
\n"," \n"," 38 | \n"," AMGN | \n"," Amgen (AMGN) closed the most recent trading da... | \n"," 0.058333 | \n","
\n"," \n"," 39 | \n"," AMGN | \n"," Amgen (AMGN) has an impressive earnings surpri... | \n"," 0.214286 | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Function to calculate sentiment polarity\n","def calculate_sentiment(text):\n"," return TextBlob(text).sentiment.polarity\n","# Initialize an empty DataFrame to store the sentiment scores for the top 20 summaries\n","dow_top20_sentiment_df = pd.DataFrame(columns=['Ticker', 'Summary', 'Sentiment'])\n","# Iterate through the DataFrame and calculate the sentiment for each summary\n","for index, row in dow_top20_summaries_df.iterrows():\n"," ticker = row['Ticker']\n"," summary = row['Summary']\n"," sentiment = calculate_sentiment(summary)\n"," dow_top20_sentiment_df = dow_top20_sentiment_df.append({'Ticker': ticker, 'Summary': summary, 'Sentiment': sentiment}, ignore_index=True)\n","dow_top20_sentiment_df.head(40)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"text/plain":["[(0, '0.014*\"’\" + 0.008*\"2024\" + 0.007*\"\\'s\" + 0.006*\"april\"'),\n"," (1, '0.014*\"stocks\" + 0.014*\"\\'s\" + 0.009*\"trading\" + 0.007*\"earnings\"'),\n"," (2, '0.012*\"\\'s\" + 0.007*\"2024\" + 0.006*\"stock\" + 0.006*\"market\"'),\n"," (3, '0.009*\"stocks\" + 0.008*\"earnings\" + 0.008*\"company\" + 0.007*\"\\'s\"'),\n"," (4, '0.012*\"\\'s\" + 0.010*\"’\" + 0.006*\"u.s.\" + 0.005*\"rate\"')]"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Function to clean and tokenize text\n","def clean_tokenize(text):\n"," stop_words = set(stopwords.words('english'))\n"," tokens = word_tokenize(text.lower())\n"," tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]\n"," return tokens\n","\n","# Tokenize the summaries\n","tokenized_summaries = dow_top20_summaries_df['Summary'].apply(clean_tokenize)\n","\n","# Create a dictionary and corpus from the tokenized summaries\n","dictionary = corpora.Dictionary(tokenized_summaries)\n","corpus = [dictionary.doc2bow(text) for text in tokenized_summaries]\n","\n","# Apply LDA model\n","lda_model = models.ldamodel.LdaModel(corpus, num_topics=5, id2word=dictionary, passes=15)\n","topics = lda_model.print_topics(num_words=4)\n","\n","topics"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"text/plain":["[(0, '0.013*\"trading\" + 0.011*\"stocks\" + 0.011*\"day\" + 0.010*\"’\"'),\n"," (1, '0.011*\"\\'s\" + 0.011*\"’\" + 0.009*\"stocks\" + 0.007*\"market\"'),\n"," (2, '0.013*\"earnings\" + 0.012*\"’\" + 0.011*\"2024\" + 0.008*\"company\"'),\n"," (3, '0.011*\"\\'s\" + 0.011*\"stocks\" + 0.006*\"’\" + 0.006*\"company\"'),\n"," (4, '0.019*\"\\'s\" + 0.008*\"said\" + 0.007*\"street\" + 0.007*\"wall\"')]"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Re-run the LDA topic modeling code after downloading the required NLTK resources\n","from gensim import corpora, models\n","from nltk.corpus import stopwords\n","from nltk.tokenize import word_tokenize\n","import string\n","\n","# Function to clean and tokenize text\n","def clean_tokenize(text):\n"," stop_words = set(stopwords.words('english'))\n"," tokens = word_tokenize(text.lower())\n"," tokens = [word for word in tokens if word not in stop_words and word not in string.punctuation]\n"," return tokens\n","\n","# Tokenize the summaries\n","tokenized_summaries = dow_top20_summaries_df['Summary'].apply(clean_tokenize)\n","\n","# Create a dictionary and corpus from the tokenized summaries\n","dictionary = corpora.Dictionary(tokenized_summaries)\n","corpus = [dictionary.doc2bow(text) for text in tokenized_summaries]\n","\n","# Apply LDA model\n","lda_model = models.ldamodel.LdaModel(corpus, num_topics=5, id2word=dictionary, passes=15)\n","topics = lda_model.print_topics(num_words=4)\n","\n","topics"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/fe4fd92e-0c89-4b96-a036-2719999dcfeb\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Topic | \n"," Sentiment | \n","
\n"," \n"," \n"," \n"," 0 | \n"," 0 | \n"," 0.124343 | \n","
\n"," \n"," 1 | \n"," 1 | \n"," 0.110615 | \n","
\n"," \n"," 2 | \n"," 2 | \n"," 0.126383 | \n","
\n"," \n"," 3 | \n"," 3 | \n"," 0.178993 | \n","
\n"," \n"," 4 | \n"," 4 | \n"," 0.126170 | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Function to assign topics to summaries based on LDA model\n","def assign_topic_to_summary(summary):\n"," bow = dictionary.doc2bow(clean_tokenize(summary))\n"," topic_scores = lda_model[bow]\n"," dominant_topic = max(topic_scores, key=lambda x: x[1])[0]\n"," return dominant_topic\n","\n","# Assign topics to each summary\n","dow_top20_summaries_df['Topic'] = dow_top20_summaries_df['Summary'].apply(assign_topic_to_summary)\n","\n","# Perform sentiment analysis on each summary\n","dow_top20_summaries_df['Sentiment'] = dow_top20_summaries_df['Summary'].apply(calculate_sentiment)\n","\n","# Group by topic and calculate average sentiment\n","topic_sentiment_df = dow_top20_summaries_df.groupby('Topic')['Sentiment'].mean().reset_index()\n","\n","topic_sentiment_df"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[{"data":{"application/vnd.hex.export+parquet":"{\"success\":true,\"exportKey\":\"4a8043e5-f038-4821-9c1d-4b8d3d5b0fcd/4f22f623-94d2-4685-a2bb-957a6cfa4229/exports/3e79ced9-c958-471a-9f7b-bda913cf5601\"}","text/html":["\n","\n","
\n"," \n"," \n"," | \n"," Ticker | \n"," Original_Sentiment | \n"," New_Weighted_Sentiment | \n","
\n"," \n"," \n"," \n"," 0 | \n"," AAPL | \n"," 0.195268 | \n"," 0.027609 | \n","
\n"," \n"," 1 | \n"," AMGN | \n"," 0.125121 | \n"," 0.018886 | \n","
\n"," \n"," 2 | \n"," AMZN | \n"," 0.143147 | \n"," 0.019433 | \n","
\n"," \n"," 3 | \n"," AXP | \n"," 0.158369 | \n"," 0.022741 | \n","
\n"," \n"," 4 | \n"," BA | \n"," 0.145588 | \n"," 0.021359 | \n","
\n"," \n"," 5 | \n"," CAT | \n"," 0.099819 | \n"," 0.012247 | \n","
\n"," \n"," 6 | \n"," CRM | \n"," 0.134925 | \n"," 0.018329 | \n","
\n"," \n"," 7 | \n"," CSCO | \n"," 0.088520 | \n"," 0.011163 | \n","
\n"," \n"," 8 | \n"," CVX | \n"," 0.124590 | \n"," 0.017225 | \n","
\n"," \n"," 9 | \n"," DIS | \n"," 0.169991 | \n"," 0.022478 | \n","
\n"," \n"," 10 | \n"," DOW | \n"," 0.180742 | \n"," 0.024173 | \n","
\n"," \n"," 11 | \n"," GS | \n"," 0.239865 | \n"," 0.030563 | \n","
\n"," \n"," 12 | \n"," HD | \n"," 0.168376 | \n"," 0.024265 | \n","
\n"," \n"," 13 | \n"," HON | \n"," 0.109561 | \n"," 0.015721 | \n","
\n"," \n"," 14 | \n"," IBM | \n"," 0.148592 | \n"," 0.022415 | \n","
\n"," \n"," 15 | \n"," INTC | \n"," 0.043373 | \n"," 0.006019 | \n","
\n"," \n"," 16 | \n"," JNJ | \n"," 0.087794 | \n"," 0.010825 | \n","
\n"," \n"," 17 | \n"," JPM | \n"," 0.075948 | \n"," 0.010967 | \n","
\n"," \n"," 18 | \n"," KO | \n"," 0.215687 | \n"," 0.033600 | \n","
\n"," \n"," 19 | \n"," MCD | \n"," 0.155715 | \n"," 0.022637 | \n","
\n"," \n"," 20 | \n"," MMM | \n"," 0.157566 | \n"," 0.022900 | \n","
\n"," \n"," 21 | \n"," MRK | \n"," 0.140685 | \n"," 0.018020 | \n","
\n"," \n"," 22 | \n"," MSFT | \n"," 0.105799 | \n"," 0.016556 | \n","
\n"," \n"," 23 | \n"," NKE | \n"," 0.073771 | \n"," 0.011753 | \n","
\n"," \n"," 24 | \n"," PG | \n"," 0.160547 | \n"," 0.023201 | \n","
\n"," \n"," 25 | \n"," TRV | \n"," 0.138650 | \n"," 0.021667 | \n","
\n"," \n"," 26 | \n"," UNH | \n"," 0.114048 | \n"," 0.015892 | \n","
\n"," \n"," 27 | \n"," V | \n"," 0.124004 | \n"," 0.017746 | \n","
\n"," \n"," 28 | \n"," VZ | \n"," 0.145537 | \n"," 0.017689 | \n","
\n"," \n"," 29 | \n"," WMT | \n"," 0.099878 | \n"," 0.012784 | \n","
\n"," \n","
\n","
"]},"execution_count":null,"metadata":{},"output_type":"execute_result"}],"source":["# Function to calculate weighted sentiment based on topic sentiment\n","def calculate_weighted_sentiment(row):\n"," topic = row['Topic']\n"," sentiment = row['Sentiment']\n"," topic_weight = topic_sentiment_df[topic_sentiment_df['Topic'] == topic]['Sentiment'].values[0]\n"," return sentiment * topic_weight\n","\n","# Calculate weighted sentiment for each summary\n","dow_top20_summaries_df['Weighted_Sentiment'] = dow_top20_summaries_df.apply(calculate_weighted_sentiment, axis=1)\n","\n","# Calculate new average sentiment for each company based on weighted sentiment\n","new_dow_sentiment_df = dow_top20_summaries_df.groupby('Ticker')['Weighted_Sentiment'].mean().reset_index()\n","\n","# Merge with original dow_sentiment_df to compare\n","comparison_df = pd.merge(dow_sentiment_df, new_dow_sentiment_df, on='Ticker', how='inner')\n","comparison_df.columns = ['Ticker', 'Original_Sentiment', 'New_Weighted_Sentiment']\n","\n","comparison_df"]},{"cell_type":"markdown","metadata":{},"source":["## Conclusions:\n","\n","1. Nuanced Understanding: The weighted sentiment scores provide a more nuanced understanding of the news landscape for each company. They take into account not just the sentiment of the news, but also the importance of the topic that the news belongs to.\n","\n","2. Risk Mitigation: By focusing on topic-specific sentiment, investors can potentially mitigate risks. For example, if a company has negative sentiment in a critical topic like \"Corporate Announcements,\" it might be a red flag.\n","\n","3. Strategic Investment: The topic-weighted sentiment can be used to fine-tune investment strategies. For instance, you might prioritize companies with positive news in topics that are currently trending or are of strategic importance, like \"Stock Market Trends.\"\n","\n","4. Dynamic Adaptation: As the importance of topics changes over time (e.g., during earnings season, product launches, etc.), the weighted sentiment scores can adapt dynamically, providing timely investment insights.\n","\n","5. Comprehensive Analysis: Combining both general and topic-specific sentiment gives a more rounded view, allowing for better-informed investment decisions.\n","\n","By using weighted sentiment scores, investors can make more nuanced and strategic decisions, potentially leading to better investment outcomes."]}],"metadata":{"hex_info":{"author":"Brandon Doey","exported_date":"Wed Apr 17 2024 18:05:36 GMT+0000 (Coordinated Universal Time)","project_id":"4f22f623-94d2-4685-a2bb-957a6cfa4229","version":"draft"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"orig_nbformat":4},"nbformat":4,"nbformat_minor":4}