{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Document Indexing and RAG" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we'll demonstrate how RAG operations can be implemented in Pixeltable. In particular, we'll develop a RAG application that summarizes a collection of PDF documents and uses ChatGPT to answer questions about them.\n", "\n", "In a traditional RAG workflow, such operations might be implemented as a Python script that runs on a periodic schedule or in response to certain events. In Pixeltable, they are implemented as persistent tables that are updated automatically and incrementally as new data becomes available.\n", "\n", "
\n", "If you are running this tutorial in Colab:\n", "In order to make the tutorial run a bit snappier, let's switch to a GPU-equipped instance for this Colab session. To do that, click on the Runtime -> Change runtime type menu item at the top, then select the GPU radio button and click on Save.\n", "
\n", "\n", "We first set up our OpenAI API key:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "if 'OPENAI_API_KEY' not in os.environ:\n", " os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then install the packages we need for this tutorial and then set up our environment." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install -q pixeltable sentence-transformers tiktoken openai openpyxl" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/sergeymkhitaryan/.pixeltable/pgdata\n", "Created directory 'rag_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pixeltable as pxt\n", "\n", "# Ensure a clean slate for the demo\n", "pxt.drop_dir('rag_demo', force=True)\n", "pxt.create_dir('rag_demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we'll create a table containing the sample questions we want to answer. The questions are stored in an Excel spreadsheet, along with a set of \"ground truth\" answers to help evaluate our model pipeline. We can use `create_table()` with the `source` parameter to load them. Note that we can pass the URL of the spreadsheet directly." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'queries'.\n", "Inserting rows into `queries`: 8 rows [00:00, 2469.96 rows/s]\n", "Inserted 8 rows with 0 errors.\n" ] } ], "source": [ "base = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/rag-demo/'\n", "qa_url = base + 'Q-A-Rag.xlsx'\n", "queries_t = pxt.create_table('rag_demo/queries', source=qa_url)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
S__No_Questioncorrect_answer
1What is roughly the current mortage rate?0.07
2What is the current dividend yield for Alphabet Inc. (\\$GOOGL)?0.0046
3What is the market capitalization of Alphabet?\\$2182.8 Billion
4What are the latest financial metrics for Accenture PLC?missed consensus forecasts and strong total bookings rising by 22% annually
5What is the overall latest rating for Amazon.com from analysts?SELL
6What is the operating cash flow of Amazon in Q1 2024?18,989 Million
7What is the expected EPS for Nvidia in Q1 2026?0.73 EPS
8What are the main reasons to buy Nvidia?Datacenter, GPUs Demands, Self-driving, and cash-flow
" ], "text/plain": [ " S__No_ Question \\\n", "0 1 What is roughly the current mortage rate? \n", "1 2 What is the current dividend yield for Alphabe... \n", "2 3 What is the market capitalization of Alphabet? \n", "3 4 What are the latest financial metrics for Acce... \n", "4 5 What is the overall latest rating for Amazon.c... \n", "5 6 What is the operating cash flow of Amazon in Q... \n", "6 7 What is the expected EPS for Nvidia in Q1 2026? \n", "7 8 What are the main reasons to buy Nvidia? \n", "\n", " correct_answer \n", "0 0.07 \n", "1 0.0046 \n", "2 $2182.8 Billion \n", "3 missed consensus forecasts and strong total bo... \n", "4 SELL \n", "5 18,989 Million \n", "6 0.73 EPS \n", "7 Datacenter, GPUs Demands, Self-driving, and ca... " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outline\n", "\n", "There are two major parts to our RAG application:\n", "\n", "1. Document Indexing: Load the documents, split them into chunks, and index them using a vector embedding.\n", "1. Querying: For each question on our list, do a top-k lookup for the most relevant chunks, use them to construct a ChatGPT prompt, and send the enriched prompt to an LLM.\n", "\n", "We'll implement both parts in Pixeltable.\n", "\n", "## Document Indexing\n", "\n", "All data in Pixeltable, including documents, resides in tables.\n", "\n", "Tables are persistent containers that can serve as the store of record for your data. Since we are starting from scratch, we will start with an empty table `rag_demo.documents` with a single column, `document`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2024-04-11T20:58:43.144688Z", "start_time": "2024-04-11T20:58:42.912418Z" }, "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'documents'.\n" ] }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'rag_demo/documents'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
documentDocument
\n" ], "text/plain": [ "table 'rag_demo/documents'\n", "\n", " Column Name Type Computed With\n", " document Document " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "documents_t = pxt.create_table(\n", " 'rag_demo/documents', {'document': pxt.Document}\n", ")\n", "\n", "documents_t" ] }, { "cell_type": "markdown", "metadata": { "metadata": {} }, "source": [ "Next, we'll insert our first few source documents into the new table. We'll leave the rest for later, in order to show how to update the indexed document base incrementally." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "document_urls = [\n", " base + 'Argus-Market-Digest-June-2024.pdf',\n", " base + 'Argus-Market-Watch-June-2024.pdf',\n", " base + 'Company-Research-Alphabet.pdf',\n", " base + 'Jefferson-Amazon.pdf',\n", " base + 'Mclean-Equity-Alphabet.pdf',\n", " base + 'Zacks-Nvidia-Report.pdf',\n", "]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `documents`: 3 rows [00:00, 491.31 rows/s]\n", "Inserted 3 rows with 0 errors.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
document
\n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "
" ], "text/plain": [ " document\n", "0 /Users/sergeymkhitaryan/.pixeltable/file_cache...\n", "1 /Users/sergeymkhitaryan/.pixeltable/file_cache...\n", "2 /Users/sergeymkhitaryan/.pixeltable/file_cache..." ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "documents_t.insert({'document': url} for url in document_urls[:3])\n", "documents_t.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In RAG applications, we often decompose documents into smaller units, or chunks, rather than treating each document as a single entity. In this example, we'll use Pixeltable's built-in `document_splitter`, but in general the chunking methodology is highly customizable. `document_splitter` has a variety of options for controlling the chunking behavior, and it's also possible to replace it entirely with a user-defined iterator (or an adapter for a third-party document splitter).\n", "\n", "In Pixeltable, operations such as chunking can be automated by creating **views** of the base `documents` table. A view is a virtual derived table: rather than adding data directly to the view, we define it via a computation over the base table. In this example, the view is defined by iteration over the chunks of a `document_splitter`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `chunks`: 41 rows [00:00, 20799.04 rows/s]\n" ] } ], "source": [ "from pixeltable.functions.document import document_splitter\n", "\n", "chunks_t = pxt.create_view(\n", " 'rag_demo/chunks',\n", " documents_t,\n", " iterator=document_splitter(\n", " documents_t.document, separators='token_limit', limit=300\n", " ),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our `chunks` view now has 3 columns:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
view 'rag_demo/chunks' (of 'rag_demo/documents')
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
posRequired[Int]
textRequired[String]
documentDocument
\n" ], "text/plain": [ "view 'rag_demo/chunks' (of 'rag_demo/documents')\n", "\n", " Column Name Type Computed With\n", " pos Required[Int] \n", " text Required[String] \n", " document Document " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chunks_t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `text` is the chunk text produced by the `document_splitter`\n", "- `pos` is a system-generated integer column, starting at 0, that provides a sequence number for each row\n", "- `document`, which is simply the `document` column from the base table `documents`. We won't need it here, but having access to the base table's columns (in effect a parent-child join) can be quite useful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that as soon as we created it, `chunks` was automatically populated with data from the existing documents in our base table. We can select the first 2 chunks from each document using common query operations, in order to get a feel for what was extracted:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "metadata": {}, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
postextdocument
0MARKET DIGEST\n", "- 1 -\n", " FRIDAY, JUNE 21, 2024\n", "JUNE 20, DJIA: 39,134.76\n", "UP 299.90\n", "Independent Equity Research Since 1934\n", "ARGUS\n", "A R G U S R E S E A R C H C O M P A N Y • 6 1 B R O A D W A Y • N E W Y O R K , N. Y. 1 0 0 0 6 • ( 2 1 2 ) 4 2 5 - 7 5 0 0\n", "LONDON SALES & MARKETING OFFICE TEL 011-44-207-256-8383 / FAX 011-44-207-256-8363\n", "®\n", "Good Morning. This is the Market Digest for Friday, June 21, 2024, with analysis of the financial markets and comments on\n", "Accenture plc.\n", "IN THIS ISSUE:\n", "* Growth Stock: Accenture plc: Shares rally on AI optimism (Jim Kelleher)\n", "MARKET REVIEW:\n", "Yogi Berra famously said "When you come to a fork in the road, take it." Stock investors did just that on Thursday,\n", "pushing the Dow Jones Industrial Average higher by 0.77% but the Nasdaq Composite and S&P 500 lower by 0.79% and 0.25%,
\n", " \n", " \n", " \n", "
1respectively. In a rare event, shares of Nvidia not only lost ground, but lost a relatively meaningful amount (3.5%), proving nothing\n", "can go up forever. Still, the major indices are comfortably ahead for the year to date — and the big non-AI mover for stocks is\n", "the future direction of interest rates, which remains a concern for Wall Street and (for one day at least) offset AI mania.\n", "ACCENTURE PLC (NYSE: ACN, \\$306.16) BUY\n", "ACN: Shares rally on AI optimism\n", "* Accenture posted fiscal 3Q24 non-GAA ...... Q24, rising 22% annually on acceleration in managed services and resulting in a\n", "1.3 book-to-bill. Accenture still appears to be taking share from competitors.\n", "* We believe that Accenture has the financial resources, customer presence, and market strength to thrive as\n", "companies accelerate the process of digital transformation and begin their AI journeys.\n", "ANALYSIS\n", "INVESTMENT THESIS\n", "Shares of BUY-rated Accenture plc (NYSE: ACN) rose solidly on June 20, despite the company reporting fiscal 3Q24
\n", " \n", " \n", " \n", "
0Friday, June 21, 2024\n", "Intermediate Term:\n", "Market Outlook\n", "Bullish\n", "-------------- PORTFOLIO STRATEGY -------------\n", "Equity: 72%\n", "Cash: 1%\n", "Today's Market Movers\n", "IMPACT\n", "aGlobal Shares Lower\n", "GILD Pops on HIV Drug Results\n", "SRPT Soars on FDA Approval\n", "SWBI Drops on Sales Guidance\n", "+\n", "+\n", "+\n", "-\n", "a\n", "a\n", "a\n", "Recent Research Review\n", "ADSK, MRNA, IQV, WMB, BUD, LYFT, SRE, BP, \n", "AEE, PPC, JNPR, ORCL, CMG, TPR, DPZ, EOG, \n", "COST, PLTR, COR, VRTX\n", "Statistics Diary\n", "12-Mth S&P 500 Forcast:\n", "S&P 500 Current/Next EPS:\n", "S&P 500 P/E:\n", "12-Mth S&P P/E Range:\n", "10-Year Yield:\n", "12-Mth 10-Yr. Bond Forecast:\n", "Current Fed Funds Target:\n", "12-Mth Fed Funds Forecast:\n", "4800-5600\n", "247/265\n", "22.16\n", "18.1 - 21.1\n", "4.26%\n", "3.50-4.50%\n", "4.62%\n", "4.50-5.50%\n", "DJIA:\n", "S&P 500:\n", "NASDAQ:\n", "Lrg/Small Cap:\n", "Growth/Value:\n", "PREVIOUS\n", "CLOSE\n", "200-DAY\n", "AVERAGE\n", "39134
\n", " \n", " \n", " \n", "
1.76 37058.23\n", "5473.17 4831.39\n", "17721.59 15160.55\n", "1.48 1.37\n", "2.07 1.86\n", "CURRENT RANKING\n", "Five-Day Put/Call:\n", "Momentum:\n", "Bullish Sentiment:\n", "Mutual Fund Cash:\n", "Vickers Insider Index:\n", "1.00 Positive\n", "346000 Neutral\n", "44% Positive\n", "1.70% Negative\n", "3.42 Negative\n", "Housing Sentiment Slumps\n", "Mortgage rates near 7% are pushing prospective buyers to the sidelines and \n", "could turn housing to a drag on 2Q GDP after a strong contribution to 1Q \n", "growth. "Millions of potential homebuyers have been priced out of the market \n", " ...... s published yesterday by Harvard's Joint \n", "Center for Housing Studies. Fannie Mae's Home Purchase Sentiment Index \n", "for May dropped by 2.5 points to an all-time survey low of 69.4. Just 14% of \n", "consumers said that it is a good time to buy a home, down from 20% in April. \n", "Doug Duncan, Chief Economist at Fannie Mae, said "While many \n", "respondents expressed optimism at the beginning of the year that mortgage \n", "rates would decline, that simply hasn't happened, and current sentiment \n", "reflects pent-up
\n", " \n", " \n", " \n", "
0Company Research Highlights®\n", "Report created on June 21, 2024\n", "This is not an investment recommendation from Fidelity Investments. Fidelity provides this information as a service to investors from independent, third-party sources.\n", "Performance of analyst recommendations are provided by StarMine from Refinitiv. Current analyst recommendations are collected and standardized by Investars.\n", "See each section in this report for third-party content attribution, as well as the final page of the report f ...... Capitalization: \\$2182.8 B Interactive Media & Services Industry\n", "Business Description Data provided by S&P Compustat\n", "Alphabet Inc. offers various products and platforms in the United States, Europe, the Middle\n", "East, Africa, the Asia-Pacific, Canada, and Latin America. It operates through Google Services,\n", "Google Cloud, and Other Bets segments.\n", "Key Statistics\n", "Employee Count 182,502\n", "Institutional Ownership 80.9%\n", "Total Revenue (TTM) \\$80,539.00\n", "3/31/2024\n", "Revenue Growth\n", "(TTM vs. Prior TTM)\n", "+11.78%
\n", " \n", " \n", " \n", "
1Enterprise Value \\$2103.1 B\n", "6/20/2024\n", "Ex. Dividend Date 6/10/2024\n", "Dividend \\$0.200000\n", "Dividend Yield (Annualized) 0.45%\n", "6/20/2024\n", "P/E (TTM) 27.0\n", "6/20/2024\n", "Earning Yield (TTM) +3.70%\n", "6/20/2024\n", "EPS (Adjusted TTM) \\$1.89\n", "4/25/2024\n", "Consensus EPS Estimate\n", "(Q2 2024)\n", "\\$1.84\n", "EPS Growth\n", "(TTM vs. Prior TTM)\n", "+45.2%\n", "3-Year Price Performance Data provided by DataScope from Refinitiv\n", "Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2\n", "2021 2022 2023 2024\n", "-40%\n", "-20%\n", "0%\n", "20%\n", "40%\n", "0\n", "500\n", "Average Monthly Volume (Millions)\n", "50-Day Moving Average\n", "200-Day Moving Average\n", "Trading Characteristics\n", "52 Week High\n", "6/12/2024\n", "\\$180.41\n", "52 Week Low\n", "7/11/2023\n", "\\$115.35\n", "% Price Above/Below\n", " 20-Day Average 2.4\n", " 50-Day Average 6.0\n", " 200-Day Average 22.6\n", "Price Performance (% Change)
\n", " \n", " \n", " \n", "
" ], "text/plain": [ " pos text \\\n", "0 0 MARKET DIGEST\\n- 1 -\\n FRIDAY, JUNE 21, 2024\\n... \n", "1 1 respectively. In a rare event, shares of Nvidi... \n", "2 0 Friday, June 21, 2024\\nIntermediate Term:\\nMar... \n", "3 1 .76 37058.23\\n5473.17 4831.39\\n17721.59 15160.... \n", "4 0 Company Research Highlights®\\nReport created o... \n", "5 1 Enterprise Value $2103.1 B\\n6/20/2024\\nEx. Div... \n", "\n", " document \n", "0 /Users/sergeymkhitaryan/.pixeltable/file_cache... \n", "1 /Users/sergeymkhitaryan/.pixeltable/file_cache... \n", "2 /Users/sergeymkhitaryan/.pixeltable/file_cache... \n", "3 /Users/sergeymkhitaryan/.pixeltable/file_cache... \n", "4 /Users/sergeymkhitaryan/.pixeltable/file_cache... \n", "5 /Users/sergeymkhitaryan/.pixeltable/file_cache... " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chunks_t.where(chunks_t.pos < 2).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's compute vector embeddings for the document chunks and store them in a vector index. Pixeltable has built-in support for vector indexing using a variety of embedding model families, and it's easy for users to add new ones via UDFs. In this demo, we're going to use the E5 model from the Huggingface `sentence_transformers` library, which runs locally.\n", "\n", "The following command creates a vector index on the `text` column in the `chunks` table, using the E5 embedding model. (For details on index creation, see the [Embedding and Vector Indices](https://github.com/pixeltable/pixeltable/blob/release/docs/platform/embedding-indexes.ipynb) guide.) Note that defining the index is sufficient in order to load it with the existing data (and also to update it when the underlying data changes, as we'll see later)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "metadata": {} }, "outputs": [], "source": [ "from pixeltable.functions.huggingface import sentence_transformer\n", "\n", "chunks_t.add_embedding_index(\n", " 'text',\n", " embedding=sentence_transformer.using(model_id='intfloat/e5-large-v2'),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This completes the first part of our application, creating an indexed document base. Next, we'll use it to run some queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying\n", "\n", "In order to express a top-k lookup against our index, we use Pixeltable's `similarity` operator in combination with the standard `order_by` and `limit` operations. Before building this into our application, let's run a sample query to make sure it works." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "metadata": {}, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
similaritytext
0.798this report for third-party content attribution.\n", "Page 3\n", "Report created on June 21, 2024\n", "Recent Recap\n", "Last Report: Q1 Earnings on 04/25/24\n", "Reported Earnings: \\$1.89 per share\n", "Next Expected Report Date: 07/23/24\n", "GOOGL exceeded the First Call Consensus\n", "of \\$1.515 and exceeded the StarMine\n", "SmartEstimate from Refinitiv of \\$1.533 for\n", "Q1 2024.\n", "About Starmine SmartEstimate\n", "The StarMine SmartEstimate from Refinitiv\n", "seeks to be more accurate than the\n", "consensus EPS by calculating an analyst's\n", "accuracy and timeliness of an analyst's\n", "estimates into its estimate of earnings.\n", "Actuals vs. Estimates by Fiscal Quarter Data provided by I/B/E/S from Refinitiv\n", "ACTUALS ESTIMATES STARMINE ESTIMATES GOOGL PRICE Earnings in US Dollars\n", "Q1 Q2\n", "2024\n", "Q1 Q2 Q3 Q4\n", "2023\n", "Q1 Q2 Q3 Q4\n", "2022\n", "\\$0\n", "\\$100\n", "\\$200 Today\n", "1.23 1.21 1.06 1.05 1.17 1.44 1.55 1.64 1.89 1.84\n", "Actuals vs. Estimates for Fiscal Year\n", "First Call Estimates\n", " Actual (\\$) Cons
0.797, being in the 100th percentile is not the best for items such as Debt to Capital Ratio where a lower number means less debt. Therefore, being\n", "in the 1st percentile indicates lower debt than its peers in the industry.\n", "The Industry Average is a market cap-weighted average of the non-null values in the industry.Company Research Highlights®\n", "NASDAQ GOOGL This is not an investment recommendation from Fidelity Investments. The information\n", "contained in this report is sourced from independent, third ...... content attribution.\n", " © 2024 FMR LLC. All rights reserved. 447628.8.0\n", "Page 4\n", "Report created on June 21, 2024\n", "Important Information Regarding Third-Party Content\n", "The content compiled in this report is provided by third parties and not Fidelity. Fidelity did not prepare and does not endorse such content. Historical prices provided by Datascope from\n", "Refinitiv. Fundamental data provided by Standard & Poor's Compustat®. Earnings estimates provided by Refinitiv. Analyst recommendations performance
0.794Friday, June 21, 2024\n", "Intermediate Term:\n", "Market Outlook\n", "Bullish\n", "-------------- PORTFOLIO STRATEGY -------------\n", "Equity: 72%\n", "Cash: 1%\n", "Today's Market Movers\n", "IMPACT\n", "aGlobal Shares Lower\n", "GILD Pops on HIV Drug Results\n", "SRPT Soars on FDA Approval\n", "SWBI Drops on Sales Guidance\n", "+\n", "+\n", "+\n", "-\n", "a\n", "a\n", "a\n", "Recent Research Review\n", "ADSK, MRNA, IQV, WMB, BUD, LYFT, SRE, BP, \n", "AEE, PPC, JNPR, ORCL, CMG, TPR, DPZ, EOG, \n", "COST, PLTR, COR, VRTX\n", "Statistics Diary\n", "12-Mth S&P 500 Forcast:\n", "S&P 500 Current/Next EPS:\n", "S&P 500 P/E:\n", "12-Mth S&P P/E Range:\n", "10-Year Yield:\n", "12-Mth 10-Yr. Bond Forecast:\n", "Current Fed Funds Target:\n", "12-Mth Fed Funds Forecast:\n", "4800-5600\n", "247/265\n", "22.16\n", "18.1 - 21.1\n", "4.26%\n", "3.50-4.50%\n", "4.62%\n", "4.50-5.50%\n", "DJIA:\n", "S&P 500:\n", "NASDAQ:\n", "Lrg/Small Cap:\n", "Growth/Value:\n", "PREVIOUS\n", "CLOSE\n", "200-DAY\n", "AVERAGE\n", "39134
0.793. Each firm has its own recommendation\n", "categories, making it difficult to compare\n", "one firm's recommendation to another's.\n", "Investars, a third-party research\n", "company, collects and standardizes\n", "these recommendations using a\n", "five-point scale.\n", "† The Equity Summary Score provided by\n", "StarMine from Refinitiv is current as of\n", "the date specified. There may be\n", "differences between the Equity Summary\n", "Score analyst count and the number of\n", "underlying analysts listed. Due to the\n", "timing in receiving ratings ...... mendation from Fidelity Investments. The information\n", "contained in this report is sourced from independent, third party providers.\n", "Price on 6/20/2024: \\$176.30 Communication Services Sector\n", "Market Capitalization: \\$2182.8 B Interactive Media & Services Industry Alphabet Class A\n", "The content on this page is provided by third parties and not Fidelity. Fidelity did not prepare and does not endorse such content. All are third-party companies that are not affiliated with Fidelity.\n", "See each section in
0.792in this report for third-party content attribution and see page 4 for full disclosures.\n", "Page 2\n", "Report created on June 21, 2024\n", "Equity Summary Score is a weighted,\n", "aggregated view of opinions from the\n", "independent research firms on Fidelity.com.\n", "It uses the past accuracy of these firms in\n", "determining the emphasis placed on any\n", "individual opinion.\n", "First Call Consensus Recommendation\n", "is provided by I/B/E/S from Refinitiv, an\n", "independent third-party research provider,\n", "using information gathered ...... firm descriptions,\n", "detailed methodologies, and more\n", "information on the Equity Summary Score,\n", "First Call Consensus, opinion history and\n", "performance, and most current available\n", "research reports for GOOGL.\n", "Equity Summary Score (7 Firms†) Provided by StarMine from Refinitiv as of 6/21/2024\n", "Firm1\n", "Starmine\n", "Relative Accuracy2Standardized Opinion3\n", "Refinitiv/Verus (i) 30 Neutral\n", "Zacks Investment Research, Inc (i) 56 Outperform\n", "ISS-EVA (i) 86 Underperform\n", "Jefferson Research (i) 34 Buy\n", "Trading Central
" ], "text/plain": [ " similarity text\n", "0 0.798383 this report for third-party content attributi...\n", "1 0.796918 , being in the 100th percentile is not the bes...\n", "2 0.794085 Friday, June 21, 2024\\nIntermediate Term:\\nMar...\n", "3 0.793386 . Each firm has its own recommendation\\ncatego...\n", "4 0.792263 in this report for third-party content attrib..." ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query_text = 'What is the expected EPS for Nvidia in Q1 2026?'\n", "sim = chunks_t.text.similarity(string=query_text)\n", "nvidia_eps_query = (\n", " chunks_t.order_by(sim, asc=False)\n", " .select(similarity=sim, text=chunks_t.text)\n", " .limit(5)\n", ")\n", "nvidia_eps_query.collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We perform this context retrieval for each row of our `queries` table by adding it as a computed column. In this case, the operation is a top-k similarity lookup against the data in the `chunks` table. To implement this operation, we'll use Pixeltable's `@query` decorator to enhance the capabilities of the `chunks` table." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 8 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "8 rows updated, 8 values computed." ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# A @query is essentially a reusable, parameterized query that is attached to a table (or view),\n", "# which is a modular way of getting data from that table.\n", "\n", "\n", "@pxt.query\n", "def top_k(query_text: str):\n", " sim = chunks_t.text.similarity(string=query_text)\n", " return (\n", " chunks_t.order_by(sim, asc=False)\n", " .select(chunks_t.text, sim=sim)\n", " .limit(5)\n", " )\n", "\n", "\n", "# Now add a computed column to `queries_t`, calling the query\n", "# `top_k` that we just defined.\n", "queries_t.add_computed_column(question_context=top_k(queries_t.Question))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our `queries` table now looks like this:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'rag_demo/queries'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
S__No_Int
QuestionString
correct_answerString
question_contextJsontop_k(Question)
\n" ], "text/plain": [ "table 'rag_demo/queries'\n", "\n", " Column Name Type Computed With\n", " S__No_ Int \n", " Question String \n", " correct_answer String \n", " question_context Json top_k(Question)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The new column `question_context` now contains the result of executing the query for each row, formatted as a list of dictionaries:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
question_context
[{"sim": 0.793, "text": ".76 37058.23\\n5473.17 4831.39\\n17721.59 15160.55\\n1.48 1.37\\n2.07 1.86\\nCURRENT RANKING\\nFive-Day Put/Call:\\nMomentum:\\nBullish Sentiment:\\nMutual Fund Cas ...... sed optimism at the beginning of the year that mortgage \\nrates would decline, that simply hasn't happened, and current sentiment \\nreflects pent-up"}, {"sim": 0.789, "text": " frustration with the overall lack of purchase affordability.\\" \\nBased on the June 20 GDPNow estimate from the Atlanta Fed, residential \\nfixed inve ...... . High mortgage rates are a challenge, but we remain bullish on \\nthe sector because demographics point to strong demand amid a decades-long \\nshort"}, {"sim": 0.773, "text": ". The content of this report \\nmay be derived from Argus research reports, notes, or analyses. The opinions and information contained herein have b ...... all recipients of this report as \\ncustomers simply by virtue of their receipt of this material. Investments involve risk and an investor may incur"}, {"sim": 0.773, "text": "Enterprise Value \\$2103.1 B\\n6/20/2024\\nEx. Dividend Date 6/10/2024\\nDividend \\$0.200000\\nDividend Yield (Annualized) 0.45%\\n6/20/2024\\nP/E (TTM) 27.0\\n6/2 ...... 0.41\\n52 Week Low\\n7/11/2023\\n\\$115.35\\n% Price Above/Below\\n 20-Day Average 2.4\\n 50-Day Average 6.0\\n 200-Day Average 22.6\\nPrice Performance (% Change)\\n"}, {"sim": 0.773, "text": " in this report for third-party content attribution and see page 4 for full disclosures.\\nPage 2\\nReport created on June 21, 2024\\nEquity Summary Sco ...... iv/Verus (i) 30 Neutral\\nZacks Investment Research, Inc (i) 56 Outperform\\nISS-EVA (i) 86 Underperform\\nJefferson Research (i) 34 Buy\\nTrading Central"}]
" ], "text/plain": [ " question_context\n", "0 [{'sim': 0.7933434484405357, 'text': '.76 3705..." ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.select(queries_t.question_context).head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Asking the LLM\n", "\n", "Now it's time for the final step in our application: feeding the document chunks and questions to an LLM for resolution. In this demo, we'll use OpenAI for this, but any other inference cloud or local model could be used instead.\n", "\n", "We start by defining a UDF that takes a top-k list of context chunks and a question and turns them into a ChatGPT prompt." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Define a UDF to create an LLM prompt given a top-k list of\n", "# context chunks and a question.\n", "@pxt.udf\n", "def create_prompt(top_k_list: list[dict], question: str) -> str:\n", " concat_top_k = '\\n\\n'.join(\n", " elt['text'] for elt in reversed(top_k_list)\n", " )\n", " return f\"\"\"\n", " PASSAGES:\n", "\n", " {concat_top_k}\n", "\n", " QUESTION:\n", "\n", " {question}\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then add that again as a computed column to `queries`:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 8 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "8 rows updated, 16 values computed." ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.add_computed_column(\n", " prompt=create_prompt(queries_t.question_context, queries_t.Question)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a new string column containing the prompt:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'rag_demo/queries'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
S__No_Int
QuestionString
correct_answerString
question_contextJsontop_k(Question)
promptStringcreate_prompt(question_context, Question)
\n" ], "text/plain": [ "table 'rag_demo/queries'\n", "\n", " Column Name Type Computed With\n", " S__No_ Int \n", " Question String \n", " correct_answer String \n", " question_context Json top_k(Question)\n", " prompt String create_prompt(question_context, Question)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prompt
PASSAGES:\n", "\n", " in this report for third-party content attribution and see page 4 for full disclosures.\n", "Page 2\n", "Report created on June 21, 2024\n", "Equity Summary Score is a weighted,\n", "aggregated view of opinions from the\n", "independent research firms on Fidelity.com.\n", "It uses the past accuracy of these firms in\n", "determining the emphasis placed on any\n", "individual opinion.\n", "First Call Consensus Recommendation\n", "is provided by I/B/E/S from Refinitiv, an\n", "independent third-party research provider,\n", "using i ...... tudies. Fannie Mae's Home Purchase Sentiment Index \n", "for May dropped by 2.5 points to an all-time survey low of 69.4. Just 14% of \n", "consumers said that it is a good time to buy a home, down from 20% in April. \n", "Doug Duncan, Chief Economist at Fannie Mae, said "While many \n", "respondents expressed optimism at the beginning of the year that mortgage \n", "rates would decline, that simply hasn't happened, and current sentiment \n", "reflects pent-up\n", "\n", " QUESTION:\n", "\n", " What is roughly the current mortage rate?
" ], "text/plain": [ " prompt\n", "0 \\n PASSAGES:\\n\\n in this report for thi..." ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.select(queries_t.prompt).head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now add another computed column to call OpenAI. For the `chat_completions()` call, we need to construct two messages, containing the instructions to the model and the prompt. For the latter, we can simply reference the `prompt` column we just added." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 8 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "8 rows updated, 8 values computed." ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pixeltable.functions import openai\n", "\n", "# Assemble the prompt and instructions into OpenAI's message format\n", "messages = [\n", " {\n", " 'role': 'system',\n", " 'content': 'Please read the following passages and answer the question based on their contents.',\n", " },\n", " {'role': 'user', 'content': queries_t.prompt},\n", "]\n", "\n", "# Add a computed column that calls OpenAI\n", "queries_t.add_computed_column(\n", " response=openai.chat_completions(\n", " model='gpt-4o-mini', messages=messages\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our `queries` table now contains a JSON-structured column `response`, which holds the entire API response structure. At the moment, we're only interested in the response content, which we can extract easily into another computed column:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 8 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "8 rows updated, 8 values computed." ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.add_computed_column(\n", " answer=queries_t.response.choices[0].message.content\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have the following `queries` schema:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
table 'rag_demo/queries'
\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column NameTypeComputed With
S__No_Int
QuestionString
correct_answerString
question_contextJsontop_k(Question)
promptStringcreate_prompt(question_context, Question)
responseRequired[Json]chat_completions(model='gpt-4o-mini', messages=[{'role': 'system', 'content': 'Please read the following passages and answer the question based on their contents.'}, {'role': 'user', 'content': prompt}])
answerJsonresponse.choices[0].message.content
\n" ], "text/plain": [ "table 'rag_demo/queries'\n", "\n", " Column Name Type Computed With\n", " S__No_ Int \n", " Question String \n", " correct_answer String \n", " question_context Json top_k(Question)\n", " prompt String create_prompt(question_context, Question)\n", " response Required[Json] chat_completions(model='gpt-4o-mini', messages...\n", " answer Json response.choices[0].message.content" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at what we got back:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Questioncorrect_answeranswer
What is roughly the current mortage rate?0.07The current mortgage rate is near 7%.
What is the market capitalization of Alphabet?\\$2182.8 BillionThe market capitalization of Alphabet is \\$2,182.8 billion.
What is the overall latest rating for Amazon.com from analysts?SELLThe provided passages do not contain specific information about Amazon.com or its overall latest rating from analysts. The data focuses on Alphabet Inc. (GOOGL) and its financial metrics, growth rates, and analyst recommendations. As a result, I cannot provide the latest rating for Amazon.com based on the content provided.
What is the current dividend yield for Alphabet Inc. (\\$GOOGL)?0.0046The passages provided do not contain any information regarding the current dividend yield for Alphabet Inc. (\\$GOOGL). To find the dividend yield, you would typically need to look at the company's dividend payment history or current financial statements, which are not included in the text you provided.
What is the operating cash flow of Amazon in Q1 2024?18,989 MillionThe passages provided do not contain information regarding Amazon's operating cash flow for Q1 2024. They primarily focus on financial data related to Accenture and Alphabet (Google). Therefore, I'm unable to provide the operating cash flow figure for Amazon in Q1 2024 based on the given text.
What is the expected EPS for Nvidia in Q1 2026?0.73 EPSThe provided passages do not contain any information about Nvidia or its expected earnings per share (EPS) for Q1 2026. The information focuses on the equity summary and earnings reports for Alphabet (GOOGL) and does not mention Nvidia. Therefore, I cannot provide the expected EPS for Nvidia in Q1 2026 based on the supplied text.
What are the main reasons to buy Nvidia?Datacenter, GPUs Demands, Self-driving, and cash-flowThe passages provided do not specifically outline the reasons to buy Nvidia. They instead mention that Nvidia shares lost ground by 3.5%, point out the influence of AI on stock movement, and note that despite this loss, major indices remain ahead for the year. Therefore, without additional context or specific reasons from another source, there are no explicit justifications for buying Nvidia mentioned in the passages.\n", "\n", "For more comprehensive reasons to consider buying Nvidia, typically, one would look at factors such as its leadership in the GPU market, growth in AI and machine learning applications, strong financial performance, and innovative product offerings. However, these points are not covered in the provided passages.
What are the latest financial metrics for Accenture PLC?missed consensus forecasts and strong total bookings rising by 22% annuallyThe latest financial metrics for Accenture PLC, as of fiscal 3Q24, are as follows:\n", "\n", "1. **Revenue**: \\$16.47 billion, which is down 1% year over year.\n", "2. **GAAP Gross Margin**: 33.4%.\n", "3. **GAAP Operating Margin**: 16.0%.\n", "4. **Non-GAAP Operating Margin**: 16.4%.\n", "5. **Earnings per Share (EPS)**: Non-GAAP EPS of \\$3.13 per diluted share, down 2% year over year. The consensus forecast was \\$3.15.\n", "6. **Total Revenue for FY23**: \\$64.1 billion, up 4% on a GAAP basis.\n", "7. **Free Cash Flow for FY23**: \\$9.0 billion, with a forecast for FY24 of \\$8.7-\\$9.3 billion.\n", "8. **Debt**: \\$1.68 billion at the end of 3Q24.\n", "9. **Cash**: \\$5.54 billion at the end of 3Q24.\n", "10. **Shareholder Returns**: Accenture expects to return at least \\$7.7 billion to shareholders in FY24, following \\$7.2 billion in FY23.\n", "\n", "Additionally, they have announced a quarterly dividend of \\$1.29 per share, which reflects a 15% increase from the previous quarterly payout.
" ], "text/plain": [ " Question \\\n", "0 What is roughly the current mortage rate? \n", "1 What is the market capitalization of Alphabet? \n", "2 What is the overall latest rating for Amazon.c... \n", "3 What is the current dividend yield for Alphabe... \n", "4 What is the operating cash flow of Amazon in Q... \n", "5 What is the expected EPS for Nvidia in Q1 2026? \n", "6 What are the main reasons to buy Nvidia? \n", "7 What are the latest financial metrics for Acce... \n", "\n", " correct_answer \\\n", "0 0.07 \n", "1 $2182.8 Billion \n", "2 SELL \n", "3 0.0046 \n", "4 18,989 Million \n", "5 0.73 EPS \n", "6 Datacenter, GPUs Demands, Self-driving, and ca... \n", "7 missed consensus forecasts and strong total bo... \n", "\n", " answer \n", "0 The current mortgage rate is near 7%. \n", "1 The market capitalization of Alphabet is $2,18... \n", "2 The provided passages do not contain specific ... \n", "3 The passages provided do not contain any infor... \n", "4 The passages provided do not contain informati... \n", "5 The provided passages do not contain any infor... \n", "6 The passages provided do not specifically outl... \n", "7 The latest financial metrics for Accenture PLC... " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.select(\n", " queries_t.Question, queries_t.correct_answer, queries_t.answer\n", ").show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The application works, but, as expected, a few questions couldn't be answered due to the missing documents. As a final step, let's add the remaining documents to our document base, and run the queries again." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Incremental Updates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pixeltable's views and computed columns update automatically in response to new data. We can see this when we add the remaining documents to our `documents` table. Watch how the `chunks` view is updated to stay in sync with `documents`:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `documents`: 3 rows [00:00, 569.05 rows/s]\n", "Inserting rows into `chunks`: 67 rows [00:00, 325.91 rows/s]\n", "Inserted 70 rows with 0 errors.\n" ] }, { "data": { "text/plain": [ "70 rows inserted, 6 values computed." ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "documents_t.insert({'document': p} for p in document_urls[3:])" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
document
\n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", "
" ], "text/plain": [ " document\n", "0 /Users/sergeymkhitaryan/.pixeltable/file_cache...\n", "1 /Users/sergeymkhitaryan/.pixeltable/file_cache...\n", "2 /Users/sergeymkhitaryan/.pixeltable/file_cache...\n", "3 /Users/sergeymkhitaryan/.pixeltable/file_cache...\n", "4 /Users/sergeymkhitaryan/.pixeltable/file_cache...\n", "5 /Users/sergeymkhitaryan/.pixeltable/file_cache..." ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "documents_t.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Note: although Pixeltable updates `documents` and `chunks`, it **does not** automatically update the `queries` table. This is by design: we don't want all rows in `queries` to get automatically re-executed every time a single new document is added to the document base. However, newly-added rows will be run over the new, incrementally-updated index.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To confirm that the `chunks` index got updated, we'll re-run the chunks retrieval query for the question\n", "\n", "`What is the expected EPS for Nvidia in Q1 2026?`\n", "\n", "Previously, our most similar chunk had a similarity score of ~0.8. Let's see what we get now:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
similaritytext
0.855Sales revenues to customers outside the United\n", "States accounted for more than 56% of the total revenues for fiscal 2024. Hence, we believe that any unfavorable currency fluctuation and\n", "an uncertain macroeconomic environment may moderate the company's growth.\n", "Zacks Equity Research www.zacks.com Page 4 of 10Last Earnings Report\n", "NVIDIA Q1 Earnings Top Estimates, Revenues Rise Y/Y\n", "NVIDIA reported first-quarter fiscal 2025 earnings of \\$6.12 per share, which beat the Zacks\n", "Consensus Estimate by 1 ...... trong and innovative portfolio, with the growing adoption of its GPUs. It\n", "benefits from a strong partner base that includes the likes of TSMC, Synopsys, AWS, Alphabet,\n", "Microsoft, Oracle, and Johnson & Johnson MedTech.\n", "NVIDIA also announced a ten-for-one forward stock split of its issued common stock and raised the quarterly cash dividend by 150%.\n", "Segment Details\n", "NVIDIA reports revenues under two segments — Graphics, and Compute & Networking.\n", "Graphics accounted for 13% of fiscal first-quarter
0.853anding Portfolio Aids Prospects\n", "In the fiscal first quarter, NVIDIA launched the Blackwell platform targeted for AI computing at a trillion-parameter scale and the Blackwellpowered DGX SuperPOD for Generative AI supercomputing.\n", "It announced NVIDIA Quantum and NVIDIA Spectrum X800 series switches for InfiniBand and Ethernet, respectively, optimized for trillionparameter GPU computing and AI infrastructure.\n", "Moreover, the company launched NVIDIA AI Enterprise 5.0 with NVIDIA NIM inference micro ...... esearch www.zacks.com Page 5 of 10optimizations and integrations for Windows to deliver maximum performance on NVIDIA GeForce RTX AI PCs and workstations.\n", "For the Professional Visualization domain, it launched NVIDIA RTX 500 and 1000 professional Ada generation laptop GPUs for AI-enhanced\n", "workflows, NVIDIA RTX A400 and A1000 GPUs for desktop workstations and NVIDIA Omniverse Cloud APIs.\n", "Operating Details\n", "NVIDIA's non-GAAP gross margin increased to 78.9% from 66.8% in the year-ago quarter and
0.85376.7% from the previous quarter, mainly driven\n", "by higher Data Center sales.\n", "Non-GAAP operating expenses increased 43% year over year and 13.2% sequentially to \\$2.50 billion. The increase was due to higher\n", "compensations and related benefits.\n", "However, as a percentage of total revenues, non-GAAP operating expenses declined to 9.6% from 24.3% in the year-ago quarter and 30.7% in\n", "the previous quarter.\n", "The non-GAAP operating income was \\$18.06 billion compared with \\$3.05 billion in the year-ago qu ...... up from \\$25.98 billion as of Jan 28, 2024.\n", "As of Apr 28, 2024, the total long-term debt was \\$8.46 billion, unchanged sequentially.\n", "NVIDIA generated \\$15.4 billion in operating cash flow, up from the previous quarter's \\$11.5 billion.\n", "The company ended the fiscal first quarter with a free cash flow of \\$14.94 billion.\n", "In the fiscal first quarter, it returned \\$7.8 billion to shareholders through dividend payouts and share repurchases.\n", "Guidance\n", "For the second quarter of fiscal 2025, NVIDIA anticip
0.847Q2 Q3 Q4 Annual*\n", "2026 0.73 E 0.73 E 0.77 E 0.81 E 3.04 E\n", "2025 0.61 A 0.62 E 0.67 E 0.71 E 2.62 E\n", "2024 0.11 A 0.27 A 0.40 A 0.52 A 1.30 A\n", "*Quarterly figures may not add up to annual.\n", "1) The data in the charts and tables, except the estimates, is as of 06/11/2024.\n", "2) The report's text, the analyst-provided estimates, and the price target are as of 06/12/2024.\n", "Zacks Report Date: June 12, 2024\n", "© 2024 Zacks Investment Research, All Rights Reserved 10 S. Riverside Plaza Suite 1600 · Chicago, IL 6 ...... the worldwide leader in visual computing\n", "technologies and the inventor of the graphic processing unit, or GPU.\n", "Over the years, the company's focus has evolved from PC graphics to\n", "artificial intelligence (AI) based solutions that now support high\n", "performance computing (HPC), gaming and virtual reality (VR) platforms.\n", "NVIDIA's GPU success can be attributed to its parallel processing\n", "capabilities supported by thousands of computing cores, which are\n", "necessary to run deep learning algorithms. The
0.843company's GPU\n", "platforms are playing a major role in developing multi-billion-dollar endmarkets like robotics and self-driving vehicles.\n", "NVIDIA is a dominant name in the Data Center, professional\n", "visualization and gaming markets where Intel and Advanced Micro\n", "Devices are playing a catch-up role. The company's partnership with\n", "almost all major cloud service providers (CSPs) and server vendors is a\n", "key catalyst.\n", "NVIDIA's GPUs are also getting rapid adoption in diverse fields ranging\n", "from radio ...... Us for gaming and PCs, the GeForce NOW game streaming service and related infrastructure, and solutions for\n", "gaming platforms; Quadro GPUs for enterprise design; GRID software for cloud-based visual and virtual computing; and automotive platforms for\n", "infotainment systems.\n", "Compute & Networking comprises Data Center platforms and systems for AI, HPC, and accelerated computing; DRIVE for autonomous vehicles;\n", "and Jetson for robotics and other embedded platforms. Mellanox revenues included in this
" ], "text/plain": [ " similarity text\n", "0 0.855290 Sales revenues to customers outside the Unite...\n", "1 0.853228 anding Portfolio Aids Prospects\\nIn the fiscal...\n", "2 0.852772 76.7% from the previous quarter, mainly drive...\n", "3 0.847299 Q2 Q3 Q4 Annual*\\n2026 0.73 E 0.73 E 0.77 E 0...\n", "4 0.843375 company's GPU\\nplatforms are playing a major ..." ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nvidia_eps_query.collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our most similar chunk now has a score of ~0.855 and pulls in more relevant chunks from the newly-inserted documents.\n", "\n", "Let's recompute the `question_context` column of the `queries_t` table, which will automatically recompute the `answer` column as well." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `queries`: 8 rows [00:00, 580.60 rows/s]\n" ] }, { "data": { "text/plain": [ "8 rows updated, 40 values computed." ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.recompute_columns('question_context')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a final step, let's confirm that all the queries now have answers:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Questioncorrect_answeranswer
What is roughly the current mortage rate?0.07The current mortgage rate is near 7%.
What is the expected EPS for Nvidia in Q1 2026?0.73 EPSThe expected EPS (Earnings Per Share) for Nvidia in Q1 2026 is 0.73.
What is the current dividend yield for Alphabet Inc. (\\$GOOGL)?0.0046The current dividend yield for Alphabet Inc. (\\$GOOGL) is 0.46%.
What is the overall latest rating for Amazon.com from analysts?SELLThe overall latest rating for Amazon.com from analysts is "SELL" for the 1st quarter of 2024.
What is the market capitalization of Alphabet?\\$2182.8 BillionThe market capitalization of Alphabet Inc. (GOOGL) is \\$2,182.8 billion.
What is the operating cash flow of Amazon in Q1 2024?18,989 MillionThe operating cash flow of Amazon in Q1 2024 is reported as 18,989 million USD.
What are the latest financial metrics for Accenture PLC?missed consensus forecasts and strong total bookings rising by 22% annuallyAs of fiscal 3Q24, the latest financial metrics for Accenture PLC are as follows:\n", "\n", "- **Revenue**: \\$16.47 billion, down 1% year over year.\n", "- **GAAP Gross Margin**: 33.4% in 3Q24.\n", "- **GAAP Operating Margin**: 16.0% in 3Q24.\n", "- **Non-GAAP Operating Margin**: 16.4% in 3Q24.\n", "- **Earnings Per Share (EPS)**: \\$3.13 per diluted share in fiscal 3Q24, down 2% year over year, and missed the consensus forecast of \\$3.15.\n", "- **Record Revenue for FY23**: \\$64.1 billion, up 4% on a GAAP basis.\n", "- **Annual EPS Gu ...... at \\$8.7-\\$9.3 billion for FY24; actual free cash flow for FY23 was \\$9.0 billion.\n", "- **Debt**: \\$1.68 billion as of the end of 3Q24.\n", "- **Cash**: \\$5.54 billion at the end of 3Q24, down from \\$9.05 billion at the end of FY23.\n", "- **Dividend**: Quarterly dividend increased to \\$1.29 per share, with estimated dividends of \\$5.16 per share for FY24 and \\$5.40 for FY25.\n", "\n", "Overall, Accenture's financial strength ranking is high, with a long-term credit rating of Aa3 from Moody's and A+ from Standard & Poor's.
What are the main reasons to buy Nvidia?Datacenter, GPUs Demands, Self-driving, and cash-flowThe main reasons to buy NVIDIA, as derived from the passages, include:\n", "\n", "1. **Strong Financial Performance**: NVIDIA has demonstrated impressive financial metrics, including a significant increase in earnings (19% sequential growth and 262% year-over-year rise) and revenues (beating estimates by 7.02%).\n", "\n", "2. **Growing Market Opportunities**: The company has identified several growth opportunities in sectors such as ray-traced gaming, high-performance computing, artificial intelligence (AI), se ...... uting is leading to increased demand for datacenters, which benefits NVIDIA's product offerings. Their acquisition of Mellanox can potentially drive further growth in this segment.\n", "\n", "8. **Low Debt Levels**: NVIDIA has a low total debt to total capital ratio (0.16), indicating a strong balance sheet and less leverage compared to industry averages, providing stability and financial flexibility.\n", "\n", "These factors combined create a compelling case for potential investors looking to buy NVIDIA stock.
" ], "text/plain": [ " Question \\\n", "0 What is roughly the current mortage rate? \n", "1 What is the expected EPS for Nvidia in Q1 2026? \n", "2 What is the current dividend yield for Alphabe... \n", "3 What is the overall latest rating for Amazon.c... \n", "4 What is the market capitalization of Alphabet? \n", "5 What is the operating cash flow of Amazon in Q... \n", "6 What are the latest financial metrics for Acce... \n", "7 What are the main reasons to buy Nvidia? \n", "\n", " correct_answer \\\n", "0 0.07 \n", "1 0.73 EPS \n", "2 0.0046 \n", "3 SELL \n", "4 $2182.8 Billion \n", "5 18,989 Million \n", "6 missed consensus forecasts and strong total bo... \n", "7 Datacenter, GPUs Demands, Self-driving, and ca... \n", "\n", " answer \n", "0 The current mortgage rate is near 7%. \n", "1 The expected EPS (Earnings Per Share) for Nvid... \n", "2 The current dividend yield for Alphabet Inc. (... \n", "3 The overall latest rating for Amazon.com from ... \n", "4 The market capitalization of Alphabet Inc. (GO... \n", "5 The operating cash flow of Amazon in Q1 2024 i... \n", "6 As of fiscal 3Q24, the latest financial metric... \n", "7 The main reasons to buy NVIDIA, as derived fro... " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "queries_t.select(\n", " queries_t.Question, queries_t.correct_answer, queries_t.answer\n", ").show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.19" } }, "nbformat": 4, "nbformat_minor": 4 }