{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import os\n", "import numpy as np\n", "import pandas as pd\n", "pd.set_option('display.max_colwidth', None)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "I1016 09:52:24.127507 140653204371264 file_utils.py:39] PyTorch version 1.4.0 available.\n", "I1016 09:52:24.130346 140653204371264 file_utils.py:55] TensorFlow version 2.1.0 available.\n" ] } ], "source": [ "import ktrain" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Learning from Unlabeled Text Data\n", "\n", "Unlabeled, unstructured text or document data abound, and it is often necessary to \"make sense\" of these data for various applications. Examples include:\n", "- *exploratory analysis of text data*: provide rich overviews of the information space to discover relevant information for which one may not have even known to look\n", "- *building training sets for text classification*: identifying positive and negative example documents to train a [text classifier](https://en.wikipedia.org/wiki/Document_classification) in a semi-automated fashion \n", "- *document similarity*: measuring the semantic simlarity between documents or sets of documents\n", "- *document recommender systems*: given a specific document of interest, recommend other documents that are semantically similar to it\n", "\n", "Each of these examples involve **learning from largely unlabeled text data**. In this notebook, we will show you how to accomplish the above with minimal coding using *ktrain*. The *ktrain* library is an open-source, augmented ML library built around Keras and scikit-learn. It can be installed with `pip3 install ktrain` and is [available on GitHub](https://github.com/amaiya/ktrain).\n", "\n", "We will use the well-known [20-newsgroup dataset](http://qwone.com/~jason/20Newsgroups/) for this demonstration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Raw Document Data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# 20newsgroups\n", "from sklearn.datasets import fetch_20newsgroups\n", "\n", "# we only want to keep the body of the documents!\n", "remove = ('headers', 'footers', 'quotes')\n", "\n", "# fetch train and test data\n", "newsgroups_train = fetch_20newsgroups(subset='train', remove=remove)\n", "newsgroups_test = fetch_20newsgroups(subset='test', remove=remove)\n", "\n", "# compile the texts\n", "texts = newsgroups_train.data + newsgroups_test.data\n", "\n", "# let's also store the newsgroup category associated with each document\n", "# we can display this information in visualizations\n", "targets = [target for target in list(newsgroups_train.target) + list(newsgroups_test.target)]\n", "categories = [newsgroups_train.target_names[target] for target in targets]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are loading the targets (i.e., newsgroup categories), but will not use them for learning a model. Rather, they are simply employed as an example of how to incorporate metadata about documents in visualizations and anlayses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train an LDA Topic Model to Discover Topics\n", "\n", "The `get_topic_model` function learns a [topic model](https://en.wikipedia.org/wiki/Topic_model) using [Latent Dirichlet Allocation (LDA)](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) by default. \n", "\n", "To use [non-negative matrix factorization](https://en.wikipedia.org/wiki/Non-negative_matrix_factorization)(NMF) instead of LDA, you can supply `model_type='nmf'` to the **get_topic_model** function. \n", "\n", "The `n_features` argument specifies the size of the vocabulary, and the `n_topics` argument sets the number of topics (or clusters) to discover. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "n_topics automatically set to 97\n", "lang: en\n", "preprocessing texts...\n", "fitting model...\n", "iteration: 1 of max_iter: 5\n", "iteration: 2 of max_iter: 5\n", "iteration: 3 of max_iter: 5\n", "iteration: 4 of max_iter: 5\n", "iteration: 5 of max_iter: 5\n", "done.\n", "CPU times: user 15min 53s, sys: 43min 10s, total: 59min 4s\n", "Wall time: 1min 59s\n" ] } ], "source": [ "%%time\n", "tm = ktrain.text.get_topic_model(texts, n_topics=None, n_features=10000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " We can examine the discovered topics using `print_topics`, `get_topics`, or `topics`. Here, we will use `print_topics`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "topic 0 | tape adam tim case moved bag quote mass marked zionism\n", "topic 1 | image jpeg images format programs tiff files jfif save lossless\n", "topic 2 | alternative movie film static cycles films philips dynamic hou phi\n", "topic 3 | hell humans poster frank reality kent gerard gant eternal bell\n", "topic 4 | air phd chz kit cbc ups w-s rus w47 mot\n", "topic 5 | dog math great figure poster couldn don trying rushdie fatwa\n", "topic 6 | collaboration nazi fact end expression germany philly world certified moore\n", "topic 7 | gif points scale postscript mirror plane rendering algorithm polygon rayshade\n", "topic 8 | fonts font shell converted iii characters slight composite breaks compress\n", "topic 9 | power station supply options option led light tank plastic wall\n", "topic 10 | transmission rider bmw driver automatic shift gear japanese stick highway\n", "topic 11 | tyre ezekiel ruler hernia appeared appointed supreme man land power\n", "topic 12 | space nasa earth data launch surface solar moon mission planet\n", "topic 13 | israel jews jewish israeli arab peace war arabs palestinian kuwait\n", "topic 14 | olvwm xremote animals kinds roughing toolkit close corp glenn imakefile\n", "topic 15 | medical health disease cancer patients drug treatment drugs aids study\n", "topic 16 | biden chip gear like information number automatic mode insurance know\n", "topic 17 | graphics zip amiga shareware formats ftp gif program sgi convert\n", "topic 18 | brilliant mail did god coming christianity people got ideas reading\n", "topic 19 | black red white blue green cross wires lines helmet mask\n", "topic 20 | car engine cars miles clutch new ford rear slip road\n", "topic 21 | list mailing service model small large lists radar available major\n", "topic 22 | key encryption chip keys clipper phone security use government privacy\n", "topic 23 | talking pit nyr stl phi edm mtl wsh hfd cgy\n", "topic 24 | signal input switch connected circuit audio noise output control voltage\n", "topic 25 | stuff deleted die posting beware fantastic motives authentic reluctant hope\n", "topic 26 | adams douglas dc-x garrett ingres tin sdio incremental mcdonnell guide\n", "topic 27 | men homosexual homosexuality women gay sexual homosexuals male kinsey pop\n", "topic 28 | usual leo rs-232 martian reading cooperative unmanned somalia decompress visited\n", "topic 29 | edu university information send new computer research mail internet address\n", "topic 30 | reserve naval marine ret commission one-way irgun prior closure facilities\n", "topic 31 | state intelligence militia units army zone georgia sam croats belongs\n", "topic 32 | says article pain known warning doctor stone bug kidney response\n", "topic 33 | faq rsa ripem lights yes patent nist management wax cipher\n", "topic 34 | wolverine comics hulk appearance special liefeld sabretooth incredible hobgoblin x-force\n", "topic 35 | software ram worth cycles controller available make dram dynamic situation\n", "topic 36 | religion people religious catalog bobby used driven involved long like\n", "topic 37 | intel sites experiment ftp does know family good like mrs\n", "topic 38 | armenian people army russian turkish genocide armenians ottoman turks jews\n", "topic 39 | theft geo available face couldn cover sony people number shop\n", "topic 40 | christianity did exists mail matter mind tool status god reading\n", "topic 41 | propane probe earth orbit orbiter titan cassini space atmosphere gravity\n", "topic 42 | people government right think rights law make public fbi don\n", "topic 43 | god people does say believe bible true think evidence religion\n", "topic 44 | mov phone south key war supply push left just registered\n", "topic 45 | period goal pts play chicago pittsburgh buffalo shots new blues\n", "topic 46 | game team games year hockey season players player baseball league\n", "topic 47 | speed dod student technician just hits right note giant light\n", "topic 48 | sex marriage relationship family married couple depression pregnancy childhood trademark\n", "topic 49 | protects rejecting com4 couple decides taking connect unc nearest richer\n", "topic 50 | president states united american national press april washington america white\n", "topic 51 | card memory windows board ram bus drivers driver cpu problem\n", "topic 52 | window application manager display button xterm path widget event resources\n", "topic 53 | cable win van det bos tor cal nyi chi buf\n", "topic 54 | americans baltimore rochester cape springfield moncton providence utica binghamton adirondack\n", "topic 55 | color monitor screen mouse video colors resolution vga colour monitors\n", "topic 56 | option power ssf flights capability module redesign missions station options\n", "topic 57 | body father son vitamin diet day cells cell form literature\n", "topic 58 | max g9v b8f a86 bhj giz bxn biz qax b4q\n", "topic 59 | bit fast chip ibm faster mode chips scsi-2 speeds quadra\n", "topic 60 | book books law adl islam islamic iran media bullock muslims\n", "topic 61 | armenian russian turkish ottoman people army armenians genocide war turks\n", "topic 62 | oscillator partition tune nun umumiye nezareti mecmuasi muharrerat-i evrak version\n", "topic 63 | tongues seat est didn raise copied lazy schemes adapter leap\n", "topic 64 | com object jim app function motorola heterosexual objects pointers encountered\n", "topic 65 | effective boy projects grow jason ain dump keyboards vastly grants\n", "topic 66 | armenian people russian armenians turks ottoman army turkish genocide muslim\n", "topic 67 | mac apple pin ground wire quicktime macs pins connector simms\n", "topic 68 | bastard turning likes hooks notions turks cited proud pointers chuck\n", "topic 69 | bought dealer cost channel replaced face sony stereo warranty tube\n", "topic 70 | myers food reaction msg writes loop eat dee effects taste\n", "topic 71 | lander contradiction reconcile apparent somebody supplement essential needs produce insulin\n", "topic 72 | re-boost systems virginia voice unix input ken easily summary developing\n", "topic 73 | block tests suck shadow dte screws macedonia sunlight fin message\n", "topic 74 | jesus church christ god lord holy spirit mary shall heaven\n", "topic 75 | gun number year guns rate insurance police years new firearms\n", "topic 76 | rule automatically characteristic wider thumb recommendation inline mr2 halfway width\n", "topic 77 | drive disk hard scsi drives controller floppy ide master transfer\n", "topic 78 | stephanopoulos water gas oil heat energy hot temperature cold nuclear\n", "topic 79 | like know does use don just good thanks need want\n", "topic 80 | starters mlb mov higher signing left accessible argument viola teams\n", "topic 81 | entry rules info define entries year int printf include contest\n", "topic 82 | price new sale offer sell condition shipping interested asking prices\n", "topic 83 | issue germany title magazine german cover race generation origin nazi\n", "topic 84 | armenian armenians people turkish war said killed children russian turkey\n", "topic 85 | dos windows software comp library os/2 version microsoft applications code\n", "topic 86 | probe space launch titan earth cassini orbiter orbit atmosphere mission\n", "topic 87 | housed throws fills daylight occurring activities adjacent presenting punish occuring\n", "topic 88 | statement folk raids thor disarmed anatolia polygon inria arrive smehlik\n", "topic 89 | sound steve pro convert ati ultra fahrenheit orchid hercules blaster\n", "topic 90 | joke tricky wearing golden trickle seen geneva csh course caesar\n", "topic 91 | moral objective values morality child defined bank definition wrong different\n", "topic 92 | files file edu ftp available version server data use sun\n", "topic 93 | catalog tons seal ordering kawasaki tools fax free ultraviolet packages\n", "topic 94 | file program error output use section line code command problem\n", "topic 95 | power ssf module capability option flights redesign missions human station\n", "topic 96 | just don think know like time did going didn people\n" ] } ], "source": [ "tm.print_topics()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the above, we can immediately get a feel for what kinds of subjects are discussed within this dataset. For instsance, Topic \\#13 appears to be about the Middle East with labels: \"*israel jews jewish israeli arab peace*\".\n", "\n", "We can examine the word weights for this topic, where the \"weight\" is a pseudo-count (that can be converted to a probability if normalizing over all words in vocabulary):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('israel', 832.6639019298735),\n", " ('jews', 670.7851381220836),\n", " ('jewish', 539.6844212023212),\n", " ('israeli', 426.2434455755054),\n", " ('arab', 376.1511464659808),\n", " ('peace', 269.33000902133085),\n", " ('war', 229.3267450288203),\n", " ('arabs', 223.59854716988627),\n", " ('palestinian', 191.29031007300767),\n", " ('kuwait', 182.83909796210693),\n", " ('land', 173.23994664154932),\n", " ('palestinians', 158.31898772111572),\n", " ('state', 151.5461024601567),\n", " ('palestine', 121.14271427446522),\n", " ('west', 113.54334001125812),\n", " ('iraq', 111.54753922935043),\n", " ('jew', 106.56410679110718),\n", " ('attacks', 106.47122225017874),\n", " ('israelis', 99.7315473662208),\n", " ('gaza', 98.13080829032971),\n", " ('killed', 92.8557266909479),\n", " ('occupied', 90.79070215809577),\n", " ('country', 89.68676723731086),\n", " ('policy', 86.85153167299889),\n", " ('civilians', 86.63190206561713)]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tm.get_word_weights(topic_id=13, n_words=25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing the Document-Topic Matrix\n", "\n", "We will now pre-compute the document-topic matrix. Each row in this matrix represents a document, and the columns represent the probability distribution over the 97 topics. This allows us to easily see what kinds of topics are covered by any specific document in the original corpus.\n", "\n", "When computing the document-topic matrix, we will also filter out documents whose maximum topic probability is less than 0.25 in order to consider the most representative documents for each topic. This may help to improve clarity of visualizations (shown later) by removing \"unfocused\" documents. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "done.\n", "CPU times: user 1min 23s, sys: 3min 18s, total: 4min 42s\n", "Wall time: 12.3 s\n" ] } ], "source": [ "%%time\n", "tm.build(texts, threshold=0.25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the `build` method prunes documents based on threshold, we should prune the original data and any metadata in a similar way for consistency. This can be accomplished with the `filter` method. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "texts = tm.filter(texts)\n", "categories = tm.filter(categories)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is useful to ensure all data and metadata are aligned with the same array indices in case we want to use them later (e.g., in visualizations, for example)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having computed the document-topic matrix, we can now easily access the topic probablity distribution for any document in the corpus using `get_doctopics`. For instance, this document in the corpus is about sports:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For the second straight game, California scored a ton of late runs to crush\n", "the Brewhas. It was six runs in the 8th for a 12-5 win Monday and five in\n", "the 8th and six in the 9th for a 12-2 win yesterday. Jamie Navarro pitched\n", "seven strong innings, but Orosco, Austin, Manzanillo and Lloyd all took part\n", "in the mockery of a bullpen yesterday. How's this for numbers? Maldanado has\n", "pitched three scoreless innings and Navarro's ERA is 0.75. The next lowest\n", "on the staff is Wegman at 5.14. Ouch!\n", "\n", "It doesn't look much better for the hitters. Hamilton is batting .481, while\n", "Thon is hitting .458 and has seven RBI. The next highest is three. The next\n", "best hitter is Jaha at .267 and then Vaughn, who has the team's only HR, at\n", ".238. Another ouch. Looking at the stats, it's not hard to see why the team\n", "is 2-5. In fact, 2-5 doesn't sound bad when you're averaging three runs/game\n", "and giving up 6.6/game. \n", "\n", "Still, it's early and things will undoubtedly get better. The offense should\n", "come around, but the bullpen is a major worry. Fetters, Plesac and Austin gave\n", "the Brewers great middle relief last year. Lloyd, Maldanado, Manzanillo, \n", "Fetters, Austin and Orosco will have to pick up the pace for the team to be\n", "successful. Milwaukee won a number of games last year when middle relief either\n", "held small leads or kept small deficits in place. The starters will be okay,\n", "the defense will be alright and the hitting will come around, but the bullpen\n", "is a big question mark.\n", "\n", "In other news, Nilsson and Doran were reactivated yesterday, while William\n", "Suero was sent down and Tim McIntosh was picked up by Montreal. Today's game\n", "with California was cancelled.\n" ] } ], "source": [ "print(texts[35])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, here is the topic probability distribution for this document:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[4.64381931e-04, 4.64381908e-04, 4.64381908e-04, 4.64381909e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381911e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381912e-04, 4.64381914e-04, 4.64381908e-04, 4.64381909e-04,\n", " 4.64381908e-04, 4.64381909e-04, 4.64381908e-04, 4.64381922e-04,\n", " 4.64381909e-04, 8.60305392e-02, 4.64381913e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381908e-04, 4.64381915e-04,\n", " 4.64381908e-04, 4.64381924e-04, 4.64381908e-04, 4.64381919e-04,\n", " 4.64381947e-04, 4.64381915e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381926e-04, 4.64381929e-04,\n", " 4.64381908e-04, 4.64381911e-04, 6.93731322e-01, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 1.50902793e-02, 4.64381911e-04,\n", " 4.64381909e-04, 9.05690040e-03, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 4.64381911e-04, 4.64381908e-04,\n", " 4.64381908e-04, 4.64381908e-04, 1.26873995e-02, 4.64381921e-04,\n", " 4.64381908e-04, 4.64381937e-04, 4.64381908e-04, 1.92280658e-02,\n", " 9.47339092e-03, 4.64381909e-04, 4.64381912e-04, 4.64381908e-04,\n", " 4.64381918e-04, 4.64381908e-04, 4.64381908e-04, 4.64381908e-04,\n", " 4.64381908e-04, 9.47334319e-03, 4.64381908e-04, 4.64381909e-04,\n", " 4.64381910e-04, 4.64381908e-04, 4.64381909e-04, 4.64381908e-04,\n", " 1.04363152e-01]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tm.get_doctopics(doc_ids=[35])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, the highest topic probability (69%) is associated with a topic about sports:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'game team games year hockey season players player baseball league'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tm.topics[ np.argmax(tm.get_doctopics(doc_ids=[35]))]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting the Topics of New Documents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `predict` method can predict the topic probability distribution for any arbitrary document directly from raw text:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.65009096, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.06185567, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214, 0.00303214, 0.00303214, 0.00303214,\n", " 0.00303214, 0.00303214]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tm.predict(['Elon Musk leads Space Exploration Technologies (SpaceX), where he oversees ' +\n", " 'the development and manufacturing of advanced rockets and spacecraft for missions ' +\n", " 'to and beyond Earth orbit.'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, the highest topic probability for this sentence is from topic \\#12 (third row and third column), which is about space and related things:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'space nasa earth data launch surface solar moon mission planet'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tm.topics[ np.argmax(tm.predict(['Elon Musk leads Space Exploration Technologies (SpaceX), where he oversees ' +\n", " 'the development and manufacturing of advanced rockets and spacecraft for missions ' +\n", " 'to and beyond Earth orbit.']))]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing Topics\n", "Let's take another look at the list of discovered topics but sorted by document count." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "topic:79 | count:3782 | like know does use don just good thanks need want\n", "topic:96 | count:3643 | just don think know like time did going didn people\n", "topic:43 | count:1599 | god people does say believe bible true think evidence religion\n", "topic:42 | count:1246 | people government right think rights law make public fbi don\n", "topic:51 | count:900 | card memory windows board ram bus drivers driver cpu problem\n", "topic:46 | count:782 | game team games year hockey season players player baseball league\n", "topic:92 | count:597 | files file edu ftp available version server data use sun\n", "topic:29 | count:399 | edu university information send new computer research mail internet address\n", "topic:82 | count:371 | price new sale offer sell condition shipping interested asking prices\n", "topic:84 | count:312 | armenian armenians people turkish war said killed children russian turkey\n", "topic:12 | count:296 | space nasa earth data launch surface solar moon mission planet\n", "topic:22 | count:283 | key encryption chip keys clipper phone security use government privacy\n", "topic:75 | count:236 | gun number year guns rate insurance police years new firearms\n", "topic:15 | count:157 | medical health disease cancer patients drug treatment drugs aids study\n", "topic:94 | count:152 | file program error output use section line code command problem\n", "topic:74 | count:146 | jesus church christ god lord holy spirit mary shall heaven\n", "topic:45 | count:123 | period goal pts play chicago pittsburgh buffalo shots new blues\n", "topic:13 | count:104 | israel jews jewish israeli arab peace war arabs palestinian kuwait\n", "topic:77 | count:75 | drive disk hard scsi drives controller floppy ide master transfer\n", "topic:85 | count:58 | dos windows software comp library os/2 version microsoft applications code\n", "topic:21 | count:46 | list mailing service model small large lists radar available major\n", "topic:52 | count:29 | window application manager display button xterm path widget event resources\n", "topic:20 | count:28 | car engine cars miles clutch new ford rear slip road\n", "topic:27 | count:22 | men homosexual homosexuality women gay sexual homosexuals male kinsey pop\n", "topic:19 | count:19 | black red white blue green cross wires lines helmet mask\n", "topic:53 | count:16 | cable win van det bos tor cal nyi chi buf\n", "topic:78 | count:16 | stephanopoulos water gas oil heat energy hot temperature cold nuclear\n", "topic:91 | count:14 | moral objective values morality child defined bank definition wrong different\n", "topic:24 | count:13 | signal input switch connected circuit audio noise output control voltage\n", "topic:60 | count:12 | book books law adl islam islamic iran media bullock muslims\n", "topic:17 | count:12 | graphics zip amiga shareware formats ftp gif program sgi convert\n", "topic:32 | count:12 | says article pain known warning doctor stone bug kidney response\n", "topic:55 | count:12 | color monitor screen mouse video colors resolution vga colour monitors\n", "topic:59 | count:11 | bit fast chip ibm faster mode chips scsi-2 speeds quadra\n", "topic:58 | count:10 | max g9v b8f a86 bhj giz bxn biz qax b4q\n", "topic:70 | count:10 | myers food reaction msg writes loop eat dee effects taste\n", "topic:81 | count:9 | entry rules info define entries year int printf include contest\n", "topic:54 | count:8 | americans baltimore rochester cape springfield moncton providence utica binghamton adirondack\n", "topic:50 | count:8 | president states united american national press april washington america white\n", "topic:9 | count:8 | power station supply options option led light tank plastic wall\n", "topic:34 | count:8 | wolverine comics hulk appearance special liefeld sabretooth incredible hobgoblin x-force\n", "topic:67 | count:7 | mac apple pin ground wire quicktime macs pins connector simms\n", "topic:3 | count:6 | hell humans poster frank reality kent gerard gant eternal bell\n", "topic:25 | count:6 | stuff deleted die posting beware fantastic motives authentic reluctant hope\n", "topic:4 | count:5 | air phd chz kit cbc ups w-s rus w47 mot\n", "topic:64 | count:5 | com object jim app function motorola heterosexual objects pointers encountered\n", "topic:47 | count:5 | speed dod student technician just hits right note giant light\n", "topic:8 | count:4 | fonts font shell converted iii characters slight composite breaks compress\n", "topic:7 | count:4 | gif points scale postscript mirror plane rendering algorithm polygon rayshade\n", "topic:0 | count:3 | tape adam tim case moved bag quote mass marked zionism\n", "topic:33 | count:3 | faq rsa ripem lights yes patent nist management wax cipher\n", "topic:83 | count:2 | issue germany title magazine german cover race generation origin nazi\n", "topic:89 | count:2 | sound steve pro convert ati ultra fahrenheit orchid hercules blaster\n", "topic:65 | count:2 | effective boy projects grow jason ain dump keyboards vastly grants\n", "topic:69 | count:1 | bought dealer cost channel replaced face sony stereo warranty tube\n", "topic:48 | count:1 | sex marriage relationship family married couple depression pregnancy childhood trademark\n", "topic:31 | count:1 | state intelligence militia units army zone georgia sam croats belongs\n", "topic:57 | count:1 | body father son vitamin diet day cells cell form literature\n", "topic:10 | count:1 | transmission rider bmw driver automatic shift gear japanese stick highway\n", "topic:76 | count:1 | rule automatically characteristic wider thumb recommendation inline mr2 halfway width\n" ] } ], "source": [ "tm.print_topics(show_counts=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The topic with the most documents appears to be conversational questions, replies, and comments that aren't focused on a particular subject. Other topics are focused on specific domains (e.g., topic 15 is about **medicine** with label \"*medical health disease cancer patients drug treatment*\")." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We can easily generate an interactive visualization of the documents under consideration using `visualize_documents`:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reducing to 2 dimensions...[t-SNE] Computing 91 nearest neighbors...\n", "[t-SNE] Indexed 15644 samples in 0.174s...\n", "[t-SNE] Computed neighbors for 15644 samples in 41.661s...\n", "[t-SNE] Computed conditional probabilities for sample 1000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 2000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 3000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 4000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 5000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 6000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 7000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 8000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 9000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 10000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 11000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 12000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 13000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 14000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 15000 / 15644\n", "[t-SNE] Computed conditional probabilities for sample 15644 / 15644\n", "[t-SNE] Mean sigma: 0.071015\n", "[t-SNE] KL divergence after 250 iterations with early exaggeration: 87.289635\n", "[t-SNE] KL divergence after 1000 iterations: 1.866425\n", "done.\n" ] }, { "data": { "text/html": [ "\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"\n", " | Prediction | \n", "Score | \n", "Text | \n", "
---|---|---|---|
0 | \n", "1 | \n", "0.212587 | \n", "I'm looking for recommendations for a laser printer. It will\\nbe used mostly for text by a single user. It doesn't need to\\nbe a postscript printer. Any advice would be appreciated.\\n | \n", "
1 | \n", "1 | \n", "0.211690 | \n", "I get the picture, I just find it humorous that Running Windows 3.1 apps ( 3.0 for 2.0 ) \\nis what makes os/2 more credible... | \n", "
2 | \n", "1 | \n", "0.211690 | \n", "Two-part question:\\n\\n1) What is Windows NT - a 'real' windows OS?\\n\\n2) This past weekend, a local 'hacker' radio show metioned a new product\\n from Microsoft called 'Chicago' if I recall. Anyone know what this is?\\n\\nThat is it -\\n\\nThanks a heap.\\n\\n- Alan\\n | \n", "
3 | \n", "1 | \n", "0.211690 | \n", "Is there any one know:\\n\\nWhat is the FTP tool for Windows and where to get the tool ?\\n\\nThanks for any help !! | \n", "
4 | \n", "1 | \n", "0.205488 | \n", "Could someone point me toward a source (FTP/BBS/whatever) for development\\ntools for the 8051 microprocessor. I specifically am looking for a Macintosh\\ncross-assembler/disassembler. Also, is there a mailing-list dedicated to\\ndiscussing the 8051? Thanks.\\n | \n", "
\n", " | Prediction | \n", "Score | \n", "Text | \n", "
---|---|---|---|
377 | \n", "0 | \n", "-0.000711 | \n", "I was at avalon today and found texture maps in some \"tex\" and \"txc\"\\nformat, something I've never encountered before. These are obviously\\nnot tex or LaTeX files.\\n\\nIF you have a clue how I can convert these to something\\nreasonable, please let me know. | \n", "
378 | \n", "0 | \n", "-0.002478 | \n", "I need to be able to cause a beep, but without using any interrupt\\nroutines, as I cannot use the BIOS. I believe that the PIC might have\\nsomething to do with it, but I'm having troubles deciphering the\\ninformation I have on it to figure out how to program it!\\n\\n\\tI'm programming all of this in Turbo C, if that makes any\\ndiference at all...\\n\\n\\tPlease can anyone help me??!\\n\\nThanks, | \n", "
379 | \n", "0 | \n", "-0.003216 | \n", "\\nThe only things you'll be able to salvage from the junior are the floppy drives\\nand monitor. The floppies are 360k, and the monitor is CGA, but you will need\\nan adaptor cable to use it. The junior does not use standard cards. Unless \\nyou're really strapped for cash, you should just junk the thing and buy new \\nstuff.\\n\\nDan\\n | \n", "
380 | \n", "0 | \n", "-0.003326 | \n", "\\nMacintosh II cx with 40 MB HD, 8 MB RAM and 19\" monochrome\\nmonitor (Ikegami) is for sale.\\nAsking $3,000, no reasonable (best) offer will be rejected.\\nContact Konrad at (416) 365-0564m Mon-Frii 9-5.\\n | \n", "
381 | \n", "0 | \n", "-0.003883 | \n", "I edited a few newsgroup from that line (don't like to crosspost THAT\\nmuch). I can't compare the two, but I recently got an HP DeskJet 500.\\n\\nI'm very pleased with the output (remember that I'm used to imagens,\\nlaser and postscript printers at school -- looks very good. You have\\nto be careful to let it dry before touching it, as it will smudge.\\n\\nThe deskjet is SLOW. This is in comparison to the other printers I\\nmentioned. I have no idea how the bubblejet compares.\\n\\nThe interface between Win3.1 and the printer is just dandy, I've not\\nhad any problems with it.\\n\\nHope that helps some.\\n\\n--Cindy\\n\\n--\\nCindy Tittle Moore | \n", "
\n", " | Prediction | \n", "Score | \n", "Text | \n", "
---|---|---|---|
2217 | \n", "0 | \n", "-0.510036 | \n", "\\nDon't forget Chemical Abstracts Service (which is pretty much the international\\nclearinghouse for all chemical information), whose former director (Ronald\\nWigington) and head of R&D (Nick Farmer) were openly former NSA employees. | \n", "
2218 | \n", "0 | \n", "-0.510492 | \n", "From article <1993Apr21.013846.1374@cx5.com>, by tlc@cx5.com:\\n\\nAccording to my ColoRIX manual .SCF files are 640x480x256\\n\\n\\nYou may try VPIC, I think it handles the 256 color RIX files OK..\\n | \n", "
2219 | \n", "0 | \n", "-0.510556 | \n", "What about disks? Won't it erase them if you're carrying them in the bag? | \n", "
2220 | \n", "0 | \n", "-0.510582 | \n", "was\\nYuppies\\nstarted\\nYep, that's when I noticed it too. I stopped replacing the hood badge \\nafter the second or third one (at $12.00 each).\\n\\n2002 drivers used to flash their headlight at each other in greeting. Try \\nflashing your headlights at a 318i driver and see what kind of look you \\nget. They usually check their radar detector...they think you're alerting \\nthem to a cop. | \n", "
2221 | \n", "0 | \n", "-0.510692 | \n", "refrettably you are mistaken. alt.drugs was used to recruit people for the\\nworldwide pot religion. I, however hve no problem being in both of them\\n\\n | \n", "
\n", " | Prediction | \n", "Score | \n", "Text | \n", "
---|---|---|---|
0 | \n", "1 | \n", "0.418117 | \n", "\\nAnd does it not say in scripture that no man knows the hour of His coming, not\\neven the angels in Heaven but only the Father Himself? DK was trying to play\\nGod by breaking the seals himself. DK killed himself and as many of his\\nfollowers as he could. BTW, God did save the children. They are in Heaven,\\na far better place. How do I know? By faith.\\n\\nGod be with you, | \n", "
1 | \n", "1 | \n", "0.409673 | \n", "\\n \\nFirst of all, the original poster misquoted. The reference is from 2 Tim 3:16.\\nThe author was Paul, and his revelations were anything but \"(at best) \\nsecond-hand\".\\n\\n\\t\"And is came about that as [Saul] journeyed, he was approaching\\n\\t Damascus, and suddenly a light from heaven flashed around him; and\\n\\t he fell to the ground, and heard a voice saying to him, \"Saul, Saul,\\n\\t why are you persecuting Me?\" And he said, \"Who art Thou, Lord?\" And\\n\\t He said, \"I am Jesus whom you are persecuting, . . .\"\\n\\t\\t(Acts 9:3-5, NAS)\\n\\nPaul received revelation directly from the risen Jesus! (Pretty cool, eh?) He\\nbecame closely involved with the early church, the leaders of which were \\nfollowers of Jesus throughout his ministry on earth.\\n\\n\\nI agree. I don't believe anyone but the Spirit would be able to convince you \\nthe Spirit exists. Please don't complain about this being circular. I know\\nit is, but really, can anything of the natural world explain the supernatural?\\n(This is why revelation is necessary to the authors of the Bible.)\\n\\n\\nThe Spirit is part of God. How much closer to the source can you get?\\nThe Greek in 2 Timothy which is sometimes translated as \"inspired by God\", \\nliterally means \"God-breathed\". In other words, God spoke the actual words \\ninto the scriptures. Many theologians and Bible scholars (Dr. James Boice is \\none that I can remember off-hand) get quite annoyed by the dryness and \\nincompleteness of \"inspired by God\".\\n\\n\\nThat's what the verse taken from 2 Timothy was all about. The continuity of a \\nbook written over a span of 1500 years by more than 40 authors from all walks \\nof life is a testimony to the single authorship of God.\\n\\n\\n\\nWhat source to you claim to have discovered which has information of superior\\nhistoricity to the Bible? Certainly not Josephus' writings, or the writings \\nof the Gnostics which were third century, at the earliest.\\n\\n\\nJesus was fully God as well. That's why I'd assert that he is wise.\\n\\n\\nPlease rethink this last paragraph. If there is no God, which seems to be your\\ncurrent belief, then Jesus was either a liar or a complete nut because not\\nonly did he assert that God exists, but he claimed to be God himself! (regards\\nto C.S. Lewis) How then could you have the least bit of respect for Jesus?\\n\\tIn conclusion, be careful about logically unfounded hypotheses based\\non gut feelings about the text and other scholars' unsubstantiated claims. \\nThe Bible pleads that we take it in its entirety or throw the whole book out.\\n\\tAbout your reading of the Bible, not only does the Spirit inspire the\\nwriters, but he guides the reader as well. We cannot understand it in the \\nleast without the Spirit's guidance:\\n\\n\\t\"For to us God revealed them through the Spirit; for the Spirit \\n\\tsearches all things, even the depths of God.\" (1 Cor 2:10, NAS)\\n \\nPeace and may God guide us in wisdom.\\n\\n+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=\\nCarter C. Page | Of happiness the crown and chiefest part is wisdom,\\nA Carpenter's Apprentice | and to hold God in awe. This is the law that,\\ncpage@seas.upenn.edu | seeing the stricken heart of pride brought down,\\n | we learn when we are old. -Adapted from Sophocles\\n+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=-+-=+-=-+-=+-=-+=-+-=-+-=-+=-+-= | \n", "
2 | \n", "1 | \n", "0.405350 | \n", "I differ with our moderator on this. I thought the whole idea of God coming\\ndown to earth to live as one of us \"subject to sin and death\" (as one of\\nthe consecration prayers in the Book of Common Prayer (1979) puts it) was\\nthat Jesus was tempted, but did not succumb. If sin is not part of the\\nbasic definition of humanity, then Jesus \"fully human\" (Nicea) would not\\nbe \"subject to sin\", but then the Resurrection loses some of its meaning,\\nbecause we encounter our humanity most powerfully when we sin. To distinguish\\nbetween \"human\" and \"fallen human\" makes Jesus less like one of us at the\\ntime we need him most.\\n\\n\\nFirst, the Monophysites inherited none of Nestorius's version -- they \\nwere on the opposite end of the spectrum from him. Second, the historical\\nrecord suggests that the positions attributed to Nestorius were not as\\nextreme as his (successful) opponents (who wrote the conventional history)\\nclaimed. Mainly Nestorius opposed the term Theotokos for Mary, arguing\\n(I think correctly) that a human could not be called Mother of God. I mean,\\nin the Athanasian Creed we talk about the Son \"uncreate\" -- surely even \\nArians would concede that Jesus existed long before Mary. Anyway, Nestorius's\\nopponents claimed that by saying Mary was not Theotokos, that he claimed\\nthat she only gave birth to the human nature of Jesus, which would require\\ntwo seperate and distinct natures. The argument fails though, because\\nMary simply gave birth to Jesus, who preexisted her either divinely,\\nif you accept \"Nestorianism\" as commonly defined, or both natures intertwined,\\na la Chalcedon.\\n\\nSecond, I am not sure that \"Nestorianism\" is not a better alternative than\\nthe orthodox view. After all, I find it hard to believe that pre-Incarnation\\nthat Jesus's human nature was in heaven; likewise post-Ascension. I think\\nrather that God came to earth and took our nature upon him. It was a seperate\\nnature, capable of being tempted as in Gethsemane (since I believe the divine\\nnature could never be tempted) but in its moments of weakness the divine nature\\nprevailed.\\n\\nComments on the above warmly appreciated.\\n\\nJason Albert\\n\\n[There may be differences in what we mean by \"subject to sin\". The\\noriginal complaint was from someone who didn't see how we could call\\nJesus fully human, because he didn't sin. I completely agree that\\nJesus was subject to temptation. I simply object to the idea that by\\nnot succumbing, he is thereby not fully human. I believe that you do\\nnot have to sin in order to be human.\\n\\nI again apologize for confusing Nestorianism and monophysitism. I\\nagree with you, and have said elsewhere, that there's reason to think\\nthat not everyone who is associated with heretical positions was in\\nfact heretical. There are scholars who maintain that Nestorius was\\nnot Nestorian. I have to confess that the first time I read some of\\nthe correspondence between Nestorius and his opponents, I thought he\\ngot the better of them.\\n\\nHowever, most scholars do believe that the work that eventually led to\\nChalcedon was an advance, and that Nestorius was at the very least\\n\"rash and dogmatic\" (as the editor of \"The Christological Controversy\"\\nrefers to him) in rejecting all approaches other than his own. As\\nregular Usenet readers know, narrowness can be just as much an\\nimpediment as being wrong. Furthermore, he did say some things that I\\nthink are problematical. He responds to a rather mild letter from\\nCyril with a flame worthy of Usenet. In it he says \"To attribute also\\nto [the Logos], in the name of [the incarnation] the characteristics\\nof the flesh that has been conjoined with him ... is, my brother,\\neither the work of a mind which truly errs in the fashion of the\\nGreeks or that of a mind diseased with the insane heresy of Arius and\\nApollinaris and the others. Those who are thus carried away with the\\nidea of this association are bound, because of it, to make the divine\\nLogos have a part in being fed with milk and participate to some\\ndegree in growh and stand in need of angelic assistance because of his\\nfearfulness ... These things are taken falsely when they are put off\\non the deity and they become the occasion of just condemnation for us\\nwho perpetrate the falsehood.\"\\n\\nIt's all well and good to maintain a proper distinction between\\nhumanity and divinity. But the whole concept of incarnation is based\\non exactly the idea that the divine Logos does in fact have \"to some\\ndegree\" a part in being born, growing up, and dying. Of course it\\nmust be understood that there's a certain indirectness in the Logos'\\nparticipation in these things. But there must be some sort of\\nidentification between the divine and human, or we don't have an\\nincarnation at all. Nestorius seemed to think in black and white\\nterms, and missed the sorts of nuances one needs to deal with this\\narea.\\n\\nYou say \"I find it hard to believe that pre-Incarnation that Jesus's\\nhuman nature was in heaven.\" I don't think that's required by\\northodox doctrine. It's the divine Logos that is eternal. | \n", "