{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tf-idf Vectorizer, Naive Bayes" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected Category: description\n", "description has 280 samples;\n", "installation has 70 samples;\n", "invocation has 70 samples;\n", "citation has 70 samples;\n", "Selected Category: installation\n", "description has 200 samples;\n", "installation has 800 samples;\n", "invocation has 200 samples;\n", "citation has 200 samples;\n", "Selected Category: invocation\n", "description has 279 samples;\n", "installation has 279 samples;\n", "invocation has 1118 samples;\n", "citation has 279 samples;\n", "Selected Category: citation\n", "description has 77 samples;\n", "installation has 77 samples;\n", "invocation has 77 samples;\n", "citation has 309 samples;\n", "{'description': excerpt description\n", "0 Puppeteer is a Node library which provides a h... True\n", "1 The major contributors of this repository incl... True\n", "2 Integral Regression is initially described in ... True\n", "3 We build a 3D pose estimation system based mai... True\n", "4 The Integral Regression is also known as soft-... True\n", "5 This is an official implementation for Integra... True\n", "6 The original implementation is based on our in... True\n", "7 LibGEOS is a LGPL-licensed package for manipul... True\n", "8 Among other things, it allows you to parse Wel... True\n", "9 This repository contains the experiments in th... True\n", "10 For the results presented in the paper, we did... True\n", "11 Batch normalization is currently not supported... True\n", "12 Open-source Ground Penetrating Radar processin... True\n", "13 Pytorch implementation for high-resolution (e.... True\n", "14 The PVGeo Python package contains VTK powered ... True\n", "15 A PyVista (and VTK) interface for the Open Min... True\n", "16 GeoNotebook is an application that provides cl... True\n", "17 Fiona is OGR's neat and nimble API for Python ... True\n", "18 Fiona is designed to be simple and dependable.... True\n", "19 Shapely is a BSD-licensed Python package for m... True\n", "20 Rain streaks can severely degrade the visibili... True\n", "21 The pytorch branch contains: True\n", "22 the pytorch implementation of Peak Response Ma... True\n", "23 the PASCAL-VOC demo (training, inference, and ... True\n", "24 Lithology and stratigraphic logs for wells and... True\n", "25 This Python module allows you to: True\n", "26 Interactively control an instance of ANSYS v14... True\n", "27 Extract data directly from binary ANSYS v14.5+... True\n", "28 Rapidly read in binary result (.rst), binary m... True\n", "29 Official implementation of GANimation. In this... True\n", ".. ... ...\n", "460 author = {Xinlei Chen and Li-Jia Li and Li Fei... False\n", "461 journal={arXiv preprint arXiv:1809.06079}, False\n", "462 booktitle = {Proceedings of the IEEE Conferenc... False\n", "463 booktitle = {IEEE Conference on Computer Visio... False\n", "464 @article{yu2018pygeopressure, False\n", "465 Tristan van Leeuwen, TristanvanLeeuwen False\n", "466 year={2018} False\n", "467 @inproceedings{chen18iterative, False\n", "468 Dieter Werthmüller, prisae False\n", "469 } False\n", "470 title = {Two-Stream Convolutional Networks for... False\n", "471 If you find our work useful in your research, ... False\n", "472 booktitle = {International Conference on Machi... False\n", "473 } False\n", "474 volume = {3}, False\n", "475 Citation False\n", "476 author = {Lim, Bee and Son, Sanghyun and Kim, ... False\n", "477 Citation False\n", "478 Title = {{R-FCN}: Object Detection via Region-... False\n", "479 M. Attene. A lightweight approach to repairing... False\n", "480 year={2018} False\n", "481 False\n", "482 @InProceedings{kato2018renderer False\n", "483 year = {2018} False\n", "484 Learning Spatio-Temporal Features with 3D Resi... False\n", "485 author={Chen, Yuhua and Li, Wen and Sakaridis,... False\n", "486 } False\n", "487 Calcagno, P., Chilès, J. P., Courrioux, G., & ... False\n", "488 Year = {2017} False\n", "489 HyVR can be attributed by citing the following... False\n", "\n", "[490 rows x 2 columns], 'installation': excerpt installation\n", "0 Neural Renderer (this repository) False\n", "1 This repository only contains the core compone... False\n", "2 Additionally, the aim is not to support the fu... False\n", "3 Lithology and stratigraphic logs for wells and... False\n", "4 Faster R-CNN False\n", "5 Basically, he wears a top hat, lives in your c... False\n", "6 A Jupyter / Leaflet bridge enabling interactiv... False\n", "7 mplstereonet provides lower-hemisphere equal-a... False\n", "8 By default, a modified Kamb method with expone... False\n", "9 Detectron is Facebook AI Research's software s... False\n", "10 This work is based on our research paper, whic... False\n", "11 This Python module allows you to: False\n", "12 spatial regression and statistical modeling on... False\n", "13 Rain streaks can severely degrade the visibili... False\n", "14 PySAL, the Python spatial analysis library, is... False\n", "15 Shapely is a BSD-licensed Python package for m... False\n", "16 At FAIR, Detectron has enabled numerous resear... False\n", "17 Import meshes from many common formats (use py... False\n", "18 This is a NodeJS port of pymasker. It provides... False\n", "19 For simplicity, each dot represents one U-Net.... False\n", "20 Export meshes as VTK, STL, OBJ, or PLY file types False\n", "21 Sandbox False\n", "22 This is the code for the paper False\n", "23 Modelling routines: False\n", "24 The Laplacian Pyramid Super-Resolution Network... False\n", "25 fdesign: Design digital linear filters for the... False\n", "26 The goal of Detectron is to provide a high-qua... False\n", "27 Airwave (semi-analytical in the case of step r... False\n", "28 Geographic information systems use GeoTIFF and... False\n", "29 tiles server for live feedback when coding False\n", "... ... ...\n", "1370 Algorithm and Citation Policy False\n", "1371 Title = {Multi{P}ose{N}et: Fast Multi... False\n", "1372 volume = {4}, False\n", "1373 @inproceedings{LapSRN, False\n", "1374 @inproceedings{tesfaldet2018, False\n", "1375 author = {Yiping Chen and Jingkang Wang and Jo... False\n", "1376 If you use this code or pre-trained models, pl... False\n", "1377 Lajaunie, C., Courrioux, G., & Manuel, L. (199... False\n", "1378 } False\n", "1379 and Michael J. Black False\n", "1380 Title = {{R-FCN}: Object Detection via Region-... False\n", "1381 For a more detailed elaboration of the theory ... False\n", "1382 To better understand how the algorithm works, ... False\n", "1383 author={Sun, Xiao and Xiao, Bin and Liang, Shu... False\n", "1384 booktitle = {The IEEE Conference on Computer V... False\n", "1385 } False\n", "1386 @article{zhang2018rdnir, False\n", "1387 title={Integral human pose regression}, False\n", "1388 booktitle={The IEEE Conference on Computer Vis... False\n", "1389 Citation False\n", "1390 } False\n", "1391 } False\n", "1392 year = {2018} False\n", "1393 Citing DaSiamRPN False\n", "1394 title = {Detectron}, False\n", "1395 booktitle = {The IEEE Conference on Computer V... False\n", "1396 year = {2017} False\n", "1397 Learning Spatio-Temporal Features with 3D Resi... False\n", "1398 Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, H... False\n", "1399 @inproceedings{wang2018vid2vid, False\n", "\n", "[1400 rows x 2 columns], 'invocation': excerpt invocation\n", "0 Just so you get an idea, it took NYPL staff co... False\n", "1 This repository contains the experiments in th... False\n", "2 The code is built on EDSR (Torch) and tested o... False\n", "3 Surface contact points: 3D coordinates of poin... False\n", "4 Additionally, the aim is not to support the fu... False\n", "5 Renderer backend for tilelive.js that uses nod... False\n", "6 Resulting tiles conform to the JSON equivalent... False\n", "7 construction of graphs from spatial data False\n", "8 Single-image 3D mesh reconstruction False\n", "9 model - model spatial relationships in data wi... False\n", "10 The original motivation for HyVR was the lack ... False\n", "11 SEG-Y Revisions False\n", "12 gprMax is principally written in Python 3 with... False\n", "13 Among other things, it allows you to parse Wel... False\n", "14 Note this is not a package for reading LiDAR d... False\n", "15 TetGen is a program to generate tetrahedral me... False\n", "16 project loading False\n", "17 PyVista is a helper module for the Visualizati... False\n", "18 Segyio can handle a lot of files that are SEG-... False\n", "19 In this repository, we release demo code and p... False\n", "20 tiles server for live feedback when coding False\n", "21 Complete full-space (electric and magnetic sou... False\n", "22 analytical: interface to the analytical, space... False\n", "23 Linear operators and inverse problems are at t... False\n", "24 Nikos Kolotouros provides PyTorch re-implement... False\n", "25 TetGen provides various features to generate g... False\n", "26 For now, only Carto based projects are support... False\n", "27 Tilematrix supports metatiling and tile buffer... False\n", "28 This Python module is an interface to Hang Si'... False\n", "29 A highly efficient JavaScript library for slic... False\n", "... ... ...\n", "1925 title={Scale-recurrent Network for Deep Image ... False\n", "1926 Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang False\n", "1927 booktitle={CVPR}, False\n", "1928 @inproceedings{li2018recurrent, False\n", "1929 Presented at CVPR 2018 False\n", "1930 {ethanlee, jlwu1992, zlin, hongliu}@pku.edu.cn... False\n", "1931 booktitle = {IEEE Conferene on Computer Vision... False\n", "1932 Citation False\n", "1933 title = {Two-Stream Convolutional Networks for... False\n", "1934 @inproceedings{tao2018srndeblur, False\n", "1935 References False\n", "1936 Key Laboratory of Machine Perception (MOE), Sc... False\n", "1937 title = {{PyVista}: 3D plotting and mesh analy... False\n", "1938 HyVR can be attributed by citing the following... False\n", "1939 } False\n", "1940 journal = {Journal of Open Source Software} False\n", "1941 Year = {2018} False\n", "1942 } False\n", "1943 Tristan van Leeuwen, TristanvanLeeuwen False\n", "1944 @inproceedings{zhang2018residual, False\n", "1945 journal={arXiv preprint arXiv:1711.08229}, False\n", "1946 title = {Detectron}, False\n", "1947 } False\n", "1948 If you use Detectron in your research or wish ... False\n", "1949 booktitle={BMVC}, False\n", "1950 booktitle={Proceedings of the European Confere... False\n", "1951 author = {Xinlei Chen and Abhinav Gupta}, False\n", "1952 @inproceedings{LapSRN, False\n", "1953 url = {https://doi.org/10.21105/joss.01450}, False\n", "1954 publisher = {The Open Journal}, False\n", "\n", "[1955 rows x 2 columns], 'citation': excerpt citation\n", "0 model - model spatial relationships in data wi... False\n", "1 Features False\n", "2 A scene graph is a structured representation o... False\n", "3 Renderer backend for tilelive.js that uses nod... False\n", "4 The input is assumed to represent a single clo... False\n", "5 GemPy was designed from the beginning to suppo... False\n", "6 Complete full-space (electric and magnetic sou... False\n", "7 The mapshaper command line program supports es... False\n", "8 Very lite but extendable mapping framework to ... False\n", "9 Learn Once, Write Anywhere: We don't make assu... False\n", "10 graph construction from polygonal lattices, li... False\n", "11 exploratory spatio-temporal data analysis False\n", "12 The file read parameters are based on GSSI's D... False\n", "13 If you give it all of OpenStreetMap and zoom o... False\n", "14 PySAL, the Python spatial analysis library, is... False\n", "15 Resulting tiles conform to the JSON equivalent... False\n", "16 The input scene graph is processed with a grap... False\n", "17 A Jupyter / Leaflet bridge enabling interactiv... False\n", "18 SEG-Y Revisions False\n", "19 This is the implementation of our CVPR 2018 wo... False\n", "20 mplstereonet also includes a number of utiliti... False\n", "21 All traces in a file are assumed to be of the ... False\n", "22 ResNet{50,101,152} False\n", "23 This repository contains the experiments in th... False\n", "24 This is the code for the paper False\n", "25 We build a 3D pose estimation system based mai... False\n", "26 Overview False\n", "27 Flow-Guided Feature Aggregation (FGFA) is init... False\n", "28 The major contributors of this repository incl... False\n", "29 mplleaflet is a Python library that converts a... False\n", ".. ... ...\n", "510 booktitle = {Computer Vision and Pattern Recog... True\n", "511 year={2018} True\n", "512 } True\n", "513 Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhon... True\n", "514 Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhon... True\n", "515 @InProceedings{Lim_2017_CVPR_Workshops, True\n", "516 author = {Lim, Bee and Son, Sanghyun and Kim, ... True\n", "517 title = {Enhanced Deep Residual Networks for S... True\n", "518 booktitle = {The IEEE Conference on Computer V... True\n", "519 month = {July}, True\n", "520 year = {2017} True\n", "521 } True\n", "522 @inproceedings{zhang2018residual, True\n", "523 title={Residual Dense Network for Image Super-... True\n", "524 author={Zhang, Yulun and Tian, Yapeng and Kong... True\n", "525 booktitle={CVPR}, True\n", "526 year={2018} True\n", "527 @article{zhang2018rdnir, True\n", "528 title={Residual Dense Network for Image Restor... True\n", "529 booktitle={arXiv}, True\n", "530 @inproceedings{tang2018quantized, True\n", "531 title={Quantized densely connected U-Nets for ... True\n", "532 author={Tang, Zhiqiang and Peng, Xi and Geng, ... True\n", "533 booktitle={ECCV}, True\n", "534 year={2018} True\n", "535 } True\n", "536 @inproceedings{tang2018cu, True\n", "537 title={CU-Net: Coupled U-Nets}, True\n", "538 author={Tang, Zhiqiang and Peng, Xi and Geng, ... True\n", "539 booktitle={BMVC}, True\n", "\n", "[540 rows x 2 columns]}\n" ] } ], "source": [ "from setup_corpus import build_corpora\n", "corpora = build_corpora()\n", "print(corpora)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "doing something haha" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import cross_val_score, cross_validate, StratifiedKFold\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "from sklearn.naive_bayes import MultinomialNB\n", "from sklearn.pipeline import make_pipeline\n", "from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, roc_curve, auc\n", "pipeline = make_pipeline(TfidfVectorizer(), MultinomialNB())\n", "\n", "cv = StratifiedKFold(n_splits = 5, shuffle=True)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Category: description\n", "Scores: [0.79591837 0.75510204 0.78571429 0.75510204 0.74489796]\n", "Accuracy: 0.7673 (+/- 0.0396)\n", "Category: installation\n", "Scores: [0.85714286 0.84285714 0.86785714 0.84285714 0.83928571]\n", "Accuracy: 0.8500 (+/- 0.0217)\n", "Category: invocation\n", "Scores: [0.88010204 0.84693878 0.86189258 0.87179487 0.87692308]\n", "Accuracy: 0.8675 (+/- 0.0240)\n", "Category: citation\n", "Scores: [0.88990826 0.91666667 0.93518519 0.85185185 0.92523364]\n", "Accuracy: 0.9038 (+/- 0.0600)\n" ] } ], "source": [ "for category in corpora:\n", " scores = cross_val_score(pipeline, corpora[category].excerpt, corpora[category][category], cv=cv)\n", " print(f\"Category: {category}\\nScores: {scores}\\nAccuracy: {scores.mean():.4f} (+/- {scores.std()*2:.4f})\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from scipy import interp\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Description ROC\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "X = corpora['description'].excerpt\n", "y = corpora['description'].description\n", "tprs = []\n", "aucs = []\n", "mean_fpr = np.linspace(0, 1, 100)\n", " \n", "i = 0\n", "print('Description ROC')\n", "for train, test in cv.split(X, y):\n", " probas_ = pipeline.fit(X[train], y[train]).predict_proba(X[test])\n", " # Compute ROC curve and area under the curve\n", " fpr, tpr, thresholds = roc_curve(y[test], probas_[:, 1])\n", " tprs.append(interp(mean_fpr, fpr, tpr))\n", " tprs[-1][0] = 0.0\n", " roc_auc = auc(fpr, tpr)\n", " aucs.append(roc_auc)\n", " plt.plot(fpr, tpr, lw=1, alpha=0.3, label='ROC fold %d (AUC = %0.2f)' % (i, roc_auc))\n", " i+=1\n", "plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',\n", " label='Chance', alpha=.8)\n", "\n", "mean_tpr = np.mean(tprs, axis=0)\n", "mean_tpr[-1] = 1.0\n", "mean_auc = auc(mean_fpr, mean_tpr)\n", "std_auc = np.std(aucs)\n", "plt.plot(mean_fpr, mean_tpr, color='b',\n", " label=r'Mean ROC (AUC = %0.2f $\\pm$ %0.2f)' % (mean_auc, std_auc),\n", " lw=2, alpha=.8)\n", "\n", "std_tpr = np.std(tprs, axis=0)\n", "tprs_upper = np.minimum(mean_tpr + std_tpr, 1)\n", "tprs_lower = np.maximum(mean_tpr - std_tpr, 0)\n", "plt.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,\n", " label=r'$\\pm$ 1 std. dev.')\n", "\n", "plt.xlim([-0.05, 1.05])\n", "plt.ylim([-0.05, 1.05])\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.title('ROC Curve for Description Classification')\n", "plt.legend(loc=\"lower right\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Installation ROC\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "X = corpora['installation'].excerpt\n", "y = corpora['installation'].installation\n", "tprs = []\n", "aucs = []\n", "mean_fpr = np.linspace(0, 1, 100)\n", " \n", "i = 0\n", "print('Installation ROC')\n", "for train, test in cv.split(X, y):\n", " probas_ = pipeline.fit(X[train], y[train]).predict_proba(X[test])\n", " # Compute ROC curve and area under the curve\n", " fpr, tpr, thresholds = roc_curve(y[test], probas_[:, 1])\n", " tprs.append(interp(mean_fpr, fpr, tpr))\n", " tprs[-1][0] = 0.0\n", " roc_auc = auc(fpr, tpr)\n", " aucs.append(roc_auc)\n", " plt.plot(fpr, tpr, lw=1, alpha=0.3, label='ROC fold %d (AUC = %0.2f)' % (i, roc_auc))\n", " i+=1\n", "plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',\n", " label='Chance', alpha=.8)\n", "\n", "mean_tpr = np.mean(tprs, axis=0)\n", "mean_tpr[-1] = 1.0\n", "mean_auc = auc(mean_fpr, mean_tpr)\n", "std_auc = np.std(aucs)\n", "plt.plot(mean_fpr, mean_tpr, color='b',\n", " label=r'Mean ROC (AUC = %0.2f $\\pm$ %0.2f)' % (mean_auc, std_auc),\n", " lw=2, alpha=.8)\n", "\n", "std_tpr = np.std(tprs, axis=0)\n", "tprs_upper = np.minimum(mean_tpr + std_tpr, 1)\n", "tprs_lower = np.maximum(mean_tpr - std_tpr, 0)\n", "plt.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,\n", " label=r'$\\pm$ 1 std. dev.')\n", "\n", "plt.xlim([-0.05, 1.05])\n", "plt.ylim([-0.05, 1.05])\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.title('ROC Curve for Installation Classification')\n", "plt.legend(loc=\"lower right\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Installation ROC\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "X = corpora['invocation'].excerpt\n", "y = corpora['invocation'].invocation\n", "tprs = []\n", "aucs = []\n", "mean_fpr = np.linspace(0, 1, 100)\n", " \n", "i = 0\n", "print('Installation ROC')\n", "for train, test in cv.split(X, y):\n", " probas_ = pipeline.fit(X[train], y[train]).predict_proba(X[test])\n", " # Compute ROC curve and area under the curve\n", " fpr, tpr, thresholds = roc_curve(y[test], probas_[:, 1])\n", " tprs.append(interp(mean_fpr, fpr, tpr))\n", " tprs[-1][0] = 0.0\n", " roc_auc = auc(fpr, tpr)\n", " aucs.append(roc_auc)\n", " plt.plot(fpr, tpr, lw=1, alpha=0.3, label='ROC fold %d (AUC = %0.2f)' % (i, roc_auc))\n", " i+=1\n", "plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',\n", " label='Chance', alpha=.8)\n", "\n", "mean_tpr = np.mean(tprs, axis=0)\n", "mean_tpr[-1] = 1.0\n", "mean_auc = auc(mean_fpr, mean_tpr)\n", "std_auc = np.std(aucs)\n", "plt.plot(mean_fpr, mean_tpr, color='b',\n", " label=r'Mean ROC (AUC = %0.2f $\\pm$ %0.2f)' % (mean_auc, std_auc),\n", " lw=2, alpha=.8)\n", "\n", "std_tpr = np.std(tprs, axis=0)\n", "tprs_upper = np.minimum(mean_tpr + std_tpr, 1)\n", "tprs_lower = np.maximum(mean_tpr - std_tpr, 0)\n", "plt.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,\n", " label=r'$\\pm$ 1 std. dev.')\n", "\n", "plt.xlim([-0.05, 1.05])\n", "plt.ylim([-0.05, 1.05])\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.title('ROC Curve for Invocation Classification')\n", "plt.legend(loc=\"lower right\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Citation ROC\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "X = corpora['citation'].excerpt\n", "y = corpora['citation'].citation\n", "tprs = []\n", "aucs = []\n", "mean_fpr = np.linspace(0, 1, 100)\n", " \n", "i = 0\n", "print('Citation ROC')\n", "for train, test in cv.split(X, y):\n", " probas_ = pipeline.fit(X[train], y[train]).predict_proba(X[test])\n", " # Compute ROC curve and area under the curve\n", " fpr, tpr, thresholds = roc_curve(y[test], probas_[:, 1])\n", " tprs.append(interp(mean_fpr, fpr, tpr))\n", " tprs[-1][0] = 0.0\n", " roc_auc = auc(fpr, tpr)\n", " aucs.append(roc_auc)\n", " plt.plot(fpr, tpr, lw=1, alpha=0.3, label='ROC fold %d (AUC = %0.2f)' % (i, roc_auc))\n", " i+=1\n", "plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',\n", " label='Chance', alpha=.8)\n", "\n", "mean_tpr = np.mean(tprs, axis=0)\n", "mean_tpr[-1] = 1.0\n", "mean_auc = auc(mean_fpr, mean_tpr)\n", "std_auc = np.std(aucs)\n", "plt.plot(mean_fpr, mean_tpr, color='b',\n", " label=r'Mean ROC (AUC = %0.2f $\\pm$ %0.2f)' % (mean_auc, std_auc),\n", " lw=2, alpha=.8)\n", "\n", "std_tpr = np.std(tprs, axis=0)\n", "tprs_upper = np.minimum(mean_tpr + std_tpr, 1)\n", "tprs_lower = np.maximum(mean_tpr - std_tpr, 0)\n", "plt.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,\n", " label=r'$\\pm$ 1 std. dev.')\n", "\n", "plt.xlim([-0.05, 1.05])\n", "plt.ylim([-0.05, 1.05])\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.title('ROC Curve for Citation Classification')\n", "plt.legend(loc=\"lower right\")\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }