{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 08 - Model Deployment\n", "\n", "by [Alejandro Correa Bahnsen](http://www.albahnsen.com/) & [Iván Torroledo](http://www.ivantorroledo.com/)\n", "\n", "version 1.3, June 2018\n", "\n", "## Part of the class [Applied Deep Learning](https://github.com/albahnsen/AppliedDeepLearningClass)\n", "\n", "This notebook is licensed under a [Creative Commons Attribution-ShareAlike 3.0 Unported License](http://creativecommons.org/licenses/by-sa/3.0/deed.en_US)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Agenda:\n", "\n", "1. Creating and saving a model\n", "2. Running the model in batch\n", "3. Exposing the model as an API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Phishing Detection\n", "\n", "Phishing, by definition, is the act of defrauding an online user in order to obtain personal information by posing as a trustworthy institution or entity. Users usually have a hard time differentiating between legitimate and malicious sites because they are made to look exactly the same. Therefore, there is a need to create better tools to combat attackers." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import zipfile\n", "with zipfile.ZipFile('../datasets/model_deployment/phishing.csv.zip', 'r') as z:\n", " f = z.open('phishing.csv')\n", " data = pd.read_csv(f, index_col=False)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
urlphishing
0http://www.subalipack.com/contact/images/sampl...1
1http://fasc.maximecapellot-gypsyjazz-ensemble....1
2http://theotheragency.com/confirmer/confirmer-...1
3http://aaalandscaping.com/components/com_smart...1
4http://paypal.com.confirm-key-21107316126168.s...1
\n", "
" ], "text/plain": [ " url phishing\n", "0 http://www.subalipack.com/contact/images/sampl... 1\n", "1 http://fasc.maximecapellot-gypsyjazz-ensemble.... 1\n", "2 http://theotheragency.com/confirmer/confirmer-... 1\n", "3 http://aaalandscaping.com/components/com_smart... 1\n", "4 http://paypal.com.confirm-key-21107316126168.s... 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
urlphishing
39995http://www.diaperswappers.com/forum/member.php...0
39996http://posting.bohemian.com/northbay/Tools/Ema...0
39997http://www.tripadvisor.jp/Hotel_Review-g303832...0
39998http://www.baylor.edu/content/services/downloa...0
39999http://www.phinfever.com/forums/viewtopic.php?...0
\n", "
" ], "text/plain": [ " url phishing\n", "39995 http://www.diaperswappers.com/forum/member.php... 0\n", "39996 http://posting.bohemian.com/northbay/Tools/Ema... 0\n", "39997 http://www.tripadvisor.jp/Hotel_Review-g303832... 0\n", "39998 http://www.baylor.edu/content/services/downloa... 0\n", "39999 http://www.phinfever.com/forums/viewtopic.php?... 0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.tail()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 20000\n", "0 20000\n", "Name: phishing, dtype: int64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.phishing.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating features" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['http://dothan.com.co/gold/austspark/index.htm\\n',\n", " 'http://78.142.63.63/%7Enetsysco/process/fc1d9c7ea4773b7ff90925c2902cb5f2\\n',\n", " 'http://verify95.5gbfree.com/coverme2010/\\n',\n", " 'http://www.racom.com/uploads/productscat/bookmark/ii.php?.rand=13vqcr8bp0gud&cbcxt=mai&email=abuse@tradinghouse.ca\\n',\n", " 'http://www.cleanenergytci.com/components/update.logon.l3an7lofamerica/2342343234532534546347677898765432876543345687656543876/\\n',\n", " 'http://209.148.89.163/-/santander.co.uk/weblegn/AccountLogin.php\\n',\n", " 'http://senevi.com/confirmation/\\n',\n", " 'http://www.hellenkeller.cl/tmp/new/noticias/Modulo_de_Atualizacao_Bradesco/index2.php?id=PSO1AM04L3Q6PSBNVJ82QUCO0L5GBSY2KM2U9BYUEO14HCRDVZEMTRB3DGJO9HPT4ROC4M8HA8LRJD5FCJ27AD0NTSC3A3VDUJQX6XFG519OED4RW6Y8J8VC19EAAAO5UF21CHGHIP7W4AO1GM8ZU4BUBQ6L2UQVARVM\\n',\n", " 'http://internet-sicherheit.co/de/konflikt/src%3Dde/AZ00276ZZ75/we%3Dhs_0_2/sicherheit/konto_verifizieren/verifizierung.php\\n',\n", " 'http://alen.co/docs/cleaner\\n',\n", " 'http://rattanhouse.co/Atualizacao_Bradesco/cadastro2013.php?2MAS2XACUJPI3U8D9ZDDG2G9YJICVABQ3K73KWDKYK0NA0AWWWCOUEDUJRXHRKPNMUYLDV89RA6OCG2MQUS0TAUXX9IOGJUEIXPDS5B0RM18OF1H860UAMJOY6ICUR81VSEKKJFPBYNLYGUXBGJ1HEHKOMLTM01P658M\\n',\n", " 'http://steamcommunily.co/p.php?login=true\\n',\n", " 'http://www.nyyg.com/Bradesco/5W9SQ394.html\\n',\n", " 'http://wp.tipografiacentral.com.co/sparkde/index.html\\n',\n", " 'http://www.entrerev.com/component/.secure.wpa/.www.paypal.com.returnUrl=/cgi-bin/5RF3S6y0K349/PayPal.co.uk/dispute_centre/sotmks/npsw&st.payment.decline.centre/ipoi/secure-codes.paypal.account4738154login.complete-infrmations.login.accountSecure26/securities/\\n',\n", " 'http://x.co/SecurCent\\n',\n", " 'http://dejatequerer.co/united.com/index.html\\n',\n", " 'http://www.speakeasymovies.com/components/com_wrapper/.amazon.co.uk/\\n',\n", " 'http://www.culturaespanola.com.br/bt/www.paypal.com/paypal.com.com/index-new.php\\n',\n", " 'http://www.agroassistance.com/components/com_content/c05354aa285b6a932a57086ba13762a1/\\n',\n", " 'http://www.estranetsrl.com.ar/bbvacambios.html\\n',\n", " 'http://osfsw.cba.pl/content/classic/html/ibpf/bradesco/?UOREEIYGQTERIRVSJTUHMVMZJWWYSVNYQOFSPWVFTEJEEKMJWHFERRYTFRWPSYYWGFIGJUPLZMZLTNSKOGMQQSHSXPLMXILVSM\\n',\n", " 'http://bitcrush.co/~geetha5/natwest/natwest/ibcarregister-natwst.html\\n',\n", " 'http://cannot-hide-from-PhishTank.zenith-services.com/controllare/auth/\\n',\n", " 'http://nova.pymesonline.co/fr.php\\n',\n", " 'http://comococino.com/wp-content/uploads/2013/01/paypal.com/us/cgi-bin/webscr.htm?\\n',\n", " 'http://www.fundacionchwinqlal.com.gt/imgs/Notas/img/_New/Agencias_Bradesco/Public_201133.php?KSR6YOU359CY1USIRMSBI8CFJF7TVREFJ6KIUFKZNXXNRP7JBYVU79APNGJI8YYR5I0YXUXLRU0JKF4WEYQL81BUGVDOTBFXUPVSKSEBNNU84X4IWT54UFYABCY5OE3J5XBOQQ1EDVMHTPZPJ4TEJSOU5NZS32B8ZNWQ\\n',\n", " 'http://flightripe.com/confirmation/update/billing/9a523c6017caa3406af9d5c2c0cb1854/\\n',\n", " 'http://accademiazerootto.it/templates/zerootto-new/html/com_content/category/bompreco.php\\n',\n", " 'http://santanderseguranca.zapto.org/Clientesx/\\n',\n", " 'http://www.muttico.com/components/com_media/p3rs0na4l/53f8b14c76c890e1806b8f9d97f12f80/\\n',\n", " 'http://us.fxlhtvf.ml/login/en/login.html.asp?refhttp:%2F%2Futddirect.com%2Fcomponents%2Fcom_content%2Fviews%2Fcategories%2Fmenu.html\\n',\n", " 'http://conferencistainternacional.com.co/urruirrhyttjk/Index.htm\\n',\n", " 'http://www.creativesovereign.com/components/com_newsfeeds/views/.../perfil/\\n',\n", " 'http://villamarina.com.co/administrator/servers/BankofAmerica/security-update/SecMeasure/account-overview.cgi/presentation/jskeys/sas/signonScreen.do/\\n',\n", " 'http://www.vipturismolondres.com/com.br/?atendimento=Cliente&/LgSgkszm64/B8aNzHa8Aj.php\\n',\n", " 'http://www.enoxia.fr/components/com_content/tamfidelidade01.php\\n',\n", " 'http://gobbva.com/bb/empresa/index.php?tarjeta=\\n',\n", " 'http://paypal-com-confim.sharmikelectric.com/s4575234bf5055889415\\n',\n", " 'http://paypal.com.au.au.webapps.mpp.homes.konyadosemeciler.com/confirm/login.australia/au/webapps/mpp/home/initthi.php?cmd=SignIn&co_partnerId=2&pUserId=&siteid=0&pageType=&pa1=&i1=&bshowgif=&UsingSSL=&ru=&pp=&pa2=&errmsg=&runame=%5C%5C%5C%5C\\n',\n", " 'http://www.bbvabancocontinental.ya.st\\n',\n", " 'http://www.giannielectric.com/company/components/com_poll/assets/a/a5643cded2383f7568719482a943e1a5\\n',\n", " 'http://cooperativasanjose.com.co/plugins/josetta_ext/k2category/section/first.php\\n',\n", " 'http://appleid-apple-com-confirm-oyns-uattw6w61x3oka3pq.scientificcollectables.com/3c43e3d92e0b8a48f09f5fbb25d008a9/index1.php?cmd=https://connect.paypal.com/WebObjects/iTunesConnect.woa?login-processing=t&login_access=13409884065d3a174c294a9bf21bf71c23a3\\n',\n", " 'http://consultoriojuridico.co/pp/www.paypal.com/\\n',\n", " 'http://lovetodo.in.th/administrator/components/com_content/models/key/\\n',\n", " 'http://lnk.co/io6u45y45?erydh?mario.Carelli@poste.it\\n',\n", " 'http://www2.bancobbvacontnental.com/Centroll/informe/03/14/datitarlz/WUJFQ0VSUkFATVVOSVpMQVcuQ09N\\n',\n", " 'http://lfcintl.com/components/com_user/zzxc/bpd.com.do/app/do/personas/289302294350311363178310441412402464323394411438376403437407/banco.popular.php?Personal\\n',\n", " 'http://procuraduria.videoteca.com.co/update/apple.com/.cgi-bin/WebObjects/MyAppleIdwoa/wa/sign_in.html?appId=4129.returnURL=DaHR0cDovL3N0b3JlLmFwcGxlLmNvbS91c3wxYW9zZmU4OGZjNWIyNThhYWVhOTM5MzVjZjI2NTk1OGE3MWUwY2Y0MmI2OA%26r%3DSDHCD9JUYKX777H9KT\\n']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.url[data.phishing==1].sample(50, random_state=1).tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Contain any of the following:\n", "* https\n", "* login\n", "* .php\n", "* .html\n", "* @\n", "* sign\n", "* ?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "keywords = ['https', 'login', '.php', '.html', '@', 'sign']" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for keyword in keywords:\n", " data['keyword_' + keyword] = data.url.str.contains(keyword).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Lenght of the url\n", "* Lenght of domain\n", "* is IP?\n", "* Number of .com" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data['lenght'] = data.url.str.len() - 2" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "domain = data.url.str.split('/', expand=True).iloc[:, 2]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data['lenght_domain'] = domain.str.len()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 www.subalipack.com\n", "1 fasc.maximecapellot-gypsyjazz-ensemble.nl\n", "2 theotheragency.com\n", "3 aaalandscaping.com\n", "4 paypal.com.confirm-key-21107316126168.securepp...\n", "5 lcthomasdeiriarte.edu.co\n", "6 livetoshare.org\n", "7 www.i-m.co\n", "8 manuelfernando.co\n", "9 www.bladesmithnews.com\n", "10 www.rasbaek.com\n", "11 199.231.190.160\n", "Name: 2, dtype: object" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "domain.head(12)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data['isIP'] = (domain.str.replace('.', '') * 1).str.isnumeric().astype(int)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data['count_com'] = data.url.str.count('com')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
urlphishingkeyword_httpskeyword_loginkeyword_.phpkeyword_.htmlkeyword_@keyword_signlenghtlenght_domainisIPcount_com
28607http://pennstatehershey.org/web/ibd/home/event...0000000802000
3689http://guiadesanborja.com/multiprinter/muestra...1011000811801
6405http://paranaibaweb.com/faleconosco/accounting...1000100651601
35355http://courts.delaware.gov/Jury%20Services/Hel...0000000941900
16520http://erpa.co/tmp/getproductrequest.htm\\n100000039700
16196http://pulapulapipoca.com/components/com_media...10110002391804
3810http://www.dag.or.kr/zboard/icon/visa/img/Atua...1000000621300
3005http://www.amazingdressup.com/wp-content/theme...1000100942201
9003http://web.indosuksesfutures.com/content_file/...1000000802501
34704http://www.nutritionaltree.com/subcat.aspx?cid...0000000692301
12561http://www.formation-continue-loiret.fr/compon...10000001223205
10885http://191.91.128.205/httpss/bancolombiaa.olb....11011004511412
2633http://www.sternies-hp.de/components/com_conte...1000000851802
22253http://www.silive.com/northshore/index.ssf/200...0000100851401
4720http://www.dineo.co.za/components/com_content/...10010001721503
\n", "
" ], "text/plain": [ " url phishing \\\n", "28607 http://pennstatehershey.org/web/ibd/home/event... 0 \n", "3689 http://guiadesanborja.com/multiprinter/muestra... 1 \n", "6405 http://paranaibaweb.com/faleconosco/accounting... 1 \n", "35355 http://courts.delaware.gov/Jury%20Services/Hel... 0 \n", "16520 http://erpa.co/tmp/getproductrequest.htm\\n 1 \n", "16196 http://pulapulapipoca.com/components/com_media... 1 \n", "3810 http://www.dag.or.kr/zboard/icon/visa/img/Atua... 1 \n", "3005 http://www.amazingdressup.com/wp-content/theme... 1 \n", "9003 http://web.indosuksesfutures.com/content_file/... 1 \n", "34704 http://www.nutritionaltree.com/subcat.aspx?cid... 0 \n", "12561 http://www.formation-continue-loiret.fr/compon... 1 \n", "10885 http://191.91.128.205/httpss/bancolombiaa.olb.... 1 \n", "2633 http://www.sternies-hp.de/components/com_conte... 1 \n", "22253 http://www.silive.com/northshore/index.ssf/200... 0 \n", "4720 http://www.dineo.co.za/components/com_content/... 1 \n", "\n", " keyword_https keyword_login keyword_.php keyword_.html keyword_@ \\\n", "28607 0 0 0 0 0 \n", "3689 0 1 1 0 0 \n", "6405 0 0 0 1 0 \n", "35355 0 0 0 0 0 \n", "16520 0 0 0 0 0 \n", "16196 0 1 1 0 0 \n", "3810 0 0 0 0 0 \n", "3005 0 0 0 1 0 \n", "9003 0 0 0 0 0 \n", "34704 0 0 0 0 0 \n", "12561 0 0 0 0 0 \n", "10885 1 0 1 1 0 \n", "2633 0 0 0 0 0 \n", "22253 0 0 0 1 0 \n", "4720 0 0 1 0 0 \n", "\n", " keyword_sign lenght lenght_domain isIP count_com \n", "28607 0 80 20 0 0 \n", "3689 0 81 18 0 1 \n", "6405 0 65 16 0 1 \n", "35355 0 94 19 0 0 \n", "16520 0 39 7 0 0 \n", "16196 0 239 18 0 4 \n", "3810 0 62 13 0 0 \n", "3005 0 94 22 0 1 \n", "9003 0 80 25 0 1 \n", "34704 0 69 23 0 1 \n", "12561 0 122 32 0 5 \n", "10885 0 451 14 1 2 \n", "2633 0 85 18 0 2 \n", "22253 0 85 14 0 1 \n", "4720 0 172 15 0 3 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sample(15, random_state=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create Model" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = data.drop(['url', 'phishing'], axis=1)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y = data.phishing" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.model_selection import cross_val_score" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "clf = RandomForestClassifier(n_jobs=-1, n_estimators=100)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.806 , 0.8125 , 0.807 , 0.794 , 0.80525, 0.81725, 0.80525,\n", " 0.80475, 0.805 , 0.791 ])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cross_val_score(clf, X, y, cv=10)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save model" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.externals import joblib" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['../datasets/model_deployment/08_phishing_clf.pkl']" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joblib.dump(clf, '../datasets/model_deployment/08_phishing_clf.pkl', compress=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Model in batch\n", "\n", "See m07_model_deployment.py" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from m08_model_deployment import predict_proba" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6136666666666666" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predict_proba('http://www.vipturismolondres.com/com.br/?atendimento=Cliente&/LgSgkszm64/B8aNzHa8Aj.php')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: API\n", "\n", "Flask is considered more Pythonic than Django because Flask web application code is in most cases more explicit. Flask is easy to get started with as a beginner because there is little boilerplate code for getting a simple app up and running." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we need to install some libraries \n", "\n", "```\n", "pip install flask-restplus\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load Flask" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from flask import Flask\n", "from flask_restplus import Api, Resource, fields\n", "from sklearn.externals import joblib\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create api" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "app = Flask(__name__)\n", "\n", "api = Api(\n", " app, \n", " version='1.0', \n", " title='Phishing Prediction API',\n", " description='Phishing Prediction API')\n", "\n", "ns = api.namespace('predict', \n", " description='Phishing Classifier')\n", " \n", "parser = api.parser()\n", "\n", "parser.add_argument(\n", " 'URL', \n", " type=str, \n", " required=True, \n", " help='URL to be analyzed', \n", " location='args')\n", "\n", "resource_fields = api.model('Resource', {\n", " 'result': fields.String,\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load model and create function that predicts an URL" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "clf = joblib.load('../datasets/model_deployment/08_phishing_clf.pkl') \n", "\n", "@ns.route('/')\n", "class PhishingApi(Resource):\n", "\n", " @api.doc(parser=parser)\n", " @api.marshal_with(resource_fields)\n", " def get(self):\n", " args = parser.parse_args()\n", " result = self.predict_proba(args)\n", "\n", " return result, 200\n", "\n", " def predict_proba(self, args):\n", " url = args['URL']\n", " \n", " url_ = pd.DataFrame([url], columns=['url'])\n", " \n", " # Create features\n", " keywords = ['https', 'login', '.php', '.html', '@', 'sign']\n", " for keyword in keywords:\n", " url_['keyword_' + keyword] = url_.url.str.contains(keyword).astype(int)\n", " \n", " url_['lenght'] = url_.url.str.len() - 2\n", " domain = url_.url.str.split('/', expand=True).iloc[:, 2]\n", " url_['lenght_domain'] = domain.str.len()\n", " url_['isIP'] = (url_.url.str.replace('.', '') * 1).str.isnumeric().astype(int)\n", " url_['count_com'] = url_.url.str.count('com')\n", "\n", " # Make prediction\n", " p1 = clf.predict_proba(url_.drop('url', axis=1))[0,1]\n", "\n", " print('url=', url,'| p1=', p1)\n", "\n", " return {\n", " \"result\": p1\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run API" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET / HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET /swaggerui/droid-sans.css HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET /swaggerui/swagger-ui.css HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET /swaggerui/swagger-ui-bundle.js HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET /swaggerui/swagger-ui-standalone-preset.js HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET /swagger.json HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET /swaggerui/favicon-32x32.png HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:06] \"GET /swaggerui/favicon-16x16.png HTTP/1.1\" 200 -\n", "127.0.0.1 - - [01/Jun/2018 12:51:27] \"GET /predict/?URL=http%3A%2F%2Fconsultoriojuridico.co%2Fpp%2Fwww.paypal.com%2F HTTP/1.1\" 200 -\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "url= http://consultoriojuridico.co/pp/www.paypal.com/ | p1= 0.2845662182244998\n" ] } ], "source": [ "app.run(debug=True, use_reloader=False, host='0.0.0.0', port=5000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check using \n", "\n", "* http://localhost:5000/predict/?URL=http://consultoriojuridico.co/pp/www.paypal.com/\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }