{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ " # Naive Bayes NLTK Demo\n", " \n", " $$ P(y \\mid x) = \\underbrace{P(y)}_{\\textit{prior}} \\prod_i P( f_i \\mid y) $$\n", " " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Features used to classify a name as male or female:\n", "{ 'alwayson': True,\n", " 'count(a)': 1,\n", " 'count(b)': 0,\n", " 'count(c)': 0,\n", " 'count(d)': 0,\n", " 'count(e)': 0,\n", " 'count(f)': 0,\n", " 'count(g)': 0,\n", " 'count(h)': 0,\n", " 'count(i)': 0,\n", " 'count(j)': 0,\n", " 'count(k)': 0,\n", " 'count(l)': 0,\n", " 'count(m)': 0,\n", " 'count(n)': 1,\n", " 'count(o)': 2,\n", " 'count(p)': 1,\n", " 'count(q)': 0,\n", " 'count(r)': 0,\n", " 'count(s)': 0,\n", " 'count(t)': 0,\n", " 'count(u)': 0,\n", " 'count(v)': 0,\n", " 'count(w)': 0,\n", " 'count(x)': 0,\n", " 'count(y)': 0,\n", " 'count(z)': 0,\n", " 'endswith': 'p',\n", " 'has(a)': True,\n", " 'has(b)': False,\n", " 'has(c)': False,\n", " 'has(d)': False,\n", " 'has(e)': False,\n", " 'has(f)': False,\n", " 'has(g)': False,\n", " 'has(h)': False,\n", " 'has(i)': False,\n", " 'has(j)': False,\n", " 'has(k)': False,\n", " 'has(l)': False,\n", " 'has(m)': False,\n", " 'has(n)': True,\n", " 'has(o)': True,\n", " 'has(p)': True,\n", " 'has(q)': False,\n", " 'has(r)': False,\n", " 'has(s)': False,\n", " 'has(t)': False,\n", " 'has(u)': False,\n", " 'has(v)': False,\n", " 'has(w)': False,\n", " 'has(x)': False,\n", " 'has(y)': False,\n", " 'has(z)': False,\n", " 'startswith': 'a'}\n" ] } ], "source": [ "import pprint\n", "from nltk.classify.naivebayes import NaiveBayesClassifier\n", "from nltk.classify.util import names_demo, names_demo_features\n", "\n", "print(\"Features used to classify a name as male or female:\")\n", "pp = pprint.PrettyPrinter(indent=4)\n", "test_features = names_demo_features(\"anoop\")\n", "pp.pprint(test_features)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train NaiveBayes classifier and run on some example input names:\n", "Training classifier...\n", "Testing classifier...\n", "Accuracy: 0.7820\n", "Avg. log likelihood: -0.7476\n", "\n", "Unseen Names P(Male) P(Female)\n", "----------------------------------------\n", " Kelli 0.0132 *0.9868\n", " Er *0.8826 0.1174\n", " Ally 0.0903 *0.9097\n", " Stephan *0.8361 0.1639\n", " Chriss 0.6864 *0.3136\n" ] } ], "source": [ "print(\"Train NaiveBayes classifier and run on some example input names:\")\n", "classifier = names_demo(NaiveBayesClassifier.train)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Run trained classifier on input name: nate\n", "P(male|nate)=0.08246413295145613\n", "P(female|nate)=0.9175358670485438\n" ] } ], "source": [ "name='nate'\n", "print(\"Run trained classifier on input name:\", name)\n", "test_features = names_demo_features(name)\n", "output = classifier.prob_classify(test_features)\n", "print(\"P(male|{0})={1}\".format(name,output.prob('male')))\n", "print(\"P(female|{0})={1}\".format(name,output.prob('female')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Most informative features\n", "\n", "The informativeness of a feature `feature_type = feature_value` or $f=v$ is computed by taking the ratio of choosing one label over the other, so if there are two labels: $\\ell_1$ or $\\ell_2$\n", "\n", "$$ score(f=v) = \\frac{ P( f=v \\mid \\ell_1 ) }{ P( f=v \\mid \\ell_2 ) } $$\n", "\n", "If there are more than 2 labels, say $\\ell_1, \\ldots \\ell_n$, then just compare one label versus all others:\n", "\n", "$$ score(f=v) = \\frac{ P( f=v \\mid \\ell_i ) }{ \\sum_{k \\neq i} P( f=v \\mid \\ell_k ) } $$\n", "\n", "We sort all the features by this score and report the top 10 below.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Most Informative Features\n", " endswith = 'a' female : male = 31.5 : 1.0\n", " endswith = 'p' male : female = 14.2 : 1.0\n", " endswith = 'v' male : female = 13.0 : 1.0\n", " endswith = 'f' male : female = 10.5 : 1.0\n", " endswith = 'm' male : female = 10.3 : 1.0\n", " endswith = 'd' male : female = 10.2 : 1.0\n", " endswith = 'o' male : female = 7.7 : 1.0\n", " count(v) = 2 female : male = 6.5 : 1.0\n", " endswith = 'r' male : female = 6.4 : 1.0\n", " endswith = 'w' male : female = 6.1 : 1.0\n" ] } ], "source": [ "classifier.show_most_informative_features()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{ 'alwayson': True,\n", " 'count(a)': 0,\n", " 'count(b)': 0,\n", " 'count(c)': 0,\n", " 'count(d)': 1,\n", " 'count(e)': 2,\n", " 'count(f)': 0,\n", " 'count(g)': 0,\n", " 'count(h)': 0,\n", " 'count(i)': 0,\n", " 'count(j)': 0,\n", " 'count(k)': 0,\n", " 'count(l)': 0,\n", " 'count(m)': 1,\n", " 'count(n)': 1,\n", " 'count(o)': 1,\n", " 'count(p)': 0,\n", " 'count(q)': 0,\n", " 'count(r)': 1,\n", " 'count(s)': 0,\n", " 'count(t)': 1,\n", " 'count(u)': 0,\n", " 'count(v)': 0,\n", " 'count(w)': 0,\n", " 'count(x)': 0,\n", " 'count(y)': 0,\n", " 'count(z)': 0,\n", " 'count2(aa)': 0,\n", " 'count2(ab)': 0,\n", " 'count2(ac)': 0,\n", " 'count2(ad)': 0,\n", " 'count2(ae)': 0,\n", " 'count2(af)': 0,\n", " 'count2(ag)': 0,\n", " 'count2(ah)': 0,\n", " 'count2(ai)': 0,\n", " 'count2(aj)': 0,\n", " 'count2(ak)': 0,\n", " 'count2(al)': 0,\n", " 'count2(am)': 0,\n", " 'count2(an)': 0,\n", " 'count2(ao)': 0,\n", " 'count2(ap)': 0,\n", " 'count2(aq)': 0,\n", " 'count2(ar)': 0,\n", " 'count2(as)': 0,\n", " 'count2(at)': 0,\n", " 'count2(au)': 0,\n", " 'count2(av)': 0,\n", " 'count2(aw)': 0,\n", " 'count2(ax)': 0,\n", " 'count2(ay)': 0,\n", " 'count2(az)': 0,\n", " 'count2(ba)': 0,\n", " 'count2(bb)': 0,\n", " 'count2(bc)': 0,\n", " 'count2(bd)': 0,\n", " 'count2(be)': 0,\n", " 'count2(bf)': 0,\n", " 'count2(bg)': 0,\n", " 'count2(bh)': 0,\n", " 'count2(bi)': 0,\n", " 'count2(bj)': 0,\n", " 'count2(bk)': 0,\n", " 'count2(bl)': 0,\n", " 'count2(bm)': 0,\n", " 'count2(bn)': 0,\n", " 'count2(bo)': 0,\n", " 'count2(bp)': 0,\n", " 'count2(bq)': 0,\n", " 'count2(br)': 0,\n", " 'count2(bs)': 0,\n", " 'count2(bt)': 0,\n", " 'count2(bu)': 0,\n", " 'count2(bv)': 0,\n", " 'count2(bw)': 0,\n", " 'count2(bx)': 0,\n", " 'count2(by)': 0,\n", " 'count2(bz)': 0,\n", " 'count2(ca)': 0,\n", " 'count2(cb)': 0,\n", " 'count2(cc)': 0,\n", " 'count2(cd)': 0,\n", " 'count2(ce)': 0,\n", " 'count2(cf)': 0,\n", " 'count2(cg)': 0,\n", " 'count2(ch)': 0,\n", " 'count2(ci)': 0,\n", " 'count2(cj)': 0,\n", " 'count2(ck)': 0,\n", " 'count2(cl)': 0,\n", " 'count2(cm)': 0,\n", " 'count2(cn)': 0,\n", " 'count2(co)': 0,\n", " 'count2(cp)': 0,\n", " 'count2(cq)': 0,\n", " 'count2(cr)': 0,\n", " 'count2(cs)': 0,\n", " 'count2(ct)': 0,\n", " 'count2(cu)': 0,\n", " 'count2(cv)': 0,\n", " 'count2(cw)': 0,\n", " 'count2(cx)': 0,\n", " 'count2(cy)': 0,\n", " 'count2(cz)': 0,\n", " 'count2(da)': 0,\n", " 'count2(db)': 0,\n", " 'count2(dc)': 0,\n", " 'count2(dd)': 0,\n", " 'count2(de)': 1,\n", " 'count2(df)': 0,\n", " 'count2(dg)': 0,\n", " 'count2(dh)': 0,\n", " 'count2(di)': 0,\n", " 'count2(dj)': 0,\n", " 'count2(dk)': 0,\n", " 'count2(dl)': 0,\n", " 'count2(dm)': 0,\n", " 'count2(dn)': 0,\n", " 'count2(do)': 0,\n", " 'count2(dp)': 0,\n", " 'count2(dq)': 0,\n", " 'count2(dr)': 0,\n", " 'count2(ds)': 0,\n", " 'count2(dt)': 0,\n", " 'count2(du)': 0,\n", " 'count2(dv)': 0,\n", " 'count2(dw)': 0,\n", " 'count2(dx)': 0,\n", " 'count2(dy)': 0,\n", " 'count2(dz)': 0,\n", " 'count2(ea)': 0,\n", " 'count2(eb)': 0,\n", " 'count2(ec)': 0,\n", " 'count2(ed)': 0,\n", " 'count2(ee)': 0,\n", " 'count2(ef)': 0,\n", " 'count2(eg)': 0,\n", " 'count2(eh)': 0,\n", " 'count2(ei)': 0,\n", " 'count2(ej)': 0,\n", " 'count2(ek)': 0,\n", " 'count2(el)': 0,\n", " 'count2(em)': 1,\n", " 'count2(en)': 1,\n", " 'count2(eo)': 0,\n", " 'count2(ep)': 0,\n", " 'count2(eq)': 0,\n", " 'count2(er)': 0,\n", " 'count2(es)': 0,\n", " 'count2(et)': 0,\n", " 'count2(eu)': 0,\n", " 'count2(ev)': 0,\n", " 'count2(ew)': 0,\n", " 'count2(ex)': 0,\n", " 'count2(ey)': 0,\n", " 'count2(ez)': 0,\n", " 'count2(fa)': 0,\n", " 'count2(fb)': 0,\n", " 'count2(fc)': 0,\n", " 'count2(fd)': 0,\n", " 'count2(fe)': 0,\n", " 'count2(ff)': 0,\n", " 'count2(fg)': 0,\n", " 'count2(fh)': 0,\n", " 'count2(fi)': 0,\n", " 'count2(fj)': 0,\n", " 'count2(fk)': 0,\n", " 'count2(fl)': 0,\n", " 'count2(fm)': 0,\n", " 'count2(fn)': 0,\n", " 'count2(fo)': 0,\n", " 'count2(fp)': 0,\n", " 'count2(fq)': 0,\n", " 'count2(fr)': 0,\n", " 'count2(fs)': 0,\n", " 'count2(ft)': 0,\n", " 'count2(fu)': 0,\n", " 'count2(fv)': 0,\n", " 'count2(fw)': 0,\n", " 'count2(fx)': 0,\n", " 'count2(fy)': 0,\n", " 'count2(fz)': 0,\n", " 'count2(ga)': 0,\n", " 'count2(gb)': 0,\n", " 'count2(gc)': 0,\n", " 'count2(gd)': 0,\n", " 'count2(ge)': 0,\n", " 'count2(gf)': 0,\n", " 'count2(gg)': 0,\n", " 'count2(gh)': 0,\n", " 'count2(gi)': 0,\n", " 'count2(gj)': 0,\n", " 'count2(gk)': 0,\n", " 'count2(gl)': 0,\n", " 'count2(gm)': 0,\n", " 'count2(gn)': 0,\n", " 'count2(go)': 0,\n", " 'count2(gp)': 0,\n", " 'count2(gq)': 0,\n", " 'count2(gr)': 0,\n", " 'count2(gs)': 0,\n", " 'count2(gt)': 0,\n", " 'count2(gu)': 0,\n", " 'count2(gv)': 0,\n", " 'count2(gw)': 0,\n", " 'count2(gx)': 0,\n", " 'count2(gy)': 0,\n", " 'count2(gz)': 0,\n", " 'count2(ha)': 0,\n", " 'count2(hb)': 0,\n", " 'count2(hc)': 0,\n", " 'count2(hd)': 0,\n", " 'count2(he)': 0,\n", " 'count2(hf)': 0,\n", " 'count2(hg)': 0,\n", " 'count2(hh)': 0,\n", " 'count2(hi)': 0,\n", " 'count2(hj)': 0,\n", " 'count2(hk)': 0,\n", " 'count2(hl)': 0,\n", " 'count2(hm)': 0,\n", " 'count2(hn)': 0,\n", " 'count2(ho)': 0,\n", " 'count2(hp)': 0,\n", " 'count2(hq)': 0,\n", " 'count2(hr)': 0,\n", " 'count2(hs)': 0,\n", " 'count2(ht)': 0,\n", " 'count2(hu)': 0,\n", " 'count2(hv)': 0,\n", " 'count2(hw)': 0,\n", " 'count2(hx)': 0,\n", " 'count2(hy)': 0,\n", " 'count2(hz)': 0,\n", " 'count2(ia)': 0,\n", " 'count2(ib)': 0,\n", " 'count2(ic)': 0,\n", " 'count2(id)': 0,\n", " 'count2(ie)': 0,\n", " 'count2(if)': 0,\n", " 'count2(ig)': 0,\n", " 'count2(ih)': 0,\n", " 'count2(ii)': 0,\n", " 'count2(ij)': 0,\n", " 'count2(ik)': 0,\n", " 'count2(il)': 0,\n", " 'count2(im)': 0,\n", " 'count2(in)': 0,\n", " 'count2(io)': 0,\n", " 'count2(ip)': 0,\n", " 'count2(iq)': 0,\n", " 'count2(ir)': 0,\n", " 'count2(is)': 0,\n", " 'count2(it)': 0,\n", " 'count2(iu)': 0,\n", " 'count2(iv)': 0,\n", " 'count2(iw)': 0,\n", " 'count2(ix)': 0,\n", " 'count2(iy)': 0,\n", " 'count2(iz)': 0,\n", " 'count2(ja)': 0,\n", " 'count2(jb)': 0,\n", " 'count2(jc)': 0,\n", " 'count2(jd)': 0,\n", " 'count2(je)': 0,\n", " 'count2(jf)': 0,\n", " 'count2(jg)': 0,\n", " 'count2(jh)': 0,\n", " 'count2(ji)': 0,\n", " 'count2(jj)': 0,\n", " 'count2(jk)': 0,\n", " 'count2(jl)': 0,\n", " 'count2(jm)': 0,\n", " 'count2(jn)': 0,\n", " 'count2(jo)': 0,\n", " 'count2(jp)': 0,\n", " 'count2(jq)': 0,\n", " 'count2(jr)': 0,\n", " 'count2(js)': 0,\n", " 'count2(jt)': 0,\n", " 'count2(ju)': 0,\n", " 'count2(jv)': 0,\n", " 'count2(jw)': 0,\n", " 'count2(jx)': 0,\n", " 'count2(jy)': 0,\n", " 'count2(jz)': 0,\n", " 'count2(ka)': 0,\n", " 'count2(kb)': 0,\n", " 'count2(kc)': 0,\n", " 'count2(kd)': 0,\n", " 'count2(ke)': 0,\n", " 'count2(kf)': 0,\n", " 'count2(kg)': 0,\n", " 'count2(kh)': 0,\n", " 'count2(ki)': 0,\n", " 'count2(kj)': 0,\n", " 'count2(kk)': 0,\n", " 'count2(kl)': 0,\n", " 'count2(km)': 0,\n", " 'count2(kn)': 0,\n", " 'count2(ko)': 0,\n", " 'count2(kp)': 0,\n", " 'count2(kq)': 0,\n", " 'count2(kr)': 0,\n", " 'count2(ks)': 0,\n", " 'count2(kt)': 0,\n", " 'count2(ku)': 0,\n", " 'count2(kv)': 0,\n", " 'count2(kw)': 0,\n", " 'count2(kx)': 0,\n", " 'count2(ky)': 0,\n", " 'count2(kz)': 0,\n", " 'count2(la)': 0,\n", " 'count2(lb)': 0,\n", " 'count2(lc)': 0,\n", " 'count2(ld)': 0,\n", " 'count2(le)': 0,\n", " 'count2(lf)': 0,\n", " 'count2(lg)': 0,\n", " 'count2(lh)': 0,\n", " 'count2(li)': 0,\n", " 'count2(lj)': 0,\n", " 'count2(lk)': 0,\n", " 'count2(ll)': 0,\n", " 'count2(lm)': 0,\n", " 'count2(ln)': 0,\n", " 'count2(lo)': 0,\n", " 'count2(lp)': 0,\n", " 'count2(lq)': 0,\n", " 'count2(lr)': 0,\n", " 'count2(ls)': 0,\n", " 'count2(lt)': 0,\n", " 'count2(lu)': 0,\n", " 'count2(lv)': 0,\n", " 'count2(lw)': 0,\n", " 'count2(lx)': 0,\n", " 'count2(ly)': 0,\n", " 'count2(lz)': 0,\n", " 'count2(ma)': 0,\n", " 'count2(mb)': 0,\n", " 'count2(mc)': 0,\n", " 'count2(md)': 0,\n", " 'count2(me)': 1,\n", " 'count2(mf)': 0,\n", " 'count2(mg)': 0,\n", " 'count2(mh)': 0,\n", " 'count2(mi)': 0,\n", " 'count2(mj)': 0,\n", " 'count2(mk)': 0,\n", " 'count2(ml)': 0,\n", " 'count2(mm)': 0,\n", " 'count2(mn)': 0,\n", " 'count2(mo)': 0,\n", " 'count2(mp)': 0,\n", " 'count2(mq)': 0,\n", " 'count2(mr)': 0,\n", " 'count2(ms)': 0,\n", " 'count2(mt)': 0,\n", " 'count2(mu)': 0,\n", " 'count2(mv)': 0,\n", " 'count2(mw)': 0,\n", " 'count2(mx)': 0,\n", " 'count2(my)': 0,\n", " 'count2(mz)': 0,\n", " 'count2(na)': 0,\n", " 'count2(nb)': 0,\n", " 'count2(nc)': 0,\n", " 'count2(nd)': 0,\n", " 'count2(ne)': 0,\n", " 'count2(nf)': 0,\n", " 'count2(ng)': 0,\n", " 'count2(nh)': 0,\n", " 'count2(ni)': 0,\n", " 'count2(nj)': 0,\n", " 'count2(nk)': 0,\n", " 'count2(nl)': 0,\n", " 'count2(nm)': 0,\n", " 'count2(nn)': 0,\n", " 'count2(no)': 0,\n", " 'count2(np)': 0,\n", " 'count2(nq)': 0,\n", " 'count2(nr)': 0,\n", " 'count2(ns)': 0,\n", " 'count2(nt)': 1,\n", " 'count2(nu)': 0,\n", " 'count2(nv)': 0,\n", " 'count2(nw)': 0,\n", " 'count2(nx)': 0,\n", " 'count2(ny)': 0,\n", " 'count2(nz)': 0,\n", " 'count2(oa)': 0,\n", " 'count2(ob)': 0,\n", " 'count2(oc)': 0,\n", " 'count2(od)': 0,\n", " 'count2(oe)': 0,\n", " 'count2(of)': 0,\n", " 'count2(og)': 0,\n", " 'count2(oh)': 0,\n", " 'count2(oi)': 0,\n", " 'count2(oj)': 0,\n", " 'count2(ok)': 0,\n", " 'count2(ol)': 0,\n", " 'count2(om)': 0,\n", " 'count2(on)': 0,\n", " 'count2(oo)': 0,\n", " 'count2(op)': 0,\n", " 'count2(oq)': 0,\n", " 'count2(or)': 1,\n", " 'count2(os)': 0,\n", " 'count2(ot)': 0,\n", " 'count2(ou)': 0,\n", " 'count2(ov)': 0,\n", " 'count2(ow)': 0,\n", " 'count2(ox)': 0,\n", " 'count2(oy)': 0,\n", " 'count2(oz)': 0,\n", " 'count2(pa)': 0,\n", " 'count2(pb)': 0,\n", " 'count2(pc)': 0,\n", " 'count2(pd)': 0,\n", " 'count2(pe)': 0,\n", " 'count2(pf)': 0,\n", " 'count2(pg)': 0,\n", " 'count2(ph)': 0,\n", " 'count2(pi)': 0,\n", " 'count2(pj)': 0,\n", " 'count2(pk)': 0,\n", " 'count2(pl)': 0,\n", " 'count2(pm)': 0,\n", " 'count2(pn)': 0,\n", " 'count2(po)': 0,\n", " 'count2(pp)': 0,\n", " 'count2(pq)': 0,\n", " 'count2(pr)': 0,\n", " 'count2(ps)': 0,\n", " 'count2(pt)': 0,\n", " 'count2(pu)': 0,\n", " 'count2(pv)': 0,\n", " 'count2(pw)': 0,\n", " 'count2(px)': 0,\n", " 'count2(py)': 0,\n", " 'count2(pz)': 0,\n", " 'count2(qa)': 0,\n", " 'count2(qb)': 0,\n", " 'count2(qc)': 0,\n", " 'count2(qd)': 0,\n", " 'count2(qe)': 0,\n", " 'count2(qf)': 0,\n", " 'count2(qg)': 0,\n", " 'count2(qh)': 0,\n", " 'count2(qi)': 0,\n", " 'count2(qj)': 0,\n", " 'count2(qk)': 0,\n", " 'count2(ql)': 0,\n", " 'count2(qm)': 0,\n", " 'count2(qn)': 0,\n", " 'count2(qo)': 0,\n", " 'count2(qp)': 0,\n", " 'count2(qq)': 0,\n", " 'count2(qr)': 0,\n", " 'count2(qs)': 0,\n", " 'count2(qt)': 0,\n", " 'count2(qu)': 0,\n", " 'count2(qv)': 0,\n", " 'count2(qw)': 0,\n", " 'count2(qx)': 0,\n", " 'count2(qy)': 0,\n", " 'count2(qz)': 0,\n", " 'count2(ra)': 0,\n", " 'count2(rb)': 0,\n", " 'count2(rc)': 0,\n", " 'count2(rd)': 0,\n", " 'count2(re)': 0,\n", " 'count2(rf)': 0,\n", " 'count2(rg)': 0,\n", " 'count2(rh)': 0,\n", " 'count2(ri)': 0,\n", " 'count2(rj)': 0,\n", " 'count2(rk)': 0,\n", " 'count2(rl)': 0,\n", " 'count2(rm)': 0,\n", " 'count2(rn)': 0,\n", " 'count2(ro)': 0,\n", " 'count2(rp)': 0,\n", " 'count2(rq)': 0,\n", " 'count2(rr)': 0,\n", " 'count2(rs)': 0,\n", " 'count2(rt)': 0,\n", " 'count2(ru)': 0,\n", " 'count2(rv)': 0,\n", " 'count2(rw)': 0,\n", " 'count2(rx)': 0,\n", " 'count2(ry)': 0,\n", " 'count2(rz)': 0,\n", " 'count2(sa)': 0,\n", " 'count2(sb)': 0,\n", " 'count2(sc)': 0,\n", " 'count2(sd)': 0,\n", " 'count2(se)': 0,\n", " 'count2(sf)': 0,\n", " 'count2(sg)': 0,\n", " 'count2(sh)': 0,\n", " 'count2(si)': 0,\n", " 'count2(sj)': 0,\n", " 'count2(sk)': 0,\n", " 'count2(sl)': 0,\n", " 'count2(sm)': 0,\n", " 'count2(sn)': 0,\n", " 'count2(so)': 0,\n", " 'count2(sp)': 0,\n", " 'count2(sq)': 0,\n", " 'count2(sr)': 0,\n", " 'count2(ss)': 0,\n", " 'count2(st)': 0,\n", " 'count2(su)': 0,\n", " 'count2(sv)': 0,\n", " 'count2(sw)': 0,\n", " 'count2(sx)': 0,\n", " 'count2(sy)': 0,\n", " 'count2(sz)': 0,\n", " 'count2(ta)': 0,\n", " 'count2(tb)': 0,\n", " 'count2(tc)': 0,\n", " 'count2(td)': 0,\n", " 'count2(te)': 0,\n", " 'count2(tf)': 0,\n", " 'count2(tg)': 0,\n", " 'count2(th)': 0,\n", " 'count2(ti)': 0,\n", " 'count2(tj)': 0,\n", " 'count2(tk)': 0,\n", " 'count2(tl)': 0,\n", " 'count2(tm)': 0,\n", " 'count2(tn)': 0,\n", " 'count2(to)': 1,\n", " 'count2(tp)': 0,\n", " 'count2(tq)': 0,\n", " 'count2(tr)': 0,\n", " 'count2(ts)': 0,\n", " 'count2(tt)': 0,\n", " 'count2(tu)': 0,\n", " 'count2(tv)': 0,\n", " 'count2(tw)': 0,\n", " 'count2(tx)': 0,\n", " 'count2(ty)': 0,\n", " 'count2(tz)': 0,\n", " 'count2(ua)': 0,\n", " 'count2(ub)': 0,\n", " 'count2(uc)': 0,\n", " 'count2(ud)': 0,\n", " 'count2(ue)': 0,\n", " 'count2(uf)': 0,\n", " 'count2(ug)': 0,\n", " 'count2(uh)': 0,\n", " 'count2(ui)': 0,\n", " 'count2(uj)': 0,\n", " 'count2(uk)': 0,\n", " 'count2(ul)': 0,\n", " 'count2(um)': 0,\n", " 'count2(un)': 0,\n", " 'count2(uo)': 0,\n", " 'count2(up)': 0,\n", " 'count2(uq)': 0,\n", " 'count2(ur)': 0,\n", " 'count2(us)': 0,\n", " 'count2(ut)': 0,\n", " 'count2(uu)': 0,\n", " 'count2(uv)': 0,\n", " 'count2(uw)': 0,\n", " 'count2(ux)': 0,\n", " 'count2(uy)': 0,\n", " 'count2(uz)': 0,\n", " 'count2(va)': 0,\n", " 'count2(vb)': 0,\n", " 'count2(vc)': 0,\n", " 'count2(vd)': 0,\n", " 'count2(ve)': 0,\n", " 'count2(vf)': 0,\n", " 'count2(vg)': 0,\n", " 'count2(vh)': 0,\n", " 'count2(vi)': 0,\n", " 'count2(vj)': 0,\n", " 'count2(vk)': 0,\n", " 'count2(vl)': 0,\n", " 'count2(vm)': 0,\n", " 'count2(vn)': 0,\n", " 'count2(vo)': 0,\n", " 'count2(vp)': 0,\n", " 'count2(vq)': 0,\n", " 'count2(vr)': 0,\n", " 'count2(vs)': 0,\n", " 'count2(vt)': 0,\n", " 'count2(vu)': 0,\n", " 'count2(vv)': 0,\n", " 'count2(vw)': 0,\n", " 'count2(vx)': 0,\n", " 'count2(vy)': 0,\n", " 'count2(vz)': 0,\n", " 'count2(wa)': 0,\n", " 'count2(wb)': 0,\n", " 'count2(wc)': 0,\n", " 'count2(wd)': 0,\n", " 'count2(we)': 0,\n", " 'count2(wf)': 0,\n", " 'count2(wg)': 0,\n", " 'count2(wh)': 0,\n", " 'count2(wi)': 0,\n", " 'count2(wj)': 0,\n", " 'count2(wk)': 0,\n", " 'count2(wl)': 0,\n", " 'count2(wm)': 0,\n", " 'count2(wn)': 0,\n", " 'count2(wo)': 0,\n", " 'count2(wp)': 0,\n", " 'count2(wq)': 0,\n", " 'count2(wr)': 0,\n", " 'count2(ws)': 0,\n", " 'count2(wt)': 0,\n", " 'count2(wu)': 0,\n", " 'count2(wv)': 0,\n", " 'count2(ww)': 0,\n", " 'count2(wx)': 0,\n", " 'count2(wy)': 0,\n", " 'count2(wz)': 0,\n", " 'count2(xa)': 0,\n", " 'count2(xb)': 0,\n", " 'count2(xc)': 0,\n", " 'count2(xd)': 0,\n", " 'count2(xe)': 0,\n", " 'count2(xf)': 0,\n", " 'count2(xg)': 0,\n", " 'count2(xh)': 0,\n", " 'count2(xi)': 0,\n", " 'count2(xj)': 0,\n", " 'count2(xk)': 0,\n", " 'count2(xl)': 0,\n", " 'count2(xm)': 0,\n", " 'count2(xn)': 0,\n", " 'count2(xo)': 0,\n", " 'count2(xp)': 0,\n", " 'count2(xq)': 0,\n", " 'count2(xr)': 0,\n", " 'count2(xs)': 0,\n", " 'count2(xt)': 0,\n", " 'count2(xu)': 0,\n", " 'count2(xv)': 0,\n", " 'count2(xw)': 0,\n", " 'count2(xx)': 0,\n", " 'count2(xy)': 0,\n", " 'count2(xz)': 0,\n", " 'count2(ya)': 0,\n", " 'count2(yb)': 0,\n", " 'count2(yc)': 0,\n", " 'count2(yd)': 0,\n", " 'count2(ye)': 0,\n", " 'count2(yf)': 0,\n", " 'count2(yg)': 0,\n", " 'count2(yh)': 0,\n", " 'count2(yi)': 0,\n", " 'count2(yj)': 0,\n", " 'count2(yk)': 0,\n", " 'count2(yl)': 0,\n", " 'count2(ym)': 0,\n", " 'count2(yn)': 0,\n", " 'count2(yo)': 0,\n", " 'count2(yp)': 0,\n", " 'count2(yq)': 0,\n", " 'count2(yr)': 0,\n", " 'count2(ys)': 0,\n", " 'count2(yt)': 0,\n", " 'count2(yu)': 0,\n", " 'count2(yv)': 0,\n", " 'count2(yw)': 0,\n", " 'count2(yx)': 0,\n", " 'count2(yy)': 0,\n", " 'count2(yz)': 0,\n", " 'count2(za)': 0,\n", " 'count2(zb)': 0,\n", " 'count2(zc)': 0,\n", " 'count2(zd)': 0,\n", " 'count2(ze)': 0,\n", " 'count2(zf)': 0,\n", " 'count2(zg)': 0,\n", " 'count2(zh)': 0,\n", " 'count2(zi)': 0,\n", " 'count2(zj)': 0,\n", " 'count2(zk)': 0,\n", " 'count2(zl)': 0,\n", " 'count2(zm)': 0,\n", " 'count2(zn)': 0,\n", " 'count2(zo)': 0,\n", " 'count2(zp)': 0,\n", " 'count2(zq)': 0,\n", " 'count2(zr)': 0,\n", " 'count2(zs)': 0,\n", " 'count2(zt)': 0,\n", " 'count2(zu)': 0,\n", " 'count2(zv)': 0,\n", " 'count2(zw)': 0,\n", " 'count2(zx)': 0,\n", " 'count2(zy)': 0,\n", " 'count2(zz)': 0,\n", " 'endswith': 'r',\n", " 'has(a)': False,\n", " 'has(b)': False,\n", " 'has(c)': False,\n", " 'has(d)': True,\n", " 'has(e)': True,\n", " 'has(f)': False,\n", " 'has(g)': False,\n", " 'has(h)': False,\n", " 'has(i)': False,\n", " 'has(j)': False,\n", " 'has(k)': False,\n", " 'has(l)': False,\n", " 'has(m)': True,\n", " 'has(n)': True,\n", " 'has(o)': True,\n", " 'has(p)': False,\n", " 'has(q)': False,\n", " 'has(r)': True,\n", " 'has(s)': False,\n", " 'has(t)': True,\n", " 'has(u)': False,\n", " 'has(v)': False,\n", " 'has(w)': False,\n", " 'has(x)': False,\n", " 'has(y)': False,\n", " 'has(z)': False,\n", " 'has2(aa)': False,\n", " 'has2(ab)': False,\n", " 'has2(ac)': False,\n", " 'has2(ad)': False,\n", " 'has2(ae)': False,\n", " 'has2(af)': False,\n", " 'has2(ag)': False,\n", " 'has2(ah)': False,\n", " 'has2(ai)': False,\n", " 'has2(aj)': False,\n", " 'has2(ak)': False,\n", " 'has2(al)': False,\n", " 'has2(am)': False,\n", " 'has2(an)': False,\n", " 'has2(ao)': False,\n", " 'has2(ap)': False,\n", " 'has2(aq)': False,\n", " 'has2(ar)': False,\n", " 'has2(as)': False,\n", " 'has2(at)': False,\n", " 'has2(au)': False,\n", " 'has2(av)': False,\n", " 'has2(aw)': False,\n", " 'has2(ax)': False,\n", " 'has2(ay)': False,\n", " 'has2(az)': False,\n", " 'has2(ba)': False,\n", " 'has2(bb)': False,\n", " 'has2(bc)': False,\n", " 'has2(bd)': False,\n", " 'has2(be)': False,\n", " 'has2(bf)': False,\n", " 'has2(bg)': False,\n", " 'has2(bh)': False,\n", " 'has2(bi)': False,\n", " 'has2(bj)': False,\n", " 'has2(bk)': False,\n", " 'has2(bl)': False,\n", " 'has2(bm)': False,\n", " 'has2(bn)': False,\n", " 'has2(bo)': False,\n", " 'has2(bp)': False,\n", " 'has2(bq)': False,\n", " 'has2(br)': False,\n", " 'has2(bs)': False,\n", " 'has2(bt)': False,\n", " 'has2(bu)': False,\n", " 'has2(bv)': False,\n", " 'has2(bw)': False,\n", " 'has2(bx)': False,\n", " 'has2(by)': False,\n", " 'has2(bz)': False,\n", " 'has2(ca)': False,\n", " 'has2(cb)': False,\n", " 'has2(cc)': False,\n", " 'has2(cd)': False,\n", " 'has2(ce)': False,\n", " 'has2(cf)': False,\n", " 'has2(cg)': False,\n", " 'has2(ch)': False,\n", " 'has2(ci)': False,\n", " 'has2(cj)': False,\n", " 'has2(ck)': False,\n", " 'has2(cl)': False,\n", " 'has2(cm)': False,\n", " 'has2(cn)': False,\n", " 'has2(co)': False,\n", " 'has2(cp)': False,\n", " 'has2(cq)': False,\n", " 'has2(cr)': False,\n", " 'has2(cs)': False,\n", " 'has2(ct)': False,\n", " 'has2(cu)': False,\n", " 'has2(cv)': False,\n", " 'has2(cw)': False,\n", " 'has2(cx)': False,\n", " 'has2(cy)': False,\n", " 'has2(cz)': False,\n", " 'has2(da)': False,\n", " 'has2(db)': False,\n", " 'has2(dc)': False,\n", " 'has2(dd)': False,\n", " 'has2(de)': True,\n", " 'has2(df)': False,\n", " 'has2(dg)': False,\n", " 'has2(dh)': False,\n", " 'has2(di)': False,\n", " 'has2(dj)': False,\n", " 'has2(dk)': False,\n", " 'has2(dl)': False,\n", " 'has2(dm)': False,\n", " 'has2(dn)': False,\n", " 'has2(do)': False,\n", " 'has2(dp)': False,\n", " 'has2(dq)': False,\n", " 'has2(dr)': False,\n", " 'has2(ds)': False,\n", " 'has2(dt)': False,\n", " 'has2(du)': False,\n", " 'has2(dv)': False,\n", " 'has2(dw)': False,\n", " 'has2(dx)': False,\n", " 'has2(dy)': False,\n", " 'has2(dz)': False,\n", " 'has2(ea)': False,\n", " 'has2(eb)': False,\n", " 'has2(ec)': False,\n", " 'has2(ed)': False,\n", " 'has2(ee)': False,\n", " 'has2(ef)': False,\n", " 'has2(eg)': False,\n", " 'has2(eh)': False,\n", " 'has2(ei)': False,\n", " 'has2(ej)': False,\n", " 'has2(ek)': False,\n", " 'has2(el)': False,\n", " 'has2(em)': True,\n", " 'has2(en)': True,\n", " 'has2(eo)': False,\n", " 'has2(ep)': False,\n", " 'has2(eq)': False,\n", " 'has2(er)': False,\n", " 'has2(es)': False,\n", " 'has2(et)': False,\n", " 'has2(eu)': False,\n", " 'has2(ev)': False,\n", " 'has2(ew)': False,\n", " 'has2(ex)': False,\n", " 'has2(ey)': False,\n", " 'has2(ez)': False,\n", " 'has2(fa)': False,\n", " 'has2(fb)': False,\n", " 'has2(fc)': False,\n", " 'has2(fd)': False,\n", " 'has2(fe)': False,\n", " 'has2(ff)': False,\n", " 'has2(fg)': False,\n", " 'has2(fh)': False,\n", " 'has2(fi)': False,\n", " 'has2(fj)': False,\n", " 'has2(fk)': False,\n", " 'has2(fl)': False,\n", " 'has2(fm)': False,\n", " 'has2(fn)': False,\n", " 'has2(fo)': False,\n", " 'has2(fp)': False,\n", " 'has2(fq)': False,\n", " 'has2(fr)': False,\n", " 'has2(fs)': False,\n", " 'has2(ft)': False,\n", " 'has2(fu)': False,\n", " 'has2(fv)': False,\n", " 'has2(fw)': False,\n", " 'has2(fx)': False,\n", " 'has2(fy)': False,\n", " 'has2(fz)': False,\n", " 'has2(ga)': False,\n", " 'has2(gb)': False,\n", " 'has2(gc)': False,\n", " 'has2(gd)': False,\n", " 'has2(ge)': False,\n", " 'has2(gf)': False,\n", " 'has2(gg)': False,\n", " 'has2(gh)': False,\n", " 'has2(gi)': False,\n", " 'has2(gj)': False,\n", " 'has2(gk)': False,\n", " 'has2(gl)': False,\n", " 'has2(gm)': False,\n", " 'has2(gn)': False,\n", " 'has2(go)': False,\n", " 'has2(gp)': False,\n", " 'has2(gq)': False,\n", " 'has2(gr)': False,\n", " 'has2(gs)': False,\n", " 'has2(gt)': False,\n", " 'has2(gu)': False,\n", " 'has2(gv)': False,\n", " 'has2(gw)': False,\n", " 'has2(gx)': False,\n", " 'has2(gy)': False,\n", " 'has2(gz)': False,\n", " 'has2(ha)': False,\n", " 'has2(hb)': False,\n", " 'has2(hc)': False,\n", " 'has2(hd)': False,\n", " 'has2(he)': False,\n", " 'has2(hf)': False,\n", " 'has2(hg)': False,\n", " 'has2(hh)': False,\n", " 'has2(hi)': False,\n", " 'has2(hj)': False,\n", " 'has2(hk)': False,\n", " 'has2(hl)': False,\n", " 'has2(hm)': False,\n", " 'has2(hn)': False,\n", " 'has2(ho)': False,\n", " 'has2(hp)': False,\n", " 'has2(hq)': False,\n", " 'has2(hr)': False,\n", " 'has2(hs)': False,\n", " 'has2(ht)': False,\n", " 'has2(hu)': False,\n", " 'has2(hv)': False,\n", " 'has2(hw)': False,\n", " 'has2(hx)': False,\n", " 'has2(hy)': False,\n", " 'has2(hz)': False,\n", " 'has2(ia)': False,\n", " 'has2(ib)': False,\n", " 'has2(ic)': False,\n", " 'has2(id)': False,\n", " 'has2(ie)': False,\n", " 'has2(if)': False,\n", " 'has2(ig)': False,\n", " 'has2(ih)': False,\n", " 'has2(ii)': False,\n", " 'has2(ij)': False,\n", " 'has2(ik)': False,\n", " 'has2(il)': False,\n", " 'has2(im)': False,\n", " 'has2(in)': False,\n", " 'has2(io)': False,\n", " 'has2(ip)': False,\n", " 'has2(iq)': False,\n", " 'has2(ir)': False,\n", " 'has2(is)': False,\n", " 'has2(it)': False,\n", " 'has2(iu)': False,\n", " 'has2(iv)': False,\n", " 'has2(iw)': False,\n", " 'has2(ix)': False,\n", " 'has2(iy)': False,\n", " 'has2(iz)': False,\n", " 'has2(ja)': False,\n", " 'has2(jb)': False,\n", " 'has2(jc)': False,\n", " 'has2(jd)': False,\n", " 'has2(je)': False,\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 'has2(jf)': False,\n", " 'has2(jg)': False,\n", " 'has2(jh)': False,\n", " 'has2(ji)': False,\n", " 'has2(jj)': False,\n", " 'has2(jk)': False,\n", " 'has2(jl)': False,\n", " 'has2(jm)': False,\n", " 'has2(jn)': False,\n", " 'has2(jo)': False,\n", " 'has2(jp)': False,\n", " 'has2(jq)': False,\n", " 'has2(jr)': False,\n", " 'has2(js)': False,\n", " 'has2(jt)': False,\n", " 'has2(ju)': False,\n", " 'has2(jv)': False,\n", " 'has2(jw)': False,\n", " 'has2(jx)': False,\n", " 'has2(jy)': False,\n", " 'has2(jz)': False,\n", " 'has2(ka)': False,\n", " 'has2(kb)': False,\n", " 'has2(kc)': False,\n", " 'has2(kd)': False,\n", " 'has2(ke)': False,\n", " 'has2(kf)': False,\n", " 'has2(kg)': False,\n", " 'has2(kh)': False,\n", " 'has2(ki)': False,\n", " 'has2(kj)': False,\n", " 'has2(kk)': False,\n", " 'has2(kl)': False,\n", " 'has2(km)': False,\n", " 'has2(kn)': False,\n", " 'has2(ko)': False,\n", " 'has2(kp)': False,\n", " 'has2(kq)': False,\n", " 'has2(kr)': False,\n", " 'has2(ks)': False,\n", " 'has2(kt)': False,\n", " 'has2(ku)': False,\n", " 'has2(kv)': False,\n", " 'has2(kw)': False,\n", " 'has2(kx)': False,\n", " 'has2(ky)': False,\n", " 'has2(kz)': False,\n", " 'has2(la)': False,\n", " 'has2(lb)': False,\n", " 'has2(lc)': False,\n", " 'has2(ld)': False,\n", " 'has2(le)': False,\n", " 'has2(lf)': False,\n", " 'has2(lg)': False,\n", " 'has2(lh)': False,\n", " 'has2(li)': False,\n", " 'has2(lj)': False,\n", " 'has2(lk)': False,\n", " 'has2(ll)': False,\n", " 'has2(lm)': False,\n", " 'has2(ln)': False,\n", " 'has2(lo)': False,\n", " 'has2(lp)': False,\n", " 'has2(lq)': False,\n", " 'has2(lr)': False,\n", " 'has2(ls)': False,\n", " 'has2(lt)': False,\n", " 'has2(lu)': False,\n", " 'has2(lv)': False,\n", " 'has2(lw)': False,\n", " 'has2(lx)': False,\n", " 'has2(ly)': False,\n", " 'has2(lz)': False,\n", " 'has2(ma)': False,\n", " 'has2(mb)': False,\n", " 'has2(mc)': False,\n", " 'has2(md)': False,\n", " 'has2(me)': True,\n", " 'has2(mf)': False,\n", " 'has2(mg)': False,\n", " 'has2(mh)': False,\n", " 'has2(mi)': False,\n", " 'has2(mj)': False,\n", " 'has2(mk)': False,\n", " 'has2(ml)': False,\n", " 'has2(mm)': False,\n", " 'has2(mn)': False,\n", " 'has2(mo)': False,\n", " 'has2(mp)': False,\n", " 'has2(mq)': False,\n", " 'has2(mr)': False,\n", " 'has2(ms)': False,\n", " 'has2(mt)': False,\n", " 'has2(mu)': False,\n", " 'has2(mv)': False,\n", " 'has2(mw)': False,\n", " 'has2(mx)': False,\n", " 'has2(my)': False,\n", " 'has2(mz)': False,\n", " 'has2(na)': False,\n", " 'has2(nb)': False,\n", " 'has2(nc)': False,\n", " 'has2(nd)': False,\n", " 'has2(ne)': False,\n", " 'has2(nf)': False,\n", " 'has2(ng)': False,\n", " 'has2(nh)': False,\n", " 'has2(ni)': False,\n", " 'has2(nj)': False,\n", " 'has2(nk)': False,\n", " 'has2(nl)': False,\n", " 'has2(nm)': False,\n", " 'has2(nn)': False,\n", " 'has2(no)': False,\n", " 'has2(np)': False,\n", " 'has2(nq)': False,\n", " 'has2(nr)': False,\n", " 'has2(ns)': False,\n", " 'has2(nt)': True,\n", " 'has2(nu)': False,\n", " 'has2(nv)': False,\n", " 'has2(nw)': False,\n", " 'has2(nx)': False,\n", " 'has2(ny)': False,\n", " 'has2(nz)': False,\n", " 'has2(oa)': False,\n", " 'has2(ob)': False,\n", " 'has2(oc)': False,\n", " 'has2(od)': False,\n", " 'has2(oe)': False,\n", " 'has2(of)': False,\n", " 'has2(og)': False,\n", " 'has2(oh)': False,\n", " 'has2(oi)': False,\n", " 'has2(oj)': False,\n", " 'has2(ok)': False,\n", " 'has2(ol)': False,\n", " 'has2(om)': False,\n", " 'has2(on)': False,\n", " 'has2(oo)': False,\n", " 'has2(op)': False,\n", " 'has2(oq)': False,\n", " 'has2(or)': True,\n", " 'has2(os)': False,\n", " 'has2(ot)': False,\n", " 'has2(ou)': False,\n", " 'has2(ov)': False,\n", " 'has2(ow)': False,\n", " 'has2(ox)': False,\n", " 'has2(oy)': False,\n", " 'has2(oz)': False,\n", " 'has2(pa)': False,\n", " 'has2(pb)': False,\n", " 'has2(pc)': False,\n", " 'has2(pd)': False,\n", " 'has2(pe)': False,\n", " 'has2(pf)': False,\n", " 'has2(pg)': False,\n", " 'has2(ph)': False,\n", " 'has2(pi)': False,\n", " 'has2(pj)': False,\n", " 'has2(pk)': False,\n", " 'has2(pl)': False,\n", " 'has2(pm)': False,\n", " 'has2(pn)': False,\n", " 'has2(po)': False,\n", " 'has2(pp)': False,\n", " 'has2(pq)': False,\n", " 'has2(pr)': False,\n", " 'has2(ps)': False,\n", " 'has2(pt)': False,\n", " 'has2(pu)': False,\n", " 'has2(pv)': False,\n", " 'has2(pw)': False,\n", " 'has2(px)': False,\n", " 'has2(py)': False,\n", " 'has2(pz)': False,\n", " 'has2(qa)': False,\n", " 'has2(qb)': False,\n", " 'has2(qc)': False,\n", " 'has2(qd)': False,\n", " 'has2(qe)': False,\n", " 'has2(qf)': False,\n", " 'has2(qg)': False,\n", " 'has2(qh)': False,\n", " 'has2(qi)': False,\n", " 'has2(qj)': False,\n", " 'has2(qk)': False,\n", " 'has2(ql)': False,\n", " 'has2(qm)': False,\n", " 'has2(qn)': False,\n", " 'has2(qo)': False,\n", " 'has2(qp)': False,\n", " 'has2(qq)': False,\n", " 'has2(qr)': False,\n", " 'has2(qs)': False,\n", " 'has2(qt)': False,\n", " 'has2(qu)': False,\n", " 'has2(qv)': False,\n", " 'has2(qw)': False,\n", " 'has2(qx)': False,\n", " 'has2(qy)': False,\n", " 'has2(qz)': False,\n", " 'has2(ra)': False,\n", " 'has2(rb)': False,\n", " 'has2(rc)': False,\n", " 'has2(rd)': False,\n", " 'has2(re)': False,\n", " 'has2(rf)': False,\n", " 'has2(rg)': False,\n", " 'has2(rh)': False,\n", " 'has2(ri)': False,\n", " 'has2(rj)': False,\n", " 'has2(rk)': False,\n", " 'has2(rl)': False,\n", " 'has2(rm)': False,\n", " 'has2(rn)': False,\n", " 'has2(ro)': False,\n", " 'has2(rp)': False,\n", " 'has2(rq)': False,\n", " 'has2(rr)': False,\n", " 'has2(rs)': False,\n", " 'has2(rt)': False,\n", " 'has2(ru)': False,\n", " 'has2(rv)': False,\n", " 'has2(rw)': False,\n", " 'has2(rx)': False,\n", " 'has2(ry)': False,\n", " 'has2(rz)': False,\n", " 'has2(sa)': False,\n", " 'has2(sb)': False,\n", " 'has2(sc)': False,\n", " 'has2(sd)': False,\n", " 'has2(se)': False,\n", " 'has2(sf)': False,\n", " 'has2(sg)': False,\n", " 'has2(sh)': False,\n", " 'has2(si)': False,\n", " 'has2(sj)': False,\n", " 'has2(sk)': False,\n", " 'has2(sl)': False,\n", " 'has2(sm)': False,\n", " 'has2(sn)': False,\n", " 'has2(so)': False,\n", " 'has2(sp)': False,\n", " 'has2(sq)': False,\n", " 'has2(sr)': False,\n", " 'has2(ss)': False,\n", " 'has2(st)': False,\n", " 'has2(su)': False,\n", " 'has2(sv)': False,\n", " 'has2(sw)': False,\n", " 'has2(sx)': False,\n", " 'has2(sy)': False,\n", " 'has2(sz)': False,\n", " 'has2(ta)': False,\n", " 'has2(tb)': False,\n", " 'has2(tc)': False,\n", " 'has2(td)': False,\n", " 'has2(te)': False,\n", " 'has2(tf)': False,\n", " 'has2(tg)': False,\n", " 'has2(th)': False,\n", " 'has2(ti)': False,\n", " 'has2(tj)': False,\n", " 'has2(tk)': False,\n", " 'has2(tl)': False,\n", " 'has2(tm)': False,\n", " 'has2(tn)': False,\n", " 'has2(to)': True,\n", " 'has2(tp)': False,\n", " 'has2(tq)': False,\n", " 'has2(tr)': False,\n", " 'has2(ts)': False,\n", " 'has2(tt)': False,\n", " 'has2(tu)': False,\n", " 'has2(tv)': False,\n", " 'has2(tw)': False,\n", " 'has2(tx)': False,\n", " 'has2(ty)': False,\n", " 'has2(tz)': False,\n", " 'has2(ua)': False,\n", " 'has2(ub)': False,\n", " 'has2(uc)': False,\n", " 'has2(ud)': False,\n", " 'has2(ue)': False,\n", " 'has2(uf)': False,\n", " 'has2(ug)': False,\n", " 'has2(uh)': False,\n", " 'has2(ui)': False,\n", " 'has2(uj)': False,\n", " 'has2(uk)': False,\n", " 'has2(ul)': False,\n", " 'has2(um)': False,\n", " 'has2(un)': False,\n", " 'has2(uo)': False,\n", " 'has2(up)': False,\n", " 'has2(uq)': False,\n", " 'has2(ur)': False,\n", " 'has2(us)': False,\n", " 'has2(ut)': False,\n", " 'has2(uu)': False,\n", " 'has2(uv)': False,\n", " 'has2(uw)': False,\n", " 'has2(ux)': False,\n", " 'has2(uy)': False,\n", " 'has2(uz)': False,\n", " 'has2(va)': False,\n", " 'has2(vb)': False,\n", " 'has2(vc)': False,\n", " 'has2(vd)': False,\n", " 'has2(ve)': False,\n", " 'has2(vf)': False,\n", " 'has2(vg)': False,\n", " 'has2(vh)': False,\n", " 'has2(vi)': False,\n", " 'has2(vj)': False,\n", " 'has2(vk)': False,\n", " 'has2(vl)': False,\n", " 'has2(vm)': False,\n", " 'has2(vn)': False,\n", " 'has2(vo)': False,\n", " 'has2(vp)': False,\n", " 'has2(vq)': False,\n", " 'has2(vr)': False,\n", " 'has2(vs)': False,\n", " 'has2(vt)': False,\n", " 'has2(vu)': False,\n", " 'has2(vv)': False,\n", " 'has2(vw)': False,\n", " 'has2(vx)': False,\n", " 'has2(vy)': False,\n", " 'has2(vz)': False,\n", " 'has2(wa)': False,\n", " 'has2(wb)': False,\n", " 'has2(wc)': False,\n", " 'has2(wd)': False,\n", " 'has2(we)': False,\n", " 'has2(wf)': False,\n", " 'has2(wg)': False,\n", " 'has2(wh)': False,\n", " 'has2(wi)': False,\n", " 'has2(wj)': False,\n", " 'has2(wk)': False,\n", " 'has2(wl)': False,\n", " 'has2(wm)': False,\n", " 'has2(wn)': False,\n", " 'has2(wo)': False,\n", " 'has2(wp)': False,\n", " 'has2(wq)': False,\n", " 'has2(wr)': False,\n", " 'has2(ws)': False,\n", " 'has2(wt)': False,\n", " 'has2(wu)': False,\n", " 'has2(wv)': False,\n", " 'has2(ww)': False,\n", " 'has2(wx)': False,\n", " 'has2(wy)': False,\n", " 'has2(wz)': False,\n", " 'has2(xa)': False,\n", " 'has2(xb)': False,\n", " 'has2(xc)': False,\n", " 'has2(xd)': False,\n", " 'has2(xe)': False,\n", " 'has2(xf)': False,\n", " 'has2(xg)': False,\n", " 'has2(xh)': False,\n", " 'has2(xi)': False,\n", " 'has2(xj)': False,\n", " 'has2(xk)': False,\n", " 'has2(xl)': False,\n", " 'has2(xm)': False,\n", " 'has2(xn)': False,\n", " 'has2(xo)': False,\n", " 'has2(xp)': False,\n", " 'has2(xq)': False,\n", " 'has2(xr)': False,\n", " 'has2(xs)': False,\n", " 'has2(xt)': False,\n", " 'has2(xu)': False,\n", " 'has2(xv)': False,\n", " 'has2(xw)': False,\n", " 'has2(xx)': False,\n", " 'has2(xy)': False,\n", " 'has2(xz)': False,\n", " 'has2(ya)': False,\n", " 'has2(yb)': False,\n", " 'has2(yc)': False,\n", " 'has2(yd)': False,\n", " 'has2(ye)': False,\n", " 'has2(yf)': False,\n", " 'has2(yg)': False,\n", " 'has2(yh)': False,\n", " 'has2(yi)': False,\n", " 'has2(yj)': False,\n", " 'has2(yk)': False,\n", " 'has2(yl)': False,\n", " 'has2(ym)': False,\n", " 'has2(yn)': False,\n", " 'has2(yo)': False,\n", " 'has2(yp)': False,\n", " 'has2(yq)': False,\n", " 'has2(yr)': False,\n", " 'has2(ys)': False,\n", " 'has2(yt)': False,\n", " 'has2(yu)': False,\n", " 'has2(yv)': False,\n", " 'has2(yw)': False,\n", " 'has2(yx)': False,\n", " 'has2(yy)': False,\n", " 'has2(yz)': False,\n", " 'has2(za)': False,\n", " 'has2(zb)': False,\n", " 'has2(zc)': False,\n", " 'has2(zd)': False,\n", " 'has2(ze)': False,\n", " 'has2(zf)': False,\n", " 'has2(zg)': False,\n", " 'has2(zh)': False,\n", " 'has2(zi)': False,\n", " 'has2(zj)': False,\n", " 'has2(zk)': False,\n", " 'has2(zl)': False,\n", " 'has2(zm)': False,\n", " 'has2(zn)': False,\n", " 'has2(zo)': False,\n", " 'has2(zp)': False,\n", " 'has2(zq)': False,\n", " 'has2(zr)': False,\n", " 'has2(zs)': False,\n", " 'has2(zt)': False,\n", " 'has2(zu)': False,\n", " 'has2(zv)': False,\n", " 'has2(zw)': False,\n", " 'has2(zx)': False,\n", " 'has2(zy)': False,\n", " 'has2(zz)': False,\n", " 'startswith': 'd'}\n" ] } ], "source": [ "def bigram_features(name):\n", " features = {}\n", " features['alwayson'] = True\n", " features['startswith'] = name[0].lower()\n", " features['endswith'] = name[-1].lower()\n", " for letter in 'abcdefghijklmnopqrstuvwxyz':\n", " features['count(%s)' % letter] = name.lower().count(letter)\n", " features['has(%s)' % letter] = letter in name.lower()\n", " for letter1 in 'abcdefghijklmnopqrstuvwxyz':\n", " for letter2 in 'abcdefghijklmnopqrstuvwxyz':\n", " bigram = \"%s%s\" % (letter1, letter2)\n", " features['count2(%s)' % bigram] = name.lower().count(bigram)\n", " features['has2(%s)' % bigram] = bigram in name.lower()\n", " return features\n", "\n", "pp.pprint(bigram_features(\"Dementor\"))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train new classifier using bigram features\n", "Training classifier...\n", "Testing classifier...\n", "Accuracy: 0.8020\n", "Avg. log likelihood: -1.0164\n", "\n", "Unseen Names P(Male) P(Female)\n", "----------------------------------------\n", " Kelli 0.0013 *0.9987\n", " Er *0.9782 0.0218\n", " Ally 0.0076 *0.9924\n", " Stephan *0.9741 0.0259\n", " Chriss 0.1445 *0.8555\n" ] } ], "source": [ "print(\"Train new classifier using bigram features\")\n", "classifier2 = names_demo(NaiveBayesClassifier.train, bigram_features)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Run trained classifier on input name: nate\n", "P(male|nate)=0.0012201724411289498\n", "P(female|nate)=0.9987798275588747\n" ] } ], "source": [ "name='nate'\n", "print(\"Run trained classifier on input name:\", name)\n", "test_features = bigram_features(name)\n", "output = classifier2.prob_classify(test_features)\n", "print(\"P(male|{0})={1}\".format(name,output.prob('male')))\n", "print(\"P(female|{0})={1}\".format(name,output.prob('female')))\n", "\n", "# try the following:\n", "# luke, lee, leigh, karol, chris, kris, pat" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Most Informative Features\n", " endswith = 'a' female : male = 31.5 : 1.0\n", " count2(hu) = 1 male : female = 26.7 : 1.0\n", " has2(hu) = True male : female = 26.7 : 1.0\n", " has2(rv) = True male : female = 23.3 : 1.0\n", " count2(rv) = 1 male : female = 23.3 : 1.0\n", " count2(lt) = 1 male : female = 19.9 : 1.0\n", " has2(lt) = True male : female = 19.9 : 1.0\n", " has2(rk) = True male : female = 15.3 : 1.0\n", " has2(fo) = True male : female = 15.3 : 1.0\n", " count2(rk) = 1 male : female = 15.3 : 1.0\n" ] } ], "source": [ "classifier2.show_most_informative_features()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }