{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intermine-Python: Tutorial 11: Combining Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial will talk about how you can combine two lists easily in Intermine. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial will require you to login into your Intermine Account so that the lists that you will combine can be saved. " ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from intermine.webservice import Service" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enter your username and password, uncomment the given line, execute it and then proceed with the rest of the tutorial. " ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#service = Service(\"www.flymine.org/flymine/service\",username=\"Enter your username\",password=\"Enter your password\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We begin by declaring a list manager object which will help us in combining various lists together. " ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "lm=service.list_manager()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say that you want to combine all the most enriched genes in the adult Fly brain and in the adult Fly hindgut. These are present as two separate lists currently on Flymine. We begin by extracting both the lists first. " ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "l1=lm.get_list(name=\"PL FlyAtlas_brain_top\")" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "l2=lm.get_list(name=\"PL FlyAtlas_hindgut_top\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a couple of ways by which you combine the two lists, i.e. union of the two lists. The first method is shown below - using the addition operator automatically combines both the lists. " ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [], "source": [ "l3=l1+l2" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [], "source": [ "l3.set_name(\"combination-1\")" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gene(score = None, primaryIdentifier = 'FBgn0039584', briefDescription = None, cytoLocation = '98D1-98D2', description = None, name = 'beat-VI', symbol = 'beat-VI', length = 56607, secondaryIdentifier = 'CG14064', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0004619', briefDescription = None, cytoLocation = '65C1-65C1', description = None, name = 'Glutamate receptor IA', symbol = 'GluRIA', length = 11028, secondaryIdentifier = 'CG8442', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0263997', briefDescription = None, cytoLocation = '9B7-9B7', description = None, name = None, symbol = 'CG43740', length = 14143, secondaryIdentifier = 'CG43740', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0051774', briefDescription = None, cytoLocation = '24C9-24D1', description = None, name = 'friend of echinoid', symbol = 'fred', length = 88786, secondaryIdentifier = 'CG31774', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0031307', briefDescription = None, cytoLocation = '21F1-21F1', description = None, name = 'Major Facilitator Superfamily Transporter 3', symbol = 'MFS3', length = 4195, secondaryIdentifier = 'CG4726', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0029761', briefDescription = None, cytoLocation = '4F5-4F9', description = None, name = 'small conductance calcium-activated potassium channel', symbol = 'SK', length = 64903, secondaryIdentifier = 'CG10706', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0035575', briefDescription = None, cytoLocation = '64B17-64B17', description = None, name = None, symbol = 'CG7509', length = 2198, secondaryIdentifier = 'CG7509', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0037475', briefDescription = None, cytoLocation = '84C5-84C6', description = None, name = '48 related 1', symbol = 'Fer1', length = 5767, secondaryIdentifier = 'CG33323', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0259164', briefDescription = None, cytoLocation = '65A5-65A5', description = None, name = None, symbol = 'CG42269', length = 5250, secondaryIdentifier = 'CG42269', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0033497', briefDescription = None, cytoLocation = '46F7-46F7', description = None, name = None, symbol = 'CG12912', length = 29154, secondaryIdentifier = 'CG12912', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0032601', briefDescription = None, cytoLocation = '36A14-36A14', description = None, name = 'yellow-b', symbol = 'yellow-b', length = 3948, secondaryIdentifier = 'CG17914', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0051547', briefDescription = None, cytoLocation = '83A5-83A6', description = None, name = None, symbol = 'CG31547', length = 13198, secondaryIdentifier = 'CG31547', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0259236', briefDescription = None, cytoLocation = '71E5-71E5', description = None, name = 'comm3', symbol = 'comm3', length = 34579, secondaryIdentifier = 'CG42334', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0031562', briefDescription = None, cytoLocation = '24B3-24B3', description = None, name = None, symbol = 'CG3604', length = 631, secondaryIdentifier = 'CG3604', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0039081', briefDescription = None, cytoLocation = '95A1-95A1', description = None, name = 'Inwardly rectifying potassium channel 2', symbol = 'Irk2', length = 5829, secondaryIdentifier = 'CG4370', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0039196', briefDescription = None, cytoLocation = '95F15-95F15', description = None, name = None, symbol = 'CG17781', length = 2206, secondaryIdentifier = 'CG17781', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0011603', briefDescription = None, cytoLocation = '24F4-24F4', description = None, name = 'inebriated', symbol = 'ine', length = 9023, secondaryIdentifier = 'CG15444', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0265597', briefDescription = None, cytoLocation = '11D8-11D8', description = None, name = 'radish', symbol = 'rad', length = 90456, secondaryIdentifier = 'CG44424', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0259110', briefDescription = None, cytoLocation = '14A1-14A3', description = None, name = 'mind-meld', symbol = 'mmd', length = 26224, secondaryIdentifier = 'CG42252', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0031865', briefDescription = None, cytoLocation = '27C1-27C1', description = None, name = 'Na[+]/H[+] hydrogen antiporter 1', symbol = 'Nha1', length = 9485, secondaryIdentifier = 'CG10806', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0035245', briefDescription = None, cytoLocation = '62A9-62A9', description = None, name = 'gamma-glutamyl carboxylase', symbol = 'GC', length = 6123, secondaryIdentifier = 'CG13927', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0263390', briefDescription = None, cytoLocation = '94D6-94D7', description = None, name = 'Na[+]/H[+] hydrogen antiporter 2', symbol = 'Nha2', length = 17070, secondaryIdentifier = 'CG43442', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0003380', briefDescription = None, cytoLocation = '16F3-16F5', description = None, name = 'Shaker', symbol = 'Sh', length = 138941, secondaryIdentifier = 'CG12348', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0029814', briefDescription = None, cytoLocation = '5C2-5C3', description = None, name = None, symbol = 'CG15765', length = 22265, secondaryIdentifier = 'CG15765', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0033524', briefDescription = None, cytoLocation = '47A7-47A9', description = None, name = 'Cyp49a1', symbol = 'Cyp49a1', length = 9543, secondaryIdentifier = 'CG18377', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0051371', briefDescription = None, cytoLocation = '99F6-99F6', description = None, name = None, symbol = 'CG31371', length = 2383, secondaryIdentifier = 'CG31371', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0086778', briefDescription = None, cytoLocation = '18C2-18C3', description = None, name = 'nicotinic Acetylcholine Receptor alpha7', symbol = 'nAChRalpha7', length = 21373, secondaryIdentifier = 'CG32538', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0011723', briefDescription = None, cytoLocation = '68E3-68E3', description = None, name = 'brachyenteron', symbol = 'byn', length = 8227, secondaryIdentifier = 'CG7260', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0039332', briefDescription = None, cytoLocation = '96D2-96D3', description = None, name = 'astrocytic leucine-rich repeat molecule', symbol = 'alrm', length = 1717, secondaryIdentifier = 'CG11910', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0266100', briefDescription = None, cytoLocation = '68E2-68E2', description = None, name = None, symbol = 'CG44837', length = 88228, secondaryIdentifier = 'CG44837', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0008636', briefDescription = None, cytoLocation = '57B5-57B5', description = None, name = 'homeobrain', symbol = 'hbn', length = 6242, secondaryIdentifier = 'CG33152', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0015524', briefDescription = None, cytoLocation = '57B4-57B4', description = None, name = 'orthopedia', symbol = 'otp', length = 19790, secondaryIdentifier = 'CG10036', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0029746', briefDescription = None, cytoLocation = '4F2-4F2', description = None, name = None, symbol = 'CG15465', length = 169106, secondaryIdentifier = 'CG15465', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0030259', briefDescription = None, cytoLocation = '10A3-10A3', description = None, name = None, symbol = 'CG1545', length = 2075, secondaryIdentifier = 'CG1545', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0036939', briefDescription = None, cytoLocation = '76F2-76F2', description = None, name = None, symbol = 'CG7365', length = 3084, secondaryIdentifier = 'CG7365', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0000442', briefDescription = None, cytoLocation = '21E2-21E2', description = None, name = 'Protein kinase, cGMP-dependent at 21D', symbol = 'Pkg21D', length = 4646, secondaryIdentifier = 'CG3324', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0004842', briefDescription = None, cytoLocation = '97D14-97E1', description = None, name = 'RYamide receptor', symbol = 'RYa-R', length = 60956, secondaryIdentifier = 'CG5811', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0013767', briefDescription = None, cytoLocation = '88B3-88B3', description = None, name = 'Corazonin', symbol = 'Crz', length = 863, secondaryIdentifier = 'CG3302', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0036343', briefDescription = None, cytoLocation = '69F6-69F6', description = None, name = None, symbol = 'CG14115', length = 873, secondaryIdentifier = 'CG14115', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0034553', briefDescription = None, cytoLocation = '57B3-57B3', description = None, name = None, symbol = 'CG9993', length = 1980, secondaryIdentifier = 'CG9993', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0034552', briefDescription = None, cytoLocation = '57B3-57B3', description = None, name = None, symbol = 'CG17999', length = 1960, secondaryIdentifier = 'CG17999', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0053517', briefDescription = None, cytoLocation = '19A4-19A4', description = None, name = 'Dopamine 2-like receptor', symbol = 'Dop2R', length = 58027, secondaryIdentifier = 'CG33517', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0035557', briefDescription = None, cytoLocation = '64B9-64B9', description = None, name = None, symbol = 'CG11353', length = 6101, secondaryIdentifier = 'CG11353', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0033108', briefDescription = None, cytoLocation = '42D4-42D6', description = None, name = None, symbol = 'CG15236', length = 10005, secondaryIdentifier = 'CG15236', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0033753', briefDescription = None, cytoLocation = '49B4-49B5', description = None, name = 'Cyp301a1', symbol = 'Cyp301a1', length = 2877, secondaryIdentifier = 'CG8587', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0035611', briefDescription = None, cytoLocation = '64D3-64D3', description = None, name = None, symbol = 'CG13285', length = 1792, secondaryIdentifier = 'CG13285', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0085417', briefDescription = None, cytoLocation = '88C1-88C1', description = None, name = 'natalisin', symbol = 'natalisin', length = 3273, secondaryIdentifier = 'CG34388', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0262870', briefDescription = None, cytoLocation = '64B12-64B13', description = None, name = 'axotactin', symbol = 'axo', length = 57714, secondaryIdentifier = 'CG43225', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0036179', briefDescription = None, cytoLocation = '68C14-68C15', description = None, name = None, symbol = 'CG7368', length = 8173, secondaryIdentifier = 'CG7368', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0052564', briefDescription = None, cytoLocation = '15F3-15F3', description = None, name = None, symbol = 'CG32564', length = 1448, secondaryIdentifier = 'CG32564', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0050053', briefDescription = None, cytoLocation = '49B12-49B12', description = None, name = None, symbol = 'CG30053', length = 1144, secondaryIdentifier = 'CG30053', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0033592', briefDescription = None, cytoLocation = '47E1-47E1', description = None, name = None, symbol = 'CG13215', length = 621, secondaryIdentifier = 'CG13215', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0011582', briefDescription = None, cytoLocation = '88A10-88A12', description = None, name = 'Dopamine 1-like receptor 1', symbol = 'Dop1R1', length = 49432, secondaryIdentifier = 'CG9652', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0266137', briefDescription = None, cytoLocation = '99B5-99B6', description = None, name = 'Dopamine 1-like receptor 2', symbol = 'Dop1R2', length = 29681, secondaryIdentifier = 'CG18741', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0034128', briefDescription = None, cytoLocation = '53C4-53C4', description = None, name = None, symbol = 'CG4409', length = 1635, secondaryIdentifier = 'CG4409', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0083945', briefDescription = None, cytoLocation = None, description = None, name = None, symbol = 'CG34109', length = 31649, secondaryIdentifier = 'CG34109', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0033873', briefDescription = None, cytoLocation = '50C6-50C6', description = None, name = None, symbol = 'CG6337', length = 1522, secondaryIdentifier = 'CG6337', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0015011', briefDescription = None, cytoLocation = '89E5-89E5', description = None, name = 'Adenosylhomocysteinase like 2', symbol = 'AhcyL2', length = 3720, secondaryIdentifier = 'CG8956', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0032151', briefDescription = None, cytoLocation = '30D1-30E1', description = None, name = 'nicotinic Acetylcholine Receptor alpha6', symbol = 'nAChRalpha6', length = 92934, secondaryIdentifier = 'CG4128', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0265296', briefDescription = None, cytoLocation = '65E6-65E7', description = None, name = 'Down syndrome cell adhesion molecule 2', symbol = 'Dscam2', length = 30733, secondaryIdentifier = 'CG42256', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0038693', briefDescription = None, cytoLocation = '91F12-91F12', description = None, name = 'uncoordinated 79', symbol = 'unc79', length = 15367, secondaryIdentifier = 'CG5237', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0035358', briefDescription = None, cytoLocation = '62E8-62E8', description = None, name = None, symbol = 'CG14949', length = 1241, secondaryIdentifier = 'CG14949', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0039770', briefDescription = None, cytoLocation = '99F4-99F4', description = None, name = None, symbol = 'CG15537', length = 8423, secondaryIdentifier = 'CG15537', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0016650', briefDescription = None, cytoLocation = '90C2-90C2', description = None, name = 'Leucine-rich repeat-containing G protein-coupled receptor 1', symbol = 'Lgr1', length = 7431, secondaryIdentifier = 'CG7665', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0038641', briefDescription = None, cytoLocation = '91C1-91C1', description = None, name = None, symbol = 'CG7708', length = 6136, secondaryIdentifier = 'CG7708', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0030887', briefDescription = None, cytoLocation = '16F5-16F5', description = None, name = None, symbol = 'CG6867', length = 4392, secondaryIdentifier = 'CG6867', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0036146', briefDescription = None, cytoLocation = '68B1-68B1', description = None, name = None, symbol = 'CG14141', length = 808, secondaryIdentifier = 'CG14141', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0035359', briefDescription = None, cytoLocation = '62E8-62E8', description = None, name = None, symbol = 'CG1143', length = 1825, secondaryIdentifier = 'CG1143', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0033207', briefDescription = None, cytoLocation = '43E9-43E9', description = None, name = None, symbol = 'CG12826', length = 777, secondaryIdentifier = 'CG12826', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0039200', briefDescription = None, cytoLocation = '96A1-96A1', description = None, name = None, symbol = 'CG13616', length = 874, secondaryIdentifier = 'CG13616', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0031650', briefDescription = None, cytoLocation = '25B4-25B4', description = None, name = None, symbol = 'CG14044', length = 1564, secondaryIdentifier = 'CG14044', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0029950', briefDescription = None, cytoLocation = '7B2-7B2', description = None, name = None, symbol = 'CG9657', length = 3549, secondaryIdentifier = 'CG9657', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0034218', briefDescription = None, cytoLocation = '54B14-54B15', description = None, name = None, symbol = 'CG18467', length = 2187, secondaryIdentifier = 'CG18467', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0035279', briefDescription = None, cytoLocation = '62B6-62B6', description = None, name = 'Cuticular protein 62Ba', symbol = 'Cpr62Ba', length = 7655, secondaryIdentifier = 'CG13934', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0038753', briefDescription = None, cytoLocation = '92B8-92B8', description = None, name = None, symbol = 'CG4459', length = 2323, secondaryIdentifier = 'CG4459', scoreType = None)\n", "Gene(score = None, primaryIdentifier = 'FBgn0004911', briefDescription = None, cytoLocation = '63F7-64A1', description = None, name = 'Ecdysone-induced gene 63F 2', symbol = 'Eig63F-2', length = 1275, secondaryIdentifier = 'CR32265', scoreType = None)\n" ] } ], "source": [ "for r in l3:\n", " print(r)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second way of combining two lists is to first declare a Python List object with the lists that you want to combine. I've called this temporary list \"y\". You can then use the union method present in list manager to take the set union and give a name to the list in one step. This has been shown below. " ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y=[l1,l2]" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "<intermine.lists.list.List at 0x7f6ea2531588>" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm.union(y,name=\"combination-2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, if you want to find the intersection of two lists, you can use the intersect method that is present in the list manager class. Note that, you can combine lists and queries in the same way. " ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda root]", "language": "python", "name": "conda-root-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }