{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "kl_py_scraping01.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.4"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/klajosw/python/blob/master/kl_py_scraping01.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bsp3x_zlo74F",
        "colab_type": "text"
      },
      "source": [
        "<p align=\"left\"> \n",
        "    <img src=\"https://raw.githubusercontent.com/klajosw/python/master/kl_mie_python_logo_250.jpg\" \n",
        "         align=\"left\" width=\"251\" height=\"251\">\n",
        "    \n",
        "</p>\n",
        "\n",
        "\n",
        "<p> </p>\n",
        "\n",
        "\n",
        "\n",
        "# Python alapok WEB bányászat\n",
        "\n",
        "<https://klajosw.blogspot.com/>\n",
        "\n",
        "\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "collapsed": true,
        "id": "yXxqgKt6SYTt"
      },
      "source": [
        "#  WEB bányászat / scraping\n",
        "\n",
        "\n",
        "## WEB lapról linkek kigyűjtése\n",
        "\n",
        "\n",
        "A megadott induló linken keresztül egy WEB lapértlapmezés után a linkekk <a> (külső vagy belső) kigyüjtése és kiiratása a képernyőre"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "WUb_ZSiVSYTw",
        "outputId": "ea3cee42-3445-4c13-e193-dfb042701121",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 329
        }
      },
      "source": [
        "from urllib.request import urlopen\n",
        "from bs4 import BeautifulSoup \n",
        "\n",
        "szamlal=1\n",
        "html = urlopen('http://en.wikipedia.org/wiki/Kevin_Bacon')\n",
        "bs = BeautifulSoup(html, 'html.parser')\n",
        "for link in bs.find_all('a'):\n",
        "    szamlal += 1\n",
        "    if 'href' in link.attrs:\n",
        "        print(link.attrs['href'])\n",
        "    if szamlal >=20:  ## Csak 20 db kilistázása\n",
        "        break"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/wiki/Wikipedia:Protection_policy#semi\n",
            "#mw-head\n",
            "#p-search\n",
            "/wiki/Kevin_Bacon_(disambiguation)\n",
            "/wiki/File:Kevin_Bacon_SDCC_2014.jpg\n",
            "/wiki/Philadelphia\n",
            "/wiki/Pennsylvania\n",
            "/wiki/Kyra_Sedgwick\n",
            "/wiki/Sosie_Bacon\n",
            "#cite_note-1\n",
            "/wiki/Edmund_Bacon_(architect)\n",
            "/wiki/Michael_Bacon_(musician)\n",
            "/wiki/Holly_Near\n",
            "http://baconbros.com/\n",
            "#cite_note-2\n",
            "#cite_note-actor-3\n",
            "/wiki/Footloose_(1984_film)\n",
            "/wiki/JFK_(film)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8PluJRB6SYT0"
      },
      "source": [
        "## Csak a belső cikkek linkjeinek kigyüjtése\n",
        "\n",
        "\n",
        "A linkek elemzése helyi domén elemzés (a Wikin belülre mutató linkek)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "x_-FKIqiSYT1",
        "outputId": "7456e7c0-6b97-464d-dc4b-9b5310c0c94a",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 347
        }
      },
      "source": [
        "from urllib.request import urlopen \n",
        "from bs4 import BeautifulSoup \n",
        "import re\n",
        "\n",
        "szamlal=1\n",
        "html = urlopen('http://en.wikipedia.org/wiki/Kevin_Bacon')\n",
        "bs = BeautifulSoup(html, 'html.parser')\n",
        "for link in bs.find('div', {'id':'bodyContent'}).find_all(\n",
        "    'a', href=re.compile('^(/wiki/)((?!:).)*$')):\n",
        "    szamlal += 1\n",
        "    if 'href' in link.attrs:\n",
        "        print(link.attrs['href'])\n",
        "    if szamlal >=20:  ## Csak 20 db kilistázása\n",
        "        break"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/wiki/Kevin_Bacon_(disambiguation)\n",
            "/wiki/Philadelphia\n",
            "/wiki/Pennsylvania\n",
            "/wiki/Kyra_Sedgwick\n",
            "/wiki/Sosie_Bacon\n",
            "/wiki/Edmund_Bacon_(architect)\n",
            "/wiki/Michael_Bacon_(musician)\n",
            "/wiki/Holly_Near\n",
            "/wiki/Footloose_(1984_film)\n",
            "/wiki/JFK_(film)\n",
            "/wiki/A_Few_Good_Men\n",
            "/wiki/Apollo_13_(film)\n",
            "/wiki/Mystic_River_(film)\n",
            "/wiki/Sleepers\n",
            "/wiki/The_Woodsman_(2004_film)\n",
            "/wiki/He_Said,_She_Said_(film)\n",
            "/wiki/Fox_Broadcasting_Company\n",
            "/wiki/The_Following\n",
            "/wiki/HBO\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "dfdD7c1HSYT4"
      },
      "source": [
        "## Véletlen szerű bejárás\n",
        "\n",
        "\n",
        "A beső cikkek linkjeinek vélbejárása és kiíratása"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "D54GA-yiSYT4",
        "outputId": "c71e5cff-0026-406c-b9a3-8f3262ffe607",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 347
        }
      },
      "source": [
        "from urllib.request import urlopen\n",
        "from bs4 import BeautifulSoup\n",
        "import datetime\n",
        "import random\n",
        "import re\n",
        "\n",
        "szamlal=1\n",
        "random.seed(datetime.datetime.now())\n",
        "def getLinks(articleUrl):\n",
        "    html = urlopen('http://en.wikipedia.org{}'.format(articleUrl))\n",
        "    bs = BeautifulSoup(html, 'html.parser')\n",
        "    return bs.find('div', {'id':'bodyContent'}).find_all('a', href=re.compile('^(/wiki/)((?!:).)*$'))\n",
        "\n",
        "links = getLinks('/wiki/Kevin_Bacon')\n",
        "while len(links) > 0:\n",
        "    newArticle = links[random.randint(0, len(links)-1)].attrs['href']\n",
        "    print(newArticle)\n",
        "    szamlal += 1\n",
        "    if szamlal >=20:  ## Csak 20 db kilistázása\n",
        "        break\n",
        "    links = getLinks(newArticle)"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/wiki/Golden_Globe_Award\n",
            "/wiki/To_Kill_a_Mockingbird_(film)\n",
            "/wiki/La_Jolla,_California\n",
            "/wiki/Klamath_Basin\n",
            "/wiki/Crater_Lake\n",
            "/wiki/Rhyodacite\n",
            "/wiki/Latite\n",
            "/wiki/Petrology\n",
            "/wiki/Seismology\n",
            "/wiki/Volcanology\n",
            "/wiki/INGV\n",
            "/wiki/CNN\n",
            "/wiki/Warner_Bros._International_Television_Production\n",
            "/wiki/DirecTV-5\n",
            "/wiki/AT%26T_satellite_fleet\n",
            "/wiki/Red_by_HBO\n",
            "/wiki/Sky_News_International\n",
            "/wiki/Digital_television_in_the_United_Kingdom\n",
            "/wiki/Royal_Television_Society\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "dF2p1NxASYT-"
      },
      "source": [
        "## Rekurzív módon bejárjuk az egész webhelyet\n",
        "\n",
        "Önmeghívó eljárással bejárluk az egészet és az új lapokat eltároljuk egy pages változóba"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "RmEeMf8SSYT_",
        "outputId": "0f0f0f7a-69f8-400b-c3fd-ab97828e7891",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 329
        }
      },
      "source": [
        "from urllib.request import urlopen\n",
        "from bs4 import BeautifulSoup\n",
        "import re\n",
        "\n",
        "szamlal=1\n",
        "pages = set()\n",
        "def getLinks(pageUrl):\n",
        "    global pages, szamlal\n",
        "    html = urlopen('http://en.wikipedia.org{}'.format(pageUrl))\n",
        "    bs = BeautifulSoup(html, 'html.parser')\n",
        "    for link in bs.find_all('a', href=re.compile('^(/wiki/)')):\n",
        "        if 'href' in link.attrs:\n",
        "            if link.attrs['href'] not in pages:\n",
        "                szamlal += 1\n",
        "                if szamlal >=20:  ## Csak 20 db kilistázása\n",
        "                    break\n",
        "                #We have encountered a new page\n",
        "                newPage = link.attrs['href']\n",
        "                print(newPage)\n",
        "                pages.add(newPage)\n",
        "                getLinks(newPage)\n",
        "getLinks('')"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/wiki/Wikipedia\n",
            "/wiki/Wikipedia:Protection_policy#semi\n",
            "/wiki/Wikipedia:Requests_for_page_protection\n",
            "/wiki/Wikipedia:Protection_policy#move\n",
            "/wiki/Wikipedia:Lists_of_protected_pages\n",
            "/wiki/Wikipedia:Protection_policy\n",
            "/wiki/Wikipedia:Perennial_proposals\n",
            "/wiki/Wikipedia:Reliable_sources/Perennial_sources\n",
            "/wiki/Wikipedia:Reliable_sources\n",
            "/wiki/Wikipedia:WikiProject_Reliability\n",
            "/wiki/Wikipedia:WRE\n",
            "/wiki/File:People_icon.svg\n",
            "/wiki/Special:WhatLinksHere/File:People_icon.svg\n",
            "/wiki/Help:What_links_here\n",
            "/wiki/Wikipedia:Project_namespace#How-to_and_information_pages\n",
            "/wiki/Wikipedia:Policies_and_guidelines\n",
            "/wiki/Wikipedia:WikiProject_Politics\n",
            "/wiki/File:A_coloured_voting_box.svg\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "bXWeecq1SYUB"
      },
      "source": [
        "## Adatgyűjtés az egész webhelyen\n",
        "\n",
        "Hibakezeléssel bejárjuk a WEB lapokat és az új lapok tartalmát listához adjuk."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "6dJLczHlSYUC",
        "outputId": "9114b6f4-90db-4e52-981c-c9fec08c4e79",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 419
        }
      },
      "source": [
        "from urllib.request import urlopen\n",
        "from bs4 import BeautifulSoup\n",
        "import re\n",
        "\n",
        "szamlal=1\n",
        "pages = set()\n",
        "def getLinks(pageUrl):\n",
        "    global pages, szamlal\n",
        "    html = urlopen('http://en.wikipedia.org{}'.format(pageUrl))\n",
        "    bs = BeautifulSoup(html, 'html.parser')\n",
        "    try:\n",
        "        print(bs.h1.get_text())\n",
        "        print(bs.find(id ='mw-content-text').find_all('p')[0])\n",
        "        print(bs.find(id='ca-edit').find('span').find('a').attrs['href'])\n",
        "    except AttributeError:\n",
        "        print('This page is missing something! Continuing.')\n",
        "    \n",
        "    for link in bs.find_all('a', href=re.compile('^(/wiki/)')):\n",
        "        if 'href' in link.attrs:\n",
        "            if link.attrs['href'] not in pages:\n",
        "                szamlal += 1\n",
        "                if szamlal >=5:  ## Csak 5 db kilistázása\n",
        "                    break\n",
        "                #We have encountered a new page\n",
        "                newPage = link.attrs['href']\n",
        "                print('-'*20)\n",
        "                print(newPage)\n",
        "                pages.add(newPage)\n",
        "                getLinks(newPage)\n",
        "getLinks('') "
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Main Page\n",
            "<p><b><a href=\"/wiki/Jean-Fran%C3%A7ois-Marie_de_Surville\" title=\"Jean-François-Marie de Surville\">Jean-François-Marie de Surville</a></b> (1717–1770) was a merchant captain with the <a href=\"/wiki/French_East_India_Company\" title=\"French East India Company\">French East India Company</a> who commanded a voyage of exploration to the Pacific in 1769 and 1770. Born in <a href=\"/wiki/Brittany\" title=\"Brittany\">Brittany</a>, France, Surville joined the company when he was 10 years old. For the next several years, he sailed on voyages in Indian and Chinese waters. In 1740, he joined the <a href=\"/wiki/French_Navy\" title=\"French Navy\">French Navy</a>. He fought in the <a href=\"/wiki/War_of_the_Austrian_Succession\" title=\"War of the Austrian Succession\">War of the Austrian Succession</a> and the <a href=\"/wiki/Seven_Years%27_War\" title=\"Seven Years' War\">Seven Years' War</a>, twice becoming a prisoner of war. In 1769, in command of <i>Saint Jean-Baptiste</i>, he sailed from India on an expedition to the Pacific looking for trading opportunities. He explored the seas around the <a href=\"/wiki/Solomon_Islands\" title=\"Solomon Islands\">Solomon Islands</a> and anchored in December at <a href=\"/wiki/Doubtless_Bay\" title=\"Doubtless Bay\">Doubtless Bay</a>, New Zealand <i>(commemorative plaque pictured)</i>. Part of his route around New Zealand overlapped that of <a href=\"/wiki/James_Cook\" title=\"James Cook\">James Cook</a> in <a href=\"/wiki/HMS_Endeavour\" title=\"HMS Endeavour\"><i>Endeavour</i></a>, who had preceded him by only a few days. Three months later, Surville drowned off the coast of Peru while seeking help for his <a href=\"/wiki/Scurvy\" title=\"Scurvy\">scurvy</a>-afflicted crew. (<b><a href=\"/wiki/Jean-Fran%C3%A7ois-Marie_de_Surville\" title=\"Jean-François-Marie de Surville\">Full article...</a></b>)\n",
            "</p>\n",
            "This page is missing something! Continuing.\n",
            "--------------------\n",
            "/wiki/Wikipedia\n",
            "Wikipedia\n",
            "<p class=\"mw-empty-elt\">\n",
            "</p>\n",
            "This page is missing something! Continuing.\n",
            "--------------------\n",
            "/wiki/Wikipedia:Protection_policy#semi\n",
            "Wikipedia:Protection policy\n",
            "<p class=\"mw-empty-elt\">\n",
            "</p>\n",
            "This page is missing something! Continuing.\n",
            "--------------------\n",
            "/wiki/Wikipedia:Requests_for_page_protection\n",
            "Wikipedia:Requests for page protection\n",
            "<p>This page is for requesting that a page, file or template be <b> fully protected</b>, <b>create protected</b> (<a href=\"/wiki/Wikipedia:Protection_policy#Creation_protection\" title=\"Wikipedia:Protection policy\">salted</a>), <b>extended confirmed protected</b>, <b>semi-protected</b>, added to <b>pending changes</b>, <b>move-protected</b>, <b>template protected</b>, <b>upload protected</b> (file-specific), or <b>unprotected</b>. Please read up on the <a href=\"/wiki/Wikipedia:Protection_policy\" title=\"Wikipedia:Protection policy\">protection policy</a>. Full protection is used to stop edit warring between multiple users or to prevent vandalism to <a href=\"/wiki/Wikipedia:High-risk_templates\" title=\"Wikipedia:High-risk templates\">high-risk templates</a>; semi-protection and pending changes are usually used only to prevent IP and new user vandalism (see the <a href=\"/wiki/Wikipedia:Rough_guide_to_semi-protection\" title=\"Wikipedia:Rough guide to semi-protection\">rough guide to semi-protection</a>); and move protection is used to stop <a href=\"/wiki/Wikipedia:Page-move_war\" title=\"Wikipedia:Page-move war\">page-move wars</a>. Extended confirmed protection is used where semi-protection has proved insufficient (see the <a href=\"/wiki/Wikipedia:Rough_guide_to_extended_confirmed_protection\" title=\"Wikipedia:Rough guide to extended confirmed protection\">rough guide to extended confirmed protection</a>)\n",
            "</p>\n",
            "This page is missing something! Continuing.\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "m0H-KSw9SYUE"
      },
      "source": [
        "## Feltérképezés az interneten keresztül"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "HKSWbuhLSYUF",
        "outputId": "b4aa8909-829b-451c-89dd-7dc9178ceaed",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 52
        }
      },
      "source": [
        "from urllib.request import urlopen\n",
        "from urllib.parse import urlparse\n",
        "from bs4 import BeautifulSoup\n",
        "import re\n",
        "import datetime\n",
        "import random\n",
        "\n",
        "pages = set()\n",
        "random.seed(datetime.datetime.now())\n",
        "szamlal=1\n",
        "\n",
        "#Retrieves a list of all Internal links found on a page\n",
        "def getInternalLinks(bs, includeUrl):\n",
        "    includeUrl = '{}://{}'.format(urlparse(includeUrl).scheme, urlparse(includeUrl).netloc)\n",
        "    internalLinks = []\n",
        "    #Finds all links that begin with a \"/\"\n",
        "    for link in bs.find_all('a', href=re.compile('^(/|.*'+includeUrl+')')):\n",
        "        if link.attrs['href'] is not None:\n",
        "            if link.attrs['href'] not in internalLinks:\n",
        "                if(link.attrs['href'].startswith('/')):\n",
        "                    internalLinks.append(includeUrl+link.attrs['href'])\n",
        "                else:\n",
        "                    internalLinks.append(link.attrs['href'])\n",
        "    return internalLinks\n",
        "            \n",
        "#Retrieves a list of all external links found on a page\n",
        "def getExternalLinks(bs, excludeUrl):\n",
        "    global szamlal\n",
        "    externalLinks = []\n",
        "    #Finds all links that start with \"http\" that do\n",
        "    #not contain the current URL\n",
        "    for link in bs.find_all('a', href=re.compile('^(http|www)((?!'+excludeUrl+').)*$')):\n",
        "        szamlal += 1\n",
        "        if szamlal >= 5:  ## Csak 5 db kilistázása\n",
        "             break\n",
        "        if link.attrs['href'] is not None:\n",
        "            if link.attrs['href'] not in externalLinks:\n",
        "                externalLinks.append(link.attrs['href'])\n",
        "    return externalLinks\n",
        "\n",
        "def getRandomExternalLink(startingPage):\n",
        "    try:\n",
        "        html = urlopen(startingPage)\n",
        "        bs = BeautifulSoup(html, 'html.parser')\n",
        "        externalLinks = getExternalLinks(bs, urlparse(startingPage).netloc)\n",
        "        if len(externalLinks) == 0:\n",
        "            print('No external links, looking around the site for one')\n",
        "            domain = '{}://{}'.format(urlparse(startingPage).scheme, urlparse(startingPage).netloc)\n",
        "            internalLinks = getInternalLinks(bs, domain)\n",
        "            return getRandomExternalLink(internalLinks[random.randint(0, len(internalLinks)-1)])\n",
        "        else:\n",
        "            return externalLinks[random.randint(0, len(externalLinks)-1)]\n",
        "    except Exception:\n",
        "        return None\n",
        "    \n",
        "def followExternalOnly(startingSite):\n",
        "    externalLink = getRandomExternalLink(startingSite)\n",
        "    if externalLink == None:\n",
        "      quit\n",
        "    else:  \n",
        "      print('Random external link is: {}'.format(externalLink))\n",
        "      followExternalOnly(externalLink)\n",
        "            \n",
        "        \n",
        "followExternalOnly('https://klajosw.blogspot.com/p/kezdolap.html')\n",
        "\n"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Random external link is: https://klajosw.blogspot.hu/\n",
            "No external links, looking around the site for one\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "IHYJZZMCSYUH"
      },
      "source": [
        "## Collect all External Links from a Site"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "iiSzM6C8SYUI",
        "outputId": "2e0d9e10-ac27-47fa-d8e6-faf719a249f2",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 87
        }
      },
      "source": [
        "from urllib.request import urlopen\n",
        "from urllib.parse import urlparse\n",
        "from bs4 import BeautifulSoup\n",
        "\n",
        "allExtLinks = set()\n",
        "allIntLinks = set()\n",
        "szamlal=1\n",
        "\n",
        "def getAllExternalLinks(siteUrl):\n",
        "    global szamlal\n",
        "    html = urlopen(siteUrl)\n",
        "    domain = '{}://{}'.format(urlparse(siteUrl).scheme,\n",
        "                              urlparse(siteUrl).netloc)\n",
        "    bs = BeautifulSoup(html, 'html.parser')\n",
        "    internalLinks = getInternalLinks(bs, domain)\n",
        "    externalLinks = getExternalLinks(bs, domain)\n",
        "\n",
        "    for link in externalLinks:\n",
        "        szamlal += 1\n",
        "        if szamlal >= 10:  ## Csak 10 db kilistázása\n",
        "             break\n",
        "        if link not in allExtLinks:\n",
        "            allExtLinks.add(link)\n",
        "            print('Új külsö link : ', link)\n",
        "    for link in internalLinks:\n",
        "        szamlal += 1\n",
        "        if szamlal >= 10:  ## Csak 10 db kilistázása\n",
        "             break\n",
        "        if link not in allIntLinks:\n",
        "            print('Új belső link : ', link)\n",
        "            allIntLinks.add(link)\n",
        "            getAllExternalLinks(link)\n",
        "\n",
        "\n",
        "allIntLinks.add('https://klajosw.blogspot.com')\n",
        "getAllExternalLinks('https://klajosw.blogspot.com/p/kezdolap.html')"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Új külsö link :  https://klajosw.blogspot.com/\n",
            "Új külsö link :  https://mierdekel.hu/\n",
            "Új külsö link :  https://klajosw.blogspot.hu/\n",
            "Új belső link :  https://klajosw.blogspot.com/\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}