# <center>How To Retrieve Unstructred Web Data In a Structured Manner with riko</center>
## <center>A Riveting 15-688 Tutorial</center><br/><center>*by* Ahmet Emre Unal ([aemreunal](https://github.com/aemreunal))</center>

You might have heard about [Google Reader](https://en.wikipedia.org/wiki/Google_Reader). It was a free [RSS](https://en.wikipedia.org/wiki/RSS) reader that brought RSS reading to the masses. It was a great product and I, personally, was a very heavy user. Google Reader allowed me to follow many websites that publish things infrequently. This, though, was only possible through the RSS feeds published by the websites.

It's great when a website admin takes the time to create the necessary RSS feeds (or implement the tool that does it) but every so often, you come across websites that you want to follow but don't have an RSS feed. How can you now make use of this beautiful system? Can you somehow parse the plain HTML web page to retrieve data in an ordered fashion?

[The riko library](https://github.com/nerevu/riko/) is a library that allows you to do exactly that. By using Riko, we can parse the plain HTML of a website and retrieve the elements in a website in an orderly fashion, like iterating through ```<li>``` elements with a for-loop, for example. 

I personally believe in walking through examples to learn something so let's jump right in (If you would like to follow along, you can [install riko](https://github.com/nerevu/riko/blob/master/README.rst#installation) on your local environment):

In [1]:
import os

from os import path as p
from riko.collections.sync import SyncPipe

CWD = os.getcwd()

def get_test_site_url(test_site_name):
    return 'file://%s' % p.join(CWD, 'test_sites', test_site_name)

In [2]:
##########################################################################################
#
# Note: You can use the following section to create the test sites' files:
#
##########################################################################################

test_site_1_contents = '''
<!DOCTYPE html>
<html>
    <body>
        <h4>This is a simple example</h4>
        <div class="container">
            <ul>
                <li class="drink hot">Coffee</li>
                <li class="drink hot">Green Tea</li>
                <li class="hot drink">Black Tea</li>
                <li class="drink cold">Milk</li>
                <li class="food">Chocolate</li>
                <li class="food">Marshmallow</li>
            </ul>
        </div>
    </body>
</html>
'''

test_site_2_contents = '''
<!DOCTYPE html>
<html>
    <body>
        <h4>This is a slightly more complex example</h4>
        <div class="container">
            <ul>
                <li class="drink hot">Coffee</li>
                <li class="drink hot">Green Tea
                    <p>Oolong Tea</p>
                    <a href="https://en.wikipedia.org/wiki/Oolong"></a>
                </li>
                <li class="hot drink">Black Tea
                    <p>Rize Tea</p>
                    <a href="https://en.wikipedia.org/wiki/Rize_Tea"></a>
                </li>
                <li class="drink cold">Milk</li>
                <li class="food">Chocolate</li>
                <li class="food">Marshmallow</li>
            </ul>
        </div>
    </body>
</html>
'''

# You can use the following functions to create the test sites' files:
path = p.join(CWD, 'test_sites')
test1_path = p.join(path, 'test1.html')
test2_path = p.join(path, 'test2.html')

try:
    os.mkdir(path)  # Try to create the 'test_sites' folder
except FileExistsError:
    pass            # Ignore error if 'test_sites' folder exists
    

try:
    with open(test1_path, "w") as test_site_1:
        pass
except IOError:     # If 'test1.html' file doesn't exist, create it
    with open(test1_path, "w") as test_site_1:
        test_site_1.write(test_site_1_contents)
        
try:
    with open(test2_path, "w") as test_site_1:
        pass
except IOError:     # If 'test2.html' file doesn't exist, create it
    with open(test2_path, "w") as test_site_1:
        test_site_1.write(test_site_2_contents)

In the ```test_sites``` folder, you will find some number of HTML files that are simple website examples. The first one, ```test1.html```, is as follows:
```html
<!DOCTYPE html>
<html>
<body>

<h4>This is a simple example</h4>
<div class="container">
    <ul>
      <li class="drink hot">Coffee</li>
      <li class="drink hot">Green Tea</li>
      <li class="hot drink">Black Tea</li>
      <li class="drink cold">Milk</li>
      <li class="food">Chocolate</li>
      <li class="food">Marshmallow</li>
    </ul>
</div>

</body>
</html>
```
Riko sees things through what's called a 'pipe'. By fetching a webpage through a URL and pointing Riko to the appropriate part of said webpage, we can obtain 'streams' coming from those 'pipe's that can be iterated. Let's start with a very simple act of retrieveing the webpage in its entirety. We can achieve this with the very simple [```fetchpage```](https://github.com/nerevu/riko/blob/master/riko/modules/fetchpage.py) module, which will literally just fetch a page:

In [4]:
url = get_test_site_url('test1.html')          # The URL of our test website
fetch_conf = {'url': url}                      # A configuration dictionary for Riko
pipe = SyncPipe('fetchpage', conf=fetch_conf)  # A pipe that streams 'test1.html'
stream = pipe.output                           # The stream being output from the pipe

What we did was to tell Riko to create a synchronous pipe (using the [SyncPipe class](https://github.com/nerevu/riko/blob/master/riko/collections/sync.py)) that uses the webpage fetching module (called ```fetchpage```) to fetch the URL specified in the ```fetch_conf``` configuration dictionary. 

We could've created the stream driectly by simply using the ```fetchpage``` module directly:
```python
from riko.modules import fetchpage
stream = fetchpage.pipe(conf=fetch_conf)
```
but we'll see in a bit why we're using the ```SyncPipe``` class.

You might've wondered when did Riko even have the time to go fetch the page? Well, pipes in Riko are *lazy*. That means it won't start fetching (or processing) a URL before we start iterating. So let's iterate:

In [5]:
for item in stream:
    print(item)

{'content': b'<!DOCTYPE html>\n\n<html>\n\n<body>\n\n\n\n<h4>This is a simple example</h4>\n\n<div class="container">\n\n    <ul>\n\n      <li class="drink hot">Coffee</li>\n\n      <li class="drink hot">Green Tea</li>\n\n      <li class="hot drink">Black Tea</li>\n\n      <li class="drink cold">Milk</li>\n\n      <li class="food">Chocolate</li>\n\n      <li class="food">Marshmallow</li>\n\n    </ul>\n\n</div>\n\n\n\n</body>\n\n</html>\n'}


I told you it would literally just fetch the entire page. The whole webpage being printed is not really that useful; there is nothing special about this. We could've at least specified a start and end tag for Riko to fetch only that part:

In [7]:
# The same config as above, but with the start and end tags to fetch specified
fetch_conf = {   'url': url, 'start': '<body>', 'end': '</body>'}

# A pipe that streams 'test1.html' according to the config above
pipe = SyncPipe('fetchpage', conf=fetch_conf)  

# The stream being output from the pipe
stream = pipe.output                           

for item in stream:
    print(item)

{'content': b'\n\n\n\n<h4>This is a simple example</h4>\n\n<div class="container">\n\n    <ul>\n\n      <li class="drink hot">Coffee</li>\n\n      <li class="drink hot">Green Tea</li>\n\n      <li class="hot drink">Black Tea</li>\n\n      <li class="drink cold">Milk</li>\n\n      <li class="food">Chocolate</li>\n\n      <li class="food">Marshmallow</li>\n\n    </ul>\n\n</div>\n\n\n\n'}


This isn't very useful either, honestly. To get to the list items we want, we'd need to do some weird string processing. We don't want to do that and that's why we have riko!

***
Let's take a side step and ask ourselves a question: a URL is a string that points to a webpage (or a file in the filesystem), but what could point to an element *inside* a webpage? The answer is [XPath](https://en.wikipedia.org/wiki/XPath). 'XPath' is very similar to a URL, only that it denotes a path inside a markup file. For example, the XPath of the ```<ul>``` element in the website structure above is ```/html/body/div/ul```. In turn, each ```<li>``` element under that ```<ul>``` element could be pointed to using the XPath ```/html/body/div/ul/li[<index>]```, where ```<index>``` is the 1-based index (index = 1 is the first element) or all ```<li>``` elements with the XPath ```/html/body/div/ul/li```.
***
Riko has an alternate module called [```xpathfetchpage```](https://github.com/nerevu/riko/blob/master/riko/modules/xpathfetchpage.py) that can take a URL, as well as an XPath, and can pipe the element pointed by that XPath:

In [8]:
# The XPath of the <ul> element
xpath = '/html/body/div/ul'                         

# The XPath configuration dictionary for riko
xpath_conf = {'xpath': xpath, 'url': url}           

# A pipe that streams what's pointed by the XPath inside 'test1.html'
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)  

# The stream being output from the pipe
stream = pipe.output                                
    
for item in stream:
    print(item)

{'{http://www.w3.org/1999/xhtml}li': [{'class': 'drink hot', 'content': 'Coffee'}, {'class': 'drink hot', 'content': 'Green Tea'}, {'class': 'hot drink', 'content': 'Black Tea'}, {'class': 'drink cold', 'content': 'Milk'}, {'class': 'food', 'content': 'Chocolate'}, {'class': 'food', 'content': 'Marshmallow'}]}


Ah, now this seems interesting. The pipe seems to have retrieved a dictionary with a single key, ```u'{http://www.w3.org/1999/xhtml}li'``` (weird key, I know), which points to a list of dictionaries, like ```{u'content': u'Coffee', u'class': u'drink hot'}```, that look eerily similar to our list elements! But it's still tedious at this point to unwrap that outer dictionary. Let's try pointing Riko to an XPath that matches all multiple `<li>` elements, which is ```/html/body/div/ul/li```:

In [9]:
# The XPath of the <li> element(s)
xpath = '/html/body/div/ul/li'                      

# The XPath configuration dictionary for riko
xpath_conf = {'xpath': xpath, 'url': url}           

# A pipe that streams what's pointed by the XPath inside 'test1.html'
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)  

# The stream being output from the pipe
stream = pipe.output                                
    
for item in stream:
    print(item)

{'class': 'drink hot', 'content': 'Coffee'}
{'class': 'drink hot', 'content': 'Green Tea'}
{'class': 'hot drink', 'content': 'Black Tea'}
{'class': 'drink cold', 'content': 'Milk'}
{'class': 'food', 'content': 'Chocolate'}
{'class': 'food', 'content': 'Marshmallow'}


Now we're talking. We have retrieved each ```<li>``` element as a seperate item through the stream we created.

As mentioned above, we could've retrieved a specific ```<li>``` element by specifying its index on the XPath; adding '`[1]`' to the end of the XPath above will return:
```python
{u'content': u'Coffee', u'class': u'drink hot'}
```

Let's say we are only interested in the drinks. How do we only get the drinks? Do we check for and do some weird string matching with the ```class``` of each element *while* iterating over the stream elements and only add ones that match our criteria? Nope!

The point of having streams and pipes is to *filter* the streams and prevent the unwanted objects from going through the stream in the first place. Riko has a way to filter streams, by using the very handy [```filter```](https://github.com/nerevu/riko/blob/master/riko/modules/filter.py) pipe module. The gist of thinking in Riko's terms is to think of chaining pipes together. The first pipe will be carrying a flow of ```<li>``` elements we pointed to. The second pipe, the ```filter``` pipe, will only let through elements that match a certain criteria:

In [10]:
# The URL of our test website
url = get_test_site_url('test1.html')               

# The XPath of the <li> element(s)
xpath = '/html/body/div/ul/li'                      

# The XPath configuration dictionary for riko
xpath_conf = {'xpath': xpath, 'url': url}           

# A pipe that streams what's pointed by the XPath inside 'test1.html'
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)  

# A 'filter' rule that tells the 'filter'
# pipe to perform the 'contains' operation on the 'class'
# field, to check wether the value 'drink' exists, and
# only let through the items that do match the rule
filter_rule = {'field': 'class', 'op': 'contains', 'value': 'drink'}

# The 'filter' pipe configuration created from the rule
filter_conf = {'rule': filter_rule}                 

# A chained pipe that filters acording to the configuration
pipe = pipe.filter(conf=filter_conf)                

# The stream being output from the pipe
stream = pipe.output                                
    
for item in stream:
    print(item)

{'class': 'drink hot', 'content': 'Coffee'}
{'class': 'drink hot', 'content': 'Green Tea'}
{'class': 'hot drink', 'content': 'Black Tea'}
{'class': 'drink cold', 'content': 'Milk'}


This is getting really cool. We seemed to have retrieved all the drinks, and only the drinks! A similar operation can be performed to only retrieve the hot drinks:

In [11]:
# The URL of our test website
url = get_test_site_url('test1.html')               

# The XPath of the <li> element(s)
xpath = '/html/body/div/ul/li'                      

# The XPath configuration dictionary for riko
xpath_conf = {'xpath': xpath, 'url': url}           

# A pipe that streams what's pointed by the XPath inside 'test1.html'
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)  
                                                    
# A 'filter' rule that tells the 'filter'
# pipe to perform the 'contains' operation on the 'class'
# field, to check whether the value 'drink hot' exists, and
# only let through the items that do match the rule
filter_rule = {'field': 'class', 'op': 'contains', 'value': 'drink hot'}

# The 'filter' pipe configuration created from the rule
filter_conf = {'rule': filter_rule}                 

# A chained pipe that filters acording to the configuration
pipe = pipe.filter(conf=filter_conf)                

# The stream being output from the pipe
stream = pipe.output                                
    
for item in stream:
    print(item)

{'class': 'drink hot', 'content': 'Coffee'}
{'class': 'drink hot', 'content': 'Green Tea'}


Wow, this even cooler. but it seems like we have a problem: the fact that the '```value```' key in the rule above has a '```drink hot```' value means that it's not matching an ```<li>``` element with the class '```hot drink```', which is perfectly valid and equal to the class '```drink hot```'. It looks like having a long, more specific value can get pretty unwieldy. It seems to me like it would make more sense if we could apply shorter, more general, *multiple* rules to the ```filter``` pipe:

In [12]:
# The URL of our test website
url = get_test_site_url('test1.html')               

# The XPath of the <li> element(s)
xpath = '/html/body/div/ul/li'                      

# The XPath configuration dictionary for riko
xpath_conf = {'xpath': xpath, 'url': url}           

# A pipe that streams what's pointed by the XPath inside 'test1.html'
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)  
                                                    
# A 'filter' rule that tells the 'filter'
# pipe to perform the 'contains' operation on the 'class'
# field, to check whether the value 'drink' exists, and
# only let through the items that do match the rule
filter_rule_drink = { 'field': 'class', 'op': 'contains', 'value': 'drink'}

# A 'filter' rule that tells the 'filter'
# pipe to perform the 'contains' operation on the 'class'
# field, to check whether the value 'hot' exists, and
# only let through the items that do match the rule
filter_rule_hot = {'field': 'class', 'op': 'contains', 'value': 'hot'}

# The 'filter' pipe configuration created from the two
# rules specified above
filter_conf = {'rule': [filter_rule_drink, filter_rule_hot]}

# A chained pipe that filters acording to the configuration
pipe = pipe.filter(conf=filter_conf)                

# The stream being output from the pipe
stream = pipe.output                                
    
for item in stream:
    print(item)

{'class': 'drink hot', 'content': 'Coffee'}
{'class': 'drink hot', 'content': 'Green Tea'}
{'class': 'hot drink', 'content': 'Black Tea'}


Have you heard? They're saying you're the coolest kid on the block. It seems to be pretty clear how you can apply different filters to get the elements you want. You can use the filter pipe to filter based on the content as well, to, for example, print only the teas:

```python
filter_rule = {          # A 'filter' rule that tells the 'filter'
    'field': 'content',  # pipe to perform the 'contains' operation on the 'content'
    'op': 'contains',    # field, to check whether the value 'tea' exists, and
    'value': 'tea'       # only let through the items that do match the rule
}
```
which, when used in the ways above, would print:
```python
{u'content': u'Green Tea', u'class': u'drink hot'}
{u'content': u'Black Tea', u'class': u'hot drink'}
```
You can notice that the rule was applied case-insensitively.
***
Through all of these streams, you can use the items, which are plain old Python objects, in any way you want. You can go ahead and print the list of hot drinks you have with the following for-loop:

```python
for item in stream:
    print(item['content'])  # 'item' object is a regular Python dictionary
```

which would print:

```
Coffee
Green Tea
Black Tea
```

***
Let's look at the following, more complicated webpage structure, which is `test2.html`:

```html
<!DOCTYPE html>
<html>
<body>

<h4>This is a slightly more complex example</h4>
<div class="container">
    <ul>
      <li class="drink hot">Coffee</li>
      <li class="drink hot">Green Tea
          <p>Oolong Tea</p>
          <a href="https://en.wikipedia.org/wiki/Oolong"></a>
      </li>
      <li class="hot drink">Black Tea
          <p>Rize Tea</p>
          <a href="https://en.wikipedia.org/wiki/Rize_Tea"></a>
      </li>
      <li class="drink cold">Milk</li>
      <li class="food">Chocolate</li>
      <li class="food">Marshmallow</li>
    </ul>
</div>

</body>
</html>
```

How would we access the URLs nested under the teas in the list? If you thought of 'XPath', you can congratulate yourself:

In [13]:
# The URL of our test website
url = get_test_site_url('test2.html')               

# The XPath of the <a> element(s)
xpath = '/html/body/div/ul/li/a'                    

# The XPath configuration dictionary for Riko
xpath_conf = {'xpath': xpath, 'url': url}           

# A pipe that streams what's pointed by the XPath inside 'test2.html'
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)  

# The stream being output from the pipe
stream = pipe.output                                
    
for item in stream:
    print(item['href'])

https://en.wikipedia.org/wiki/Oolong
https://en.wikipedia.org/wiki/Rize_Tea


It seems like we got both of the URLs. Notice how Riko didn't raise an error for ```<li>``` tags that lack ```<a>``` tags underneath them. This is because the XPath only matches those that do have the ```<a>``` tags. This is very handy for unstructured web data, where some tags might have nested elements, while some might not.
***
Finally, let's apply what we learned to a real world example: a prominent Turkish writer by the name 'Yılmaz Özdil' publishes an article every day on the newspaper 'Sözcü', talking about the current affairs of Turkey. The newspaper lists his articles under the URL:

In [14]:
url = 'http://www.sozcu.com.tr/kategori/yazarlar/yilmaz-ozdil/'

On this page, you can see a list of article titles (that link to the articles themselves), along with the date it was published. The XPath of the list elements are:

In [15]:
xpath = '/html/body/div[5]/div[6]/div[3]/div[1]/div[2]/div[1]/div[1]/div[2]/ul/li/a'

Let's go ahead and set up a pipe to fetch these list entries:

In [17]:
# The XPath configuration dictionary for Riko
xpath_conf = {'xpath': xpath, 'url': url}           

# A pipe that streams what's pointed by the XPath inside the web page
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)  

# truncate will allow us to get only 
# the first n elements, which is 3 in this case
truncated = pipe.truncate(conf={'count': 3})

# The stream being output from the truncated pipe
stream = truncated.output                                
    
for item in stream:
    print(item)
    print()

{'title': 'Amok', 'href': 'http://www.sozcu.com.tr/2016/yazarlar/yilmaz-ozdil/amok-1491254/', '{http://www': {'w3': {'org/1999/xhtml}span': {'class': 'date', 'content': '6 Kasım 2016'}, 'org/1999/xhtml}p': 'Amok'}}}

{'title': 'hdp', 'href': 'http://www.sozcu.com.tr/2016/yazarlar/yilmaz-ozdil/hdp-1489785/', '{http://www': {'w3': {'org/1999/xhtml}span': {'class': 'date', 'content': '5 Kasım 2016'}, 'org/1999/xhtml}p': 'hdp'}}}

{'title': 'Darbe komisyonu', 'href': 'http://www.sozcu.com.tr/2016/yazarlar/yilmaz-ozdil/darbe-komisyonu-1486993/', '{http://www': {'w3': {'org/1999/xhtml}span': {'class': 'date', 'content': '4 Kasım 2016'}, 'org/1999/xhtml}p': 'Darbe komisyonu'}}}



It seems like we retrieved the first 3 articles (the exact articles you retrieve will be different when ran on a different day). You can notice that each element has a title, a URL and a date. Let's say that we want to parse all of this and return it as a list of tuples, where each entry is of form: `(title, date, url)`. 

We can do this the old fashioned way, where we iterate through each of those dictionaries and get the data we want. Instead, let's do something a bit different: let's set up two pipes for two different XPaths, and iterate through them synchronously:

In [36]:
# Top-level <a> elements stream
# The XPath config. for the top-level <a> elements
xpath_conf = {'xpath': xpath, 'url': url}                   

# A pipe that streams the top-level <a> elements 
pipe = SyncPipe('xpathfetchpage', conf=xpath_conf)

# truncate will allow us to get only 
# the first n elements, which is 3 in this case
truncated = pipe.truncate(conf={'count': 3})            

# The stream being output from the pipe
stream = truncated.output                                    
  
for item in stream:  
    date = item.get('{http://www.w3.org/1999/xhtml}span')['content']
    article = (item['title'], date, item['href'])
    print(article)
    print()

('Amok', '6 Kasım 2016', 'http://www.sozcu.com.tr/2016/yazarlar/yilmaz-ozdil/amok-1491254/')

('hdp', '5 Kasım 2016', 'http://www.sozcu.com.tr/2016/yazarlar/yilmaz-ozdil/hdp-1489785/')

('Darbe komisyonu', '4 Kasım 2016', 'http://www.sozcu.com.tr/2016/yazarlar/yilmaz-ozdil/darbe-komisyonu-1486993/')



This is awesome! You can go to the website and see the list elements for yourself. For a website that is mostly auto-generated (disastrously, might I say), this was relatively easy to achieve!

Let's look at one last example: let's fetch this list and dynamically fetch the articles it points to and get the full article:

In [45]:
# Article list elements stream
# The XPath configuration for the article list
xpath_conf_list = {'xpath': xpath, 'url': url}

# A pipe that streams the article list elements
pipe_list = SyncPipe('xpathfetchpage', conf=xpath_conf_list)

# The stream being output from the pipe
stream_list = pipe_list.output

# The article stream
# XPath of article body
xpath_article = '/html/body/div[5]/div[6]/div[3]/div/div[2]/div[1]/div/div[2]/div[2]'

# The XPath configuration for the articles
# Notice how we can refer to a 'subkey' as the URL of this configuration
xpath_conf_article = {'url': {'subkey': 'href'}, 'xpath': xpath_article}

# A pipe that streams the articles linked to by the list stream
# Notice how we create this pipe by chaining a pipe on top of the list pipe;
# how this one is 'dependent' on the list pipe
pipe_article = pipe_list.xpathfetchpage(conf=xpath_conf_article)

# truncate will allow us to get only 
# the first n elements, which is 3 in this case
truncated_article = pipe_article.truncate(conf={'count': 3})   

# The stream being output from the article pipe
stream_article = truncated_article.output                           

# Create a zipped iterator from the two pipes
for list_item, article in zip(stream_list, stream_article):
    # Get the list of <p> elements under this XPath    
    elements = article.get('{http://www.w3.org/1999/xhtml}p')

    # Grab only the strings under the <p> elements
    string_els = (el for el in elements if not hasattr(el, 'values'))

    # Join strings to create the whole article
    article_body = '\n'.join(string_els)

    # Create the article's (title, body) tuple
    title = list_item['title']
    print(title)
    print('=' * len(title))
    print(article_body)  
    print()

Amok
====
Durup dururken…
*
Amok koşusu bu.
*
Malezya'da görülen bir tür çıldırma hali… Malezya'ya özgü tarihsel ve kültürel unsurlardan kaynaklanıyor. Manasız intihar saldırısı da deniyor. Kişi önce derin bunalıma sürükleniyor, kafasının içinde kurmaya başlıyor, gerçekte öyle olmadığı halde, kendisine yönelik gizli planlar yapıldığını, kendisini aşağıladıklarını, kendisine hakaret edildiğini düşünüyor, biriktiriyor biriktiriyor, ansızın patlıyor, güya kendini koruyor, vahşice, ayrım gözetmeden saldırıya geçiyor. İçinde yaşadığı topluma hayati zararlar vermesine rağmen, kendine geldiğinde ne yaptığını hatırlamıyor, palasından kan süzülürken, şaşkınlıkla “burda ne oldu böyle?” diye soruyor.
*
Bi kaç sene öncesine kadar Türkiye'nin en hararetli tartışma konularından biri “Malezya mı oluyoruz?” sorusuydu.
*
Henüz Malezya olmadık ama…

hdp
===
Neymiş efendim, hdp milletvekilleri terörle arasına mesafe koymamış filan… Geçiniz efendim!
*
Zor yazıdır bu.
*
Zoruma giden yazıdır.
*
hdp'ye oy ve

We now have a script to fetch the articles and read them easily, without needing to go to the website. Can this be the most effective ad blocker?
***
The power of Riko and its pipes may not be immediately visible through parsing just a website but as you explore different options, you can appreciate the power it gives you, the developer, over the mess that is HTML and the World Wide Web. 