---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 2.4</h1>

<a href="https://colab.research.google.com/github/arifpucit/data-science/blob/master/Section-2-Basics-of-Python-Programming/Lec-2.04-String-Data-Type/Python-Strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## _Python-Strings.ipynb_
#### [Learn more about Python Strings](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)

<img align="right" width="400" height="800"  src="images/datatypes1.png" > 

## Learning agenda of this notebook
A String is an object type in Python, that is used to record textual information. String data type is actually a sub-type of a broader classification of object type called sequence. A sequence is an object that contains components placed one after the other, where each object is given a numeric index, which identifies that component and its position within the whole sequence. As Python is a zero indexed based language, the first object of any sequence is always indexed zero. A string is defined using opening and closing delimiters. These delimiters are single or double quotes



1. Defining strings in Python
2. Accessing characters of a string in Python
3. Strings are immutable
4. Slicing strings
5. String concatenation
6. Creating large strings

7. String Methods: `lower()`, `upper()`, `strip()`, `startswith()`, `split()`, `join()`, `find()`, `replace()`, `format()`
8. String Membership test

## 1. Defining Strings in Python
- A string is a collection of character(s) closed within single or double quotation marks. (There is no `char` data type in Python as in C/C++)
- A string can also contain a single character or be entirely empty.
- To make a single quote part of a string, define the string using double quotes and vice versa. You can also make use of escape sequence

In [1]:
string1 = 'Hello'
print(string1)

string2 = "World"
print(string2)

string3 = ""
print(string3)

string4 = "A"
print(string4)

Hello
World

A


In [2]:
# triple quotes string can extend multiple lines

string5 = """Hello, This is
            multi-line string"""
print(string5)

string5 = '''Hello, This is
            multi-line string'''
print(string5)

Hello, This is
            multi-line string
Hello, This is
            multi-line string


In [None]:
# Be careful with quotes!
'I'm using single quotes, but will create an error'

The reason for the error above is because the single quote in I'm stopped the string. You can use combinations of double and single quotes to get the complete statement.

In [3]:
"Now I'm ready to use the single quotes inside a string!"

"Now I'm ready to use the single quotes inside a string!"

Escape characters. If there is some where situation where you need a bunch of quotes, you can use escape character \' \'.
Just make sure you close the end of the string without the escape character .

In [None]:
'Now I\'m ready to use the single quotes inside a string!'

## 2. Accessing Characters of a String in Python
- Since string is of type sequence, and any component within a sequence can be accessed by entrying an index within square brackets. So naturally this work for strings as well
- Similarly, if we want to find out the index of a specific item/character, we can use the `str.index()` method

In [4]:
str = 'Python Programming is fun'
print('str = ', str)

#access first index
print('str[0] = ', str[0])

# Negative indices start from the opposite end of the string. Hence, -1 index corresponds to the last character
print('str[-1] = ', str[-1])

#access second last index
print('str[-2] = ', str[-2])

#print(str[17])     #access an index out of the range, will get error

#print(str[1.5])    #use numbers other than an integer as index will flag an error

str =  Python Programming is fun
str[0] =  P
str[-1] =  n
str[-2] =  u


In [None]:
# To find out the index of a specific character
str = "Python Programming is fun"
str.index('th')

In [None]:
dir()

## 3. Strings are Immutable

In [4]:
help(string)

NameError: name 'string' is not defined

In [None]:
#strings are immutable, means string object does not support item assignment
str1 = 'ArifButt'

#str1[5] = 'c'

print(id(str1))

#assigning a new value is valid
str1 = 'python'

print(id(str1))

The object `ArifButt` is now orphan, since there is no variable referring to it now and will be collected by Python garbage collector.

## 4. Slicing Strings
- Slicing is the process of obtaining a portion (substring) of a string by using its indices.
- Given a string, we can use the following syntax to slice it and obtain a substring:
```
string[start:end]
```

- **start** is the index from where we want the substring to start. If start is not provided, slicing starts from the beginning.
- **end** is the index where we want our substring to end (not inclusive in the substring). If end is not provided, slicing goes till the end of the string (includes the last character of the string).

In [22]:
str1 = 'DataScienceToolsAndTechniques'

print(str1[0:4]) # From the start till before the 4th index
print(str1[:4]) # From the start till before the 4th index
print(str1[11:16])
print(str1[19:]) # From the 19th index till the end
print(str1[19:len(str)]) # From the 19th index till the end
#if start is greater than end, it will return empty string
print(str1[5:2])

Data
Data
Tools
Techniques
Techniques



### a. Slicing with a Step 
- In the above example, weâ€™ve used slicing to obtain a contiguous piece of a string, i.e., all the characters from the starting index to before the ending index are retrieved.
- However, we can define a step through which we can skip characters in the string. The default step is 1, so we iterate through the string one character at a time.
- The step is defined after the end index:
```
string[start:end:step]
```

In [None]:
str1 = 'DataScienceToolsAndTechniques'
print(str1[::])  # A default step of 1
print(str1[::1])  # A step of 1
print(str1[::2])  # A step of 2

### b. Reverse Slicing
- Strings can also be sliced to return a reversed substring. 
- For reverse slicing we need to give a negative step
- For reverse slicing the `start` index must be greater than the `end` index, otherwise an empty string will be returned

In [23]:
str1 = '0123456789'
print(str1[::-1]) 
print(str1[5:1:-1]) 
print(str1[2:10:-1])
print(str1[::-2]) 

9876543210
5432

97531


## 5. String Concatenation
- Two strings can be joined or concatenated using the `+` operator

In [None]:
str1 = 'Hello'
str2 =' World!'
str3 = str1 + str2
print('str1 + str2 = ', str3)


print("Y" + str3[1:])

## 6. Creating Large strings
- A string can be replicated/repeated using the `*` operator

In [6]:
str1 = 'Hello'
print('str1 * 5 =', str1 * 5)

buffer = 'A' * 100
print(buffer)

str1 * 3 = HelloHelloHelloHelloHello
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


In [None]:
"Arif"*100

## 7. String Methods
- Objects in Python usually have built-in **methods**. These methods are functions defined in the class that can perform actions on the object of that class. 
- To keep it simple, methods are actions that are performed on object of a class, while functions are actions that belong to no 'data thing' (object). They just exist in the program.
- Methods will perform specific actions on the object and can also take arguments. We call methods with a period and then the method name. Basically, we say `"Hey object, do this to these arguments"`. The syntax to call methods is:
```
object.method(arg1, arg2, ...)
```
- Where parameters are extra arguments we can pass to the method.
- Remember, most of the String methods do not modify the string object on which they are called, rather return a new string object that has been altered as per the requested opertaion. 
- Let me re-emphasize, that all string methods return new values and DO NOT change the existing string. 
- [Click me to learn more about string methods](https://www.w3schools.com/python/python_ref_string.asp)

### a. The `len()`, `str.lower()`, `str.upper()` and `str.capitalize()` methods
- The `len()` is a built-in function that returns the number of items of a container data type passed as argument. We can check the length of any data types that are collections with built-in `len()` function.
It is considered a built-in function.
- The `str.lower()` method return a copy of the string converted to lowercase.
- The `str.upper()` method return a copy of the string converted to uppercase.
- The `str.capitalize()` method return a capitalized version of the string.

In [None]:
help(len)

In [24]:
str1= "Hello World"
mylist = [1,2,3,4,5]
len(str1)
len(mylist)

5

- To know what all methods the object of string class support. Just type the name of the object, place a dot and then press <tab> to view the list of its attributes and methods it support. 
`str.<tab>`
- Similarly to get help about a method, after the method name press <shift+tab>, to get information about what the method do, what parameters it take, its return value
`str.lower<shift+tab>`

In [None]:
str1="DS"
str.

In [25]:
str1 = 'LearNing is Fun with Arif'
print('Orignial string = ', str1)

rv = len(str1)
print('len(str1) = ', rv)

rv = str1.lower()
print('str1.lower() = ', rv)
      
print('str1.upper() = ', str1.upper())

rv = str.capitalize()
print('str1.capitalize() = ', rv)
print('Orignial string = ', str1)


Orignial string =  LearNing is Fun with Arif
len(str1) =  25
str1.lower() =  learning is fun with arif
str1.upper() =  LEARNING IS FUN WITH ARIF
str1.capitalize() =  Welcome to learning data science with arif
Orignial string =  LearNing is Fun with Arif


### b. The `str.strip()` method
- The `str.strip()` method removes whitespace characters from the beginning and end of a string.
- The `str.lstrip()` method removes whitespace characters from the beginning of a string.
- The `str.rstrip()` method removes whitespace characters from the end of a string.


In [None]:
str1="DS"
help(str1.strip)

In [26]:
buffer ="    hello world, this is       Arif Butt      "
rv = buffer.strip()
print(buffer)
print(rv)

    hello world, this is       Arif Butt      
hello world, this is       Arif Butt


In [27]:
buffer
buffer.lstrip()

'hello world, this is       Arif Butt      '

In [28]:
buffer
buffer.rstrip()

'    hello world, this is       Arif Butt'

### c. The `str.startswith()` method
The `str.startswith()` method return True if str starts with the specified prefix, False otherwise.
```
str.startswith(prefix[, start[, end]])
```

In [None]:
str1="DS"
help(str1.startswith)

In [29]:
str1 = "Learning is fun with Arif Butt"

rv = str1.startswith('Learning')
print(rv)

rv = str1.startswith('Arif')
print(rv)


rv = str1.startswith('Arif', 21)
print(rv)

# case sensitive
rv = str1.startswith('arif', 21)
print(rv)

rv = str1.startswith('arn', 2, 10)  # character at ending index is not included
print(rv)


True
False
True
False
True


### d. The `str.split()` and `str.join()` method
- The `str.split()` method splits a string into a list of strings at every occurrence of space character by default. You may pass a parameter `sep='i'` to split method to split at that specific character instead.
- The `sep.join(list)` method is passed a list/iterable as parameter and is called on a separater character. It joins the strings inside the list to a single string and returns

In [None]:
str1="DS"
help(str1.split)

In [7]:
str1 = 'Learning is fun with Arif Butt'
rv = str1.split()
print(rv)
print(type(rv))

['Learning', 'is', 'fun', 'with', 'Arif', 'Butt']
<class 'list'>


In [8]:
str1 = 'Learning is fun with Arif Butt'
rv = str1.split(sep='i')
print(rv)
print(type(rv))

['Learn', 'ng ', 's fun w', 'th Ar', 'f Butt']
<class 'list'>


In [30]:
str1="DS"
help(str1.join)

Help on built-in function join:

join(iterable, /) method of builtins.str instance
    Concatenate any number of strings.
    
    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.
    
    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'



In [31]:
mystr = "L e a r n i n g"
mystr.split(' ')

['L', 'e', 'a', 'r', 'n', 'i', 'n', 'g']

In [9]:
# The join() method takes all items in an iterable and joins them into one string.
mylist = ['Learning', 'is', 'fun', 'with', 'Arif']

#Note the separator is space character
mystr = ' '.join(mylist)

print(mylist, type(mylist))
print(mystr, type(mystr))

['Learning', 'is', 'fun', 'with', 'Arif'] <class 'list'>
Learning is fun with Arif <class 'str'>


In [10]:
# You can call join() method on a separator or your choice
mylist = ['Arif', 'Rauf', 'Maaz', 'Hadeed', 'Mujahid', 'Mohid']

#Note the separator is hash character
mystr = '#'.join(mylist)

print(mylist, type(mylist))
print(mystr, type(mystr))

['Arif', 'Rauf', 'Maaz', 'Hadeed', 'Mujahid', 'Mohid'] <class 'list'>
Arif#Rauf#Maaz#Hadeed#Mujahid#Mohid <class 'str'>


### e. The `str.find()` method
- The `str.find()` method is used to find a substring from within a string, which returns the first index at which a substring occurs in a string. If no instance of the substring is found, the method returns -1.
```
str.find(substring, start, end)
```
where 
    - `substring` is what we are searching for,
    - `start` is the index from where we want to start searching (default value is 0)
    - `end` is the index where we want to stop our search (default value is len(str) -1)

In [32]:
str1 = 'DataScienceToolsAndTechniques'
print(str1.find('Data'))
print(str1.find('And'))


print(str1.find('S',2)) # second argument starts searching from that index
print(str1.find('s',2)) # case sensitive

print(str1.find('S',0, 4)) # third argument stops searching uptill that index
print(str1.find('S',0, 5)) 

0
16
4
15
-1
4


### f. Use `str.replace()` method to find a substring
- The `str.replace()` method returns a string after replacing all occurrences of `substring_to_be_replaced` with `new_string`.
```
str.replace(substring_to_be_replaced, new_string, count = -1)
```
- Note that `replace` returns a new string, and the original string is not modified.

In [None]:
str1="DS"
help(str1.replace)

In [None]:
print("hello".replace("e","a"))

In [33]:
str1 = 'Welcome to Learning Data Science with Arif'
newstring = str1.replace('Data Science', 'Life')
print(str1)
print(newstring)

Welcome to Learning Data Science with Arif
Welcome to Learning Life with Arif


### g. The `str.format()` method
- The `str.format()` method combines values of other data types, e.g., integers, floats, booleans, lists, etc. with strings. 
- You can use `str.format()` to construct output messages for display in the Python built-in `print()` function.
- You put placeholders `{}` within the format string of `print()` function, and the arguments to the `str.format()` method are the variable names
- The values of the variables are replaced with the arguments provided to the `str.format()` method.

In [None]:
#Example 1:
age = 51;    name="Arif Butt"

print("Mr. {}, you are {} years old." .format(name, age))

In [None]:
#Example 2:
name="Hadeed Butt"
cost = 100
discount = .2
bill = cost - cost * discount

print("Mr. {2}, your total cost is {1}, percentage discount is {3}, and bill is {0}" 
      . format(bill, cost, name, discount))


### Comparing two strings using `is` operator and `==` operator

In [None]:
# Let us check out the IDs of the following two variables. Like numbers type of variable, 
# they are same as both a and b refers to the same memory location containing string 'hello'
a = 'hello'
b = 'hello'
id(a), id(b)

In [None]:
# in case of strings, both a and b refers to the same memory location containing string 'hello'
a = 'hello'
b = 'hello'

# The `is` operator checks the memory address of two strings 
print (a is b) 
# The `==` operator checks the contents of two strings
print (a == b) 


print(a is not b)
print (a != b)

In [None]:
# both x and y refers to two different memory locations containing string 'hello'
x = 'hello'
y = 'bye'

# The `is` operator checks the memory address of two strings 
print (x is y) 
# The `==` operator checks the contents of two strings
print (x == y) 


print(x is not y)
print (x != y)

### String Membership test using `in` operator

In [None]:
'a' in 'DataScience'

In [None]:
'th' not in 'python'

## Check your Concepts

Try answering the following questions to test your understanding of the topics covered in this notebook:

1. What are the container types available in Python?
2. What kind of data does the String data type represent?
3. What are the different ways of creating strings in Python?
4. What is the difference between strings created using single quotes, i.e. `'` and `'` vs. those created using double quotes, i.e. `"` and `"`?
5. How do you create multi-line strings in Python?
6. What is the newline character, `\n`?
7. What are escaped characters? How are they useful?
8. How do you check the length of a string?
9. How do you convert a string into a list of characters?
10. How do you access a specific character from a string?
11. How do you access a range of characters from a string?
12. How do you check if a specific character occurs in a string?
13. How do you check if a smaller string occurs within a bigger string?
14. How do you join two or more strings?
15. What are "methods" in Python? How are they different from functions?
16. What do the `.count`, `.isalnum` and `.isalpha` methods on strings do?
17. How do you replace a specific part of a string with something else?
18. How do you split the string "Sun,Mon,Tue,Wed,Thu,Fri,Sat" into a list of days?
19. How do you remove whitespace from the beginning and end of a string?
20. What is the string `.format` method used for? Can you give an example?
21. What are the benefits of using the `.format` method instead of string concatenation?
22. How do you convert a value of another type to a string?
23. How do you check if two strings have the same value?
24. Where can you find the list of all the methods supported by strings?