{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "***\n",
    "# 7. 문자열 정의 및 기초 연산\n",
    "***\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## 1 시퀀스 자료형의 지원 연산\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-1 시퀀스 자료형이란?\n",
    "- 저장된 각 요소를 정수 Index를 이용하여 참조가 가능한 자료형 \n",
    "- 시퀀스(Sequence) 자료형: 문자열, 리스트, 튜플"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = 'abcdef'\n",
    "L = [100, 200, 300]\n",
    "t = ('tuple', 'object', 1, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 시퀀스 자료형이 가지는 공통적인 연산\n",
    "  - 인덱싱 (Indexing)\n",
    "  - 슬라이싱 (Slicing)\n",
    "  - 확장 슬라이싱 (Extended Slicing)\n",
    "  - 연결 (Concatenation)\n",
    "  - 반복 (Repitition)\n",
    "  - 멤버쉽 테스트 (Membership Test)\n",
    "  - 길이 정보 (Length)\n",
    "  - for ~ in 문"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-2 인덱싱"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a\n",
      "b\n",
      "f\n",
      "\n",
      "200\n",
      "900\n"
     ]
    }
   ],
   "source": [
    "s = 'abcdef'\n",
    "l = [100, 200, 300]\n",
    "print(s[0])\n",
    "print(s[1])\n",
    "print(s[-1])\n",
    "print()\n",
    "print(l[1])\n",
    "l[1] = 900\n",
    "print(l[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "ename": "IndexError",
     "evalue": "list index out of range",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mIndexError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-3-bd1a663e00f3>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ml\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m100\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mIndexError\u001b[0m: list index out of range"
     ]
    }
   ],
   "source": [
    "print(l[100])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-3 슬라이싱\n",
    "- L[start:end]: start는 inclusive, end는 exclusive"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bc\n",
      "bcdef\n",
      "abcdef\n",
      "abcdef\n",
      "\n",
      "[100, 200]\n",
      "[100, 200]\n"
     ]
    }
   ],
   "source": [
    "s = 'abcdef'\n",
    "L = [100, 200, 300]\n",
    "\n",
    "print(s[1:3])\n",
    "print(s[1:])\n",
    "print(s[:])\n",
    "print(s[-100:100])\n",
    "print()\n",
    "print(L[:-1])     # L[:2] 와 동일\n",
    "print(L[:2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-4 확장 슬라이싱\n",
    "- L[start:end:step]: 인덱싱되어지는 각 원소들 사이의 거리가 인덱스 기준으로 step 만큼 떨어짐 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ac\n",
      "dcba\n"
     ]
    }
   ],
   "source": [
    "s = 'abcd'\n",
    "print(s[::2])   #step:2 - 각 원소들 사이의 거리가 인덱스 기준으로 2가 됨\n",
    "print(s[::-1])  #step:-1 - 왼쪽 방향으로 1칸씩"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-5 연결하기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "abcdef\n",
      "[1, 2, 3, 4, 5, 6]\n"
     ]
    }
   ],
   "source": [
    "s = 'abc' + 'def'\n",
    "print(s)\n",
    "\n",
    "L = [1,2,3] + [4,5,6]\n",
    "print(L)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-6 반복하기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "abcabcabcabc\n",
      "[1, 2, 3, 1, 2, 3]\n"
     ]
    }
   ],
   "source": [
    "s = 'abc'\n",
    "print(s * 4)\n",
    "\n",
    "L = [1, 2, 3]\n",
    "print(L * 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-7 멤버십 테스트"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "s = 'abcde'\n",
    "print('c' in s)\n",
    "\n",
    "t = (1, 2, 3, 4, 5)\n",
    "print(2 in t)\n",
    "print(10 in t)\n",
    "print(10 not in t)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n",
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "print('ab' in 'abcd')\n",
    "print('ad' in 'abcd')\n",
    "print(' ' in 'abcd')\n",
    "print(' ' in 'abcd ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-8 길이 정보"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5\n",
      "3\n",
      "4\n"
     ]
    }
   ],
   "source": [
    "s = 'abcde'\n",
    "l = [1, 2, 3]\n",
    "t = (1, 2, 3, 4)\n",
    "print(len(s))\n",
    "print(len(l))\n",
    "print(len(t))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1-9 for~in 문"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a b c d "
     ]
    }
   ],
   "source": [
    "for c in 'abcd':\n",
    "    print(c, end=\" \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## 2 문자열 정의하기\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2-1 한 줄 문자열"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "s = ''\n",
    "str1 = 'Python is great!'\n",
    "str2 = \"Yes, it is.\"\n",
    "str3 = \"It's not like any other languages\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Don't walk. \"Run\"\n"
     ]
    }
   ],
   "source": [
    "str4 = 'Don\\'t walk. \"Run\"'\n",
    "print(str4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- \\ : 다음 라인이 현재 라인의 뒤에 이어짐을 나타냄 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is a rather long string containing back slash and new line.\n",
      "Good!\n"
     ]
    }
   ],
   "source": [
    "long_str = \"This is a rather long string \\\n",
    "containing back slash and new line.\\nGood!\"\n",
    "print(long_str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2-2 여러 줄 문자열"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " While the rest of the world has been catching on to\n",
      "the Perl scripting language, the Linux commnunity,\n",
      "long since past the pleasing shock of Perl's power,\n",
      "has been catching on to a different scripting animal -- Python.\n",
      "\n",
      " While the rest of the world has been catching on to\n",
      "the Perl scripting language, the Linux commnunity,\n",
      "long since past the pleasing shock of Perl's power,\n",
      "has been catching on to a different scripting animal -- Python.\n"
     ]
    }
   ],
   "source": [
    "multiline = \"\"\" While the rest of the world has been catching on to\n",
    "the Perl scripting language, the Linux commnunity,\n",
    "long since past the pleasing shock of Perl's power,\n",
    "has been catching on to a different scripting animal -- Python.\"\"\"\n",
    "print(multiline)\n",
    "\n",
    "print()\n",
    "\n",
    "ml = ''' While the rest of the world has been catching on to\n",
    "the Perl scripting language, the Linux commnunity,\n",
    "long since past the pleasing shock of Perl's power,\n",
    "has been catching on to a different scripting animal -- Python.'''\n",
    "print(ml)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2-3 이스케이프 문자 (Escape Characters)\n",
    "- 문자열 내부의 이스케이프 문자\n",
    "\n",
    "| 이스케이프 문자  | 의미    |\n",
    "|--------------|--------------|\n",
    "| \\ \\           | \\            |\n",
    "| \\'           | '            |\n",
    "| \\\"           | \"            |\n",
    "| \\b           | 백스페이스   |\n",
    "| \\n           | 개행         |\n",
    "| \\t           | 탭           |\n",
    "| \\0nn         | 8진법 수 nn  |\n",
    "| \\xnn         | 16진법 수 nn |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\\abc\\\n",
      "\n",
      "abc\tdef\tghi\n",
      "\n",
      "a\n",
      "b\n",
      "c\n"
     ]
    }
   ],
   "source": [
    "print('\\\\abc\\\\')\n",
    "print()\n",
    "print('abc\\tdef\\tghi')\n",
    "print()\n",
    "print('a\\nb\\nc')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2-3 문자열 연산"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First String Second String\n",
      "First StringFirst StringFirst String\n",
      "\n",
      "r\n",
      "irst Strin\n",
      "12\n",
      "\n",
      "First String\n"
     ]
    }
   ],
   "source": [
    "str1 = 'First String'\n",
    "str2 = 'Second String'\n",
    "str3 = str1 + ' ' + str2\n",
    "print(str3)\n",
    "print(str1 * 3)\n",
    "print()\n",
    "print(str1[2])\n",
    "print(str1[1:-1])\n",
    "print(len(str1))\n",
    "print()\n",
    "print(str1[0:len(str1)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 문자열 자료 - Immutable (변경불가능)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'str' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-20-7213ba3e679f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mstr1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'f'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "str1[0] = 'f'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'str' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-21-08d42bad7f22>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mstr1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'abc'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "str1[0:3] = 'abc'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 문자열 변경을 위해서는 여러 Slicing 연결 활용\n",
    "  - [주의] 문자열 자체가 변경되는 것이 아니라 새로운 문자열을 생성하여 재할당하는 것임"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "spam, cheese, and egg\n"
     ]
    }
   ],
   "source": [
    "s = 'spam and egg'\n",
    "s = s[:4] + ', cheese, ' + s[5:]\n",
    "print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2-4 유니코드"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 다국어 문자의 올바른 표현을 위하여 유니코드 타입 지원이 됨\n",
    "- 유니코드 타입의 문자열 리터럴: u'Hello'\n",
    "- 하지만 Python 3 부터는 모든 str은 기본적으로 Unicode로 저장되기 때문에 u 표기 리터럴을 사용할 필요가 없음\n",
    "   - Since Python 3.0, the language features a str type that contain Unicode characters, meaning any string created using \"unicode rocks!\", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode [Source: https://docs.python.org/3/howto/unicode.html]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Spam and Egg\n",
      "\n",
      "<class 'str'>\n",
      "a\n",
      "<class 'str'>\n",
      "bc\n",
      "\n",
      "<class 'str'>\n",
      "abc\n"
     ]
    }
   ],
   "source": [
    "print(u'Spam and Egg')\n",
    "print()\n",
    "a = 'a'\n",
    "b = u'bc'\n",
    "print(type(a))\n",
    "print(a)\n",
    "print(type(b))\n",
    "print(b)\n",
    "print()\n",
    "c = a + b\n",
    "print(type(c))\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Spam 또 Egg\n"
     ]
    }
   ],
   "source": [
    "print('Spam \\uB610 Egg')    # 문자열 내에 유티코드 이스케이프 문자인 \\uHHHH 사용가능, HHHH는 4자리 16진수 (unicode 포맷)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'str'>\n",
      "한글\n"
     ]
    }
   ],
   "source": [
    "#a = unicode('한글', 'utf-8')     # '한글' 문자열의 인코딩 방식을 'utf-8'로 인식시키며 unicode로 변환 --> python3 부터는 지원되지 않음\n",
    "a = str('한글')\n",
    "print(type(a))\n",
    "print(a)\n",
    "\n",
    "\n",
    "# <type 'unicode'>\n",
    "# 한글"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8\n",
      "8\n"
     ]
    }
   ],
   "source": [
    "print(len('한글과 세종대왕'))\n",
    "#print(len(unicode('한글과 세종대왕', 'utf-8')))   #python 3 에서 지원하지 않음\n",
    "print(len(u'한글과 세종대왕'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "한\n",
      "글\n",
      "한글과\n",
      "세종대왕\n",
      "왕대종세 과글한\n",
      "\n",
      "한\n",
      "글\n",
      "한글과\n",
      "세종대왕\n",
      "왕대종세 과글한\n",
      "\n",
      "한\n",
      "글\n",
      "한글과\n",
      "세종대왕\n",
      "왕대종세 과글한\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# u = unicode('한글과 세종대왕', 'utf-8')    #python 3에서 지원하지 않음 \n",
    "u = str('한글과 세종대왕')    #python 3 부터는 임의의 str은 유니코드이기 때문에 한글이 포함된 str에 대해서도 인덱싱 및 슬라이싱이 올바르게 수행됨 \n",
    "print(u[0])\n",
    "print(u[1])\n",
    "print(u[:3])\n",
    "print(u[4:])\n",
    "print(u[::-1])\n",
    "print()\n",
    "u2 = u'한글과 세종대왕' \n",
    "print(u2[0])\n",
    "print(u2[1])\n",
    "print(u2[:3])\n",
    "print(u2[4:])\n",
    "print(u2[::-1])\n",
    "print()\n",
    "u3 = '한글과 세종대왕'\n",
    "print(u3[0])\n",
    "print(u3[1])\n",
    "print(u3[:3])\n",
    "print(u3[4:])\n",
    "print(u3[::-1])\n",
    "print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style='text-align: right;'>참고 문헌: 파이썬(열혈강의)(개정판 VER.2), 이강성, FreeLec, 2005년 8월 29일</p>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}