{ "cells": [ { "cell_type": "code", "execution_count": 10, "metadata": { "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "# NDAK18000U Overview\n", "Content at [https://github.com/coastalcph/nlp-course](github.com/coastalcph/nlp-course).\n", "Click on slides for *Course Logistics*." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### NDAK18000U Details\n", "\n", "- **Course Organizer**: [Daniel Hershcovich](https://danielhers.github.io/)\n", "- **Teachers**: Daniel Hershcovich and [Desmond Elliott](https://elliottd.github.io/)\n", "- **Teaching Assistants**:\n", " - Ruchira Dhar, Zain Muhammad Mujahid, Yingming Wang, Hu Wanjing (Michelle) and Christian Mølholt Jensen" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### NDAK18000U Schedule\n", "\n", "- Lectures: \n", " - Tuesdays, 13-15 in Lille UP1, Weeks 36-41 + 43-44\n", " \n", "- Lab Sessions:\n", " - Group 1: Mondays, 8-10 in the old library (4-0-17), Universitetsparken 1, Weeks 37-41 + 43-44\n", " - Group 2: Fridays, 8-10 in the old library (4-0-17), Universitetsparken 1, Weeks 36-41 + 43-44\n", "\n", "We will assign you to one of two lab session groups based on your answers to the [Getting to Know You survey](https://absalon.ku.dk/courses/85199/assignments/239795).\n", "If you have not filled it in yet, do it as soon as possible.\n", "You will receive an announcement about your assignment before the first lab session." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/daniel/nlp-course\n" ] } ], "source": [ "%cd .." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/markdown": [ "# Natural Language Processing (NDAK18000U)\n", "## Course at the University of Copenhagen\n", "\n", "Materials from this interactive book are used throughout the Natural Language Processing course at the Department of Computer Science, University of Copenhagen. The official course description can be found [here](https://kurser.ku.dk/course/ndak18000u). Materials covered each week are listed below. The course schedule and materials are tentative and subject to minor changes. Most reading material is from [Speech and Language Processing by Jurafsky & Martin](https://web.stanford.edu/~jurafsky/slp3).\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WeekReading (before lecture)Lecture (Tuesday)Lab (Friday & Monday)Lab notebook
36 Chapter 2 up to end of 2.5
Chapter 4 up to end of 4.5
Chapter 5 up to end of 5.6
2. Sep. 2025:
Course Logistics (slides)
Introduction to NLP (slides)
Tokenisation & Sentence Splitting (notes, slides, exercises)
Text Classification (slides)
5. & 8. Sep. 2025:
Jupyter notebook setup, introduction to Colab
Introduction to PyTorch
Project group arrangements
Questions about the course project
lab 1
37 Chapter 3 up to end of 3.5
Chapter 6 up to end of 6.4
Chapter 7 up to end of 7.5
9. Sep. 2025:
Language Modelling (slides)
Word Embeddings (slides)
12. & 15. Sep. 2025:
Word representations and sentiment classification
Project help
lab 2
38 Chapter 7 up to end of 7.6
Chapter 8 up to end of 8.7
16. Sep. 2025:
Recurrent Neural Networks (slides)
Neural Language Models (slides)
19. & 22. Sep. 2025:
Error analysis and explainability
Project help
lab 3
39 Chapter 8 up to end of 8.8
Chapter 9 up to end of 9.2
Chapter 10 up to end of 10.2
Chapter 11
23. Sep. 2025:
Attention (slides)
Transformers (slides)
26. & 29. Sep. 2025:
Language Models with Transformers and RNNs
Project help
lab 5
40 Chapter 17 up to end of 17.3
Chapter 19 up to end of 19.2
30. Sep. 2025:
Sequence Labelling (slides)
Parsing (slides)
3. & 6. Oct. 2025:
Sequence labelling and beam search
Project help
lab 4
41 Chapter 14
Chapter 20
7. Oct. 2025:
Information Extraction (slides)
Question Answering (slides)
10. & 20. Oct. 2025:
In-depth look at Transformers and Multilingual QA
Project help
lab 6
43 Chapter 12
Chapter 13
21. Oct. 2025:
Machine Translation (slides)
Transfer Learning (slides)
24. & 27. Oct. 2025: Project help.
44 Belinkov and Glass, 2019 28. Oct. 2025:
Interpretability (slides)
31. Oct. 2025: Project help.
\n", "The easiest way to view the course content is via the static [nbviewer](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb). \n", "To be able to make changes to the book and render it dynamically, see the [installation instructions](INSTALL.md).\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import re\n", "from IPython.display import Markdown, display\n", "\n", "with open('README.md', 'r') as f:\n", " content = f.read()\n", "\n", "# Regular expression to find all that don't start with 'http'\n", "pattern = re.compile(r'')\n", "\n", "# Function to prepend '../' to each link\n", "def replacer(match):\n", " quote = match.group(1)\n", " link = match.group(2)\n", " new_link = f\"../{link}\"\n", " return f''\n", "\n", "# Replace links in content to be relative to the top directory of the repo\n", "display(Markdown(pattern.sub(replacer, content)))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### Course Requirements\n", "* Familiarity with machine learning (probability theory, linear algebra, classification)\n", "* Knowledge of programming (Python)\n", "* No prior knowledge of natural language processing or linguistics is required\n", "\n", "Relevant machine learning competencies can be obtained through one of the following courses: \n", "* [NDAK22000U Machine Learning A (MLA)](https://kurser.ku.dk/course/ndak22000u) and/or [NDAK22001U Machine Learning B (MLB)](https://kurser.ku.dk/course/ndak22001u)\n", "* [NDAK16003U Introduction to Data Science (IDS)](https://kurser.ku.dk/course/ndak16003u)\n", "* [NDAB23000U Grundlæggende Data Science (GDS)](https://kurser.ku.dk/course/NDAB23000U)\n", "* [Machine Learning, Coursera](https://www.coursera.org/learn/machine-learning)\n", "\n", "See also the [course description](https://kurser.ku.dk/course/ndak18000u)." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### About You: previously taken courses related to NLP?\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### About You: previously taken courses in Machine Learning?\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### About You: experience with using neural network software libraries?\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### About You: degree are you enrolled in\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### About You: what you want to get out of this course\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### About You: what you want to get out of the lab sessions\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### Course Materials\n", "* We will be using the [nlp-course](../overview.ipynb) repository \n", "* Contains **interactive** [jupyter](http://jupyter.org/) notebooks and slides\n", " * View statically [here](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb)\n", " * Use interactively via install, see [github repo](https://github.com/coastalcph/nlp-course) instructions \n", "* Recordings of 2020 lectures are available on [Absalon](https://absalon.ku.dk/courses/85199/external_tools/14563)\n", "* References to other material are given in context\n", "* This is work in progress.\n", " * [Previous iterations of the course at DIKU](https://github.com/copenlu/stat-nlp-book)\n", " * Use `git pull` regularly for updates\n", " * *Watch* for updates\n", " * Please contribute by adding issues on github when you see errors\n", "* For assignment hand-in, announcements, discussion forum, check [Absalon](https://absalon.ku.dk/courses/85199)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### Teaching Methods\n", "* Lectures\n", "* Hands-on lab (TA) sessions\n", "* Group project\n", "* Occasional small exercises during lectures, so bring your laptop\n", "* Background material to read before each lecture" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### Assessment\n", "\n", "* **[Group project](https://absalon.ku.dk/courses/85199/assignments/239793)**, can be completed in a group of up to 3 students\n", " * Group project, can be completed in a group of up to 3 students\n", " * Released 2 September, hand-in 31 October 17:00\n", " * Joint report, contribution of each student should be stated clearly\n", " * Code to be uploaded as attachment\n", " * Individual grade for each group member, based on the quality and quantity of their contributions\n", " * Submission via Digital Exam\n", " * Consists of several parts tied to weekly lecture topics\n", " * We cannot guarantee responses to queries about the project after 30 October 15:00" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Assessment Methods\n", "\n", "* **Group project**, can be completed in a group of up to 3 students\n", " * Finding a group: \n", " * Deadline for group forming: **8 September 17:00**\n", " * We offer to help you find a group -- fill in the [Getting to Know You survey](https://absalon.ku.dk/courses/85199/assignments/239795) by the end of *first lecture day,* **2 September 17:00**\n", " * If you choose this option, you will be informed of your assigned group on **4 September**\n", " * You can still change groups afterwards by asking other students to swap groups (it's your responsibility to arrange this)\n", " * Otherwise, we assume you will find a group by yourself in the first course week, e.g. by coordinating with other students in the lab session" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Late Hand-In\n", "\n", "* Late hand-ins **cannot be accepted**\n", "* Exceptions can be made in rare cases, e.g. due to illness with doctor's notice\n", " * Get in touch with course organizer at least one working day in advance" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Project Report Submission for Feedback\n", "Optional weekly submission dealdines on Absalon: weeks 36-41.\n", " \n", "Informal feedback, no grading.\n", "\n", "Highly recommended to make sure you're on track and have the right format etc." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Plagiarism\n", "\n", "* Don't do it\n", "* Don't enable it\n", "* Check [rules and consequences](https://student-ambassador.ku.dk/rights/avoid-plagiarism/) if unclear" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### GenAI Policy\n", "\n", "See [UCPH guidelines](https://kunet.ku.dk/work-areas/teaching/digital-learning/chatgpt-and-ai/guidelines-and-rules-for-chatgpt/Pages/default.aspx).\n", "\n", "As per the [course description](https://kurser.ku.dk/course/ndak18000u), all aids allowed.\n", "\n", "Attach a [declaration of academic use of generative AI](https://kunet.ku.dk/work-areas/teaching/digital-learning/chatgpt-and-ai/new-rules-and-principles-from-september-2025/template-for-declaration/Pages/default.aspx) to your project submission." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Python\n", "\n", "* Lectures, lab exercises and assignments focus on **Python**\n", "* Python is a leading language for data science, machine learning etc., with many relevant libraries\n", "* We expect you to know Python, or be willing to learn it **on your own**\n", "* Labs and assignments focus on development within [jupyter notebooks](http://jupyter.org/)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### Lab Sessions\n", "\n", "* Some lab sessions are tutorial-style (to introduce you to practical aspects of the course)\n", "* Other lab sessions are open-topic. You can use them as an opportunity to:\n", " * ask the TAs clarifying questions about the lectures and/or project\n", " * ask the TAs for informal feedback on your project so far\n", " * work on your project with your group" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### Discussion Forum\n", "\n", "* Our Absalon page has a [**discussion forum**](https://absalon.ku.dk/courses/85199/discussion_topics).\n", "* Please post questions there (instead of sending private emails) \n", "* We give low priority to **questions already answered** in previous lectures, tutorials and posts, \n", " * and to **pure programming related issues**\n", "* We expect you to **search online** for answers to your questions before you contact us.\n", "* You are highly encouraged to participate and **help each other** on the forum. \n", "* The teaching team will check the discussion forum regularly **within normal working hours**\n", " * do not expect answers late in the evenings and on weekends\n", " * **start working on your project early**\n", " * come to the lab sessions and ask questions there" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "### DIKU NLP\n", "\n", "* Research Section, UCPH Computer Science Department\n", "* Faculty members: Isabelle Augenstein (head of section), Pepa Atanasova, Daniel Hershcovich, Desmond Elliott, Anders Søgaard\n", "* Official webpage: https://di.ku.dk/english/research/nlp/\n", "* List of group members: https://www.copenlu.com/ ; http://coastalcph.github.io/; https://lampgroup.github.io/\n", "* Social media: \n", " * @copenlu https://x.com/CopeNLU, https://bsky.app/profile/copenlu.bsky.social\n", " * @coastalcph https://x.com/coastalcph\n", "* Always looking for strong MSc students\n", "* PhD positions available dependent on funding" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.17" } }, "nbformat": 4, "nbformat_minor": 1 }