--- _db_id: 151 content_type: topic ready: true title: Natural Language Processing --- Natural Language Processing, or NLP, is used to analyse, visualise, and predict natural language. That is, languages that developed naturally, such as isiZulu, English or Spanish (but not Java or C#). ## Videos and Readings Watch the following videos as an introduction to Natural Language Processing in Python: - [Introduction to NLP](https://youtu.be/5BVebXXb2o4) - [Data cleaning and text-preprocessing in Python](https://youtu.be/iQ1bfDMCv_c) - [Exploratory data analysis and word clouds in Python](https://youtu.be/VraAbgAoYSk) - [Sentiment analysis with TextBlob in Python](https://youtu.be/N9CT6Ggh0oE) The code from the videos can be found [here](https://github.com/adashofdata/nlp-in-python-tutorial/blob/master/1-Data-Cleaning.ipynb). The videos are a great introduction to the basic NLP analysis pipeline. They go through how to do NLP with the packages `NLTK` and `TextBlob`. However, we will be using `spaCy` as that is most often used in industry. For documentation on `spaCy` commands, see [spaCy's website](https://spacy.io/usage/spacy-101) and [RealPython](https://realpython.com/natural-language-processing-spacy-python/). ## Terms to know Natural Language Processing has its own set of terms that you should know to be able to talk about it. At the end of this topic, you should know what the following terms mean: - Tokenization - Corpus - Document-Term Matrix - Stop words - Bag-of-words - Lemmatization - Bi-grams - Word cloud - Named Entity Recognition