Day 45: Introduction to Natural Language Processing (NLP) | by Ian Clemence

Python affords a number of libraries for NLP, however at present we’ll concentrate on the Pure Language Toolkit (NLTK), a complete library for constructing NLP packages.

1. Set up

First, let’s set up NLTK:

pip set up nltk

After set up, obtain the required datasets:

import nltk
nltk.obtain('punkt')
nltk.obtain('stopwords')
nltk.obtain('wordnet')

2. Tokenization

Tokenization is the method of breaking textual content into particular person phrases or sentences.

from nltk.tokenize import word_tokenize, sent_tokenizetextual content = "Hey there! Welcome to the world of NLP."
print(sent_tokenize(textual content))
print(word_tokenize(textual content))

3. Eradicating Stopwords

Stopwords are widespread phrases (like “the”, “is”, “in”) that will not add important which means to a sentence.

from nltk.corpus import stopwordsstop_words = set(stopwords.phrases('english'))
phrases = word_tokenize(textual content)
filtered_words = [word for word in words if word.lower() not in stop_words]
print(filtered_words)

4. Stemming and Lemmatization

These methods cut back phrases to their root types.

Stemming:

from nltk.stem import PorterStemmerps = PorterStemmer()
print(ps.stem("operating"))

Lemmatization:

from nltk.stem import WordNetLemmatizerlemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("operating", pos="v"))

Source link

LLMs + Democracy = Accuracy. How to trust AI-generated answers | by Thuwarakesh Murallie | Jun, 2025

How To Make AI Images Of Yourself (Free) | by VIJAI GOPAL VEERAMALLA | Jun, 2025

From Dream to Reality: Crafting the 3Phases6Steps Framework with AI Collaboration | by Abhishek Jain | Jun, 2025

GPU Programming for beginners. Understanding GPU Programming for… | by Mehul Gupta | Data Science in your pocket | Mar, 2025

Why the world is looking to ditch US AI models

5 Language Apps That Can Change How You Do Business

The Future of AI in Business: Trends to Watch in 2025 and Beyond

Why I Chose QDrant Vector Database for My Project? | by Preetham Dundigalla | Mar, 2025

Most Popular

The Shadow Side of AutoML: When No-Code Tools Hurt More Than Help

“An AI future that honors dignity for everyone” | MIT News

Branching Out: 4 Git Workflows for Collaborating on ML

Our Picks

How Businesses Can Capitalize on Emerging Domain Name Trends

Is Python Set to Surpass Its Competitors?

How to Reduce Your Power BI Model Size by 90%

Day 45: Introduction to Natural Language Processing (NLP) | by Ian Clemence | Apr, 2025

1. Set up

2. Tokenization

3. Eradicating Stopwords

4. Stemming and Lemmatization

Related Posts