analisis-sentimen-program-m.../nltk_data/corpora/stopwords
nand999 a2b89a6f1e first commit 2026-05-22 10:25:40 +07:00
..
README first commit 2026-05-22 10:25:40 +07:00
albanian first commit 2026-05-22 10:25:40 +07:00
arabic first commit 2026-05-22 10:25:40 +07:00
azerbaijani first commit 2026-05-22 10:25:40 +07:00
basque first commit 2026-05-22 10:25:40 +07:00
belarusian first commit 2026-05-22 10:25:40 +07:00
bengali first commit 2026-05-22 10:25:40 +07:00
catalan first commit 2026-05-22 10:25:40 +07:00
chinese first commit 2026-05-22 10:25:40 +07:00
danish first commit 2026-05-22 10:25:40 +07:00
dutch first commit 2026-05-22 10:25:40 +07:00
english first commit 2026-05-22 10:25:40 +07:00
finnish first commit 2026-05-22 10:25:40 +07:00
french first commit 2026-05-22 10:25:40 +07:00
german first commit 2026-05-22 10:25:40 +07:00
greek first commit 2026-05-22 10:25:40 +07:00
hebrew first commit 2026-05-22 10:25:40 +07:00
hinglish first commit 2026-05-22 10:25:40 +07:00
hungarian first commit 2026-05-22 10:25:40 +07:00
indonesian first commit 2026-05-22 10:25:40 +07:00
italian first commit 2026-05-22 10:25:40 +07:00
kazakh first commit 2026-05-22 10:25:40 +07:00
nepali first commit 2026-05-22 10:25:40 +07:00
norwegian first commit 2026-05-22 10:25:40 +07:00
portuguese first commit 2026-05-22 10:25:40 +07:00
romanian first commit 2026-05-22 10:25:40 +07:00
russian first commit 2026-05-22 10:25:40 +07:00
slovene first commit 2026-05-22 10:25:40 +07:00
spanish first commit 2026-05-22 10:25:40 +07:00
swedish first commit 2026-05-22 10:25:40 +07:00
tajik first commit 2026-05-22 10:25:40 +07:00
tamil first commit 2026-05-22 10:25:40 +07:00
turkish first commit 2026-05-22 10:25:40 +07:00

README

Stopwords Corpus

This corpus contains lists of stop words for several languages.  These
are high-frequency grammatical words which are usually ignored in text
retrieval applications.

They were obtained from:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/

The stop words for the Romanian language were obtained from:
http://arlc.ro/resources/

The English list has been augmented
https://github.com/nltk/nltk_data/issues/22

The German list has been corrected
https://github.com/nltk/nltk_data/pull/49

A Kazakh list has been added
https://github.com/nltk/nltk_data/pull/52

A Nepali list has been added
https://github.com/nltk/nltk_data/pull/83

An Azerbaijani list has been added
https://github.com/nltk/nltk_data/pull/100

A Greek list has been added
https://github.com/nltk/nltk_data/pull/103

An Indonesian list has been added
https://github.com/nltk/nltk_data/pull/112