Keyword extraction in Chinese and English

Welcome to Chinese and English Keyword Extraction Tool, this tool supports a variety of keyword extraction algorithms to help users quickly extract important information from text. The following figure shows the keyword extraction algorithms we support:

present (sb for a job etc)

This tool provides a variety of keyword extraction algorithms to meet different needs. The supported algorithms are as follows:

TF-IDF: The importance of vocabulary is measured by word frequency and inverse document frequency.
TextRank: An unsupervised keyword extraction method based on graph algorithms.
KeyBERT: A keyword extraction technique combined with BERT modeling that captures semantic relevance.
Word2Vec: Utilize word vector representation for keyword extraction.
LDA: A keyword extraction method based on topic modeling.

Usage

1、TF-IDF

from keyword_extract import KeywordExtract

input_list = [
    "Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language."
]
key_extract = KeywordExtract(type="TF-IDF")
# Keyword extraction based on TF-IDF
print(key_extract.infer(input_list))

2、TextRank

from keyword_extract import KeywordExtract
   
input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language."]
key_extract = KeywordExtract(type="TextRank")
# Keyword extraction based on TextRank
print(key_extract.infer(input_list))

3、KeyBERT

from keyword_extract import KeywordExtract
  
input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language."]
key_extract = KeywordExtract(type="KeyBERT")
# Keyword extraction based on KeyBERT
print(key_extract.infer(input_list))

4、Word2Vec

from keyword_extract import KeywordExtract

input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers can communicate effectively with each other using natural language."]
key_extract = KeywordExtract(type="Word2Vec")
# Keyword extraction based on Word2Vec
print(key_extract.infer(input_list))

5、LDA

from keyword_extract.lda_model.lda import LDA
 
input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language."]
lda_model = LDA(type="LDA")
# Keyword extraction based on LDA, topic_num is the number of topics.
print(lda_model.infer(input_list, topic_num=3))

The address of this project: /TW-NLP/KeywordExtract

Welcome to use and exchange, you can suggest what you think is a good keyword extraction algorithm in the questionnaire, we will reproduce and integrate it.