Keyword extraction in Chinese and English
Welcome to Chinese and English Keyword Extraction Tool, this tool supports a variety of keyword extraction algorithms to help users quickly extract important information from text. The following figure shows the keyword extraction algorithms we support:
present (sb for a job etc)
This tool provides a variety of keyword extraction algorithms to meet different needs. The supported algorithms are as follows:
- TF-IDF: The importance of vocabulary is measured by word frequency and inverse document frequency.
- TextRank: An unsupervised keyword extraction method based on graph algorithms.
- KeyBERT: A keyword extraction technique combined with BERT modeling that captures semantic relevance.
- Word2Vec: Utilize word vector representation for keyword extraction.
- LDA: A keyword extraction method based on topic modeling.
Usage
1、TF-IDF
from keyword_extract import KeywordExtract input_list = [ "Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language." ] key_extract = KeywordExtract(type="TF-IDF") # Keyword extraction based on TF-IDF print(key_extract.infer(input_list))
2、TextRank
from keyword_extract import KeywordExtract input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language."] key_extract = KeywordExtract(type="TextRank") # Keyword extraction based on TextRank print(key_extract.infer(input_list))
3、KeyBERT
from keyword_extract import KeywordExtract input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language."] key_extract = KeywordExtract(type="KeyBERT") # Keyword extraction based on KeyBERT print(key_extract.infer(input_list))
4、Word2Vec
from keyword_extract import KeywordExtract input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers can communicate effectively with each other using natural language."] key_extract = KeywordExtract(type="Word2Vec") # Keyword extraction based on Word2Vec print(key_extract.infer(input_list))
5、LDA
from keyword_extract.lda_model.lda import LDA input_list = ["Natural language processing is an important direction in the field of artificial intelligence. It studies how people and computers communicate effectively with each other using natural language."] lda_model = LDA(type="LDA") # Keyword extraction based on LDA, topic_num is the number of topics. print(lda_model.infer(input_list, topic_num=3))
The address of this project: /TW-NLP/KeywordExtract
Welcome to use and exchange, you can suggest what you think is a good keyword extraction algorithm in the questionnaire, we will reproduce and integrate it.