NLP
Cosine Similarity
Cosine similarity is a measure of similarity between two non-zero vectors in a multi-dimensional space, defined as the cosine of the angle between them. It is calculated as the dot product of the vectors divided by the product of their magnitudes. This metric is often used in text analysis, clustering, and information retrieval to determine the similarity of documents or data points, independent of their magnitude.
Textrank
TextRank is a graph-based ranking model used for natural language processing tasks like text summarization and keyword extraction. It adapts the PageRank algorithm to rank sentences or words based on their importance within a text, using a graph where nodes represent text units and edges represent their semantic relationships.
TF-IDF
Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical measure used in information retrieval and text mining to evaluate how important a word is to a document in a collection or corpus. It combines term frequency (TF), which measures how frequently a term appears in a document, with inverse document frequency (IDF), which measures how important the term is across all documents.