Text Analysis
Cosine Similarity
Cosine similarity is a measure of similarity between two non-zero vectors in a multi-dimensional space, defined as the cosine of the angle between them. It is calculated as the dot product of the vectors divided by the product of their magnitudes. This metric is often used in text analysis, clustering, and information retrieval to determine the similarity of documents or data points, independent of their magnitude.
TF-IDF
Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical measure used in information retrieval and text mining to evaluate how important a word is to a document in a collection or corpus. It combines term frequency (TF), which measures how frequently a term appears in a document, with inverse document frequency (IDF), which measures how important the term is across all documents.