N-Gram

The n-gram data structure is a simple container. It is used as a probabilistic model typically used in Natural Language Processing (NLP) to predict sequences of elements such as words or characters. It represents a sequence of \(n\) items from a given dataset, often applied for tasks like language modeling, auto-completion, and text prediction.

The n-gram works by breaking down text data into chunks of \(n\) contiguous elements (e.g., bigram for \({n=2}\), trigram for \({n=3}\)) and counting their occurrences to compute probabilities or frequencies for specific sequences.