Algorithms and Data Structures

Algorithms and data structures ported to TypeScript from Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, as well as other sources classic computer science papers.

Breadth-First Search

Breadth-First Search is an algorithm for searching graph data structures. It starts at a specified source node and explores all its immediate neighbors before moving on to the next level of neighbors.

Time complexity is Linear, i.e. O(v + e)
Space complexity is Linear, i.e. O(v)
For large graphs the space complexity is described in terms of depth making it exponential, i.e. O(b^d)

Binary Search Tree

A Binary Search Tree is a data structure made of nodes. Each node has two or less children. The tree follows a basic rule. Left nodes are smaller than their parent. Right nodes are larger. The time to access, search, insert, or delete a node is Linear. The space the tree uses is also linear.

Bubblesort

Bubble Sort is a simple comparison-based sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. This process is repeated until the list is sorted.

Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors in a multi-dimensional space, defined as the cosine of the angle between them. It is calculated as the dot product of the vectors divided by the product of their magnitudes. This metric is often used in text analysis, clustering, and information retrieval to determine the similarity of documents or data points, independent of their magnitude.

Counting Sort

Counting Sort is a non-comparison-based sorting algorithm that sorts integers by counting the occurrences of each distinct value in a range and using this information to place elements in their correct positions. It works in 𝑂(𝑛+𝑘) time, where 𝑛 is the number of elements and 𝑘 is the range of the input values. Counting Sort is efficient for small ranges of integers but unsuitable for sorting non-integer or large-range data due to its space complexity of 𝑂(𝑘). It is a stable sorting algorithm when implemented correctly.

Finite Automata

A finite automata algorithm is a step-by-step method used to check if a sequence of symbols (like letters or numbers) matches a specific pattern. It works by moving through a series of states based on the input, starting from an initial state and possibly ending in a state that signals a match.

Chained Hash Table

A Chained Hash Table is a data structure that resolves collisions using linked lists. Each slot in the hash table array points to a chain (linked list) that stores all elements hashing to the same index. When inserting, searching, or deleting an element, the algorithm hashes the key to determine the appropriate slot and traverses the corresponding chain to perform the operation.

Heapsort

A sorting technique that uses a binary heap, first building a max-heap and then repeatedly extracting the maximum element, ensuring efficient sorting.

Time complexity is Linearithmic, i.e. O(n log(n))
Space complexity is Constant, i.e. O(1)

Huffman Encoding

The Huffman Encoding data structure uses a Min Heap to sort characters by frequency. It then assigns codes based on frequency; the most frequent characters will have the shortest codes.

Insertion Sort

A sorting technique that builds the sorted array one element at a time, inserting each new element into its proper position among already-sorted elements.

Time complexity is Quadratic, i.e. O(n^2)
Space complexity is Constant, i.e. O(1)

Intersection

The calculateIntersectionWeight function computes the intersection weight between two arrays of strings by determining the ratio of the size of their intersection to the length of the smaller array. It relies on the getIntersection function to find the common elements between the two arrays, ensuring an efficient and straightforward implementation. The getIntersection function uses sets to identify and return a list of unique shared elements, removing duplicates and simplifying comparison. Together, these functions handle edge cases, such as empty arrays, by returning 0 when appropriate to avoid division by zero, making them robust and easy to use for set-based comparisons.

Iterative Depth-First Search

An iterative depth-first search algorithm using a stack, exploring as far along each branch as possible before backtracking, useful for pathfinding and cycle detection.

Recursive Depth-First Search

A recursive depth-first search algorithm that explores branches deeply before backtracking, managing traversal state with recursion.

Time complexity of insertion and deletion is Constant, i.e. O(1)
Time complexity of search is is Linear, i.e. O(n)
Space complexity is Linear, i.e. O(n)

Longest Common Prefix

An algorithm for finding the longest common prefix among a set of strings.

Longest Common Subsequence

An algorithm for finding the longest subsequence common to two sequences.

Linear Search

Linear Search is a straightforward algorithm for finding a specific element in a list. It sequentially checks each element until the target is found or the list ends, making it simple and effective for unsorted data but inefficient for large datasets.

Linked List

A data structure where each element contains a value and a reference to the next node, supporting efficient insertion and deletion operations.

Time complexity for access and search is Linear, i.e. O(n)
Time complexity for insertion and deletion is Constant, i.e. O(1)
Space complexity is Linear

Merge Sort

A divide-and-conquer sorting technique that splits the array into halves, recursively sorts each half, and then merges the sorted halves.

Minimum Heap

The Minimum Heap is a specialized binary tree that is based on the Heap data structure. It is also known as a Priority Queue.

Myers Diff

Computes the differences between two sequences efficiently, commonly used in version control systems to show changes between file revisions.

N-Gram

The n-gram data structure is a simple container. It is used as a probabilistic model typically used in Natural Language Processing (NLP) to predict sequences of elements such as words or characters. It represents a sequence of \(n\) items from a given dataset, often applied for tasks like language modeling, auto-completion, and text prediction.

Naive String Matcher

The naive string matching algorithm finds a pattern in a text. It uses brute force to check every position. It works by moving the pattern across the text from left to right. At each position, the algorithm compares the pattern’s characters to the text’s characters. If every character matches, the position is valid. Its takes O((n - m + 1) m) time.

Negative Array

A negative array lets you access array elements from the end of an array. It can use negative indices. It works by wrapping a standard array with a Proxy. This proxy intercepts requests. You access a negative index like myArray[-1]. The get logic converts the negative number to a positive index. It then returns the element at that place.

Queue

A linear collection following the First-In-First-Out (FIFO) principle, supporting operations like enqueue and dequeue, useful in task scheduling.

Quicksort

Quick sort is a divide-and-conquer sorting algorithm that selects a “pivot” element from an array, partitions the remaining elements into two sub-arrays (elements less than the pivot and elements greater than the pivot), and then recursively sorts the sub-arrays.

Rabin-Karp Matcher

The Rabin-Karp algorithm finds a pattern in text. It uses hashing to convert the pattern into a number. It converts a piece of the text too. The algorithm does not start over for each piece. It uses a rolling hash. The next number comes from the one before it. This is fast.

Reverse

The reverse algorithm takes an array of values and returns the values in reverse order. It works by iterating over the array and swapping the first and last values, then the second and second-to-last values, and so on until the middle of the array is reached.

Selection Sort

Selection sort is a comparison-based sorting algorithm that divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right.

Stack

A stack is a linear data structure that follows the Last In First Out (LIFO) principle. Elements are added and removed from the same end, called the top. Common operations include push (add an element), pop (remove the top element), and peek (view the top element without removing it). Stacks are used in various applications like function call management, expression evaluation, and backtracking.

Suffix Array

The suffix array algorithm constructs a sorted array of all suffixes of a given string, enabling efficient substring searches, pattern matching, and text analysis. It uses sorting and optional auxiliary structures like the LCP array for additional optimizations.

Trie Symbol Table

The Trie Symbol Table is a data structure designed to efficiently store and retrieve key-value pairs where the keys are strings. It organizes data into a tree-like structure, where each node represents a character, and paths from the root to nodes represent strings.

Textrank

TextRank is a graph-based ranking model used for natural language processing tasks like text summarization and keyword extraction. It adapts the PageRank algorithm to rank sentences or words based on their importance within a text, using a graph where nodes represent text units and edges represent their semantic relationships.

TF-IDF

Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical measure used in information retrieval and text mining to evaluate how important a word is to a document in a collection or corpus. It combines term frequency (TF), which measures how frequently a term appears in a document, with inverse document frequency (IDF), which measures how important the term is across all documents.

Ternary Search Trie

A Ternary Search Trie (TST) is a data structure used to store and retrieve strings efficiently, combining features of binary search trees and tries. It organizes data in a tree where each node contains three children (low, equal, high) corresponding to characters less than, equal to, or greater than the node’s character, enabling balanced search operations while conserving memory compared to standard tries.

Wagner-Fischer Edit Distance

The Wagner-Fischer algorithm is a dynamic programming method for calculating the Levenshtein distance (edit distance) between two strings. It determines the minimum number of single-character edits (insertions, deletions, or substitutions) needed to transform one string into another by filling a matrix where each cell represents the edit distance between prefixes of the two strings. The algorithm builds the solution incrementally, ensuring optimal substructure and overlapping subproblems for efficient computation.

[End] 4.754003608703392