Algorithms and Data Structures

Algorithms and data structures ported to TypeScript from Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, as well as other sources classic computer science papers.

Binary Search Tree

Posted by Dustin Boston in .

A Binary Search Tree (BST) is a data structure in which each node has at most two children and maintains the property that for every node, the values in its left subtree are less than the node’s value, and the values in its right subtree are greater.

Breadth-First Search

Posted by Dustin Boston in .

Breadth-First Search is an algorithm for searching graph data structures. It starts at a specified source node and explores all its immediate neighbors before moving on to the next level of neighbors.

Bubblesort

Posted by Dustin Boston in .

Bubble Sort is a simple comparison-based sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. This process is repeated until the list is sorted. Although easy to implement, its worst-case and average-case time complexity of \(O(n^2)\) makes it inefficient on large datasets.

Chained Hash Table

Posted by Dustin Boston in .

A Chained Hash Table is a data structure that resolves collisions using linked lists. Each slot in the hash table array points to a chain (linked list) that stores all elements hashing to the same index. When inserting, searching, or deleting an element, the algorithm hashes the key to determine the appropriate slot and traverses the corresponding chain to perform the operation.

Cosine Similarity

Posted by Dustin Boston in .

Cosine similarity is a measure of similarity between two non-zero vectors in a multi-dimensional space, defined as the cosine of the angle between them. It is calculated as the dot product of the vectors divided by the product of their magnitudes. This metric is often used in text analysis, clustering, and information retrieval to determine the similarity of documents or data points, independent of their magnitude.

Counting Sort

Posted by Dustin Boston in .

Counting Sort is a non-comparison-based sorting algorithm that sorts integers by counting the occurrences of each distinct value in a range and using this information to place elements in their correct positions. It works in 𝑂(𝑛+𝑘) time, where 𝑛 is the number of elements and 𝑘 is the range of the input values. Counting Sort is efficient for small ranges of integers but unsuitable for sorting non-integer or large-range data due to its space complexity of 𝑂(𝑘). It is a stable sorting algorithm when implemented correctly.

Finite Automata

Posted by Dustin Boston in .

A finite automata algorithm is a step-by-step method used to check if a sequence of symbols (like letters or numbers) matches a specific pattern. It works by moving through a series of states based on the input, starting from an initial state and possibly ending in a state that signals a match.

Heapsort

Posted by Dustin Boston .

A sorting technique that uses a binary heap, first building a max-heap and then repeatedly extracting the maximum element, ensuring efficient sorting.

Huffman Encoding

Posted by Dustin Boston .

The Huffman Encoding data structure uses a Min Heap to sort characters by frequency. It then assigns codes based on frequency; the most frequent characters will have the shortest codes.

Insertion Sort

Posted by Dustin Boston .

A sorting technique that builds the sorted array one element at a time, inserting each new element into its proper position among already-sorted elements.

Intersection

Posted by Dustin Boston in .

The calculateIntersectionWeight function computes the intersection weight between two arrays of strings by determining the ratio of the size of their intersection to the length of the smaller array. It relies on the getIntersection function to find the common elements between the two arrays, ensuring an efficient and straightforward implementation. The getIntersection function uses sets to identify and return a list of unique shared elements, removing duplicates and simplifying comparison. Together, these functions handle edge cases, such as empty arrays, by returning 0 when appropriate to avoid division by zero, making them robust and easy to use for set-based comparisons.

Linear Search

Posted by Dustin Boston in .

Linear Search is a straightforward algorithm for finding a specific element in a list. It sequentially checks each element until the target is found or the list ends, making it simple and effective for unsorted data but inefficient for large datasets.

Linked List

Posted by Dustin Boston in .

A data structure where each element contains a value and a reference to the next node, supporting efficient insertion and deletion operations.

Merge Sort

Posted by Dustin Boston in .

A divide-and-conquer sorting technique that splits the array into halves, recursively sorts each half, and then merges the sorted halves.

Minimum Heap

Posted by Dustin Boston .

The Minimum Heap is a specialized binary tree that is based on the Heap data structure. It is also known as a Priority Queue.

Myers Diff

Posted by Dustin Boston in .

Computes the differences between two sequences efficiently, commonly used in version control systems to show changes between file revisions.

N-Gram

Posted by Dustin Boston in .

The n-gram data structure is a simple container. It is used as a probabilistic model typically used in Natural Language Processing (NLP) to predict sequences of elements such as words or characters. It represents a sequence of \(n\) items from a given dataset, often applied for tasks like language modeling, auto-completion, and text prediction.

Queue

Posted by Dustin Boston in .

A linear collection following the First-In-First-Out (FIFO) principle, supporting operations like enqueue and dequeue, useful in task scheduling.

Quicksort

Posted by Dustin Boston in .

Quick sort is a divide-and-conquer sorting algorithm that selects a “pivot” element from an array, partitions the remaining elements into two sub-arrays (elements less than the pivot and elements greater than the pivot), and then recursively sorts the sub-arrays.

Reverse

Posted by Dustin Boston in .

The reverse algorithm takes an array of values and returns the values in reverse order. It works by iterating over the array and swapping the first and last values, then the second and second-to-last values, and so on until the middle of the array is reached.

Suffix Array

Posted by Dustin Boston in and .

The suffix array algorithm constructs a sorted array of all suffixes of a given string, enabling efficient substring searches, pattern matching, and text analysis. It uses sorting and optional auxiliary structures like the LCP array for additional optimizations.

Ternary Search Trie

Posted by Dustin Boston in .

A Ternary Search Trie (TST) is a data structure used to store and retrieve strings efficiently, combining features of binary search trees and tries. It organizes data in a tree where each node contains three children (low, equal, high) corresponding to characters less than, equal to, or greater than the node’s character, enabling balanced search operations while conserving memory compared to standard tries.

Trie Symbol Table

Posted by Dustin Boston in .

The Trie Symbol Table is a data structure designed to efficiently store and retrieve key-value pairs where the keys are strings. It organizes data into a tree-like structure, where each node represents a character, and paths from the root to nodes represent strings.

Wagner-Fischer Edit Distance

Posted by Dustin Boston in .

The Wagner-Fischer algorithm is a dynamic programming method for calculating the Levenshtein distance (edit distance) between two strings. It determines the minimum number of single-character edits (insertions, deletions, or substitutions) needed to transform one string into another by filling a matrix where each cell represents the edit distance between prefixes of the two strings. The algorithm builds the solution incrementally, ensuring optimal substructure and overlapping subproblems for efficient computation.