Algorithms

Breadth-First Search

Breadth-First Search is an algorithm for searching graph data structures. It starts at a specified source node and explores all its immediate neighbors before moving on to the next level of neighbors.

  • Time complexity is Linear, i.e. O(v + e)
  • Space complexity is Linear, i.e. O(v)
  • For large graphs the space complexity is described in terms of depth making it exponential, i.e. O(b^d)

Bubblesort

Bubble Sort is a simple comparison-based sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. This process is repeated until the list is sorted.

Chained Hash Table

A Chained Hash Table is a data structure that resolves collisions using linked lists. Each slot in the hash table array points to a chain (linked list) that stores all elements hashing to the same index. When inserting, searching, or deleting an element, the algorithm hashes the key to determine the appropriate slot and traverses the corresponding chain to perform the operation.

Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors in a multi-dimensional space, defined as the cosine of the angle between them. It is calculated as the dot product of the vectors divided by the product of their magnitudes. This metric is often used in text analysis, clustering, and information retrieval to determine the similarity of documents or data points, independent of their magnitude.

Counting Sort

Counting Sort is a non-comparison-based sorting algorithm that sorts integers by counting the occurrences of each distinct value in a range and using this information to place elements in their correct positions. It works in 𝑂(𝑛+𝑘) time, where 𝑛 is the number of elements and 𝑘 is the range of the input values. Counting Sort is efficient for small ranges of integers but unsuitable for sorting non-integer or large-range data due to its space complexity of 𝑂(𝑘). It is a stable sorting algorithm when implemented correctly.

Finite Automata

A finite automata algorithm is a step-by-step method used to check if a sequence of symbols (like letters or numbers) matches a specific pattern. It works by moving through a series of states based on the input, starting from an initial state and possibly ending in a state that signals a match.

Heapsort

A sorting technique that uses a binary heap, first building a max-heap and then repeatedly extracting the maximum element, ensuring efficient sorting.

  • Time complexity is Linearithmic, i.e. O(n log(n))
  • Space complexity is Constant, i.e. O(1)

Huffman Encoding

The Huffman Encoding data structure uses a Min Heap to sort characters by frequency. It then assigns codes based on frequency; the most frequent characters will have the shortest codes.

Insertion Sort

A sorting technique that builds the sorted array one element at a time, inserting each new element into its proper position among already-sorted elements.

  • Time complexity is Quadratic, i.e. O(n^2)
  • Space complexity is Constant, i.e. O(1)

Intersection

The calculateIntersectionWeight function computes the intersection weight between two arrays of strings by determining the ratio of the size of their intersection to the length of the smaller array. It relies on the getIntersection function to find the common elements between the two arrays, ensuring an efficient and straightforward implementation. The getIntersection function uses sets to identify and return a list of unique shared elements, removing duplicates and simplifying comparison. Together, these functions handle edge cases, such as empty arrays, by returning 0 when appropriate to avoid division by zero, making them robust and easy to use for set-based comparisons.

Iterative Depth-First Search

An iterative depth-first search algorithm using a stack, exploring as far along each branch as possible before backtracking, useful for pathfinding and cycle detection.

Linear Search

Linear Search is a straightforward algorithm for finding a specific element in a list. It sequentially checks each element until the target is found or the list ends, making it simple and effective for unsorted data but inefficient for large datasets.

Longest Common Prefix

An algorithm for finding the longest common prefix among a set of strings.

  • Time complexity is Linear, i.e. O(n)
  • Space complexity is Constant, i.e. O(1)

Longest Common Subsequence

An algorithm for finding the longest subsequence common to two sequences.

Merge Sort

A divide-and-conquer sorting technique that splits the array into halves, recursively sorts each half, and then merges the sorted halves.

  • Time complexity is Linearithmic, i.e. O(n log(n))
  • Space complexity is Linear, i.e. O(n)

Myers Diff

Computes the differences between two sequences efficiently, commonly used in version control systems to show changes between file revisions.

Naive String Matcher

Negative Array

Quicksort

Quick sort is a divide-and-conquer sorting algorithm that selects a “pivot” element from an array, partitions the remaining elements into two sub-arrays (elements less than the pivot and elements greater than the pivot), and then recursively sorts the sub-arrays.

Rabin-Karp Matcher

Recursive Depth-First Search

A recursive depth-first search algorithm that explores branches deeply before backtracking, managing traversal state with recursion.

  • Time complexity of insertion and deletion is Constant, i.e. O(1)
  • Time complexity of search is is Linear, i.e. O(n)
  • Space complexity is Linear, i.e. O(n)

Reverse

The reverse algorithm takes an array of values and returns the values in reverse order. It works by iterating over the array and swapping the first and last values, then the second and second-to-last values, and so on until the middle of the array is reached.

Selection Sort

Selection sort is a comparison-based sorting algorithm that divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right.

Suffix Array

The suffix array algorithm constructs a sorted array of all suffixes of a given string, enabling efficient substring searches, pattern matching, and text analysis. It uses sorting and optional auxiliary structures like the LCP array for additional optimizations.

Textrank

TextRank is a graph-based ranking model used for natural language processing tasks like text summarization and keyword extraction. It adapts the PageRank algorithm to rank sentences or words based on their importance within a text, using a graph where nodes represent text units and edges represent their semantic relationships.

TF-IDF

Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical measure used in information retrieval and text mining to evaluate how important a word is to a document in a collection or corpus. It combines term frequency (TF), which measures how frequently a term appears in a document, with inverse document frequency (IDF), which measures how important the term is across all documents.

Wagner-Fischer Edit Distance

The Wagner-Fischer algorithm is a dynamic programming method for calculating the Levenshtein distance (edit distance) between two strings. It determines the minimum number of single-character edits (insertions, deletions, or substitutions) needed to transform one string into another by filling a matrix where each cell represents the edit distance between prefixes of the two strings. The algorithm builds the solution incrementally, ensuring optimal substructure and overlapping subproblems for efficient computation.