← Back to projects

gerp (C++ Recursive File Search Tool)

A grep-inspired recursive search tool that indexes a directory tree and supports case-sensitive and case-insensitive queries.

C++Hash TableFile I/O

Highlights

  • Custom hierarchical hash-based index to reduce redundant string comparisons
  • Nested hash-table index (insensitive → sensitive → count)
  • N-ary tree directory traversal and robust string processing
  • Designed for performance on large file sets

gerp is a C++ file search tool inspired by Unix grep. It recursively traverses a directory tree, indexes file contents, and supports both case-sensitive and case-insensitive search queries. The focus of this project is designing the indexing structure so queries stay fast even as the number of files grows.

The core idea is a nested hash-table index that separates case-normalized lookup from case-preserving results.

Indexing & Data Structures
- Efficient nested hash table index: insensitive → sensitive → occurrence count
- Directory traversal implemented with an n-ary tree representation of the filesystem
- Memory-conscious storage designed for large directory trees and datasets

Search Behavior
- Supports case-insensitive and case-sensitive queries
- Robust tokenization/string processing to handle formatting and special characters
- Optimized lookup path to minimize redundant comparisons during querying

Systems Engineering Takeaways
- File I/O patterns and recursive traversal at scale
- Hashing and custom data-structure design for fast query resolution
- Practical performance tradeoffs (index build time vs query speed)