Back
Personal Project
Active Development
Linux · Python · Web Scraping
2025–Present

Operation
i1

Type
Personal Coding Project
Platform
Linux
Purpose
Anti-Trafficking Intelligence
Scope
Deep Web Scraping · Automation · Reporting
Environment
Linux (CLI)
Method
Deep Web Scraping
Goal
Automate Intelligence
Status
Active Development

Code written with
purpose behind it.

Operation i1 is a deep web scraping tool built in Linux to streamline and automate intelligence-gathering in support of anti-human trafficking efforts. It targets publicly accessible platforms — forums, classified ad sites, and other corners of the open and deep web where trafficking activity is known to surface — and extracts structured data that would otherwise require hours of manual review.

The goal isn't surveillance for its own sake. It's efficiency — getting useful, organized information into the hands of people who can act on it faster than any manual process allows.

"The most dangerous part of trafficking is how invisible it is. This project is about removing that advantage."
Automated scraping — replaces hours of manual searching with targeted, repeatable collection runs across known platforms
Structured output — raw data is parsed and formatted for immediate review, flagging and organizing results rather than dumping raw text
Linux-native — built entirely in the terminal, designed for low overhead and repeatable execution without a GUI dependency
Modular design — individual scrapers can be updated or swapped out as platforms change, without rebuilding the pipeline from scratch

Manual review
doesn't scale.

Anti-trafficking investigators and advocates face a signal-to-noise problem. The web is vast, the platforms shift constantly, and the volume of content that needs to be reviewed to surface a single actionable lead is enormous. Manual searching is exhausting, inconsistent, and doesn't keep pace with the way these networks operate.

Existing tools are often expensive, inaccessible to independent advocates, or designed for law enforcement environments that most community-level organizations can't access. Operation i1 exists in that gap — lightweight, open, and purpose-built for the problem.

"Every hour spent searching manually is an hour that isn't spent on the case itself."
Volume — relevant platforms generate thousands of new listings daily; no individual can review them by hand
Drift — platforms change structure, move, or disappear without notice; automated tooling adapts faster than manual workflows
Access gap — most intelligence tooling requires institutional access or licensing; independent advocates are effectively locked out

A pipeline from
collection to output.

Operation i1 runs as a set of modular scripts executed from the Linux command line. Each stage of the pipeline is designed to be inspectable, adjustable, and re-runnable — so the tool improves over time without requiring a full rebuild.

01
Target Identification
Known platforms, forums, and listing sites are defined as scraping targets. Target lists are maintained as plain text configs — easy to update as the landscape shifts. Targets span the open web and accessible deep web layers.
02
Scrape & Collect
Scripts send structured requests to each target, collecting page content while respecting rate limits and minimizing detection footprint. Results are pulled into raw staging files for the next stage.
03
Parse & Filter
Raw content is run through parsing logic that extracts relevant fields — contact information, keywords, geographic indicators, patterns of language known to correlate with exploitation ads. Noise is stripped. Signal is elevated.
04
Structure & Output
Parsed results are written to structured output files — organized, timestamped, and ready for review or handoff. The output is designed to be usable without technical knowledge on the receiving end.
05
Iteration & Refinement
Each run informs the next. Keyword lists are refined, target configs are updated, and parsing logic is tightened based on what the data actually looks like — a feedback loop that improves precision over time.

Built light.
Built to last.

The stack is intentionally minimal — no unnecessary dependencies, no GUI overhead, nothing that creates friction between writing code and running it.

Language
Python 3
Core scripting language for all scraping, parsing, and output logic. Fast to write, easy to modify, and well-supported for web tooling.
Scraping
BeautifulSoup & Requests
HTML parsing and HTTP request handling. BeautifulSoup for structured DOM navigation; Requests for clean, controllable fetching.
Automation
Selenium / Playwright
Headless browser control for JavaScript-rendered targets where static scraping isn't sufficient. Runs without a display server in CLI environments.
Environment
Linux (CLI)
Developed and run entirely from the terminal. No desktop GUI required — designed for low-overhead, repeatable execution via cron or manual trigger.
Output
CSV / JSON
Structured output formats that can be opened in any spreadsheet tool or fed into downstream analysis without additional conversion steps.
Version Control
Git / GitHub
Full version history, modular commits per feature, and a remote backup — making the project portable and collaborative when it needs to be.

One developer.
One clear objective.

This is an independent project — no team, no external spec, no client brief. Every architectural decision, every line of code, and every iteration came from a single question: what does the person trying to use this actually need?

System Architecture
Designed the full pipeline from target config through structured output — modular enough to update individual components without breaking the chain, simple enough to actually run when it needs to.
Scraper Development
Built individual scrapers for each target platform, handling differences in structure, rendering method, and rate-limiting behavior. Each scraper is self-contained and independently testable.
Parsing & NLP Logic
Developed keyword filtering and pattern-matching logic informed by known trafficking language patterns — designed to surface relevant results without requiring an expert to interpret raw output.
Linux Systems Work
Built and maintained entirely in a Linux CLI environment — writing shell scripts, managing dependencies, scheduling runs, and troubleshooting the full stack without a GUI safety net.