Back to blog
Micro Post

Hidden Emails in Git Patch Metadata

| osintgithub

"Where would a wise man hide a leaf? In the forest." — G.K. Chesterton

GitHub lets you hide your email address from your profile page. What it doesn't tell you is that every commit you've ever pushed still contains that email, baked into immutable git objects that anyone can read with a single URL.

By appending .patch to any public commit URL, the raw commit metadata comes back as a standard git format-patch response. No authentication is required, no special tooling is needed, and the email address is sitting right there in the From: header. GitHub's web interface hides it, but the underlying git data is completely unchanged.

The reason this works is that git stores identity information at a fundamental level. Every commit object includes author and committer fields, each containing a name, email, and timestamp. These fields are hashed into the commit's SHA, which means changing the author email would produce an entirely different commit. Once a commit is pushed, the email it contains is effectively permanent.

The disclosure

Appending .patch to any public commit URL is all it takes:

https://github.com/{user}/{repo}/commit/{sha}.patch

The response is a standard git format-patch output that includes the full commit headers:

From abc123 Mon Sep 17 00:00:00 2001
From: John Doe <john@personal-email.com>
Date: Mon, 10 Mar 2025 14:30:00 +1100
Subject: [PATCH] fix: resolve auth bug

What's extractable

A single patch file can contain email addresses in up to five different header fields, each serving a different purpose in the git workflow:

  • From: contains the primary author of the commit. This is the most common field and is almost always present in every patch file.
  • Author: appears in some patch formats and contains the same identity information as the From: field.
  • Committer: identifies the person who actually applied the commit to the repository. This often differs from the author, for example when a maintainer merges someone else's pull request. A single commit can therefore reveal two separate identities.
  • Signed-off-by: is a trailer that gets added during code review workflows, typically as part of a Developer Certificate of Origin process. It often contains a contributor's real corporate email address.
  • Co-authored-by: is used when multiple people collaborate on a single commit, such as during pair programming sessions. These trailers can reveal additional contributors who may not appear anywhere else in the repository.

A simple extraction script that covers all five patterns looks like this:

import re

PATTERNS = [
    r'From: .* <(.+@.+\..+)>',
    r'Author: .* <(.+@.+\..+)>',
    r'Committer: .* <(.+@.+\..+)>',
    r'Signed-off-by: .* <(.+@.+\..+)>',
    r'Co-authored-by: .* <(.+@.+\..+)>',
]

def extract_emails(patch_text):
    emails = set()
    for pattern in PATTERNS:
        emails.update(re.findall(pattern, patch_text, re.I))
    return {e for e in emails
            if not e.endswith("noreply.github.com")
            and not e.endswith("example.com")}

Scaling the extraction

While manually checking individual patches is useful for understanding how the technique works, the real power comes from automation at scale. A single GitHub user might have dozens of repositories, each containing hundreds or thousands of commits. The pipeline for extracting emails across all of them is straightforward:

flowchart LR
    A["Enumerate repos"] --> B["Walk commit history"]
    B --> C["Fetch .patch files"]
    C --> D["Regex extraction"]
    D --> E["Filter + deduplicate"]

The process starts by enumerating all public repositories for a target user or organisation, then walking the commit history of each repository to collect commit SHAs. For each commit, you fetch the .patch file (which requires no authentication), extract emails using the regex patterns above, and finally filter out noise such as noreply@github.com and test@example.com before deduplicating the results.

The main practical constraint is rate limiting. GitHub returns 429 responses when requests are made too aggressively, so randomised delays and user agent rotation are essential for keeping the pipeline running smoothly over larger datasets.

Countermeasures

The most important step is to enable GitHub's "Keep my email addresses private" setting. This replaces your real email with a noreply@users.github.com alias on all new commits. However, it's critical to understand that this setting does not retroactively scrub old commits. Every commit you pushed before enabling the privacy setting still contains your real email address in its metadata.

For a complete cleanup, you can use git filter-repo to rewrite your commit history and replace old email addresses with the noreply alias. This changes all affected commit hashes and requires a force push, so it should be done carefully and with awareness of the impact on any downstream forks or collaborators.


In Project Eyrie, Kraken and Kraken Web automate this pipeline with configurable speed profiles, user agent rotation, and intelligent commit selection.

Part of Project Eyrie — by notalex.sh

Open-source tooling for intelligence, investigations, and OSINT.