> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supersonic.cv/llms.txt
> Use this file to discover all available pages before exploring further.

# Auto-enrich new contacts

> A Python script that finds new contacts in Supersonic, enriches them with web search data, and updates their records. Runs on cron.

This script polls Supersonic for recently created contacts, enriches each one using a web search API (Tavily), and writes the enrichment data back to the contact record. Run it every 15 minutes on cron.

## How it works

1. Fetch contacts sorted by `created_at` descending.
2. Check which ones were created since the last run (tracked via a timestamp file).
3. For each new contact, search the web for their name + company.
4. Extract title, LinkedIn URL, and company info from search results.
5. Update the contact record in Supersonic.

## Prerequisites

You need a Contacts object type with fields for enrichment data:

```bash theme={null}
npx supersonic-cli objects add-field \
  --object-type-slug "contacts" \
  --name "LinkedIn" \
  --field-type "text"

npx supersonic-cli objects add-field \
  --object-type-slug "contacts" \
  --name "Bio" \
  --field-type "text"
```

You also need a [Tavily API key](https://tavily.com/) (free tier available). Any search API works -- substitute your preferred provider.

## The script

```python theme={null}
#!/usr/bin/env python3
"""
Auto-enrich new contacts with web search data.

Env vars: SUPERSONIC_API_KEY, TAVILY_API_KEY
State file: ~/.supersonic_enrich_last_run
"""

import os
import json
from datetime import datetime, timezone
from pathlib import Path
import httpx

API_URL = "https://mcp.supersonic.cv/api/developers/mcp/call/"
API_KEY = os.environ["SUPERSONIC_API_KEY"]
TAVILY_API_KEY = os.environ["TAVILY_API_KEY"]
OBJECT_TYPE_SLUG = "contacts"
LAST_RUN_FILE = Path.home() / ".supersonic_enrich_last_run"


def api_call(tool: str, params: dict) -> dict:
    resp = httpx.post(
        API_URL,
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
        },
        json={"tool": tool, "params": params},
        timeout=15.0,
    )
    resp.raise_for_status()
    return resp.json()


def get_last_run() -> str | None:
    if LAST_RUN_FILE.exists():
        return LAST_RUN_FILE.read_text().strip()
    return None


def save_last_run(timestamp: str):
    LAST_RUN_FILE.write_text(timestamp)


def get_new_contacts(since: str | None) -> list[dict]:
    params = {
        "object_type_slug": OBJECT_TYPE_SLUG,
        "sort": "-created_at",
        "limit": 50,
    }
    data = api_call("records.list", params)
    records = data.get("records", [])

    if since:
        records = [r for r in records if r.get("created_at", "") > since]

    return records


def search_person(name: str, company: str | None) -> dict:
    """Search the web for a person and extract enrichment data."""
    query = name
    if company:
        query = f"{name} {company}"

    resp = httpx.post(
        "https://api.tavily.com/search",
        json={
            "api_key": TAVILY_API_KEY,
            "query": query,
            "search_depth": "basic",
            "max_results": 5,
        },
        timeout=15.0,
    )
    resp.raise_for_status()
    results = resp.json().get("results", [])

    enrichment = {}

    for result in results:
        url = result.get("url", "")
        content = result.get("content", "")

        # Extract LinkedIn URL
        if "linkedin.com/in/" in url and "LinkedIn" not in enrichment:
            enrichment["LinkedIn"] = url

        # Use snippet as bio if it mentions the person
        if name.split()[0].lower() in content.lower() and "Bio" not in enrichment:
            enrichment["Bio"] = content[:500]

    return enrichment


def update_record(record_id: str, data: dict):
    api_call("records.update", {
        "object_type_slug": OBJECT_TYPE_SLUG,
        "record_id": record_id,
        "data": data,
    })


def main():
    last_run = get_last_run()
    now = datetime.now(timezone.utc).isoformat()

    contacts = get_new_contacts(last_run)
    print(f"Found {len(contacts)} new contacts to enrich.")

    for contact in contacts:
        record_id = contact["id"]
        data = contact.get("data", {})
        name = data.get("Name", "")
        company = data.get("Company")

        if not name:
            continue

        # Skip if already enriched
        if data.get("LinkedIn") or data.get("Bio"):
            print(f"  Skipping {name} (already enriched)")
            continue

        print(f"  Enriching {name}...")
        try:
            enrichment = search_person(name, company)
            if enrichment:
                update_record(record_id, enrichment)
                print(f"    Updated: {list(enrichment.keys())}")
            else:
                print(f"    No data found")
        except Exception as e:
            print(f"    Error: {e}")

    save_last_run(now)
    print("Done.")


if __name__ == "__main__":
    main()
```

## Setup

<Steps>
  <Step title="Install dependencies">
    ```bash theme={null}
    pip install httpx
    ```
  </Step>

  <Step title="Set environment variables">
    ```bash theme={null}
    export SUPERSONIC_API_KEY="supersonic_live_YOUR_KEY"
    export TAVILY_API_KEY="tvly-YOUR_KEY"
    ```
  </Step>

  <Step title="Test with a single run">
    ```bash theme={null}
    python auto_enrich.py
    ```

    Check a contact to verify enrichment data was written:

    ```bash theme={null}
    npx supersonic-cli records get \
      --object-type-slug "contacts" \
      --record-id "RECORD_ID"
    ```
  </Step>

  <Step title="Schedule with cron">
    Every 15 minutes:

    ```cron theme={null}
    */15 * * * * SUPERSONIC_API_KEY=supersonic_live_YOUR_KEY TAVILY_API_KEY=tvly-YOUR_KEY /usr/bin/python3 /path/to/auto_enrich.py >> /var/log/auto_enrich.log 2>&1
    ```
  </Step>
</Steps>

## Using a different search API

The `search_person` function is the only part that depends on Tavily. Swap it out for any provider. Here's the same function using SerpAPI:

```python theme={null}
def search_person(name: str, company: str | None) -> dict:
    query = f"{name} {company}" if company else name

    resp = httpx.get(
        "https://serpapi.com/search",
        params={
            "api_key": os.environ["SERPAPI_KEY"],
            "q": query,
            "num": 5,
        },
        timeout=15.0,
    )
    resp.raise_for_status()
    results = resp.json().get("organic_results", [])

    enrichment = {}
    for result in results:
        link = result.get("link", "")
        snippet = result.get("snippet", "")

        if "linkedin.com/in/" in link and "LinkedIn" not in enrichment:
            enrichment["LinkedIn"] = link
        if name.split()[0].lower() in snippet.lower() and "Bio" not in enrichment:
            enrichment["Bio"] = snippet[:500]

    return enrichment
```

<Note>
  This script processes up to 50 contacts per run. If you're importing large batches, increase the `limit` parameter or run more frequently. Each `records.list` and `records.update` call counts toward the 1,000 calls/min rate limit.
</Note>

<Tip>
  To enrich existing contacts (not just new ones), remove the `since` filter and the "already enriched" check. Run it once as a backfill, then switch back to incremental mode.
</Tip>
