SQLite — In-Memory Schemas (Option A)¶

Connect to a SQLite database, infer all table schemas automatically, generate synthetic data, and write it back — all without writing any schema files to disk.

When to use this approach¶

You want the simplest possible workflow
You don't need to inspect or edit schemas before generating
You're prototyping or running one-off data generation jobs

Install¶

pip install syda sqlalchemy python-dotenv

Full example¶

import os
from dotenv import load_dotenv
from sqlalchemy import create_engine, text
from syda import SyntheticDataGenerator, DatabaseSchemaLoader, ModelConfig

load_dotenv()

# --- 1. Connect to your database ---
engine = create_engine("sqlite:///healthcare_demo.db")

# --- 2. Infer schemas directly as dicts (no files written) ---
loader  = DatabaseSchemaLoader(engine)
schemas = loader.load_schemas()
# schemas = {"patient": {...}, "provider": {...}, ...}

# --- 3. Generate synthetic data ---
generator = SyntheticDataGenerator(
    model_config=ModelConfig(
        provider="anthropic",
        model_name="claude-haiku-4-5-20251001",
        temperature=0.7,
        max_tokens=8192,
    )
)

results = generator.generate_for_schemas(
    schemas=schemas,
    sample_sizes={
        "patient": 10, "provider": 5,
        "diagnosis": 20, "claim": 20,
        "adjudication": 20, "payment": 20,
    },
    prompts={
        "patient":      "Generate realistic patient records with diverse ages (18-85) and genders.",
        "provider":     "Generate realistic healthcare providers with diverse specialties.",
        "diagnosis":    "Generate realistic diagnoses using ICD-10 codes (e.g. I10, E11.9).",
        "claim":        "Generate realistic healthcare claims using CPT codes, amounts $50-$5000.",
        "adjudication": "Generate adjudication records: ~60% Approved, ~25% Partial, ~15% Denied.",
        "payment":      "Generate payment records, mostly Paid status.",
    },
    output_dir="output/load_schemas",
)

# --- 4. Write generated data back to the database ---
loader.write_to_database(results)

What happens¶

DatabaseSchemaLoader connects to the database and reads every table's columns, types, primary keys, and foreign keys via SQLAlchemy's inspect().
load_schemas() returns a dict[str, dict] — one schema dict per table — ready for generate_for_schemas().
generate_for_schemas() resolves the FK dependency graph, generates each table in topological order (parents before children), and saves CSVs to output_dir.
write_to_database() inserts each DataFrame back into the database in the same FK-safe order.

Set your API key¶

# .env
ANTHROPIC_API_KEY=your_key_here

SQLite — In-Memory Schemas (Option A)¶

When to use this approach¶

Install¶

Full example¶

What happens¶

Set your API key¶

See also¶