SQLite — In-Memory Schemas (Option A)¶
Connect to a SQLite database, infer all table schemas automatically, generate synthetic data, and write it back — all without writing any schema files to disk.
When to use this approach¶
- You want the simplest possible workflow
- You don't need to inspect or edit schemas before generating
- You're prototyping or running one-off data generation jobs
Install¶
Full example¶
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine, text
from syda import SyntheticDataGenerator, DatabaseSchemaLoader, ModelConfig
load_dotenv()
# --- 1. Connect to your database ---
engine = create_engine("sqlite:///healthcare_demo.db")
# --- 2. Infer schemas directly as dicts (no files written) ---
loader = DatabaseSchemaLoader(engine)
schemas = loader.load_schemas()
# schemas = {"patient": {...}, "provider": {...}, ...}
# --- 3. Generate synthetic data ---
generator = SyntheticDataGenerator(
model_config=ModelConfig(
provider="anthropic",
model_name="claude-haiku-4-5-20251001",
temperature=0.7,
max_tokens=8192,
)
)
results = generator.generate_for_schemas(
schemas=schemas,
sample_sizes={
"patient": 10, "provider": 5,
"diagnosis": 20, "claim": 20,
"adjudication": 20, "payment": 20,
},
prompts={
"patient": "Generate realistic patient records with diverse ages (18-85) and genders.",
"provider": "Generate realistic healthcare providers with diverse specialties.",
"diagnosis": "Generate realistic diagnoses using ICD-10 codes (e.g. I10, E11.9).",
"claim": "Generate realistic healthcare claims using CPT codes, amounts $50-$5000.",
"adjudication": "Generate adjudication records: ~60% Approved, ~25% Partial, ~15% Denied.",
"payment": "Generate payment records, mostly Paid status.",
},
output_dir="output/load_schemas",
)
# --- 4. Write generated data back to the database ---
loader.write_to_database(results)
What happens¶
DatabaseSchemaLoaderconnects to the database and reads every table's columns, types, primary keys, and foreign keys via SQLAlchemy'sinspect().load_schemas()returns adict[str, dict]— one schema dict per table — ready forgenerate_for_schemas().generate_for_schemas()resolves the FK dependency graph, generates each table in topological order (parents before children), and saves CSVs tooutput_dir.write_to_database()inserts each DataFrame back into the database in the same FK-safe order.
Set your API key¶
See also¶
- SQLite — File-Based Schemas — save YAML files first, then generate
- PostgreSQL Example — full cycle against a real PostgreSQL instance
- Database Integration deep dive