Foreign Key Handling¶
One of SYDA's most powerful features is its ability to maintain referential integrity across multiple related tables. This document explains in detail how foreign key relationships are defined, detected, and handled during data generation.
Foreign Key Definition Methods¶
SYDA supports three different ways to define foreign key relationships:
1. Special __foreign_keys__
Section¶
You can explicitly define foreign keys using a special __foreign_keys__
section in your schema:
# order.yaml
id:
type: integer
primary_key: true
customer_id:
type: integer
order_date:
type: date
__foreign_keys__:
customer_id: [Customer, id]
2. Field-Level references
Property¶
You can define foreign keys directly in field definitions using the references
property:
# orderitem.yaml
id:
type: integer
primary_key: true
order_id:
type: integer
references:
schema: Order
field: id
product_id:
type: integer
references:
schema: Product
field: id
quantity:
type: integer
This approach keeps the foreign key definition close to the field it applies to, making the schema more readable.
3. SQLAlchemy ForeignKey
Definitions¶
When using SQLAlchemy models, foreign keys are automatically detected from the ForeignKey
definitions:
from sqlalchemy import Column, Integer, String, Date, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
customer_id = Column(Integer, ForeignKey('customers.id'))
order_date = Column(Date)
SYDA will automatically extract these relationships during schema analysis.
Dependency Resolution¶
Once foreign keys are defined, SYDA automatically determines the correct order for generating data:
- Dependency Graph: SYDA builds a directed graph of dependencies between tables
- Topological Sort: It performs a topological sort to determine the generation order
- Execution Order: Tables are generated in an order that ensures all parent tables exist first
For example, with these tables:
-
Customer (no dependencies)
-
Product (no dependencies)
-
Order (depends on Customer)
-
OrderItem (depends on Order and Product)
SYDA would generate them in this order:
-
Customer and Product (can be generated in parallel)
-
Order (after Customer is available)
-
OrderItem (after both Order and Product are available)
Foreign Key Value Assignment¶
When generating data with foreign keys, SYDA ensures that each foreign key references a valid primary key in the parent table:
- Parent Table Access: SYDA maintains access to all previously generated tables
- Random Selection: By default, it randomly selects a valid foreign key value
- Consistent Foreign Keys: When multiple columns in the same schema reference the same parent table, SYDA ensures they get the same parent record for consistency
Best Practices for Foreign Key Handling¶
- Be Explicit: Whenever possible, explicitly define foreign key relationships
- Consistent Naming: Use consistent naming patterns (e.g.,
table_id
) for foreign keys - Handle Nullable Keys: Specify whether foreign keys can be null
- Test Relationships: Verify that generated data maintains proper referential integrity
- Document Dependencies: Add comments or documentation about table dependencies
Examples¶
To see foreign key relationships in action, explore the example projects included with SYDA:
- SQLAlchemy Examples: Check sqlalchemy_models for examples of foreign keys with SQLAlchemy models
- Dictionary Schema Examples: See Dictionary Examples for dictionary-based foreign key handling
- YAML/JSON Schema Examples: The YAML Examples and JSON Examples demonstrate foreign keys in file-based schemas
- Retail Example: Retail Example shows foreign keys connecting multiple related tables
Each example demonstrates different aspects of foreign key handling, including relationship definition, value assignment, and referential integrity verification.