Unstructured Document Generation¶
SYDA provides powerful capabilities for generating unstructured documents alongside structured data. This approach allows you to create realistic documents like invoices, contracts, reports, and more based on the structured data you generate.
Document Template Basics¶
Document generation in SYDA is based on templates. You define a template that includes both static content and dynamic placeholders, which SYDA will fill with generated data.
Key template attributes:¶
To generate documents, your schema must include special template attributes.
Attribute | Description | Example |
---|---|---|
__template__ |
Whether this schema is a template | true |
__description__ |
Human-readable description of the template | Retail receipt template |
__name__ |
Name of the template | Receipt |
__depends_on__ |
Other schemas this template depends on | [Product, Transaction, Customer] |
__foreign_keys__ |
Field-level foreign key relationships | customer_name: [Customer, first_name] |
__template_source__ |
Path to the template file | templates/receipt.html |
__input_file_type__ |
Template format | html |
__output_file_type__ |
Output document format | pdf |
Here's an example using YAML format:
# receipt.yml
__template__: true
__description__: Retail receipt template
__name__: Receipt
__depends_on__: [Product, Transaction, Customer]
__foreign_keys__:
customer_name: [Customer, first_name]
customer_id: [Customer, id]
__template_source__: templates/receipt.html
__input_file_type__: html
__output_file_type__: pdf
# Regular schema fields
store_name:
type: string
length: 50
description: Name of the retail store
store_address:
type: address
length: 150
description: Full address of the store
store_phone:
type: string
length: 20
description: Store phone number
receipt_number:
type: string
length: 12
description: Unique receipt identifier
items:
type: array
description: List of purchased items with product details
Here's an example using SqlAlchemy model:
SQLAlchemy Model-Based Templates¶
When using SQLAlchemy models, you can define template attributes directly as class attributes:
import os
from sqlalchemy import Column, Integer, String, Float, Text, Date, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
templates_dir = os.path.join(os.path.dirname(__file__), 'templates')
class ContractDocument(Base):
"""Contract document for a won opportunity."""
# Special metadata attributes
__tablename__ = 'contract_documents'
__depends_on__ = ['opportunities']
# Template configuration as class attributes
__template__ = True
__template_source__ = os.path.join(templates_dir, 'contract.html')
__input_file_type__ = 'html'
__output_file_type__ = 'pdf'
id = Column(Integer, primary_key=True)
opportunity_id = Column(Integer, ForeignKey('opportunities.id'), nullable=False)
effective_date = Column(Date)
expiration_date = Column(Date)
contract_number = Column(String(50))
customer_name = Column(String(100), ForeignKey('customers.name'))
customer_address = Column(String(200), ForeignKey('customers.address'))
service_description = Column(Text)
payment_terms = Column(Text)
contract_value = Column(Float, ForeignKey('opportunities.value'))
renewal_terms = Column(Text)
Supported Template Formats¶
As of now SYDA supports HTML templates(Jinja2) for unstructured document generation.
HTML Templates(Jinja2)¶
HTML Jinja2 templates provide the most flexibility and control over document formatting:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Receipt #{{ receipt_number }}</title>
<style>
body { font-family: Arial, sans-serif; margin: 2cm; }
.header { text-align: center; margin-bottom: 2em; }
.receipt-details { margin-bottom: 2em; }
.line-items { width: 100%; border-collapse: collapse; }
.line-items th, .line-items td { border: 1px solid #ddd; padding: 8px; }
.total { margin-top: 2em; text-align: right; font-weight: bold; }
</style>
</head>
<body>
<div class="header">
<h1>{{ store_name }}</h1>
<p>{{ store_address }}</p>
<p>{{ store_phone }}</p>
<p>{{ store_website }}</p>
</div>
<div class="receipt-details">
<p><strong>Receipt Number:</strong> {{ receipt_number }}</p>
<p><strong>Date:</strong> {{ transaction_date }}</p>
<p><strong>Time:</strong> {{ transaction_time }}</p>
<p><strong>Customer:</strong> {{ customer_name }}</p>
<p><strong>Customer ID:</strong> {{ customer_id }}</p>
</div>
<table class="line-items">
<thead>
<tr>
<th>Item</th>
<th>Quantity</th>
<th>Unit Price</th>
<th>Total</th>
</tr>
</thead>
<tbody>
{% if items %}
{% for item in items %}
<tr>
<td>{{ item.name }}</td>
<td>{{ item.quantity }}</td>
<td>${{ item.price }}</td>
<td>${{ item.total }}</td>
</tr>
{% endfor %}
{% else %}
<tr><td colspan="4">No items</td></tr>
{% endif %}
</tbody>
</table>
<div class="total">
<p>Subtotal: ${{ subtotal }}</p>
<p>Tax ({{ tax_rate }}%): ${{ tax_amount }}</p>
<p>Discount: ${{ discount_amount }}</p>
<p><strong>Total Amount: ${{ total }}</strong></p>
</div>
</body>
</html>
Template Design¶
SYDA uses Jinja2 for template rendering, providing powerful features for creating dynamic documents:
Variables¶
Access any field from your schema directly:
Loops¶
Iterate over arrays or lists of items:
<table>
<tr><th>Item</th><th>Price</th></tr>
{% for item in items %}
<tr>
<td>{{ item.name }}</td>
<td>${{ item.price }}</td>
</tr>
{% endfor %}
</table>
Conditionals¶
Show or hide content based on conditions:
{% if total_amount > 1000 %}
<div class="premium-customer">
Thank you for your substantial order! You qualify for our premium support.
</div>
{% elif total_amount > 500 %}
<div class="valued-customer">
Thank you for your order! You qualify for priority shipping.
</div>
{% else %}
<div class="standard-customer">
Thank you for your order!
</div>
{% endif %}
Filters¶
Transform data during rendering:
Date: {{ issue_date | date_format('%B %d, %Y') }}
Name: {{ customer_name | upper }}
Summary: {{ description | truncate(100) }}
Jinja2 Template Syntax Requirements¶
SYDA uses Jinja2 for template rendering. Be sure to follow these syntax requirements:
- Use
{{ variable }}
for variable interpolation (with spaces inside the braces) - Use
{% for item in items %}...{% endfor %}
for loops - Use
{% if condition %}...{% endif %}
for conditionals - Use
{# This is a comment #}
for comments - Use
{{ variable | filter }}
for applying filters
Important: Do not use Handlebars-style syntax (e.g., {{variable}}
without spaces or {{\#each items}}
) as these won't be processed correctly.
Example of Correct Jinja2 Syntax:¶
PDF Generation¶
SYDA can automatically convert HTML and Markdown templates to PDF documents:
schemas = {
'Contract': {
'__template__': 'templates/contract.html',
'__template_source__': 'file',
'__input_file_type__': 'html',
'__output_file_type__': 'pdf', # Generate PDF output
'id': {'type': 'integer', 'primary_key': True},
'client_name': {'type': 'string'},
'start_date': {'type': 'date'},
'end_date': {'type': 'date'},
'contract_terms': {'type': 'string', 'format': 'long_text'}
}
}
results = generator.generate_for_schemas(
schemas=schemas,
sample_sizes={'Contract': 5},
output_dir='output/contracts'
)
This will generate a Contract
directory containing PDF files (e.g., Contract_1.pdf
, Contract_2.pdf
, etc.)
Best Practices¶
- Use HTML for Complex Layouts: HTML provides the most control over document appearance
- Test Templates Separately: Validate templates with sample data before full generation
- Include CSS in HTML Templates: Embed CSS for consistent styling in PDF output
- Use Loops for Repetitive Content: Generate tables, lists, and repeated sections efficiently
- Handle Optional Fields: Use conditionals or defaults for fields that might be missing
- Consider Page Breaks: For multi-page documents, control page breaks with CSS
- Document Variable Names: Comment your templates to document expected variables
Examples¶
To see unstructured document generation in action, explore SQLAlchemy Example and Yaml Example