Using xAI Grok Models with SYDA¶
Source code: examples/model_selection/example_grok_models.py
This example demonstrates how to use xAI's Grok models with SYDA for synthetic data generation. Grok-3 and Grok-4 are xAI's large language models that provide OpenAI-compatible API access.
Prerequisites¶
Before running this example, you need to:
- Install SYDA and its dependencies
- Set up your Grok API key in your environment
You can set the API key in your .env
file:
Or set it as an environment variable before running your script:
Example Code¶
The following example demonstrates how to configure and use Grok models for synthetic data generation:
from syda.generate import SyntheticDataGenerator
from syda.schemas import ModelConfig
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Configure Grok-4 model (latest)
config = ModelConfig(
provider="grok",
model_name="grok-4", # Using Grok-4 (latest model)
temperature=0.7,
max_tokens=4000,
extra_kwargs={
"base_url": "https://api.x.ai/v1" # xAI API endpoint
}
)
# Initialize generator with Grok configuration
generator = SyntheticDataGenerator(
model_config=config,
grok_api_key=os.environ.get("GROK_API_KEY")
)
# Define schemas for technology companies and products
schemas = {
'companies': {
'__table_description__': 'Technology companies with innovative products and services',
'id': {'type': 'number', 'description': 'Unique company identifier'},
'name': {'type': 'text', 'description': 'Company name (e.g., TechCorp, InnovateLabs, FutureSystems)'},
'industry': {'type': 'text', 'description': 'Primary industry sector (AI, FinTech, HealthTech, EdTech, etc.)'},
'founded_year': {'type': 'number', 'description': 'Year the company was founded (1990-2024)'},
'employee_count': {'type': 'number', 'description': 'Number of employees (1-10000)'},
'revenue_millions': {'type': 'number', 'description': 'Annual revenue in millions USD (0.1-1000)'},
'is_public': {'type': 'boolean', 'description': 'Whether the company is publicly traded'},
'headquarters': {'type': 'text', 'description': 'City and country where company is headquartered'}
},
'products': {
'__table_description__': 'Products and services offered by technology companies',
'id': {'type': 'number', 'description': 'Unique product identifier'},
'name': {'type': 'text', 'description': 'Product or service name'},
'company_id': {'type': 'foreign_key', 'description': 'Reference to the company that owns this product', 'references': {'schema': 'companies', 'field': 'id'}},
'category': {'type': 'text', 'description': 'Product category (Software, Hardware, Service, Platform)'},
'launch_year': {'type': 'number', 'description': 'Year the product was launched'},
'price_usd': {'type': 'number', 'description': 'Product price in USD (0 for free products)'},
'is_ai_powered': {'type': 'boolean', 'description': 'Whether the product uses AI/ML technology'}
}
}
# Define custom prompts for Grok models
prompts = {
'companies': """
Generate innovative technology companies with diverse backgrounds.
Include a mix of startups and established companies across different industries.
Focus on companies that are pushing the boundaries of technology.
Create realistic company names, industries, and business metrics.
""",
'products': """
Generate cutting-edge technology products and services.
Include both software and hardware products, with realistic pricing.
Focus on innovative products that could exist in the current tech landscape.
Ensure products align with their parent companies' industries and values.
"""
}
# Generate data using Grok-4
results = generator.generate_for_schemas(
schemas=schemas,
prompts=prompts,
sample_sizes={"companies": 8, "products": 15},
output_dir="output/test_grok_models"
)
print("Grok integration test completed successfully!")
Sample Outputs¶
You can view sample outputs generated by Grok models here: examples/model_selection/output/test_grok_models
The output structure follows the same pattern as other providers: - test_grok_models/
- Main directory for Grok model outputs - test_grok_models/grok-3/
- Grok-3 specific outputs - test_grok_models/grok-4/
- Grok-4 specific outputs
Grok Model Configuration¶
SYDA supports xAI Grok models:
- grok-4: Latest xAI model for high-quality data generation with OpenAI-compatible API
- grok-3: Previous xAI model for data generation with OpenAI-compatible API
Key Concepts¶
Model Configuration¶
The ModelConfig
class is used to specify Grok model settings:
model_config = ModelConfig(
provider="grok",
model_name="grok-4", # or "grok-3"
temperature=0.7,
max_tokens=4000,
extra_kwargs={
"base_url": "https://api.x.ai/v1" # xAI API endpoint
}
)
- provider: Set to
"grok"
to use xAI models - model_name: Use
"grok-4"
(latest) or"grok-3"
(previous) - temperature: Controls randomness in generation (0.0-1.0)
- max_tokens: Maximum number of tokens in the response
- top_p: Top-p sampling parameter for response diversity
API Key Setup¶
Grok models require an API key from xAI. Set it up in your environment:
Output Directory Structure¶
The example code creates an organized directory structure for output files:
Best Practices¶
- API Key Security: Store your Grok API key securely in environment variables
- Model Parameters: Adjust temperature and top_p for different creativity levels
- Data Quality: Grok models excel at generating realistic business and technology data
- Referential Integrity: Foreign key relationships are automatically maintained
Performance Characteristics¶
- Speed: Grok models provide fast response times for data generation
- Quality: High-quality, realistic synthetic data generation
- Consistency: Reliable API with OpenAI-compatible interface
- Scalability: Suitable for moderate to large dataset generation
Troubleshooting¶
Common Issues¶
- API Key Not Found: Ensure your GROK_API_KEY is set correctly
- Rate Limiting: Grok models have rate limits; consider batching for large datasets
- Model Availability: Grok model availability may vary by region
Error Handling¶
try:
results = generator.generate_for_schemas(schemas, prompts, sample_sizes)
except ValueError as e:
if "Failed to initialize Grok client" in str(e):
print("Check your GROK_API_KEY environment variable")
else:
print(f"Error: {e}")
Integration with Other Providers¶
Grok models work seamlessly alongside other SYDA-supported providers:
# Mix and match providers
grok_config = ModelConfig(provider="grok", model_name="grok-4")
openai_config = ModelConfig(provider="openai", model_name="gpt-4")
anthropic_config = ModelConfig(provider="anthropic", model_name="claude-3-5-sonnet")
# Use different models for different tasks
generator1 = SyntheticDataGenerator(model_config=grok_config)
generator2 = SyntheticDataGenerator(model_config=openai_config)
This flexibility allows you to choose the best model for each specific data generation task.