Quickstart Guide¶
This guide will help you get started with Kura quickly using the procedural API for step-by-step conversation analysis.
Overview¶
Kura provides a functional approach to conversation clustering that allows you to:
- Process conversations step by step with full control
- Use checkpoints to save intermediate results
- Visualize results in multiple formats
Prerequisites¶
Before you begin, make sure you have:
- Installed Kura
- Set up your API key (Kura uses OpenAI by default):
bash export OPENAI_API_KEY=your_api_key_here
Basic Workflow¶
Kura's workflow consists of four main steps:
- Summarization: Generate concise summaries of conversations
- Base Clustering: Group similar summaries together
- Meta Clustering: Create hierarchical clusters for better organization
- Dimensionality Reduction: Project clusters to 2D for visualization
Complete Example¶
Here's a complete working example:
import asyncio
from rich.console import Console
from kura import (
summarise_conversations,
generate_base_clusters_from_conversation_summaries,
reduce_clusters_from_base_clusters,
reduce_dimensionality_from_clusters,
CheckpointManager,
)
from kura.visualization import visualise_pipeline_results
from kura.types import Conversation
from kura.summarisation import SummaryModel
from kura.cluster import ClusterModel
from kura.meta_cluster import MetaClusterModel
from kura.dimensionality import HDBUMAP
async def main():
# Initialize models
console = Console()
summary_model = SummaryModel(console=console)
cluster_model = ClusterModel(console=console)
meta_cluster_model = MetaClusterModel(console=console)
dimensionality_model = HDBUMAP()
# Set up checkpointing to save intermediate results
checkpoint_manager = CheckpointManager("./checkpoints", enabled=True)
# Load conversations from Hugging Face dataset
conversations = Conversation.from_hf_dataset(
"ivanleomk/synthetic-gemini-conversations",
split="train"
)
# Process through the pipeline step by step
summaries = await summarise_conversations(
conversations,
model=summary_model,
checkpoint_manager=checkpoint_manager
)
clusters = await generate_base_clusters_from_conversation_summaries(
summaries,
model=cluster_model,
checkpoint_manager=checkpoint_manager
)
reduced_clusters = await reduce_clusters_from_base_clusters(
clusters,
model=meta_cluster_model,
checkpoint_manager=checkpoint_manager
)
projected_clusters = await reduce_dimensionality_from_clusters(
reduced_clusters,
model=dimensionality_model,
checkpoint_manager=checkpoint_manager,
)
# Visualize results
visualise_pipeline_results(reduced_clusters, style="enhanced")
print(f"\nProcessed {len(conversations)} conversations")
print(f"Created {len(reduced_clusters)} meta clusters")
print(f"Checkpoints saved to: {checkpoint_manager.checkpoint_dir}")
if __name__ == "__main__":
asyncio.run(main())
This example will:
- Load 190 synthetic programming conversations from Hugging Face
- Process them through the complete analysis pipeline step by step
- Generate hierarchical clusters organized into categories
- Display the results with enhanced visualization
Visualization Options & Output¶
Kura provides multiple visualization styles through the visualise_pipeline_results
function. Simply change the style
parameter to get different output formats:
from kura.visualization import visualise_pipeline_results
# Choose from: "basic", "enhanced", or "rich"
visualise_pipeline_results(reduced_clusters, style="basic")
visualise_pipeline_results(reduced_clusters, style="enhanced") # Recommended
visualise_pipeline_results(reduced_clusters, style="rich", console=console)
Basic Style¶
Clean tree structure without extra formatting:
Output:
Clusters (190 conversations)
╠══ Generate SEO-optimized content for blogs and scripts (38 conversations)
║ ╠══ Assist in writing SEO-friendly blog posts (12 conversations)
║ ╚══ Help create SEO-driven marketing content (8 conversations)
╠══ Help analyze and visualize data with R and Tableau (25 conversations)
║ ╠══ Assist with data analysis and visualization in R (15 conversations)
║ ╚══ Troubleshoot sales data visualizations in Tableau (10 conversations)
... (and more clusters)
Enhanced Style (Recommended)¶
Includes progress bars, statistics, and detailed formatting:
Output:
================================================================================
🎯 ENHANCED CLUSTER VISUALIZATION
================================================================================
🔸 📚 All Clusters (190 total conversations)
📊 190 conversations (100.0%) [████████████████████]
╠══ 🔸 Generate SEO-optimized content for blogs and scripts
║ 📊 38 conversations (20.0%) [████░░░░░░░░░░░░░░░░]
║ ╠══ 🔸 Assist in writing SEO-friendly blog posts
║ ║ 📊 12 conversations (6.3%) [█░░░░░░░░░░░░░░░░░░░]
║ ╠══ 🔸 Write blog posts about diabetes medications
║ ║ 📊 10 conversations (5.3%) [█░░░░░░░░░░░░░░░░░░░]
║ ╚══ 🔸 Help create SEO-driven marketing content
║ 📊 8 conversations (4.2%) [░░░░░░░░░░░░░░░░░░░░]
╠══ 🔸 Help analyze and visualize data with R and Tableau
║ 📊 25 conversations (13.2%) [██░░░░░░░░░░░░░░░░░░]
║ ╠══ 🔸 Assist with data analysis and visualization in R
║ ║ 📊 15 conversations (7.9%) [█░░░░░░░░░░░░░░░░░░░]
║ ╚══ 🔸 Troubleshoot sales data visualizations in Tableau
║ 📊 10 conversations (5.3%) [█░░░░░░░░░░░░░░░░░░░]
... (and more clusters)
================================================================================
📈 CLUSTER STATISTICS
================================================================================
📊 Total Clusters: 29
🌳 Root Clusters: 10
💬 Total Conversations: 190
📏 Average Conversations per Root Cluster: 19.0
================================================================================
Rich Style¶
Colorful, interactive-style output with detailed descriptions and statistics tables:
Output:
╭──────────────────────────────────────────────────────────────────────────────╮
│ 🎯 RICH CLUSTER VISUALIZATION │
╰──────────────────────────────────────────────────────────────────────────────╯
📚 All Clusters (190 conversations)
├── Generate SEO-optimized content for blogs and scripts (38 conversations, 20.0%)
│ Users requested help in creating SEO-optimized blog posts and engaging
│ YouTube v...
│ Progress: [███░░░░░░░░░░░░]
│ ├── Assist in writing SEO-friendly blog posts (12 conversations, 6.3%)
│ │ The users sought assistance in crafting engaging and SEO-friendly blog
│ │ posts acr...
│ │ Progress: [░░░░░░░░░░░░░░░]
│ ├── Write blog posts about diabetes medications (10 conversations, 5.3%)
│ │ The users sought assistance in creating blog posts focused on diabetes
│ │ treatment...
│ │ Progress: [░░░░░░░░░░░░░░░]
│ └── Help create SEO-driven marketing content (8 conversations, 4.2%)
│ The users sought assistance in developing SEO-optimized marketing
│ content across...
│ Progress: [░░░░░░░░░░░░░░░]
├── Help analyze and visualize data with R and Tableau (25 conversations, 13.2%)
│ Users sought help with analyzing and visualizing datasets in both R and
│ Tableau,...
│ Progress: [█░░░░░░░░░░░░░░]
... (and more clusters)
📈 Cluster Statistics 📊 Cluster Size Distribution
╭─────────────────────────┬───────╮ ╭────────────────────┬───────┬────────────╮
│ Metric │ Value │ │ Size Range │ Count │ Percentage │
├─────────────────────────┼───────┤ ├────────────────────┼───────┼────────────┤
│ 📊 Total Clusters │ 29 │ │ 🔥 Large (>100) │ 0 │ 0.0% │
│ 🌳 Root Clusters │ 10 │ │ 📈 Medium (21-100) │ 3 │ 30.0% │
│ 💬 Total Conversations │ 190 │ │ 📊 Small (6-20) │ 7 │ 70.0% │
│ 📏 Avg per Root Cluster │ 19.0 │ │ 🔍 Tiny (1-5) │ 0 │ 0.0% │
╰─────────────────────────┴───────╯ ╰────────────────────┴───────┴────────────╯
Using the Web Interface¶
For a more interactive experience, Kura includes a web interface:
# Start with default checkpoint directory
kura start-app
# Or use a custom checkpoint directory
kura start-app --dir ./checkpoints
Expected output:
🚀 Access website at (http://localhost:8000)
INFO: Started server process [14465]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Access the web interface at http://localhost:8000 to explore:
- Cluster Map: 2D visualization of conversation clusters
- Cluster Tree: Hierarchical view of cluster relationships
- Cluster Details: In-depth information about selected clusters
- Conversation Dialog: Examine individual conversations
- Metadata Filtering: Filter clusters based on extracted properties
Benefits of the Procedural API¶
- Fine-grained Control: Process each step independently
- Flexibility: Mix and match different model implementations
- Checkpoint Management: Resume from any stage
- Multiple Visualization Options: Choose the best format for your needs
- Functional Programming: No hidden state, clear data flow
Next Steps¶
Now that you've run your first analysis with Kura, you can:
- Learn about configuration options to customize Kura
- Explore core concepts to understand how Kura works
- Check out the API Reference for detailed documentation