Quickstart Guide¶

This guide will help you get started with Kura quickly using the procedural API for step-by-step conversation analysis.

Overview¶

Kura provides a functional approach to conversation clustering that allows you to:

Process conversations step by step with full control
Use checkpoints to save intermediate results
Visualize results in multiple formats

Prerequisites¶

Before you begin, make sure you have:

Installed Kura
Set up your API key (Kura uses OpenAI by default): bash export OPENAI_API_KEY=your_api_key_here

Basic Workflow¶

Kura's workflow consists of four main steps:

Summarization: Generate concise summaries of conversations
Base Clustering: Group similar summaries together
Meta Clustering: Create hierarchical clusters for better organization
Dimensionality Reduction: Project clusters to 2D for visualization

Complete Example¶

Here's a complete working example:

import asyncio
from rich.console import Console
from kura import (
    summarise_conversations,
    generate_base_clusters_from_conversation_summaries,
    reduce_clusters_from_base_clusters,
    reduce_dimensionality_from_clusters,
    CheckpointManager,
)
from kura.visualization import visualise_pipeline_results
from kura.types import Conversation
from kura.summarisation import SummaryModel
from kura.cluster import ClusterModel
from kura.meta_cluster import MetaClusterModel
from kura.dimensionality import HDBUMAP

async def main():
    # Initialize models
    console = Console()
    summary_model = SummaryModel(console=console)
    cluster_model = ClusterModel(console=console)
    meta_cluster_model = MetaClusterModel(console=console)
    dimensionality_model = HDBUMAP()

    # Set up checkpointing to save intermediate results
    checkpoint_manager = CheckpointManager("./checkpoints", enabled=True)

    # Load conversations from Hugging Face dataset
    conversations = Conversation.from_hf_dataset(
        "ivanleomk/synthetic-gemini-conversations",
        split="train"
    )

    # Process through the pipeline step by step
    summaries = await summarise_conversations(
        conversations,
        model=summary_model,
        checkpoint_manager=checkpoint_manager
    )

    clusters = await generate_base_clusters_from_conversation_summaries(
        summaries,
        model=cluster_model,
        checkpoint_manager=checkpoint_manager
    )

    reduced_clusters = await reduce_clusters_from_base_clusters(
        clusters,
        model=meta_cluster_model,
        checkpoint_manager=checkpoint_manager
    )

    projected_clusters = await reduce_dimensionality_from_clusters(
        reduced_clusters,
        model=dimensionality_model,
        checkpoint_manager=checkpoint_manager,
    )

    # Visualize results
    visualise_pipeline_results(reduced_clusters, style="enhanced")

    print(f"\nProcessed {len(conversations)} conversations")
    print(f"Created {len(reduced_clusters)} meta clusters")
    print(f"Checkpoints saved to: {checkpoint_manager.checkpoint_dir}")

if __name__ == "__main__":
    asyncio.run(main())

This example will:

Load 190 synthetic programming conversations from Hugging Face
Process them through the complete analysis pipeline step by step
Generate hierarchical clusters organized into categories
Display the results with enhanced visualization

Visualization Options & Output¶

Kura provides multiple visualization styles through the visualise_pipeline_results function. Simply change the style parameter to get different output formats:

from kura.visualization import visualise_pipeline_results

# Choose from: "basic", "enhanced", or "rich"
visualise_pipeline_results(reduced_clusters, style="basic")
visualise_pipeline_results(reduced_clusters, style="enhanced")  # Recommended
visualise_pipeline_results(reduced_clusters, style="rich", console=console)

Basic Style¶

Clean tree structure without extra formatting:

Output:

Clusters (190 conversations)
╠══ Generate SEO-optimized content for blogs and scripts (38 conversations)
║   ╠══ Assist in writing SEO-friendly blog posts (12 conversations)
║   ╚══ Help create SEO-driven marketing content (8 conversations)
╠══ Help analyze and visualize data with R and Tableau (25 conversations)
║   ╠══ Assist with data analysis and visualization in R (15 conversations)
║   ╚══ Troubleshoot sales data visualizations in Tableau (10 conversations)
... (and more clusters)

Enhanced Style (Recommended)¶

Includes progress bars, statistics, and detailed formatting:

Output:

================================================================================
🎯 ENHANCED CLUSTER VISUALIZATION
================================================================================
🔸 📚 All Clusters (190 total conversations)
    📊 190 conversations (100.0%) [████████████████████]

╠══ 🔸 Generate SEO-optimized content for blogs and scripts
║   📊 38 conversations (20.0%) [████░░░░░░░░░░░░░░░░]
║   ╠══ 🔸 Assist in writing SEO-friendly blog posts
║   ║   📊 12 conversations (6.3%) [█░░░░░░░░░░░░░░░░░░░]
║   ╠══ 🔸 Write blog posts about diabetes medications
║   ║   📊 10 conversations (5.3%) [█░░░░░░░░░░░░░░░░░░░]
║   ╚══ 🔸 Help create SEO-driven marketing content
║       📊 8 conversations (4.2%) [░░░░░░░░░░░░░░░░░░░░]

╠══ 🔸 Help analyze and visualize data with R and Tableau
║   📊 25 conversations (13.2%) [██░░░░░░░░░░░░░░░░░░]
║   ╠══ 🔸 Assist with data analysis and visualization in R
║   ║   📊 15 conversations (7.9%) [█░░░░░░░░░░░░░░░░░░░]
║   ╚══ 🔸 Troubleshoot sales data visualizations in Tableau
║       📊 10 conversations (5.3%) [█░░░░░░░░░░░░░░░░░░░]

... (and more clusters)

================================================================================
📈 CLUSTER STATISTICS
================================================================================
📊 Total Clusters: 29
🌳 Root Clusters: 10
💬 Total Conversations: 190
📏 Average Conversations per Root Cluster: 19.0
================================================================================

Rich Style¶

Colorful, interactive-style output with detailed descriptions and statistics tables:

Output:

╭──────────────────────────────────────────────────────────────────────────────╮
│                        🎯 RICH CLUSTER VISUALIZATION                         │
╰──────────────────────────────────────────────────────────────────────────────╯

📚 All Clusters (190 conversations)
├── Generate SEO-optimized content for blogs and scripts (38 conversations, 20.0%)
│   Users requested help in creating SEO-optimized blog posts and engaging
│   YouTube v...
│   Progress: [███░░░░░░░░░░░░]
│   ├── Assist in writing SEO-friendly blog posts (12 conversations, 6.3%)
│   │   The users sought assistance in crafting engaging and SEO-friendly blog
│   │   posts acr...
│   │   Progress: [░░░░░░░░░░░░░░░]
│   ├── Write blog posts about diabetes medications (10 conversations, 5.3%)
│   │   The users sought assistance in creating blog posts focused on diabetes
│   │   treatment...
│   │   Progress: [░░░░░░░░░░░░░░░]
│   └── Help create SEO-driven marketing content (8 conversations, 4.2%)
│       The users sought assistance in developing SEO-optimized marketing
│       content across...
│       Progress: [░░░░░░░░░░░░░░░]
├── Help analyze and visualize data with R and Tableau (25 conversations, 13.2%)
│   Users sought help with analyzing and visualizing datasets in both R and
│   Tableau,...
│   Progress: [█░░░░░░░░░░░░░░]
... (and more clusters)

       📈 Cluster Statistics                📊 Cluster Size Distribution
╭─────────────────────────┬───────╮  ╭────────────────────┬───────┬────────────╮
│ Metric                  │ Value │  │ Size Range         │ Count │ Percentage │
├─────────────────────────┼───────┤  ├────────────────────┼───────┼────────────┤
│ 📊 Total Clusters       │ 29    │  │ 🔥 Large (>100)    │ 0     │ 0.0%       │
│ 🌳 Root Clusters        │ 10    │  │ 📈 Medium (21-100) │ 3     │ 30.0%      │
│ 💬 Total Conversations  │ 190   │  │ 📊 Small (6-20)    │ 7     │ 70.0%      │
│ 📏 Avg per Root Cluster │ 19.0  │  │ 🔍 Tiny (1-5)      │ 0     │ 0.0%       │
╰─────────────────────────┴───────╯  ╰────────────────────┴───────┴────────────╯

Using the Web Interface¶

For a more interactive experience, Kura includes a web interface:

# Start with default checkpoint directory
kura start-app

# Or use a custom checkpoint directory
kura start-app --dir ./checkpoints

Expected output:

🚀 Access website at (http://localhost:8000)

INFO:     Started server process [14465]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Access the web interface at http://localhost:8000 to explore:

Cluster Map: 2D visualization of conversation clusters
Cluster Tree: Hierarchical view of cluster relationships
Cluster Details: In-depth information about selected clusters
Conversation Dialog: Examine individual conversations
Metadata Filtering: Filter clusters based on extracted properties

Benefits of the Procedural API¶

Fine-grained Control: Process each step independently
Flexibility: Mix and match different model implementations
Checkpoint Management: Resume from any stage
Multiple Visualization Options: Choose the best format for your needs
Functional Programming: No hidden state, clear data flow

Next Steps¶

Now that you've run your first analysis with Kura, you can:

Learn about configuration options to customize Kura
Explore core concepts to understand how Kura works
Check out the API Reference for detailed documentation