Image from datagalaxy - from datagalaxy.com — datagalaxy.com

Building Conversational Artificial Intelligence: Lessons from DataGalaxy

Creating an AI assistant for the enterprise isn’t just about large language models — it’s about understanding context, constraints, and real business needs. In this post, I’ll share key lessons from building Blink, an enterprise-grade AI chatbot at DataGalaxy, a SaaS company specializing in data cataloging and governance.

Why Build an AI Assistant?

As organizations grow, so does the complexity of their data. Navigating a modern data catalog requires both technical precision and business understanding. The goal of Blink was to bridge that gap — providing users with a conversational layer to interact with metadata, data lineage, and cataloged assets using natural language.

Tech Stack Overview

To build Blink, we used:

LangChain & LangGraph for orchestrating complex LLM workflows.
OpenAI GPT-4 for natural language understanding and generation.
Elasticsearch for hybrid search (keyword + semantic).
FastAPI and aiohttp for backend APIs.
LiteLLM and LangFuse for cost management and observability.

Each layer played a key role in making the assistant accurate, fast, and enterprise-ready.

Key Lessons Learned

1. Context Is Everything

Enterprise tools have deeply structured data. Injecting structured metadata into LLM prompts — like entity types, data domains, or glossary terms — drastically improved precision. Retrieval-Augmented Generation (RAG) wasn’t enough on its own; context engineering became a core focus.

2. Tool Calling ≠ Just Plugins

Tool calls in our assistant needed to:

Trigger filtered API searches (like get_users, search_tags, or search_by_module)
Return precise catalog objects with metadata-enriched links
Be stateless and fast

We designed structured tools using LangChain’s StructuredTool, with tight validation and JWT-based auth, integrating with DataGalaxy’s internal APIs.

3. Prompt Optimization Is a Battle

Small prompt tweaks had huge impacts on:

Token usage (e.g., trimming verbose object schemas)
Response consistency (e.g., instructing LLMs to return structured markdown)
Tool selection accuracy

Over time, we moved toward modular prompt compaction, where only the required context is injected per node.

4. Monitoring Matters in Production

We used LangFuse to track:

Model latency
Tool success/failure rates
User feedback traces

This helped us quickly debug performance bottlenecks and retrain prompts or reroute requests.

5. Don't Underestimate Search

Users expect Google-like performance, but with enterprise accuracy. We layered:

Semantic vector search (via OpenAI + Elasticsearch)
Keyword fallback search
Filtering logic based on metadata attributes (tags, owners, domains)

This hybrid search architecture was critical for relevance and speed.

What’s Next?

Multi-turn memory (with session history)
Dynamic context injection per module
Domain-specific embeddings to replace general-purpose ones
More explainable responses to aid governance audits

Final Thoughts

Enterprise AI assistants aren't about mimicking ChatGPT — they're about augmenting structured systems with natural language. If you're building one, focus not just on the LLM, but on the metadata, search infrastructure, and workflows that drive real value.