
Building Conversational Artificial Intelligence: Lessons from DataGalaxy
Creating an AI assistant for the enterprise isn’t just about large language models — it’s about understanding context, constraints, and real business needs. In this post, I’ll share key lessons from building Blink, an enterprise-grade AI chatbot at DataGalaxy, a SaaS company specializing in data cataloging and governance.
Why Build an AI Assistant?
As organizations grow, so does the complexity of their data. Navigating a modern data catalog requires both technical precision and business understanding. The goal of Blink was to bridge that gap — providing users with a conversational layer to interact with metadata, data lineage, and cataloged assets using natural language.
Tech Stack Overview
To build Blink, we used:
- LangChain & LangGraph for orchestrating complex LLM workflows.
- OpenAI GPT-4 for natural language understanding and generation.
- Elasticsearch for hybrid search (keyword + semantic).
- FastAPI and aiohttp for backend APIs.
- LiteLLM and LangFuse for cost management and observability.
Each layer played a key role in making the assistant accurate, fast, and enterprise-ready.
Key Lessons Learned
1. Context Is Everything
Enterprise tools have deeply structured data. Injecting structured metadata into LLM prompts — like entity types, data domains, or glossary terms — drastically improved precision. Retrieval-Augmented Generation (RAG) wasn’t enough on its own; context engineering became a core focus.
2. Tool Calling ≠ Just Plugins
Tool calls in our assistant needed to:
- Trigger filtered API searches (like
get_users
,search_tags
, orsearch_by_module
) - Return precise catalog objects with metadata-enriched links
- Be stateless and fast
We designed structured tools using LangChain’s StructuredTool
, with tight validation and JWT-based auth, integrating with DataGalaxy’s internal APIs.
3. Prompt Optimization Is a Battle
Small prompt tweaks had huge impacts on:
- Token usage (e.g., trimming verbose object schemas)
- Response consistency (e.g., instructing LLMs to return structured markdown)
- Tool selection accuracy
Over time, we moved toward modular prompt compaction, where only the required context is injected per node.
4. Monitoring Matters in Production
We used LangFuse to track:
- Model latency
- Tool success/failure rates
- User feedback traces
This helped us quickly debug performance bottlenecks and retrain prompts or reroute requests.
5. Don't Underestimate Search
Users expect Google-like performance, but with enterprise accuracy. We layered:
- Semantic vector search (via OpenAI + Elasticsearch)
- Keyword fallback search
- Filtering logic based on metadata attributes (tags, owners, domains)
This hybrid search architecture was critical for relevance and speed.
What’s Next?
- Multi-turn memory (with session history)
- Dynamic context injection per module
- Domain-specific embeddings to replace general-purpose ones
- More explainable responses to aid governance audits
Final Thoughts
Enterprise AI assistants aren't about mimicking ChatGPT — they're about augmenting structured systems with natural language. If you're building one, focus not just on the LLM, but on the metadata, search infrastructure, and workflows that drive real value.