Nadhir MAZARI BOUFARES

Machine Learning Engineer

Image from datagalaxy - from datagalaxy.com
datagalaxy.com

Building Conversational Artificial Intelligence: Lessons from DataGalaxy

Creating an AI assistant for the enterprise isn’t just about large language models — it’s about understanding context, constraints, and real business needs. In this post, I’ll share key lessons from building Blink, an enterprise-grade AI chatbot at DataGalaxy, a SaaS company specializing in data cataloging and governance.


Why Build an AI Assistant?

As organizations grow, so does the complexity of their data. Navigating a modern data catalog requires both technical precision and business understanding. The goal of Blink was to bridge that gap — providing users with a conversational layer to interact with metadata, data lineage, and cataloged assets using natural language.

Tech Stack Overview

To build Blink, we used:

  • LangChain & LangGraph for orchestrating complex LLM workflows.
  • OpenAI GPT-4 for natural language understanding and generation.
  • Elasticsearch for hybrid search (keyword + semantic).
  • FastAPI and aiohttp for backend APIs.
  • LiteLLM and LangFuse for cost management and observability.

Each layer played a key role in making the assistant accurate, fast, and enterprise-ready.

Key Lessons Learned

1. Context Is Everything

Enterprise tools have deeply structured data. Injecting structured metadata into LLM prompts — like entity types, data domains, or glossary terms — drastically improved precision. Retrieval-Augmented Generation (RAG) wasn’t enough on its own; context engineering became a core focus.

2. Tool Calling ≠ Just Plugins

Tool calls in our assistant needed to:

  • Trigger filtered API searches (like get_users, search_tags, or search_by_module)
  • Return precise catalog objects with metadata-enriched links
  • Be stateless and fast

We designed structured tools using LangChain’s StructuredTool, with tight validation and JWT-based auth, integrating with DataGalaxy’s internal APIs.

3. Prompt Optimization Is a Battle

Small prompt tweaks had huge impacts on:

  • Token usage (e.g., trimming verbose object schemas)
  • Response consistency (e.g., instructing LLMs to return structured markdown)
  • Tool selection accuracy

Over time, we moved toward modular prompt compaction, where only the required context is injected per node.

4. Monitoring Matters in Production

We used LangFuse to track:

  • Model latency
  • Tool success/failure rates
  • User feedback traces

This helped us quickly debug performance bottlenecks and retrain prompts or reroute requests.

5. Don't Underestimate Search

Users expect Google-like performance, but with enterprise accuracy. We layered:

  • Semantic vector search (via OpenAI + Elasticsearch)
  • Keyword fallback search
  • Filtering logic based on metadata attributes (tags, owners, domains)

This hybrid search architecture was critical for relevance and speed.

What’s Next?

  • Multi-turn memory (with session history)
  • Dynamic context injection per module
  • Domain-specific embeddings to replace general-purpose ones
  • More explainable responses to aid governance audits

Final Thoughts

Enterprise AI assistants aren't about mimicking ChatGPT — they're about augmenting structured systems with natural language. If you're building one, focus not just on the LLM, but on the metadata, search infrastructure, and workflows that drive real value.