Ahead of AI & Big Data Expo Europe, AI News caught up with Ivo Everts, Senior Solutions Architect at Databricks, to discuss several key developments set to shape the future of open-source AI and data governance.
One of Databricks’ notable achievements is the DBRX model, which set a new standard for open large language models (LLMs).
“Upon release, DBRX outperformed all other leading open models on standard benchmarks and has up to 2x faster inference than models like Llama2-70B,” Everts explains. “It was trained more efficiently due to a variety of technological advances.
“From a quality standpoint, we believe that DBRX is one of the best open-source models out there and when we refer to ‘best’ this means a wide range of industry benchmarks, including language understanding (MMLU), Programming (HumanEval), and Math (GSM8K).”
The open-source AI model aims to “democratise the training of custom LLMs beyond a small handful of model providers and show organisations that they can train world-class LLMs on their data in a cost-effective way.”
In line with their commitment to open ecosystems, Databricks has also open-sourced Unity Catalog.
“Open-sourcing Unity Catalog enhances its adoption across cloud platforms (e.g., AWS, Azure) and on-premise infrastructures,” Everts notes. “This flexibility allows organisations to uniformly apply data governance policies regardless of where the data is stored or processed.”
Unity Catalog addresses the challenges of data sprawl and inconsistent access controls through various features:
The company has introduced Databricks AI/BI, a new business intelligence product that leverages generative AI to enhance data exploration and visualisation. Everts believes that “a truly intelligent BI solution needs to understand the unique semantics and nuances of a business to effectively answer questions for business users.”
The AI/BI system includes two key components:
Everts states that Databricks AI/BI is designed to provide “a deep understanding of your data’s semantics, enabling self-service data analysis for everyone in an organisation.” He notes it’s powered by “a compound AI system that continuously learns from usage across an organisation’s entire data stack, including ETL pipelines, lineage, and other queries.”
Databricks also unveiled Mosaic AI, which Everts describes as “a comprehensive platform for building, deploying, and managing machine learning and generative AI applications, integrating enterprise data for enhanced performance and governance.”
Mosaic AI offers several key components, which Everts outlines:
Everts highlights that Mosaic AI’s approach to fine-tuning and customising foundation models includes unique features like “fast startup times” by “utilising in-cluster base model caching,” “live prompt evaluation” where users can “track how the model’s responses change throughout the training process,” and support for “custom pre-trained checkpoints.”
At the heart of these innovations lies the Data Intelligence Platform, which Everts says “transforms data management by using AI models to gain deep insights into the semantics of enterprise data.” The platform combines features of data lakes and data warehouses, utilises Delta Lake technology for real-time data processing, and incorporates Delta Sharing for secure data exchange across organisational boundaries.
Everts explains that the Data Intelligence Platform plays a crucial role in supporting new AI and data-sharing initiatives by providing:
As a key sponsor of AI & Big Data Expo Europe, Databricks plans to showcase their open-source AI and data governance solutions during the event.
“At our stand, we will also showcase how to create and deploy – with Lakehouse apps – a custom GenAI app from scratch using open-source models from Hugging Face and data from Unity Catalog,” says Everts.
“With our GenAI app you can generate your own cartoon picture, all running on the Data Intelligence Platform.”
Databricks will be sharing more of their expertise at this year’s AI & Big Data Expo Europe. Swing by Databricks’ booth at stand #280 to hear more about open AI and improving data governance.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.