Essential principles to produce and consume data for AI acceleration

by CryptoExpert
Changelly


This is a VB Lab Insights article presented by Capital One.

AI offers transformative potential, but unlocking its value requires strong data management. AI builds on a solid data foundation that can iteratively improve, creating a flywheel effect between data and AI. This flywheel enables companies to build more customized, real-time solutions that unlock impact for their customers and the business.

Managing data in today’s world is not without complexity. Data volume is skyrocketing, with research showing it’s doubled in the last five years alone. As a result, 68% of data available to enterprises is left untapped. Within that data, there’s a huge variety of structures and formats, with MIT noting that around 80-90% of data is unstructured — fueling complexity in putting it to use. And finally, the velocity at which data needs to be deployed to users is accelerating. Some use cases call for sub-10 millisecond data availability, or in other words, ten times faster than the blink of an eye.

The data ecosystems of today are big, diverse and fast — and the AI revolution is further raising the stakes on how companies manage and use data.

okex

Fundamentals for great data

The data lifecycle is complicated and unforgiving, often involving many steps, many hops and many tools. This can lead to disparate ways of working with data and varying levels of maturity and instrumentation to drive data management.

To empower users with trustworthy data for innovation, we need to first tackle the fundamentals of managing great data: self-service, automation and scale.

  • Self-service means empowering users to do their job with minimal friction. It covers areas like seamless data discovery, ease of data production and tools that democratize data access.
  • Automation ensures that all core data management capabilities are embedded in the tools and experiences that enable users to work with data.
  • Data ecosystems need to scale — especially in the AI era. Among other considerations, enterprises need to consider the scalability of certain technologies, resilience capabilities and service level agreements that set baseline obligations for how data is to be managed (as well as enforcement mechanisms for such agreements).

These principles lay the foundation to produce and consume great data.

Producing great data

Data producers are responsible for onboarding and organizing data, enabling quick and efficient consumption. A well-designed, self-service portal can play a key role here by allowing producers to interact seamlessly with systems across the ecosystem — such as storage, access controls, approvals, versioning and business catalogs. The goal is to create a unified control plane that mitigates the complexity of these systems, making data available in the right format, at the right time and in the right place.

To scale and enforce governance, enterprises can choose between a central platform and a federated model — or even adopt a hybrid approach. A central platform simplifies data publishing and governance rules, while a federated model offers flexibility, using purpose-built SDKs to manage governance and infrastructure locally. The key is to implement consistent mechanisms that ensure automation and scalability, enabling the business to reliably produce high-quality data that fuels AI innovation.

Consuming great data

Data consumers — such as data scientists and data engineers — need easy access to reliable, high-quality data for rapid experimentation and development. Simplifying the storage strategy is a foundational step. By centralizing compute within the data lake and using a single storage layer, enterprises can minimize data sprawl and reduce complexity by enabling compute engines to consume data from a single storage layer.

Enterprises should also adopt a zone strategy to handle diverse use cases. For instance, a raw zone may support expanded data and file types such as unstructured data, while a curated zone enforces stricter schema and quality requirements. This setup allows for flexibility while maintaining governance and data quality. Consumers can use these zones for activities like creating personal spaces for experimentation or collaborative zones for team projects.

Automated services ensure data access, lifecycle management and compliance, empowering users to innovate with confidence and speed.

Lead with simplicity

Effective AI strategies are grounded in robust, well-designed data ecosystems. By simplifying how you produce and consume data — and improving the quality of said data — businesses can empower users to innovate in new performance-driving areas with confidence.

As a foundation, it’s paramount that businesses prioritize ecosystems and processes that enhance trustworthiness and accessibility. By implementing the principles outlined above, they can do just that –building scalable and enforceable data management that will power rapid experimentation in AI and ultimately deliver long-term business value.

Marty Andolino is VP, Software Engineering at Capital OneKajal Wood is Sr. Director, Software Engineering at Capital One

VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.



Source link

You may also like