Best Denny Session for Data + AI Summit 2025

Data + AI Summit 2025 is just a few weeks away! This year we offer our wide selection of sessions earlier, with more than 700+ to choose from. Sign up and join us personally in San Francisco or practically.

With a career rooted in Open Source, I have Meethand, how open technologies and formats for the corporate strategy are still central. As a long -term contributor to Apache Spark ™ and Mlflow, the administrator and commission for the Delta Lake and Unity catalog, and last contributor to Apache Iceberg ™, I had the honor to work with some Brightte Minds in the field.

For this year’s session I focus on the intersection of an open source and AI – with a special interest of AU multimodal AI. Specifically, how open table formats such as Delta and Iceberg, combined with a uniform Unity catalog, drive another wave in real time, trusted AI and analytics.

My best tips

Apache Spark 4.1: Another chapter in Unified Analytics

Apache Spark ™ has long been recognized as a leading open-source Unified Analytics Engine and combines a simple but powerful API with a rich ecosystem and top performance. In the upcoming issue of Spark 4.1, the Reimagines Spark community to excel in the massive deployment of clusters and the development of a local notebook. Listen and ask questions on:

Xiao Li, engineering director of Databricks, Apache Spark Commitor and Member of PMC.
DB TSAI is an engineering leader in the Databricks Spark team. He is a member of the Apache Spark (PMC) and Commister Project Management Committee

ICEBERG GEO Type: Transformation of Geospatial Data Management in Scale

Geospatial is increasingly important for Lakehouse formats. Learn from Jia Yu, co -founder and chief architect of WHERBOTS INC. And Szehon Ho, a software engineer in the databricks, on the latest and largest surrounding types of geo -space data in Apache Iceberg ™.

We will save a lot of money using cloud data!

R. Tyler Croy of Scribd, Delta Lake Mainter and Shepherd from Delta-Rs are founding, immersed in the cloud architecture that Scribd has accepted to use data from AWS Aurora, SQS, Kinesis Data Firehose and more. Using open source-code tools such as Kafka-Delta-Inget, Oxbow and Airbs, Scribd has redefined its architecture as an event, reliable and most important, cheaper. No work is needed!

This session plunges on the value props of Lakehouse architecture and cost efficiency in Rust/Arrow/Python ecosystems. Several recommended videos to watch in advance:

Daft and Unity catalog: Multimodal/Ai-Rodák Lakehouse

Multimodal AI fundamentally changes the landscape because data is more than just tables. Workflows now often include documents, images, sound, video, insertion, URL and more.

This session by Jay Chia, co -founder of Eventual, will show how the Daft + Unity catalog can help unify authentication, authorization and data line, which provides a holistic view of public administration, daft, popular multimodal framework.

Bridge of large data and AI: Strengthening the product with Lance format for multimodal data pipe AI

PSPARK has long been the cornerstone of large data processing, but the increase in multimodal AI and vector search is challenges beyond its abilities. The new API Python Data Source Python allows integration with developing AI data lakes based on a multimodal spear format.

This session will dive into Lanka format and why it is an important part for AI multimodal Allison Wang, Apache Spark ™ Commister and Li Qiu, Lacedb Database Engineer and Alluxio PMC member, immerses how to combine Apache Spark (Pyspark) and Landb allows you to strengthen the multimodal AI data pipe.

Streamling DSPY Development: Monitoring, tuning and deployment with MLFLOW

Chen Qian, a senior software engineer at Databricks, will show how to integrate MLFLOW with DSPY to bring full observability to your DSPY development.

You will see how to monitor the call, rating and optimizers of the DSPY module using trace and autologies MLFLOW. The combination of these two tools is facilitated by the tuning, iteration and understanding of DSPY workflows and then by deploying your DSPY End-Tond program.

From completion of code to autonomous software engineering agents

Kilian Lieret, a research software engineer at Princeton University, has recently been a guest for the data cooking of video recordings for fascinating discussion on new tools for evaluating and strengthening AI in software engineering.

This session is an extension of this conversation, where Kilian is dug into SWENCH (benchmarking instrument) and SWEnt (agent), the current AI for developers and how to experiment with AI agents.

Folding high-precision AI systems with SLM and mini-agents

Always to discuss Sharon Zhou, CEO and founder of Lamini, discuss how to use small language models (SLM) and mini-agents to reduce hallucinations using memory export (ie Mome knows best)!

Find out a little more about Mume in this fun data Brew by Databricks Episode with Sharon: Mix of memory exports.

Beyond The Compro -off: Differential Privacy in Table Data Synthesis

Differential privacy is an important tool for ensuring mathematical guarantees surrounds individual privacy for data. This lecture Lipika Ramaswamy of Gretel.Ai (now part of NVIDIA) examines the use of Gretel Navigator to generate differential private synthetic data, which maintains a high loyalty of source data and high usefulness for downstream tasks across heterogeneous data sets.

Some good pre -reading on this topic:

Building knowledge agents to automate workflow document
One of the greatest promises for LLM agents is the automation of all knowledge work on non -structural data – we call these “knowledge agents”. Jerry Liu, the founder of Llamiaindex, will immerse how to create knowledge agents for automating document workflows. What can sometimes be complex to implement, Jerry demonstrates how to do a simplified flow for the basic business process.

My best tips

Leave a Comment Cancel reply