I am pleased to share that Buoyant Data will be attending Data and AI Summit 2023. There are so many great sessions already on the agenda covering all aspects of modern data and ML platforms. My contribution will be in the form of a Delta Lake committers "AMA" in the exposition hall, along with meeting the numerous other contributors and users of the delta-rs and kafka-delta-ingest open source projects. If you're interested in talking more about high performance data pipelines with Rust or infrastructure cost optimization with Databricks, please reach out!.
Recommended sessions
There are dozens of really interesting sessions, but here are some I can recommend which may be of interest!
Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?
This session will talk about some of the really interesting tech being built on top of the Rust bindings which Buoyant Data contributes to:
Rust is a unique language whose traits make it very appealing for data engineering. In this session, we'll walk through the different aspects of the language that make it such a good fit for big data processing including: how it improves performance and how it provides greater safety guarantees and compatibility with a wide range of existing tools that make it well positioned to become a major building block for the future of analytics.
Why Delta Lake is the Best Storage Format for Pandas Analyses
Fellow delta-rs contributor Matthew Powers (aka MrPowers) will be leading this session:
Pandas analyses are often limited by file formats like CSV and Parquet. CSV doesn't allow for column pruning, which is an important performance optimization. Parquet doesn't allow for critical features like ACID transactions, time travel, and schema enforcement. In this session, we will discuss why Delta Lake is the fastest file format for pandas users and how it provides users with great features.
Top Mistakes to Avoid in Streaming Applications
This session will discuss the top mistakes to avoid when working with streaming applications with regard to different sources and sinks like DLT, Kafka, Delta, and so on. If you are avoiding these mistakes while architecting/running/restarting the application, then you will avoid the costly mistakes at a later point in time.
Databricks Cost Management: Tips and Tools to Stay Under Budget
How do you prevent surprise bills at the end of the month? Join us as we discuss best practices for cost management. You'll learn how to analyze and break down costs and hear best practices for keeping your budget in check.
There's a lot going on at Data and AI Summit, so whether you can make it to San Francisco or will be "attending virtually", I encourage anybody working with data to attend!