Lessons learned building delta-rs
March 09, 2025
deltalake
Reviewing some of the lessons learned building the delta-rs tooling and community.
The landscape of data and ML technology changes quickly. Our blog is where we will continue to share interesting tidbits from our work. You can read below or subscribe via RSS.
March 09, 2025
deltalake
Reviewing some of the lessons learned building the delta-rs tooling and community.
February 24, 2025
deltalake
lambda
sqs-ingest
Serverless data ingestion can be extremely cost effective but limitations of AWS Lambda can result in transaction log bloat. In this post we'll discuss the "BUFFER_MORE" feature in sqs-ingest and how it helps get more bang for your Lambda buck.
December 31, 2024
deltalake
lambda
kafka-delta-ingest
Facing a large backlog of data it is tempting to horizontally scale Delta writers as much as compute and budget will allow. In this post we'll dive into how this can be counter propductive and actually slow throughput rather than accelerate it!
November 25, 2024
deltalake
python
rust
Introducing the definitive guide to Delta Lake, the high-performance open table format for cloud and on-premise big data needs. The book is now available from O'Reilly, including the contributed chapter for using Delta Lake with Rust and Python by R. Tyler Croy.
October 17, 2024
databricks
aws
event
rust
The future of data engineering is becoming more and more Rust-powered. In this video session Tyler walks the audience through a starting point on using Rust for real-world data engineering tasks with the deltalake, datafusion, and arrow crates.
October 16, 2024
databricks
aws
event
In this session we will dive into examples of how to work with Delta tables from AWS Lambdas written in Python and Rust. For many ingestion, or lightweight data processing workloads AWS Lambda provides a fast, easy, and cheap execution environment.
June 04, 2024
databricks
aws
event
Buoyant Data will be in San Francisco for Data and AI Summit this year for a number of sessions including a obok signing, an open source summit, an AMA, and two conference track sessions! Come chat with us!
December 30, 2023
aws
lambda
S3 Event Notifications are a highly useful way of orchestrating workflows around AWS S3-based Delta tables. This post details a pattern for ensuring highly concurrent Lambda execution with S3 Event Notifications
November 27, 2023
rust
deltalake
At a protocol level Delta Lake can scale to an infinite number of concurrent readers and writers, in theory, so long as the underlying storage provider supports strong atomicity. On AWS the Simple Storage Service lacks a necessary "put if absent" operation which requires Delta writers coordinate to ensure consistent writes to any given table.
July 08, 2023
rust
python
deltalake
aws
Remove those pesky hard-coded secret keys from your data applications and learn how to assume roles using built-in credential providers in AWS. This post includes examples that can be copied for both Rust and Python applications which need to access Delta tables.
May 21, 2023
databricks
aws
Optimizing cost of workloads running on Databricks can be daunting at first, but there are plenty of low hanging fruit! These tips will help you save thousands of dollars annually on your big data's big bills!
May 17, 2023
databricks
aws
event
Buoyant Data will be in San Francisco for Data and AI Summit from June 26th to June 29th. We'll be talking about alternative data pipelines using Rust and Python, and cost optimization in AWS. Come find us!
February 09, 2023
deltalake
rust
developer
A developer focused post explaining how to write to a Delta table in Rust using the Apache Arrow RecordBatch data structure.
January 03, 2023
aws
databricks
Discussing whether it is possible to have a Databricks deployment with a $0 idle cost in AWS. It is a nice idea, but not entirely possible in practice. This post discusses the minimum footprint possible with Databricks.
December 18, 2022
news
aws
deltalake
databricks
An introductory post outlining what Buoyant Data can do to help save on their Databricks and AWS costs, along with our preferences for the most cost effective data platform architecture.