Cost effective data

Big data adds significant value to your organization but can also add significant cost. Buoyant Data specializes improving data infrastructure with high performance low-cost ingestion and transformation pipelines with Rust, Python, Databricks, and AWS.

Illustration of a woman working at a whiteboard

Delta Lake Support

As creators of the deltalake Python and Rust packages, we have been supporting Delta Lake applications since the beginning. Buoyant Data offers one-time on-demand support as well as on-going technical support subscriptions for your team!

Rust Development

With years of experience creating and deploying Rust data applications with delta-rs, kafka-delta-ingest, and more, Buoyant Data can help your organization adopt and excel with high-performance, low-cost data services or AWS Lambdas built with Rust.

Data Architecture Consulting

Our expertise in leveraging Delta Lake includes both the Databricks Platform (Serverless, Unity, etc) and the AWS Data Platform (Glue, Athena, EMR). We can help design and implement a scalable and efficient data platform for your organization.

Infrastructure Optimization

For organizations with existing data infrastructure and analytics, we can analyze and optimize in-place to squeeze faster queries and lower costs out of your existing data platform without substantial rearchitecture.

Recent Posts

Fast, cheap, and easy data ingestion with AWS Lambda and Delta Lake

In this session we will dive into examples of how to work with Delta tables from AWS Lambdas written in Python and Rust. For many ingestion, or lightweight data processing workloads AWS Lambda provides a fast, easy, and cheap execution environment.

Read more

Join us for two talks at Data and AI Summit

Buoyant Data will be in San Francisco for Data and AI Summit this year for a number of sessions including a obok signing, an open source summit, an AMA, and two conference track sessions! Come chat with us!

Read more

Scaling S3 Event Notifications for Delta Lake

S3 Event Notifications are a highly useful way of orchestrating workflows around AWS S3-based Delta tables. This post details a pattern for ensuring highly concurrent Lambda execution with S3 Event Notifications

Read more

Concurrency limitations for Delta Lake on AWS

At a protocol level Delta Lake can scale to an infinite number of concurrent readers and writers, in theory, so long as the underlying storage provider supports strong atomicity. On AWS the Simple Storage Service lacks a necessary "put if absent" operation which requires Delta writers coordinate to ensure consistent writes to any given table.

Read more

Automating credentials for Delta Lake on AWS

Remove those pesky hard-coded secret keys from your data applications and learn how to assume roles using built-in credential providers in AWS. This post includes examples that can be copied for both Rust and Python applications which need to access Delta tables.

Read more

5 tips for cheaper Databricks workloads

Optimizing cost of workloads running on Databricks can be daunting at first, but there are plenty of low hanging fruit! These tips will help you save thousands of dollars annually on your big data's big bills!

Read more

Join us at Data and AI Summit 2023

Buoyant Data will be in San Francisco for Data and AI Summit from June 26th to June 29th. We'll be talking about alternative data pipelines using Rust and Python, and cost optimization in AWS. Come find us!

Read more

Writing RecordBatches to Delta in Rust

A developer focused post explaining how to write to a Delta table in Rust using the Apache Arrow RecordBatch data structure.

Read more

The cheapest Databricks deployment is $33/month

Discussing whether it is possible to have a Databricks deployment with a $0 idle cost in AWS. It is a nice idea, but not entirely possible in practice. This post discusses the minimum footprint possible with Databricks.

Read more

Initial commit

An introductory post outlining what Buoyant Data can do to help save on their Databricks and AWS costs, along with our preferences for the most cost effective data platform architecture.

Read more