We strongly believe that open source data technology is the right choice for most organizations. An open source solution allows you to take something off the shelf, and tailor it to the unique needs of your data platform. We can suggest and in some cases support your organization's adoption of open source tooling for your data and ML platform.
We Build
- Delta Lake Lambdas for managing data platform workflows:
- oxbow: Utility for converting a bunch of Apache Parquet into a Delta Lake table.
- s3-restructure: Lambda for restructuring objects created in an S3 bucket.
- delta-optimize: Lambda for periodically running
OPTIMIZE
on Delta tables. - spark-connect-rust: Thin Rust bindings for Spark Connect
We Customize
- delta-rs, Rust and Python bindings to Delta Lake.
- kafka-delta-ingest, high speed and efficiency ingestion for Delta Lake.
We Support
- Apache Airflow, a platform created by the community to programmatically author, schedule and monitor workflows.
- Apache Arrow (Rust), implementation of the Arrow in-memory data format in Rust.
- Apache Kafka, a distributed event streaming platform.
- DataFusion, an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
- terraform-provider-databricks, a provider for automating Databricks infrastructure with Terraform.
We Like
If you need help with open source data technology in your organization, let us know!