Stars
Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake
A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset
POC on flink mysql cdc and iceberg
Declarative schema migrations with schema-as-code workflows
A modern web-based user interface for dbt-core projects
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
The world’s fastest framework for building websites.
A fast, simple & powerful blog framework, powered by Node.js.
🌐 Jekyll is a blog-aware static site generator in Ruby
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Learn ML engineering for free in 4 months! Register here 👇🏼
Actively curated list of awesome BI tools. PRs welcome!
JavaScript API for Chrome and Firefox
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
Create committing rules for projects 🚀 auto bump versions ⬆️ and auto changelog generation 📂
AliSQL is a MySQL branch originated from Alibaba Group. Fetch document from Release Notes at bottom.
Distributed stream processing engine in Rust
A high-performance observability data pipeline.
A curated list of Rust code and resources.
A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration…
Event streaming platform for agents, apps, and analytics. Continuously ingest, transform, and serve event data in real time, at scale.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and…
A MCP (Model Context Protocol) server for interacting with dbt.
The observability platform for Iceberg lakehouses.
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.