SQL Mesh

What Is it? Per there website:

SQLMesh is an open source data transformation framework that brings the best practices of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python. It is created and maintained by Tobiko Data, a company founded by data leaders from Airbnb, Apple, and Netflix.

Why am I interested?

It seems like an interesting alternative to a tool like synapse or Fabric.

Satyrn

Satyrn is a Mac based alternative to Jupyter. Some nice features I've found so far are:

It is a really clean interface with few distractions
They use most of the same keyboard shortcuts as Jupyter
They list their keyboard shortcuts right away in a intro notebook
The auto-complete is really fast, and the tool seems pretty quick too.

Some things that have been a bit of a challenge:

I ended up setting a venv and then getting the path to the bin to get a kernel with some packages (wouldn't mind having a default environment with the option to pip install)
Not entirely sure how to use black (set up to the path, but didn't notice the keyboard shortcut).

Things to Try

Get a key to ChatGPT and try the built int help

Kolo

Invert a trace and get a working integration test in fifteen minutes.

Difftastic

Difftastic is a CLI diff tool that compares files based on their syntax, not line-by-line. Difftastic produces accurate diffs that are easier for humans to read.

Quarto

An open-source scientific and technical publishing system

LanceDB

LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal data. The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.

Datasette

Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.

SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between 21 different dialects like DuckDB, Presto / Trino, Spark / Databricks, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.

Fire Ducks

Compiler Accelerated DataFrame Library for Python with fully-compatible pandas API

Pyper

Concurrent Python made simple

SQL Flow

DuckDB for streaming data

SparkDQ

Most data quality frameworks weren’t designed with PySpark in mind. They aren’t Spark-native and often lack proper support for declarative pipelines. Instead of integrating seamlessly, they require you to build custom wrappers around them just to fit into production workflows. This adds complexity and makes your pipelines harder to maintain. On top of that, many frameworks only validate data after processing — so you can’t react dynamically or fail early when data issues occur.

Patito

Patito offers a simple way to declare pydantic data models which double as schema for your polars data frames. These schema can be used for:

👮 Simple and performant data frame validation.🧪 Easy generation of valid mock data frames for tests.🐍 Retrieve and represent singular rows in an object-oriented manner.🧠 Provide a single source of truth for the core data models in your code base.

Dataframely

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meet expectations and more readable by adding schema information to data frame type hints.

DuckLake

TL;DR: DuckLake simplifies lakehouses by using a standard SQL database for all metadata, instead of complex file-based systems, while still storing data in open formats like Parquet. This makes it more reliable, faster, and easier to manage.

Tools To Check Out

SQL Mesh

Plot Nine

Satyrn

Tauri

Marimo

Supabase

Great Expectations

Kolo

Difftastic

Quarto

SQL Flow

FastHTML

LanceDB

Sanity RSS Plugin

Datasette

SeaweedFS

Deltabase

Stumpy

LLMIO

SQLGlot

Fire Ducks

Pyper

SQL Flow

SparkDQ

Patito

Dataframely

DuckLake

Posts