Posts tagged "Datasets"

Storing only the changed rows with Ducklake

Storing only the changed rows with Ducklake

Tips for using ANTI JOIN and EXCEPT in DuckDB to write only changed data in Ducklake.

DatasetsDuckDB
Tagging useful information in OpenStreetMap

Tagging useful information in OpenStreetMap

OpenStreetMap contains plenty of useful data, which can be used freely by anyone. Here's a short introduction and a few useful examples to help you find family-, laptop-, or dog-friendly places. Hopefully, this motivates you to start contributing too! 🗺️

DatasetsLifestyleOSS
Data engineering to find domains pointing to certain CNAMEs

Data engineering to find domains pointing to certain CNAMEs

Using Merklemap DNS database and duckdb to reverse lookup popular CNAME values. Parquet is very powerful format on storing large quantities of data. We learn importance of ordering and compression as well.

DNSDatasetsDuckDB
How to add 3 million real companies to your empty Cloudflare D1 database?

How to add 3 million real companies to your empty Cloudflare D1 database?

These are the steps how I added 3 million real companies to my empty database. CrunchBase has amazing free dataset which I converted to SQL using duckdb & sqlite3 and uploaded to Cloudflare D1 database with wrangler.

DatasetsSaaSStartups