Posts tagged "Datasets"
Storing only the changed rows with Ducklake
Tips for using ANTI JOIN and EXCEPT in DuckDB to write only changed data in Ducklake.
Tagging useful information in OpenStreetMap
OpenStreetMap contains plenty of useful data, which can be used freely by anyone. Here's a short introduction and a few useful examples to help you find family-, laptop-, or dog-friendly places. Hopefully, this motivates you to start contributing too! 🗺️
Data engineering to find domains pointing to certain CNAMEs
Using Merklemap DNS database and duckdb to reverse lookup popular CNAME values. Parquet is very powerful format on storing large quantities of data. We learn importance of ordering and compression as well.
How to add 3 million real companies to your empty Cloudflare D1 database?
These are the steps how I added 3 million real companies to my empty database. CrunchBase has amazing free dataset which I converted to SQL using duckdb & sqlite3 and uploaded to Cloudflare D1 database with wrangler.