DuckDB extensions for AWS Lambda

Run DuckDB serverlessly for querying remote and local data. With access to all currently available extensions.

Getting started

1. Deploy DuckDB on AWS Lambda

You can leverage DuckDB for your data applications on AWS in multiple ways. If you'd like to expose an API that provides dynamic query capabilities for data in S3 or remote locations, the serverless-duckdb project could come handy.

Reparitioning data in S3 data lakes is also a common use case that DuckDB supports (even better than AWS Athena) Therefore, serverless-parquet-repartitioner was created.

Or, just build something completely custom by leveraging the pre-build Lambda layers

2. Add custom extension source

As DuckDB doesn't currently build the extensions for older Linux variants like Amazon Linux 2 which AWS Lambda uses (this results in GLIBC incompatibilities when trying to load/install them), it was necessary to build them separately. Luckily, DuckDB offers a way to specify another extension repository URL as source:

SET custom_extension_repository = 'http://extensions.quacking.cloud';

3. Install & Load extensions

Once you set the custom extension repository, you can dynmically install and load the custom DuckDB extensions for different use cases. For an overview, have a look at the table of usabel extensions below.

Available DuckDB extensions

Extension Name Description Install Load Repository
arrow Use Apache Arrow functions within DuckDB INSTALL arrow; LOAD arrow; Link
aws Use AWS credentials from the credential chain INSTALL aws; LOAD aws; Link
iceberg Use Apache Iceberg tables with DuckDB INSTALL iceberg; LOAD iceberg; Link
postgres_scanner Access Postgres databases from DuckDB INSTALL postgres_scanner; LOAD postgres_scanner; Link
spatial Use spatial features from GDAL and GEOS INSTALL spatial; LOAD spatial; Link
sqlite_scanner Access SQLite databases from DuckDB INSTALL sqlite_scanner; LOAD sqlite_scanner; Link
substrait Use and generate Substrait query plans INSTALL substrait; LOAD substrait; Link