Run DuckDB serverlessly for querying remote and local data. With access to all currently available extensions.
You can leverage DuckDB for your data applications on AWS in multiple ways. If you'd like to expose an API that provides dynamic query capabilities for data in S3 or remote locations, the serverless-duckdb project could come handy.
Reparitioning data in S3 data lakes is also a common use case that DuckDB supports (even better than AWS Athena) Therefore, serverless-parquet-repartitioner was created.
Or, just build something completely custom by leveraging the pre-build Lambda layers
As DuckDB doesn't currently build the extensions for older Linux variants like Amazon Linux 2 which AWS Lambda uses (this results in GLIBC incompatibilities when trying to load/install them), it was necessary to build them separately. Luckily, DuckDB offers a way to specify another extension repository URL as source:
SET custom_extension_repository = 'http://extensions.quacking.cloud';
Once you set the custom extension repository, you can dynmically install and load the custom DuckDB extensions for different use cases. For an overview, have a look at the table of usabel extensions below.
Extension Name | Description | Install | Load | Repository |
---|---|---|---|---|
arrow | Use Apache Arrow functions within DuckDB |
INSTALL arrow;
|
LOAD arrow;
|
Link |
aws | Use AWS credentials from the credential chain |
INSTALL aws;
|
LOAD aws;
|
Link |
iceberg | Use Apache Iceberg tables with DuckDB |
INSTALL iceberg;
|
LOAD iceberg;
|
Link |
postgres_scanner | Access Postgres databases from DuckDB |
INSTALL postgres_scanner;
|
LOAD postgres_scanner;
|
Link |
spatial | Use spatial features from GDAL and GEOS |
INSTALL spatial;
|
LOAD spatial;
|
Link |
sqlite_scanner | Access SQLite databases from DuckDB |
INSTALL sqlite_scanner;
|
LOAD sqlite_scanner;
|
Link |
substrait | Use and generate Substrait query plans |
INSTALL substrait;
|
LOAD substrait;
|
Link |