Skip to main content

MotherDuck

Install dlt with MotherDuckโ€‹

To install the dlt library with MotherDuck dependencies:

pip install "dlt[motherduck]"
tip

If you see a lot of retries in your logs with various timeouts, decrease the number of load workers to 3-5 depending on the quality of your internet connection. Add the following to your config.toml:

[load]
workers=3

or export the LOAD__WORKERS=3 env variable. See more in performance

Setup Guideโ€‹

1. Initialize a project with a pipeline that loads to MotherDuck by running

dlt init chess motherduck

2. Install the necessary dependencies for MotherDuck by running

pip install -r requirements.txt

This will install dlt with the motherduck extra which contains duckdb and pyarrow dependencies.

3. Add your MotherDuck token to .dlt/secrets.toml

[destination.motherduck.credentials]
database = "dlt_data_3"
password = "<your token here>"

Paste your service token into the password field. The database field is optional, but we recommend setting it. MotherDuck will create this database (in this case dlt_data_3) for you.

Alternatively, you can use the connection string syntax.

[destination]
motherduck.credentials="md:///dlt_data_3?token=<my service token>"
tip

Motherduck now supports configurable access tokens. Please refer to the documentation

4. Run the pipeline

python3 chess_pipeline.py

Motherduck connection identifierโ€‹

We enable Motherduck to identify that the connection is created by dlt. Motherduck will use this identifier to better understand the usage patterns associated with dlt integration. The connection identifier is dltHub_dlt/DLT_VERSION(OS_NAME).

Write dispositionโ€‹

All write dispositions are supported.

Data loadingโ€‹

By default, Parquet files and the COPY command are used to move files to the remote duckdb database. All write dispositions are supported.

The INSERT format is also supported and will execute large INSERT queries directly into the remote database. This method is significantly slower and may exceed the maximum query size, so it is not advised.

dbt supportโ€‹

This destination integrates with dbt via dbt-duckdb, which is a community-supported package. dbt version >= 1.7 is required

Multi-statement transaction supportโ€‹

Motherduck supports multi-statement transactions. This change happened with duckdb 0.10.2.

Syncing of dlt stateโ€‹

This destination fully supports dlt state sync.

Troubleshootingโ€‹

My database is attached in read only modeโ€‹

ie. Error: Invalid Input Error: Cannot execute statement of type "CREATE" on database "dlt_data" which is attached in read-only mode! We encountered this problem for databases created with duckdb 0.9.x and then migrated to 0.10.x. After switch to 1.0.x on Motherduck, all our databases had permission "read-only" visible in UI. We could not figure out how to change it so we dropped and recreated our databases.

I see some exception with home_dir missing when opening md: connection.โ€‹

Some internal component (HTTPS) requires the HOME env variable to be present. Export such a variable to the command line. Here is what we do in our tests:

os.environ["HOME"] = "/tmp"

before opening the connection.

Additional Setup guidesโ€‹

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub โ€“ it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.