Skip to content
Transformations

How do I run a DuckDB transformation?

Create, configure, and run a DuckDB SQL transformation in Keboola from start to finish — create the configuration, map input, write the SQL, map output, run it, and confirm the result landed in Storage.

You have a table in Keboola Storage and you want to transform it with DuckDB SQL and write the result back to Storage. This page takes you from nothing to a successful run using a small worked example. For all settings and syntax rules, see the reference; for when to choose DuckDB, see the explanation.

Time: ~10 minutes · You will need: a Keboola project where you can create configurations, and one table in Storage to read from.

Get a table into Storage to use as the input. If you do not have one handy, upload the sample CSV file as a new table — the example SQL below expects a sample table with order_date and order_amount columns.

  1. Open Components → Transformations and click New Transformation.
  2. Select DuckDB Transformation.
  3. Name it, optionally add a description and folder, and click Create Transformation.
  1. In Input Mapping, add your Storage table.
  2. Set its Destination (staging table name) to sample.
  3. Save the mapping.

In the code editor, paste:

CREATE TABLE "output" AS
SELECT "order_date", SUM("order_amount") AS "sum_orders_amount"
FROM "sample"
GROUP BY "order_date";

End every statement with a semicolon (;). Quote identifiers that need exact case ("sample"). You can split longer scripts into blocks, which DuckDB runs with automatic dependency analysis (see block-based orchestration).

If SUM() fails with a type error, your input is loading as VARCHAR. Either cast explicitly, or enable Infer input table data types — see Step: typed inputs.

  1. In Output Mapping, set Source to output (the table the script creates).
  2. Set Destination to a new Storage table, for example out.c-main.orders.
  3. Save the mapping.
  1. Click Run on the transformation.
  2. Wait for the job to finish with a success status.
  3. Open Storage, find your destination table, and confirm it has one row per order_date with the summed amount.

By default, input columns load as VARCHAR, so numeric and date functions need explicit casts. To use real types directly, enable Infer input table data types in the configuration settings — DuckDB then detects types like INTEGER, FLOAT, and DATE. See Infer input table data types.

If the job is slow or runs out of memory, raise the Backend size (XSmall → Small → Medium → Large). The sizes and their memory are listed in the reference. For datasets over 10 GB, also see memory management.

You can validate without a full run using sync actions — for example Syntax check to catch SQL errors, or Expected input tables to confirm the inputs your script references.

SymptomLikely causeFix
Syntax error between statementsMissing semicolonEnd every statement with ; (reference).
SUM()/aggregation fails on a columnInput loaded as VARCHARCast explicitly, or enable Infer input table data types.
table not found for a mixed-case nameUnquoted name folded to lowercaseQuote the identifier ("MyTable"); see case sensitivity.
Run succeeds but nothing in StorageMissing/incorrect output mappingAdd an output mapping whose Source matches the table the script created (output).
Ask Kai

Ask anything about Keboola — I'll search the docs and cite the pages I use.