Transformations

Python transformation reference

Lookup reference for Python transformations in Keboola — runtime environment and limits, file locations, script requirements, installing packages, reading and writing CSV, and backend sizes.

Reference material for Python transformations. To create and run one, see the how-to.

Environment

The Python script runs in an isolated environment. The Python version is updated regularly, a few weeks after the official release; updates are announced on the status page.

Limits

Resource	Limit
Memory	8 GB
Max running time	6 hours
CPU	Equivalent of two 2.3 GHz processors

File locations

The script is compiled to /data/script.py.
Mapped input/output tables: relative in/tables/file.csv, out/tables/file.csv or absolute /data/in/tables/file.csv, /data/out/tables/file.csv.
Downloaded files: in/files/ (or /data/in/files/).
Temporary files: /tmp/. Do not use /data/ for files you don’t want exchanged with Keboola.

See the full Common Interface specification.

Script requirements

Python is sensitive to indentation — do not mix tabs and spaces. Files are assumed UTF-8 (# coding=utf-8 is not needed). No main function is required:

print("Hello Keboola")

If you define a main function, do not wrap it in if __name__ == '__main__': (it will not run) — just call it:

def main():
    print("Hello Keboola")

main()

You can organize the script into blocks.

Packages

List extra packages in the UI; they are installed with pip from PyPI. Some packages have external dependencies that may not be available — contact support if you hit problems. After install, you still need to import them.

The latest versions are installed at release time. To pin a version, force-reinstall it from your code:

import subprocess
import sys
subprocess.call([sys.executable, '-m', 'pip', 'install', '--disable-pip-version-check', '--no-cache-dir', '--force-reinstall', 'pandas==0.20.0'])

Some packages are preinstalled and don’t need to be listed.

Reading and writing CSV

Input tables arrive as CSV in in/tables/; write outputs to out/tables/. Read with the standard csv module; specifying formatting options explicitly is recommended. Process line-by-line for memory efficiency.

Dictionaries (named columns):

import csv

csvlt = '\n'
csvdel = ','
csvquo = '"'
with open('in/tables/source.csv', mode='rt', encoding='utf-8') as in_file, open('out/tables/result.csv', mode='wt', encoding='utf-8') as out_file:
    writer = csv.DictWriter(out_file, fieldnames=['col1', 'col2'], lineterminator=csvlt, delimiter=csvdel, quotechar=csvquo)
    writer.writeheader()

    lazy_lines = (line.replace('\0', '') for line in in_file)
    reader = csv.DictReader(lazy_lines, lineterminator=csvlt, delimiter=csvdel, quotechar=csvquo)
    for row in reader:
        writer.writerow({'col1': row['first'] + 'ping', 'col2': int(row['second']) * 42})

The generator lazy_lines = (line.replace('\0', '') for line in in_file) strips null characters. Always use encoding='utf-8'.

Lists (numbered columns):

import csv

with open('/data/in/tables/source.csv', mode='rt', encoding='utf-8') as in_file, open('/data/out/tables/result.csv', mode='wt', encoding='utf-8') as out_file:
    writer = csv.writer(out_file, lineterminator='\n', delimiter=',', quotechar='"')
    lazy_lines = (line.replace('\0', '') for line in in_file)
    reader = csv.reader(lazy_lines, lineterminator='\n', delimiter=',', quotechar='"')
    for row in reader:
        writer.writerow([row[0] + 'ping', int(row[1]) * 42])

Preinstalled kbc dialect (simplifies the format options):

import csv

with open('/data/in/tables/source.csv', mode='rt', encoding='utf-8') as in_file, open('/data/out/tables/result.csv', mode='wt', encoding='utf-8') as out_file:
    lazy_lines = (line.replace('\0', '') for line in in_file)
    reader = csv.DictReader(lazy_lines, dialect='kbc')
    writer = csv.DictWriter(out_file, dialect='kbc', fieldnames=reader.fieldnames)
    writer.writeheader()
    for row in reader:
        writer.writerow({"first": row['first'] + 'ping', "second": int(row['second']) * 42})

To register the kbc dialect locally: csv.register_dialect('kbc', lineterminator='\n', delimiter=',', quotechar='"').

Backend sizes (dynamic backends)

A larger backend allocates more resources for long or heavy transformations. Available sizes:

Size
XSmall
Small	Default
Medium
Large

Scaling up impacts time-credit consumption. Dynamic backends are not available on the Free Plan (Pay As You Go).

✨ Python transformation reference