Skip to main content

xls2ymljob

Synopsis

starlake xls2ymljob [options]

Description

Generate Starlake YAML transform/job configuration files from Excel spreadsheets. This is a convenience alias for xls2yml --job=true.

Business analysts define transform tasks in Excel (target tables, write strategies, partitioning, etc.), and data engineers generate the corresponding Starlake YAML task files automatically. The SQL logic is written separately — this command only generates the YAML metadata.

See also: xls2yml (domain/schema variant), yml2xls (reverse operation).

Parameters

ParameterCardinalityDescription
--files <value>RequiredList of Excel files (.xlsx or .xls) describing job/task definitions. Can also be a directory (recursively finds all .xlsx files).
--iamPolicyTagsFile <value>OptionalPath to an Excel file for generating IAM PolicyTags YAML.
--outputDir <value>OptionalPath for saving the resulting YAML files. Defaults to the Starlake transform path.
--policyFile <value>OptionalExcel file for centralizing ACL & RLS definitions.

Excel Workbook Structure

The Excel file must contain a schemas sheet defining the transform tasks.

Sheet: schemas

Each row defines one transform task. Maps to a {job_name}.sl.yml file.

ColumnRequiredDescription
_jobYesJob/task name
_domainYesTarget domain (output database schema/dataset)
_nameYesTarget table name
_writeNoWrite strategy: OVERWRITE, APPEND, OVERWRITE_BY_PARTITION, UPSERT_BY_KEY, UPSERT_BY_KEY_AND_TIMESTAMP, SCD2, DELETE_THEN_INSERT
_partitionNoPartition column
_descriptionNoJob/task description
_databaseNoTarget database name
_clusteringNoComma-separated clustering columns
_tagsNoComma-separated tags
_presqlNoPre-SQL statements (separate multiple with ###)
_postsqlNoPost-SQL statements (separate multiple with ###)
_sinkNoSink type: BQ, FS, JDBC, ES, KAFKA
_sinkConnectionRefNoConnection name for the sink
_connectionRefNoRun engine connection name/type
_formatNoOutput format for FS sink (e.g., csv, parquet)
_extensionNoFile extension for FS sink
_coalesceNoCoalesce output into a single file (true/false, FS sink only)
_optionsNoSink options as key=value pairs (e.g., opt1=val1,opt2=val2)
_dagRefNoDAG reference for orchestration
_freshnessNoFreshness rules as key=value pairs
_policyNoAccess policy names (comma-separated)

Attribute Sheets (optional)

For each task, you can optionally create a sheet matching the _name value to define output attribute descriptions.

ColumnRequiredDescription
_nameYesAttribute/column name
_descriptionNoAttribute description

Example

Given an Excel file analytics_jobs.xlsx with a schemas sheet:

_job_domain_name_write_dagRef
daily_revenueanalyticsrevenue_by_productOVERWRITEdaily_dag
customer_ltvanalyticscustomer_lifetime_valueSCD2weekly_dag
starlake xls2ymljob --files analytics_jobs.xlsx

This generates:

metadata/transform/analytics/
├── daily_revenue.sl.yml # Task definition for revenue_by_product
└── customer_ltv.sl.yml # Task definition for customer_lifetime_value

Each generated .sl.yml contains the task metadata (write strategy, partitioning, etc.). The actual SQL query must be provided separately as a .sql file alongside the YAML.