Skip to main content

extract-data

Synopsis

starlake extract-data [options]

Description

Extract data from a JDBC database into CSV or Parquet files. Supports incremental extraction, parallel partitioning, and selective table filtering for efficient large-scale data exports. See Extract Tutorial.

Parameters

ParameterCardinalityDescription
--config <value>RequiredDatabase tables & connection info
--limit <value>OptionalLimit number of records
--numPartitions <value>Optionalparallelism level regarding partitionned tables
--parallelism <value>Optionalparallelism level of the extraction process. By default equals to the available cores: 16
--ignoreExtractionFailure <value>OptionalDon't fail extraction job when any extraction fails.
--clean <value>OptionalClean all files of table only when it is extracted.
--outputDir <value>RequiredWhere to output csv files
--incremental <value>OptionalExport only new data since last extraction.
--ifExtractedBefore <value>OptionalDateTime to compare with the last beginning extraction dateTime. If it is before that date, extraction is done else skipped.
--includeSchemas schema1,schema2OptionalDomains to include during extraction.
--excludeSchemas schema1,schema2...OptionalDomains to exclude during extraction. if include-domains is defined, this config is ignored.
--includeTables table1,table2,table3...OptionalSchemas to include during extraction.
--excludeTables table1,table2,table3...OptionalSchemas to exclude during extraction. if include-schemas is defined, this config is ignored.
-- <value>Optional