debezium sqlserver配置

Debezium SQL Server连接器具有许多配置属性,可用于为应用程序实现正确的连接器行为。许多属性都有默认值。 除非提供默认值,否则以下配置属性是必需的。

Property Default Description
name Unique name for the connector. Attempting to register again with the same name will fail. (This property is required by all Kafka Connect connectors.)
connector.class The name of the Java class for the connector. Always use a value of io.debezium.connector.sqlserver.SqlServerConnector for the SQL Server connector.
tasks.max 1 The maximum number of tasks that should be created for this connector. The SQL Server connector always uses a single task and therefore does not use this value, so the default is always acceptable.
database.hostname IP address or hostname of the SQL Server database server.
database.port 1433 Integer port number of the SQL Server database server.
database.user Username to use when connecting to the SQL Server database server.
database.password Password to use when connecting to the SQL Server database server.
database.dbname The name of the SQL Server database from which to stream the changes
database.server.name Logical name that identifies and provides a namespace for the SQL Server database server that you want Debezium to capture. The logical name should be unique across all other connectors, since it is used as a prefix for all Kafka topic names emanating from this connector. Only alphanumeric characters, hyphens and underscores must be used.
database.history.kafka.topic The full name of the Kafka topic where the connector will store the database schema history.
database.history.kafka.bootstrap.servers A list of host and port pairs that the connector will use for establishing an initial connection to the Kafka cluster. This connection is used for retrieving database schema history previously stored by the connector, and for writing each DDL statement read from the source database. This should point to the same Kafka cluster used by the Kafka Connect process.
table.include.list An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables that you want Debezium to capture; any table that is not included in table.include.list is excluded from capture. Each identifier is of the form schemaName.tableName. By default, the connector captures all non-system tables for the designated schemas. Must not be used with table.exclude.list.
table.exclude.list An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; Debezium captures all tables that are not included in table.exclude.list. Each identifier is of the form schemaName.tableName. Must not be used with table.include.list.
column.include.list empty string An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be included in the change event message values. Fully-qualified names for columns are of the form schemaName.tableName.columnName. Note that primary key columns are always included in the event’s key, even if not included in the value. Do not also set the column.exclude.list property.
column.exclude.list empty string An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form schemaName.tableName.columnName. Note that primary key columns are always included in the event’s key, also if excluded from the value. Do not also set the column.include.list property.
column.mask.hash._hashAlgorithm_.with.salt._salt_ n/a An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be pseudonyms in the change event message values with a field value consisting of the hashed value using the algorithm *hashAlgorithm* and salt *salt*. Based on the used hash function referential integrity is kept while data is pseudonymized. Supported hash functions are described in the MessageDigest section of the Java Cryptography Architecture Standard Algorithm Name Documentation. The hash is automatically shortened to the length of the column.Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer or zero. Fully-qualified names for columns are of the form schemaName.tableName.columnName.Example:column.mask.hash.SHA-256.with.salt.CzQMA0cB5K = dbo.orders.customerName, dbo.shipment.customerNamewhere CzQMA0cB5K is a randomly selected salt.Note: Depending on the *hashAlgorithm* used, the *salt* selected and the actual data set, the resulting masked data set may not be completely anonymized.
time.precision.mode adaptive Time, date, and timestamps can be represented with different kinds of precision, including: adaptive (the default) captures the time and timestamp values exactly as in the database using either millisecond, microsecond, or nanosecond precision values based on the database column’s type; or connect always represents time and timestamp values using Kafka Connect’s built-in representations for Time, Date, and Timestamp, which uses millisecond precision regardless of the database columns' precision. See temporal values.
decimal.handling.mode precise Specifies how the connector should handle values for DECIMAL and NUMERIC columns: precise (the default) represents them precisely using java.math.BigDecimal values represented in change events in a binary form. double represents them using double values, which may result in a loss of precision but is easier to use. string encodes values as formatted strings, which is easy to consume but semantic information about the real type is lost.
include.schema.changes true Boolean value that specifies whether the connector should publish changes in the database schema to a Kafka topic with the same name as the database server ID. Each schema change is recorded with a key that contains the database name and a value that is a JSON structure that describes the schema update. This is independent of how the connector internally records database history. The default is true.
tombstones.on.delete true Controls whether a delete event is followed by a tombstone event. true - a delete operation is represented by a delete event and a subsequent tombstone event. false - only a delete event is emitted. After a source record is deleted, emitting a tombstone event (the default behavior) allows Kafka to completely delete all events that pertain to the key of the deleted row in case log compaction is enabled for the topic.
column.truncate.to._length_.chars n/a An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be truncated in the change event message values if the field values are longer than the specified number of characters. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer. Fully-qualified names for columns are of the form schemaName.tableName.columnName.
column.mask.with._length_.chars n/a An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be replaced in the change event message values with a field value consisting of the specified number of asterisk (*) characters. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer or zero. Fully-qualified names for columns are of the form schemaName.tableName.columnName.
column.propagate.source.type n/a An optional comma-separated list of regular expressions that match the fully-qualified names of columns whose original type and length should be added as a parameter to the corresponding field schemas in the emitted change messages. The schema parameters __debezium.source.column.type, __debezium.source.column.length and __debezium.source.column.scale is used to propagate the original type name and length (for variable-width types), respectively. Useful to properly size corresponding columns in sink databases. Fully-qualified names for columns are of the form schemaName.tableName.columnName.
datatype.propagate.source.type+ n/a An optional comma-separated list of regular expressions that match the database-specific data type name of columns whose original type and length should be added as a parameter to the corresponding field schemas in the emitted change messages. The schema parameters __debezium.source.column.type, __debezium.source.column.length and __debezium.source.column.scale will be used to propagate the original type name and length (for variable-width types), respectively. Useful to properly size corresponding columns in sink databases. Fully-qualified data type names are of the form schemaName.tableName.typeName. See SQL Server data types for the list of SQL Server-specific data type names.
message.key.columns n/a A semi-colon list of regular expressions that match fully-qualified tables and columns to map a primary key. Each item (regular expression) must match the fully-qualified <fully-qualified table>:<a comma-separated list of columns> representing the custom key. Fully-qualified tables could be defined as schemaName.tableName.
binary.handling.mode bytes Specifies how binary (binary, varbinary) columns should be represented in change events, including: bytes represents binary data as byte array (default), base64 represents binary data as base64-encoded String, hex represents binary data as hex-encoded (base16) String
database.include.list 仅监控哪个库的更改,默认监控所有数据库,可配置为正则表达式
database.exclude.list 不监控哪个库的更改,不能和database.include.list同时使用

以下高级配置属性具有良好的默认设置,可以在大多数情况下使用,因此很少需要在连接器的配置中指定。

Property Default Description
snapshot.mode initial A mode for taking an initial snapshot of the structure and optionally data of captured tables. Once the snapshot is complete, the connector will continue reading change events from the database’s redo logs. The following values are supported:initial: Takes a snapshot of structure and data of captured tables; useful if topics should be populated with a complete representation of the data from the captured tables. initial_only: Takes a snapshot of structure and data like initial but instead does not transition into streaming changes once the snapshot has completed. schema_only: Takes a snapshot of the structure of captured tables only; useful if only changes happening from now onwards should be propagated to topics.
snapshot.include.collection.list All tables specified in table.include.list An optional, comma-separated list of regular expressions that match names of fully-qualified table names (<db-name>.<schema-name>.<name>) included in table.include.list for which you want to take the snapshot.
snapshot.isolation.mode repeatable_read Mode to control which transaction isolation level is used and how long the connector locks tables that are designated for capture. The following values are supported:read_uncommitted``read_committed``repeatable_read``snapshot``exclusive (exclusive mode uses repeatable read isolation level, however, it takes the exclusive lock on all tables to be read). The snapshot, read_committed and read_uncommitted modes do not prevent other transactions from updating table rows during initial snapshot. The exclusive and repeatable_read modes do prevent concurrent updates. Mode choice also affects data consistency. Only exclusive and snapshot modes guarantee full consistency, that is, initial snapshot and streaming logs constitute a linear history. In case of repeatable_read and read_committed modes, it might happen that, for instance, a record added appears twice - once in initial snapshot and once in streaming phase. Nonetheless, that consistency level should do for data mirroring. For read_uncommitted there are no data consistency guarantees at all (some data might be lost or corrupted).
event.processing.failure.handling.mode fail Specifies how the connector should react to exceptions during processing of events. fail will propagate the exception (indicating the offset of the problematic event), causing the connector to stop. warn will cause the problematic event to be skipped and the offset of the problematic event to be logged. skip will cause the problematic event to be skipped.
poll.interval.ms 1000 Positive integer value that specifies the number of milliseconds the connector should wait during each iteration for new change events to appear. Defaults to 1000 milliseconds, or 1 second.
max.queue.size 8192 Positive integer value that specifies the maximum size of the blocking queue into which change events read from the database log are placed before they are written to Kafka. This queue can provide backpressure to the CDC table reader when, for example, writes to Kafka are slower or if Kafka is not available. Events that appear in the queue are not included in the offsets periodically recorded by this connector. Defaults to 8192, and should always be larger than the maximum batch size specified in the max.batch.size property.
max.batch.size 2048 Positive integer value that specifies the maximum size of each batch of events that should be processed during each iteration of this connector. Defaults to 2048.
heartbeat.interval.ms 0 Controls how frequently heartbeat messages are sent. This property contains an interval in milliseconds that defines how frequently the connector sends messages to a heartbeat topic. The property can be used to confirm whether the connector is still receiving change events from the database. You also should leverage heartbeat messages in cases where only records in non-captured tables are changed for a longer period of time. In such situation the connector would proceed to read the log from the database but never emit any change messages into Kafka, which in turn means that no offset updates are committed to Kafka. This may result in more change events to be re-sent after a connector restart. Set this parameter to 0 to not send heartbeat messages at all. Disabled by default.
heartbeat.topics.prefix __debezium-heartbeat Controls the naming of the topic to which heartbeat messages are sent. The topic is named according to the pattern <heartbeat.topics.prefix>.<server.name>.
snapshot.delay.ms An interval in milli-seconds that the connector should wait before taking a snapshot after starting up; Can be used to avoid snapshot interruptions when starting multiple connectors in a cluster, which may cause re-balancing of connectors.
snapshot.fetch.size 2000 Specifies the maximum number of rows that should be read in one go from each table while taking a snapshot. The connector will read the table contents in multiple batches of this size. Defaults to 2000.
query.fetch.size Specifies the number of rows that will be fetched for each database round-trip of a given query. Defaults to the JDBC driver’s default fetch size.
snapshot.lock.timeout.ms 10000 An integer value that specifies the maximum amount of time (in milliseconds) to wait to obtain table locks when performing a snapshot. If table locks cannot be acquired in this time interval, the snapshot will fail (also see snapshots). When set to 0 the connector will fail immediately when it cannot obtain the lock. Value -1 indicates infinite waiting.
snapshot.select.statement.overrides Controls which rows from tables are included in snapshot. This property contains a comma-separated list of fully-qualified tables (SCHEMA_NAME.TABLE_NAME). Select statements for the individual tables are specified in further configuration properties, one for each table, identified by the id snapshot.select.statement.overrides.[SCHEMA_NAME].[TABLE_NAME]. The value of those properties is the SELECT statement to use when retrieving data from the specific table during snapshotting. A possible use case for large append-only tables is setting a specific point where to start (resume) snapshotting, in case a previous snapshotting was interrupted. Note: This setting has impact on snapshots only. Events captured during log reading are not affected by it.
source.struct.version v2 Schema version for the source block in CDC events; Debezium 0.10 introduced a few breaking changes to the structure of the source block in order to unify the exposed structure across all the connectors. By setting this option to v1 the structure used in earlier versions can be produced. Note that this setting is not recommended and is planned for removal in a future Debezium version.
sanitize.field.names true when connector configuration explicitly specifies the key.converter or value.converter parameters to use Avro, otherwise defaults to false. Whether field names are sanitized to adhere to Avro naming requirements. See Avro naming for more details.
database.server.timezone Timezone of the server.This property defines the timezone of the transaction timestamp (ts_ms) that is retrieved from the server (which is actually not zoned). By default, the value is unset. Set a value for the property only when running on SQL Server 2014 or older, and the database server and the JVM running the Debezium connector use different timezones.When unset, default behavior is to use the timezone of the VM running the Debezium connector. In this case, when running on on SQL Server 2014 or older and using different timezones on server and the connector, incorrect ts_ms values may be produced. Possible values include “Z”, “UTC”, offset values like “+02:00”, short zone ids like “CET”, and long zone ids like “Europe/Paris”.
provide.transaction.metadata false When set to true Debezium generates events with transaction boundaries and enriches data events envelope with transaction metadata.See Transaction Metadata for additional details.
retriable.restart.connector.wait.ms 10000 (10 seconds) The number of milli-seconds to wait before restarting a connector after a retriable error occurs.
skipped.operations comma-separated list of operation types that will be skipped during streaming. The operations include: c for inserts/create, u for updates, and d for deletes. By default, no operations are skipped.
signal.data.collection Fully-qualified name of the data collection that is used to send signals to the connector. The name format is database_name.schema-name.table-name.

Debezium 1.3.0 以后把关于 “白名单” 与 “黑名单"的一些术语进行了修改

旧名称 新名称
database.whitelist database.include.list
database.blacklist database.exclude.list
schema.whitelist schema.include.list
schema.blacklist schema.exclude.list
table.whitelist table.include.list
table.blacklist table.exclude.list
column.whitelist column.include.list
column.blacklist column.exclude.list