dump_hash

dump_hash #

Description #

The dump_hash processor is used to export index documents of a cluster and calculate the hash value.

Configuration Example #

A simple example is as follows:

pipeline:
- name: bulk_request_ingest
  auto_start: true
  keep_running: true
  processor:
    - dump_hash: #dump es1's doc
        indices: "medcl-dr3"
        scroll_time: "10m"
        elasticsearch: "source"
        query: "field1:elastic"
        fields: "doc_hash"
        output_queue: "source_docs"
        batch_size: 10000
        slice_size: 5

Parameter Description #

NameTypeDescription
elasticsearchstringName of a target cluster
scroll_timestringScroll session timeout duration
batch_sizeintScroll batch size, which is set to 5000 by default
slice_sizeintSlice size, which is set to 1 by default
sort_typestringDocument sorting type, which is set to asc by default
sort_fieldstringDocument sorting field
indicesstringIndex
levelstringRequest processing level, which can be set to cluster, indicating that node- and shard-level splitting are not performed on requests. It is applicable to scenarios in which there is a proxy in front of Elasticsearch.
querystringQuery filter conditions
fieldsstringList of fields to be returned
sort_document_fieldsboolWhether to sort fields in _source before the hash value is calculated. The default value is false.
hash_funcstringHash function, which can be set to xxhash32, xxhash64, or fnv1a. The default value is xxhash32.
output_queuestringName of a queue that outputs results