Location>code7788 >text

Tossing Quickwit, a distributed search engine written in Rust - ingesting data from different sources

Popularity:671 ℃/2024-08-28 10:04:31

image

Uptake API

In this section of the tutorial, we will cover how to send data to Quickwit using the Ingest API.

To follow this tutorial, you will need to have aLocal instances of QuickwitIt's running.

  • /docs/get-started/installation

To start it, run in a terminal./quickwit run

Creating Indexes

First, we create a schema-less index.

# Create the index config file.
cat << EOF > 
version: 0.7
index_id: *-schemaless
doc_mapping:
  mode: dynamic
indexing_settings:
  commit_timeout_secs: 30
EOF
# Use the CLI to create the index...
./quickwit index create --index-config 
# Or with cURL.
curl -XPOST -H 'Content-Type: application/yaml' 'http://localhost:7280/api/v1/indexes' --data-binary @

Ingesting data

Let's start by downloading* datasetA sample of the

  • /*/stacksample
# Download the first 10_000 * posts articles.
curl -O https://quickwit-datasets-public./

You can use the command line interface or cURL to send data. The command line interface is more convenient for sending a few gigabytes of data, because when the Ingest queue is full, Quickwit might return429 Response. In this case, the Quickwit command line interface will automatically retry sending.

# Ingest the first 10_000 * posts articles with the CLI...
./quickwit index ingest --index *-schemaless --input-path  --force

# OR with cURL.
curl -XPOST -H 'Content-Type: application/json' 'http://localhost:7280/api/v1/*-schemaless/ingest?commit=force' --data-binary @

Execute a search query

Now you can perform a search on the index.

curl 'http://localhost:7280/api/v1/*-schemaless/search?query=body:python'

Clearance source (optional)

curl -XDELETE 'http://localhost:7280/api/v1/indexes/*-schemaless'

This completes the tutorial. You can now move on to the next tutorial.

local document

In this section of the tutorial, we will cover how to index local files using the Quickwit command line interface.

To follow this tutorial, you will need to haveQuickwit Binary

  • /docs/main-branch/get-started/installation

Creating Indexes

First, let's create a schema-less index. We need to start the Quickwit server just to create the index, so we will start it and shut it down later.

Start the Quickwit server.

./quickwit run

Create an index in another terminal.

# Create the index config file.
cat << EOF > 
version: 0.7
index_id: *-schemaless
doc_mapping:
  mode: dynamic
indexing_settings:
  commit_timeout_secs: 30
EOF

./quickwit index create --index-config 

You can now do this in the first terminal by pressing theCtrl+C to shut down the server.

Ingesting documents

To send a file, simply execute the following command:

./quickwit tool local-ingest --index *-schemaless --input-path 

After a few seconds, you should see the following output:

❯ Ingesting documents locally...

---------------------------------------------------
 Connectivity checklist
 ✔ metastore
 ✔ storage
 ✔ _ingest-cli-source

 Num docs   10000 Parse errs     0 PublSplits   1 Input size     6MB Thrghput  3.34MB/s Time 00:00:02
 Num docs   10000 Parse errs     0 PublSplits   1 Input size     6MB Thrghput  2.23MB/s Time 00:00:03
 Num docs   10000 Parse errs     0 PublSplits   1 Input size     6MB Thrghput  1.67MB/s Time 00:00:04

Indexed 10,000 documents in 4s.
Now, you can query the index with the following command:
quickwit index search --index *-schemaless --config ./config/ --query "my query"
Clearing local cache directory...
✔ Local cache directory cleared.
✔ Documents successfully indexed.

support sth. likes3://mybucket/ Such an object store URI is used as the--input-pathprovided that your environment is configured with the appropriate permissions.

Clearance source (optional)

That's it! Now you can clean up the created sources. You can do this by running the following command:

./quickwit run

in another terminal:

./quickwit index delete --index-id *-schemaless

This completes the tutorial. You can now move on to the next tutorial.

Kafka

In this tutorial, we'll cover how to set up Quickwit to ingest data from Kafka in a few minutes. First, we'll create an index and configure a Kafka source. Then, we'll create a Kafka topic and send some events from theGH Archive loaded into it. Finally, we will perform some search and aggregation queries to explore the newly sent data.

  • /

pre-conditions

To complete this tutorial, you will need the following:

  • Running Kafka cluster (see Kafka)Quick Start
    • /quickstart
  • Locally installed Quickwitguidebook
    • /docs/main-branch/get-started/installation

Creating Indexes

First, we create a new index. Below is the index configuration and document mapping corresponding to the GH Archive event schema:

#
# Index config file for gh-archive dataset.
#
version: 0.7

index_id: gh-archive

doc_mapping:
  field_mappings:
    - name: id
      type: text
      tokenizer: raw
    - name: type
      type: text
      fast: true
      tokenizer: raw
    - name: public
      type: bool
      fast: true
    - name: payload
      type: json
      tokenizer: default
    - name: org
      type: json
      tokenizer: default
    - name: repo
      type: json
      tokenizer: default
    - name: actor
      type: json
      tokenizer: default
    - name: other
      type: json
      tokenizer: default
    - name: created_at
      type: datetime
      fast: true
      input_formats:
        - rfc3339
      fast_precision: seconds
  timestamp_field: created_at

indexing_settings:
  commit_timeout_secs: 10

Execute these Bash commands to download the indexing configuration and create thegh-archive Index:

# Download GH Archive index config.
wget -O  /quickwit-oss/quickwit/main/config/tutorials/gh-archive/

# Create index.
./quickwit index create --index-config 

Create and populate a Kafka topic

Now, let's create a Kafka topic and load some events into it.

# Create a topic named `gh-archive` with 3 partitions.
bin/ --create --topic gh-archive --partitions 3 --bootstrap-server localhost:9092

# Download a few GH Archive files.
wget /2022-05-12-{10..15}.

# Load the events into Kafka topic.
gunzip -c 2022-05-12*. | \
bin/ --topic gh-archive --bootstrap-server localhost:9092

Creating a Kafka Source

This tutorial assumes that the Kafka cluster is locally available on the default port (9092).
If this is not the case, update accordingly Parameters.

#
# Kafka source config file.
#
version: 0.8
source_id: kafka-source
source_type: kafka
num_pipelines: 2
params:
  topic: gh-archive
  client_params:
    : localhost:9092

Run these commands to download the source configuration file and create the source.

# Download Kafka source config.
wget /quickwit-oss/quickwit/main/config/tutorials/gh-archive/

# Create source.
./quickwit source create --index gh-archive --source-config 

If you encounter the following error:

Command failed: Topic `gh-archive` has no partitions.

This means that the Kafka themegh-archive was not created correctly in the previous step.

Starting the indexing and search service

Finally, execute this command to start Quickwit in server mode.

# Launch Quickwit services.
./quickwit run

Behind the scenes, this command starts an indexer and a searcher. When started, the indexer will connect to the Kafka topic specified by the source and begin streaming and indexing events from the partitions that make up the topic. Using the default commit timeout value (seeIndex Setting), the indexer should publish the first slice in about 60 seconds.

  • /docs/configuration/index-config#indexing-settings

You can run this command in another shell to check the properties of the index and see the number of slices currently published:

# Display some general information about the index.
./quickwit index describe --index gh-archive

Once you have published your first slice, you can start running search queries. For example, we can find out all about the Kubernetesrepositoryof the event:

  • /kubernetes/kubernetes
curl 'http://localhost:7280/api/v1/gh-archive/search?query=:kubernetes%20AND%:kubernetes'

It can also be accessed through theQuickwit User Interface Access these results.

  • http://localhost:7280/ui/search?query=%3Akubernetes+AND+%3Akubernetes&index_id=gh-archive&max_hits=10

We can also group these events by type and count them:

curl -XPOST -H 'Content-Type: application/json' 'http://localhost:7280/api/v1/gh-archive/search' -d '
{
  "query":":kubernetes AND :kubernetes",
  "max_hits":0,
  "aggs":{
    "count_by_event_type":{
      "terms":{
        "field":"type"
      }
    }
  }
}'

Secure Kafka connection (optional)

Quickwit's Kafka source supports SSL and SASL authentication. This is especially useful for consuming data from external Kafka services.

The certificate and key files must exist on all Quickwit nodes in order for the Kafka source to be created and the indexing pipeline to run successfully.

SSL Configuration

version: 0.8
source_id: kafka-source-ssl
source_type: kafka
num_pipelines: 2
params:
  topic: gh-archive
  client_params:
    : 
    : SSL
    : /path/to/
    : /path/to/
    : /path/to/

SASL Configuration

version: 0.8
source_id: kafka-source-sasl
source_type: kafka
num_pipelines: 2
params:
  topic: gh-archive
  client_params:
    : 
    : /path/to/
    : SASL_SSL
    : SCRAM-SHA-256
    : your_sasl_username
    : your_sasl_password
If you encounter the following error:

Client creation error: failed: error:05880002:x509 certificate routines::system lib

This usually means that the path to the CA certificate is incorrect. Please update the Parameters.

Clearance source (optional)

Let's delete the files and sources created for this tutorial.

# Delete Kafka topic.
bin/ --delete --topic gh-archive --bootstrap-server localhost:9092

# Delete index.
./quickwit index delete --index gh-archive

# Delete source config.
rm 

This completes the tutorial. If you have any questions about Quickwit or run into any problems, don't hesitate to post them on theGitHub raise (an issue) at the last meetingconcern or openIssue reportsor directly in theDiscord Contact us on.

  • /quickwit-oss/quickwit
  • /quickwit-oss/quickwit/discussions
  • /quickwit-oss/quickwit/issues
  • /invite/MT27AG5EVE

Pulsar

In this tutorial, we'll cover how to set up Quickwit to ingest data from Pulsar in a few minutes. First, we'll create an index and configure a Pulsar source. Then, we'll create a Pulsar topic and pull some events from the data set loaded into it. Finally, we will perform some searches.

  • /*/stacksample

pre-conditions

To complete this tutorial, you will need the following:

  • locally runningQuickwit Example
    • /docs/main-branch/get-started/installation
  • locally runningPulsar Example
    • /docs/next/getting-started-standalone/

Quickwit Settings

downloading Quickwit and start a server. Then open a new terminal and execute the CLI commands using the same binary.

  • /docs/main-branch/get-started/installation
./quickwit run

Test that the cluster is running:

./quickwit index list

Pulsar Settings

Local

wget /dist/pulsar/pulsar-2.11.0/apache-pulsar-2.11.
tar xvfz apache-pulsar-2.11.
cd apache-pulsar-2.11.0
bin/pulsar standalone

Docker

docker run -it -p 6650:6650 -p 8080:8080 apachepulsar/pulsar:2.11.0 bin/pulsar standalone

See alsoofficial document The details of the

  • /docs/next/getting-started-docker/

Prepare Quickwit

First, we create a new index. Here is the index configuration and document mapping corresponding to the post schema:

#
# Index config file for Stack Overflow dataset.
#
version: 0.7

index_id: *

doc_mapping:
  field_mappings:
    - name: user
      type: text
      fast: true
      tokenizer: raw
    - name: tags
      type: array<text>
      fast: true
      tokenizer: raw
    - name: type
      type: text
      fast: true
      tokenizer: raw
    - name: title
      type: text
      tokenizer: default
      record: position
      stored: true
    - name: body
      type: text
      tokenizer: default
      record: position
      stored: true
    - name: questionId
      type: u64
    - name: answerId
      type: u64
    - name: acceptedAnswerId
      type: u64
    - name: creationDate
      type: datetime
      fast: true
      input_formats:
        - rfc3339
      fast_precision: seconds
  timestamp_field: creationDate

search_settings:
  default_search_fields: [title, body]

indexing_settings:
  commit_timeout_secs: 10

Execute these Bash commands to download the indexing configuration and create the* Index.

# Download * index config.
wget -O  /quickwit-oss/quickwit/main/config/tutorials/*/

# Create index.
./quickwit index create --index-config 

Creating a Pulsar Source

The Pulsar source only needs to define the list of topics and the instance address.

#
# Pulsar source config file.
#
version: 0.7
source_id: pulsar-source
source_type: pulsar
params:
  topics:
    - *
  address: pulsar://localhost:6650

Run these commands to download the source configuration file and create the source.

# Download Pulsar source config.
wget -O  /quickwit-oss/quickwit/main/config/tutorials/*/

# Create source.
./quickwit source create --index * --source-config 

Once the Pulsar source is created, the Quickwit control plane will request the indexer to start a new indexing pipeline. You can see a log similar to the following on the indexer:

INFO spawn_pipeline{index=* gen=0}:pulsar-consumer{subscription_name="quickwit-*-pulsar-source" params=PulsarSourceParams { topics: ["*"], address: "pulsar://localhost:6650", consumer_name: "quickwit", authentication: None } current_positions={}}: quickwit_indexing::source::pulsar_source: Seeking to last checkpoint positions. positions={}

Create and populate a Pulsar topic

We will use Pulsar's default tenant/namespacepublic/default. To populate the theme, we'll use a Python script:

import json
import pulsar

client = ('pulsar://localhost:6650')
producer = client.create_producer('public/default/*')

with open('', encoding='utf8') as file:
   for i, line in enumerate(file):
       (('utf-8'))
       if i % 100 == 0:
           print(f"{i}/10000 messages sent.", i)

()

To install a local Python client, for more information seedocumentation page

  • /docs/2./client-libraries-python/
# Download the first 10_000 * posts articles.
curl -O https://quickwit-datasets-public./

# Install pulsar python client.
# Requires a python version < 3.11
pip3 install 'pulsar-client==2.10.1'
wget /quickwit-oss/quickwit/main/config/tutorials/*/send_messages_to_pulsar.py
python3 send_messages_to_pulsar.py

Start searching!

You can run this command to check the properties of the index and see the number of splits and documents currently published:

# Display some general information about the index.
./quickwit index describe --index *

You will especially notice the number of documents published.

Now you can execute some queries.

curl 'http://localhost:7280/api/v1/*/search?query=search+AND+engine'

If your Quickwit server is local, you can access the results through the Quickwit UI atlocalhost:7280

  • http://localhost:7280/ui/search?query=&index_id=*&max_hits=10

Clearance source (optional)

Let's delete the files and sources created for this tutorial.

# Delete quickwit index.
./quickwit index delete --index * --yes
# Delete Pulsar topic.
bin/pulsar-admin topics delete *

This completes the tutorial. If you have any questions about Quickwit or run into any problems, don't hesitate to post them on theGitHub raise (an issue) at the last meetingconcern or openIssue reportsor directly in theDiscord Contact us on.

  • /quickwit-oss/quickwit
  • /quickwit-oss/quickwit/discussions
  • /quickwit-oss/quickwit/issues
  • /invite/MT27AG5EVE

Kinesis

In this tutorial, we'll cover how to set up Quickwit to ingest data from Kinesis in a few minutes. First, we'll create an index and configure a Kinesis source. Then, we'll create a Kinesis stream and send some events from theGH Archive loaded into it. Finally, we will perform some search and aggregation queries to explore the newly sent data.

  • /

There are some costs associated with using the Amazon Kinesis service in this tutorial.

pre-conditions

To complete this tutorial, you will need the following:

  • AWS CLI version 2 (seeGetting Started with the AWS CLI Understanding prerequisites and installation)
    • /cli/latest/userguide/
  • Locally installed Quickwitguidebook
    • /docs/main-branch/get-started/installation
  • jq
    • /jq/download/
  • GNU parallel
    • /software/parallel/

jq Used to reshape events into records that can be sent via the Amazon Kinesis API.

Creating Indexes

First, we create a new index. Below is the index configuration and document mapping corresponding to the GH Archive event schema:

#
# Index config file for gh-archive dataset.
#
version: 0.7

index_id: gh-archive

doc_mapping:
  field_mappings:
    - name: id
      type: text
      tokenizer: raw
    - name: type
      type: text
      fast: true
      tokenizer: raw
    - name: public
      type: bool
      fast: true
    - name: payload
      type: json
      tokenizer: default
    - name: org
      type: json
      tokenizer: default
    - name: repo
      type: json
      tokenizer: default
    - name: actor
      type: json
      tokenizer: default
    - name: other
      type: json
      tokenizer: default
    - name: created_at
      type: datetime
      fast: true
      input_formats:
        - rfc3339
      fast_precision: seconds
  timestamp_field: created_at

indexing_settings:
  commit_timeout_secs: 10

Execute these Bash commands to download the indexing configuration and create thegh-archive Index.

# Download GH Archive index config.
wget -O  /quickwit-oss/quickwit/main/config/tutorials/gh-archive/

# Create index.
./quickwit index create --index-config 

Creating and populating Kinesis streams

Now we create a Kinesis stream and load some events into it.

This step can be quite slow, depending on the available bandwidth. The current command limits the amount of data to be sent by taking only the first 10,000 lines of each file downloaded from GH Archive. If you have enough bandwidth, you can remove it to send the whole set of files. You can also limit the amount of data to be sent by increasing the number of slices and/orparallel Number of tasks started (-j option) to speed things up.

# Create a stream named `gh-archive` with 3 shards.
aws kinesis create-stream --stream-name gh-archive --shard-count 8

# Download a few GH Archive files.
wget /2022-05-12-{10..12}.

# Load the events into Kinesis stream
gunzip -c 2022-05-12*. | \
head -n 10000 | \
parallel --gnu -j8 -N 500 --pipe \
'jq --slurp -c "{\"Records\": [.[] | {\"Data\": (. | tostring), \"PartitionKey\": .id }], \"StreamName\": \"gh-archive\"}" > records-{%}.json && \
aws kinesis put-records --cli-input-json file://records-{%}.json --cli-binary-format raw-in-base64-out >> '

Creating a Kinesis Source

#
# Kinesis source config file.
#
version: 0.7
source_id: kinesis-source
source_type: kinesis
params:
  stream_name: gh-archive

Run these commands to download the source configuration file and create the source.

# Download Kinesis source config.
wget /quickwit-oss/quickwit/main/config/tutorials/gh-archive/

# Create source.
./quickwit source create --index gh-archive --source-config 

If this command fails with the following error message:

Command failed: Stream gh-archive under account XXXXXXXXX not found.

Caused by:
    0: Stream gh-archive under account XXXXXXXX not found.
    1: Stream gh-archive under account XXXXXXXX not found.

This means that the Kinesis stream was not created correctly in the previous step.

Starting the indexing and search service

Finally, execute this command to start Quickwit in server mode.

# Launch Quickwit services.
./quickwit run

Behind the scenes, this command starts an indexer and a searcher. When started, the indexer will connect to the Kinesis stream specified by the source and begin streaming processing and indexing events from the slices that make up the stream. Using the default commit timeout value (seeIndex Setting), the indexer should publish the first slice in about 60 seconds.

  • /docs/configuration/index-config#indexing-settings

You can run this command in another shell to check the properties of the index and see the number of slices currently published:

# Display some general information about the index.
./quickwit index describe --index gh-archive

It can also be accessed through theQuickwit User Interface Get indexing information.

  • http://localhost:7280/ui/indexes/gh-archive

Once you have published your first slice, you can start running search queries. For example, we can find out all about the Kubernetesrepositoryof the event:

  • /kubernetes/kubernetes
curl 'http://localhost:7280/api/v1/gh-archive/search?query=:kubernetes%20AND%:kubernetes'

It can also be accessed through theuser Access these results.

  • http://localhost:7280/ui/search?query=%3Akubernetes+AND+%3Akubernetes&index_id=gh-archive&max_hits=10

We can also group these events by type and count them:

curl -XPOST -H 'Content-Type: application/json' 'http://localhost:7280/api/v1/gh-archive/search' -d '
{
  "query":":kubernetes AND :kubernetes",
  "max_hits":0,
  "aggs":{
    "count_by_event_type":{
      "terms":{
        "field":"type"
      }
    }
  }
}'

Clearance source (optional)

Let's delete the files and sources created for this tutorial.

# Delete Kinesis stream.
aws kinesis delete-stream --stream-name gh-archive

# Delete index.
./quickwit index delete --index gh-archive

# Delete source config.
rm 

This completes the tutorial. If you have any questions about Quickwit or run into any problems, don't hesitate to post them on theGitHub raise (an issue) at the last meetingconcern or openIssue reportsor directly in theDiscord Contact us on.

  • /quickwit-oss/quickwit
  • /quickwit-oss/quickwit/discussions
  • /quickwit-oss/quickwit/issues
  • /invite/MT27AG5EVE

S3 with SQS notifications

In this tutorial, we describe how to set up Quickwit to ingest data from S3, where bucket notification events are streamed via SQS. We start by creating AWS sources (S3 buckets, SQS queues, notifications) using Terraform. Then we configure the Quickwit index and file source. Finally, we send some data to the source bucket and verify that it is properly indexed.

AWS Source

The complete Terraform script can be downloaded from thehere are Download.

  • /docs/assets/

First, create the bucket that receives the source data file (in NDJSON format):

resource "aws_s3_bucket" "file_source" {
  bucket_prefix = "qw-tuto-source-bucket"
}

The SQS queue is then set up to carry notifications when files are added to the bucket. The queue is configured with a policy that allows the source bucket to write S3 notification messages to it. A Dead Letter Queue (DLQ) is also created to receive messages that the file source cannot process (e.g. corrupted files). Messages are moved to the DLQ after 5 indexing attempts.

locals {
  sqs_notification_queue_name = "qw-tuto-s3-event-notifications"
}

data "aws_iam_policy_document" "sqs_notification" {
  statement {
    effect = "Allow"

    principals {
      type        = "*"
      identifiers = ["*"]
    }

    actions   = ["sqs:SendMessage"]
    resources = ["arn:aws:sqs:*:*:${local.sqs_notification_queue_name}"]

    condition {
      test     = "ArnEquals"
      variable = "aws:SourceArn"
      values   = [aws_s3_bucket.file_source.arn]
    }
  }
}

resource "aws_sqs_queue" "s3_events_deadletter" {
  name = "${locals.sqs_notification_queue_name}-deadletter"
}

resource "aws_sqs_queue" "s3_events" {
  name   = local.sqs_notification_queue_name
  policy = data.aws_iam_policy_document.sqs_notification.json

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.s3_events_deadletter.arn
    maxReceiveCount     = 5
  })
}

resource "aws_sqs_queue_redrive_allow_policy" "s3_events_deadletter" {
  queue_url = aws_sqs_queue.s3_events_deadletter.id

  redrive_allow_policy = jsonencode({
    redrivePermission = "byQueue",
    sourceQueueArns   = [aws_sqs_queue.s3_events.arn]
  })
}

Configure bucket notifications to write messages to SQS whenever a new file is created in the source bucket:

resource "aws_s3_bucket_notification" "bucket_notification" {
  bucket = aws_s3_bucket.file_source.id

  queue {
    queue_arn = aws_sqs_queue.s3_events.arn
    events    = ["s3:ObjectCreated:*"]
  }
}

Only supports3:ObjectCreated:* Type of event.
Other types (e.g.ObjectRemoved) will be acknowledged and a warning log will be recorded.

The source needs to be able to access the notification queue and the source bucket. The following policy document contains the minimum permissions required by the source:

data "aws_iam_policy_document" "quickwit_node" {
  statement {
    effect = "Allow"
    actions = [
      "sqs:ReceiveMessage",
      "sqs:DeleteMessage",
      "sqs:ChangeMessageVisibility",
      "sqs:GetQueueAttributes",
    ]
    resources = [aws_sqs_queue.s3_events.arn]
  }
  statement {
    effect    = "Allow"
    actions   = ["s3:GetObject"]
    resources = ["${aws_s3_bucket.file_source.arn}/*"]
  }
}

Create IAM users and credentials to associate them with a local Quickwit instance:

resource "aws_iam_user" "quickwit_node" {
  name = "quickwit-filesource-tutorial"
  path = "/system/"
}

resource "aws_iam_user_policy" "quickwit_node" {
  name   = "quickwit-filesource-tutorial"
  user   = aws_iam_user.quickwit_node.name
  policy = data.aws_iam_policy_document.quickwit_node.json
}

resource "aws_iam_access_key" "quickwit_node" {
  user = aws_iam_user.quickwit_node.name
}

We do not recommend running Quickwit nodes with IAM user credentials in a production environment.
This is just to simplify the tutorial setup. When running on EC2/ECS, the policy document should be attached to the IAM role.

downloadingComplete Terraform Scriptsand useterraform init cap (a poem)terraform apply Deploy it. Upon successful execution, the output required to configure Quickwit will be listed. You can use the following command to display the values of the sensitive outputs (Key ID and Key):

terraform output quickwit_node_access_key_id
terraform output quickwit_node_secret_access_key

Run Quickwit

Local installation of QuickwitThen, in the installation directory, run Quickwit with the necessary access rights to the<quickwit_node_access_key_id> cap (a poem)<quickwit_node_secret_access_key> Replace with the matching Terraform output value:

  • /docs/get-started/installation
AWS_ACCESS_KEY_ID=<quickwit_node_access_key_id> \
AWS_SECRET_ACCESS_KEY=<quickwit_node_secret_access_key> \
AWS_REGION=us-east-1 \
./quickwit run

Configuring Indexes and Sources

In another terminal, in the Quickwit installation directory, create an index:

cat << EOF > 
version: 0.7
index_id: tutorial-sqs-file
doc_mapping:
  mode: dynamic
indexing_settings:
  commit_timeout_secs: 30
EOF

./quickwit index create --index-config 

commander-in-chief (military)<notification_queue_url> Replace with the corresponding Terraform output value to create a file source for the index:

cat << EOF > 
version: 0.8
source_id: sqs-filesource
source_type: file
num_pipelines: 2
params:
  notifications:
    - type: sqs
      queue_url: <notification_queue_url>
      message_type: s3_notification
EOF

./quickwit source create --index tutorial-sqs-file --source-config 

num_pipeline The configuration controls how many consumers will be polled from the queue in parallel. Select the number of resource choices based on the indexer calculations you want to allocate for this source. Typically, 1 pipeline is configured for every 2 cores.

Ingesting data

We can now send data to Quickwit by uploading files to S3. If you have the AWS CLI installed, run the following command to set the<source_bucket_name> Replace with the associated Terraform output:

curl https://quickwit-datasets-public./ | \
    aws s3 cp - s3://<source_bucket_name>/

If you don't want to use the AWS CLI, you can also download the files and manually upload them to the source bucket through the AWS console.

Wait about 1 minute and the data should appear in the index:

./quickwit index describe --index tutorial-sqs-file

source of clearance

The AWS sources instantiated in this section of the tutorial do not incur a fixed cost, but we still recommend that you delete them when you are done. In the directory containing the Terraform script, run theterraform destroy

more

1. How Binance Used Quickwit to Build a 100PB Logging Service (Quickwit Blog)