Configuring Elasticsearch - Configuring your cluster logging deployment | Logging

Configuring Elasticsearch CPU and memory limits
Configuring Elasticsearch replication policy
Configuring Elasticsearch storage
Configuring Elasticsearch for emptyDir storage
Exposing Elasticsearch as a route
About Elasticsearch alerting rules

OpenShift Container Platform uses Elasticsearch (ES) to store and organize the log data.

You can configure your Elasticsearch deployment to:

configure storage for your Elasticsearch cluster;
define how shards are replicated across data nodes in the cluster, from full replication to no replication;
configure external access to Elasticsearch data.

Scaling down Elasticsearch nodes is not supported. When scaling down, Elasticsearch pods can be accidentally deleted, possibly resulting in shards not being allocated and replica shards being lost.

Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and CPU limits, unless you specify otherwise in the ClusterLogging Custom Resource. The initial set of OpenShift Container Platform nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OpenShift Container Platform cluster to run with the recommended or higher memory.

Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments.

If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.

Configuring Elasticsearch CPU and memory limits

Each component specification allows for adjustments to both the CPU and memory limits. You should not have to manually adjust these values as the Elasticsearch Operator sets values sufficient for your environment.

Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments. For production use, you should have no less than the default 16Gi allocated to each Pod. Preferably you should allocate as much as possible, up to 64Gi per Pod.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

$ oc edit ClusterLogging instance

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
....
spec:
    logStore:
      type: "elasticsearch"
      elasticsearch:
        resources: (1)
          limits:
            memory: "16Gi"
          requests:
            cpu: "1"
            memory: "16Gi"

1	Specify the CPU and memory limits as needed. If you leave these values blank, the Elasticsearch Operator sets default values that should be sufficient for most deployments.

Configuring Elasticsearch replication policy

You can define how Elasticsearch shards are replicated across data nodes in the cluster:

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

oc edit clusterlogging instance

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"

....

spec:
  logStore:
    type: "elasticsearch"
    elasticsearch:
      redundancyPolicy: "SingleRedundancy" (1)

Specify a redundancy policy for the shards. The change is applied upon saving the changes.

FullRedundancy. Elasticsearch fully replicates the primary shards for each index to every data node. This provides the highest safety, but at the cost of the highest amount of disk required and the poorest performance.
MultipleRedundancy. Elasticsearch fully replicates the primary shards for each index to half of the data nodes. This provides a good tradeoff between safety and performance.
SingleRedundancy. Elasticsearch makes one copy of the primary shards for each index. Logs are always available and recoverable as long as at least two data nodes exist. Better performance than MultipleRedundancy, when using 5 or more nodes. You cannot apply this policy on deployments of single Elasticsearch node.
ZeroRedundancy. Elasticsearch does not make copies of the primary shards. Logs might be unavailable or lost in the event a node is down or fails. Use this mode when you are more concerned with performance than safety, or have implemented your own disk/PVC backup/restore strategy.

Configuring Elasticsearch storage

Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance is.

Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Edit the Cluster Logging CR to specify that each data node in the cluster is bound to a Persistent Volume Claim.

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"

....

 spec:
    logStore:
      type: "elasticsearch"
      elasticsearch:
        nodeCount: 3
        storage:
          storageClassName: "gp2"
          size: "200G"

This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.

Configuring Elasticsearch for emptyDir storage

You can use emptyDir with Elasticsearch, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.

When using emptyDir, if Elasticsearch is restarted or redeployed, you will lose data.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Edit the Cluster Logging CR to specify emptyDir:

 spec:
    logStore:
      type: "elasticsearch"
      elasticsearch:
        nodeCount: 3
        storage: {}

Exposing Elasticsearch as a route

By default, Elasticsearch deployed with cluster logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to Elasticsearch for those tools that access its data.

Externally, you can access Elasticsearch by creating a reencrypt route, your OpenShift Container Platform token and the installed Elasticsearch CA certificate. Then, access an Elasticsearch node with a cURL request that contains:

The Authorization: Bearer ${token}
The Elasticsearch reencrypt route and an Elasticsearch API request.

Internally, you can access Elastiscearch using the Elasticsearch cluster IP:

$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging
172.30.183.229

oc get service elasticsearch
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   172.30.183.229   <none>        9200/TCP   22h

$ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    29  100    29    0     0    108      0 --:--:-- --:--:-- --:--:--   108

Prerequisites

Cluster logging and Elasticsearch must be installed.
You must have access to the project in order to be able to access to the logs. For example:

Procedure

To expose Elasticsearch externally:

Change to the openshift-logging project:
```
$ oc project openshift-logging
```
Extract the CA certificate from Elasticsearch and write to the admin-ca file:
```
$ oc extract secret/elasticsearch --to=. --keys=admin-ca

admin-ca
```

Create the route for the Elasticsearch service as a YAML file:

Create a YAML file with the following:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: elasticsearch
  namespace: openshift-logging
spec:
  host:
  to:
    kind: Service
    name: elasticsearch
  tls:
    termination: reencrypt
    destinationCAcertificate: | (1)

1	Add the Elasticsearch CA certifcate or use the command in the next step. You do not have to set the `spec.tls.key`, `spec.tls.certificate`, and `spec.tls.cacertificate` parameters required by some reencrypt routes.

Add the Elasticsearch CA certificate to the route YAML you created:
```
cat ./admin-ca | sed -e "s/^/      /" >> <file-name>.yaml
```

Create the route:

$ oc create -f <file-name>.yaml

route.route.openshift.io/elasticsearch created

Check that the Elasticsearch service is exposed:

Get the token of this ServiceAccount to be used in the request:
```
$ token=$(oc whoami -t)
```
Set the elasticsearch route you created as an environment variable.
```
$ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
```

To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:

curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/.operations.*/_search?size=1" | jq

The response appears similar to the following:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   944  100   944    0     0     62      0  0:00:15  0:00:15 --:--:--   204
{
  "took": 441,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 89157,
    "max_score": 1,
    "hits": [
      {
        "_index": ".operations.2019.03.15",
        "_type": "com.example.viaq.common",
        "_id": "ODdiNWIyYzAtMjg5Ni0TAtNWE3MDY1MjMzNTc3",
        "_score": 1,
        "_source": {
          "_SOURCE_MONOTONIC_TIMESTAMP": "673396",
          "systemd": {
            "t": {
              "BOOT_ID": "246c34ee9cdeecb41a608e94",
              "MACHINE_ID": "e904a0bb5efd3e36badee0c",
              "TRANSPORT": "kernel"
            },
            "u": {
              "SYSLOG_FACILITY": "0",
              "SYSLOG_IDENTIFIER": "kernel"
            }
          },
          "level": "info",
          "message": "acpiphp: Slot [30] registered",
          "hostname": "localhost.localdomain",
          "pipeline_metadata": {
            "collector": {
              "ipaddr4": "10.128.2.12",
              "ipaddr6": "fe80::xx:xxxx:fe4c:5b09",
              "inputname": "fluent-plugin-systemd",
              "name": "fluentd",
              "received_at": "2019-03-15T20:25:06.273017+00:00",
              "version": "1.3.2 1.6.0"
            }
          },
          "@timestamp": "2019-03-15T20:00:13.808226+00:00",
          "viaq_msg_id": "ODdiNWIyYzAtMYTAtNWE3MDY1MjMzNTc3"
        }
      }
    ]
  }
}

About Elasticsearch alerting rules

You can view these alerting rules in Prometheus.

Alert	Description	Severity
ElasticsearchClusterNotHealthy	Cluster health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn’t been elected yet.	critical
ElasticsearchClusterNotHealthy	Cluster health status has been YELLOW for at least 20m. Some shard replicas are not allocated.	warning
ElasticsearchBulkRequestsRejectionJumps	High Bulk Rejection Ratio at node in cluster. This node may not be keeping up with the indexing speed.	warning
ElasticsearchNodeDiskWatermarkReached	Disk Low Watermark Reached at node in cluster. Shards can not be allocated to this node anymore. You should consider adding more disk to the node.	alert
ElasticsearchNodeDiskWatermarkReached	Disk High Watermark Reached at node in cluster. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node.	high
ElasticsearchJVMHeapUseHigh	JVM Heap usage on the node in cluster is <value>	alert
AggregatedLoggingSystemCPUHigh	System CPU usage on the node in cluster is <value>	alert
ElasticsearchProcessCPUHigh	ES process CPU usage on the node in cluster is <value>	alert

Alert

Description

Severity

ElasticsearchClusterNotHealthy

Cluster health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn’t been elected yet.

critical

ElasticsearchClusterNotHealthy

Cluster health status has been YELLOW for at least 20m. Some shard replicas are not allocated.

warning

ElasticsearchBulkRequestsRejectionJumps

High Bulk Rejection Ratio at node in cluster. This node may not be keeping up with the indexing speed.

warning

ElasticsearchNodeDiskWatermarkReached

Disk Low Watermark Reached at node in cluster. Shards can not be allocated to this node anymore. You should consider adding more disk to the node.

alert

ElasticsearchNodeDiskWatermarkReached

Disk High Watermark Reached at node in cluster. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node.

high

ElasticsearchJVMHeapUseHigh

JVM Heap usage on the node in cluster is <value>

alert

AggregatedLoggingSystemCPUHigh

System CPU usage on the node in cluster is <value>

alert

ElasticsearchProcessCPUHigh

ES process CPU usage on the node in cluster is <value>

alert

Configuring Elasticsearch to store and organize log data

Configuring Elasticsearch CPU and memory limits

Configuring Elasticsearch replication policy

Configuring Elasticsearch storage

Configuring Elasticsearch for emptyDir storage

Exposing Elasticsearch as a route

About Elasticsearch alerting rules