$ oc edit ClusterLogging instance
OpenShift Container Platform uses Elasticsearch (ES) to store and organize the log data.
Some of the modifications you can make to your log store include:
storage for your Elasticsearch cluster;
how shards are replicated across data nodes in the cluster, from full replication to no replication;
allowing external access to Elasticsearch data.
Scaling down Elasticsearch nodes is not supported. When scaling down, Elasticsearch pods can be accidentally deleted, possibly resulting in shards not being allocated and replica shards being lost. |
Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and limits, unless you specify otherwise in the Cluster Logging Custom Resource. The initial set of OpenShift Container Platform nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OpenShift Container Platform cluster to run with the recommended or higher memory.
Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments.
If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO. |
Each component specification allows for adjustments to both the CPU and memory limits. You should not have to manually adjust these values as the Elasticsearch Operator sets values sufficient for your environment.
Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments. For production use, you should have no less than the default 16Gi allocated to each Pod. Preferably you should allocate as much as possible, up to 64Gi per Pod.
Cluster logging and Elasticsearch must be installed.
Edit the Cluster Logging Custom Resource (CR) in the openshift-logging
project:
$ oc edit ClusterLogging instance
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
....
spec:
logStore:
type: "elasticsearch"
elasticsearch:
resources: (1)
limits:
memory: 16Gi
requests:
cpu: 500m
memory: 16Gi
1 | Specify the CPU and memory limits as needed. If you leave these values blank, the Elasticsearch Operator sets default values that should be sufficient for most deployments. |
If you adjust the amount of Elasticsearch CPU and memory, you must change both the request value and the limit value.
For example:
resources:
limits:
cpu: "8"
memory: "32Gi"
requests:
cpu: "8"
memory: "32Gi"
Kubernetes generally adheres the node CPU configuration and DOES not allow Elasticsearch to use the specified limits.
Setting the same value for the requests
and limits
ensures that Elasticseach can use the CPU and memory you want, assuming the node has the CPU and memory available.
You can define how Elasticsearch shards are replicated across data nodes in the cluster.
Cluster logging and Elasticsearch must be installed.
Edit the Cluster Logging Custom Resource (CR) in the openshift-logging
project:
oc edit clusterlogging instance
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
....
spec:
logStore:
type: "elasticsearch"
elasticsearch:
redundancyPolicy: "SingleRedundancy" (1)
1 | Specify a redundancy policy for the shards. The change is applied upon saving the changes.
|
The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes. |
Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance.
Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur. |
Cluster logging and Elasticsearch must be installed.
Edit the Cluster Logging CR to specify that each data node in the cluster is bound to a Persistent Volume Claim.
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
....
spec:
logStore:
type: "elasticsearch"
elasticsearch:
nodeCount: 3
storage:
storageClassName: "gp2"
size: "200G"
This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.
You can use emptyDir with Elasticsearch, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.
When using emptyDir, if Elasticsearch is restarted or redeployed, you will lose data. |
Cluster logging and Elasticsearch must be installed.
Edit the Cluster Logging CR to specify emptyDir:
spec:
logStore:
type: "elasticsearch"
elasticsearch:
nodeCount: 3
storage: {}
By default, Elasticsearch deployed with cluster logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to Elasticsearch for those tools that access its data.
Externally, you can access Elasticsearch by creating a reencrypt route, your OpenShift Container Platform token and the installed Elasticsearch CA certificate. Then, access an Elasticsearch node with a cURL request that contains:
The Authorization: Bearer ${token}
The Elasticsearch reencrypt route and an Elasticsearch API request.
Internally, you can access Elastiscearch using the Elasticsearch cluster IP:
You can get the Elasticsearch cluster IP using either of the following commands:
$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging 172.30.183.229
oc get service elasticsearch -n openshift-logging NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP 172.30.183.229 <none> 9200/TCP 22h $ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -n openshift-logging -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 29 100 29 0 0 108 0 --:--:-- --:--:-- --:--:-- 108
Cluster logging and Elasticsearch must be installed.
You must have access to the project in order to be able to access to the logs.
To expose Elasticsearch externally:
Change to the openshift-logging
project:
$ oc project openshift-logging
Extract the CA certificate from Elasticsearch and write to the admin-ca file:
$ oc extract secret/elasticsearch --to=. --keys=admin-ca admin-ca
Create the route for the Elasticsearch service as a YAML file:
Create a YAML file with the following:
apiVersion: route.openshift.io/v1
kind: route
metadata:
name: elasticsearch
namespace: openshift-logging
spec:
host:
to:
kind: Service
name: elasticsearch
tls:
termination: reencrypt
destinationCACertificate: | (1)
1 | Add the Elasticsearch CA certifcate or use the command in the next step. You do not have to set the spec.tls.key , spec.tls.certificate , and spec.tls.caCertificate parameters required by some reencrypt routes. |
Run the following command to add the Elasticsearch CA certificate to the route YAML you created:
cat ./admin-ca | sed -e "s/^/ /" >> <file-name>.yaml
Create the route:
$ oc create -f <file-name>.yaml route.route.openshift.io/elasticsearch created
Check that the Elasticsearch service is exposed:
Get the token of this ServiceAccount to be used in the request:
$ token=$(oc whoami -t)
Set the elasticsearch route you created as an environment variable.
$ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:
curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/.operations.*/_search?size=1" | jq
The response appears similar to the following:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 944 100 944 0 0 62 0 0:00:15 0:00:15 --:--:-- 204 { "took": 441, "timed_out": false, "_shards": { "total": 3, "successful": 3, "skipped": 0, "failed": 0 }, "hits": { "total": 89157, "max_score": 1, "hits": [ { "_index": ".operations.2019.03.15", "_type": "com.example.viaq.common", "_id": "ODdiNWIyYzAtMjg5Ni0TAtNWE3MDY1MjMzNTc3", "_score": 1, "_source": { "_SOURCE_MONOTONIC_TIMESTAMP": "673396", "systemd": { "t": { "BOOT_ID": "246c34ee9cdeecb41a608e94", "MACHINE_ID": "e904a0bb5efd3e36badee0c", "TRANSPORT": "kernel" }, "u": { "SYSLOG_FACILITY": "0", "SYSLOG_IDENTIFIER": "kernel" } }, "level": "info", "message": "acpiphp: Slot [30] registered", "hostname": "localhost.localdomain", "pipeline_metadata": { "collector": { "ipaddr4": "10.128.2.12", "ipaddr6": "fe80::xx:xxxx:fe4c:5b09", "inputname": "fluent-plugin-systemd", "name": "fluentd", "received_at": "2019-03-15T20:25:06.273017+00:00", "version": "1.3.2 1.6.0" } }, "@timestamp": "2019-03-15T20:00:13.808226+00:00", "viaq_msg_id": "ODdiNWIyYzAtMYTAtNWE3MDY1MjMzNTc3" } } ] } }
You can view these alerting rules in Prometheus.
Alert | Description | Severity |
---|---|---|
ElasticsearchClusterNotHealthy |
Cluster health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn’t been elected yet. |
critical |
ElasticsearchClusterNotHealthy |
Cluster health status has been YELLOW for at least 20m. Some shard replicas are not allocated. |
warning |
ElasticsearchBulkRequestsRejectionJumps |
High Bulk Rejection Ratio at node in cluster. This node may not be keeping up with the indexing speed. |
warning |
ElasticsearchNodeDiskWatermarkReached |
Disk Low Watermark Reached at node in cluster. Shards can not be allocated to this node anymore. You should consider adding more disk space to the node. |
alert |
ElasticsearchNodeDiskWatermarkReached |
Disk High Watermark Reached at node in cluster. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node. |
high |
ElasticsearchJVMHeapUseHigh |
JVM Heap usage on the node in cluster is <value> |
alert |
AggregatedLoggingSystemCPUHigh |
System CPU usage on the node in cluster is <value> |
alert |
ElasticsearchProcessCPUHigh |
ES process CPU usage on the node in cluster is <value> |
alert |