Monitoring Octopus

Octopus is built on sigs.k8s.io/controller-runtime, so some metrics are related to controller-runtime and client-go. At the same time, github.com/prometheus/client_golang provides some metrics for Go runtime and process state.

Metrics Category

In the "Type" column, use the first letter to represent the corresponding word: G - Gauge, C - Counter, H - Histogram, S - Summary.

Exposing from Controller Runtime

Controller metrics

TypeNameDescriptionUsage
Ccontroller_runtime_reconcile_totalTotal number of reconciliations per controller.
Ccontroller_runtime_reconcile_errors_totalTotal number of reconciliation errors per controller.
Hcontroller_runtime_reconcile_time_secondsLength of time per reconciliation per controller.

Webhook metrics

TypeNameDescriptionUsage
Hcontroller_runtime_webhook_latency_secondsHistogram of the latency of processing admission requests.

Exposing from Kubernetes client

Rest client metrics

TypeNameDescriptionUsage
Crest_client_requests_totalNumber of HTTP requests, partitioned by status code, method, and host.
Hrest_client_request_latency_secondsRequest latency in seconds. Broken down by verb and URL.

Workqueue metrics

TypeNameDescriptionUsage
Gworkqueue_depthCurrent depth of workqueue.
Gworkqueue_unfinished_work_secondsHow many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.
Gworkqueue_longest_running_processor_secondsHow many seconds has the longest running processor for workqueue been running.
Cworkqueue_adds_totalTotal number of adds handled by workqueue.
Cworkqueue_retries_totalTotal number of retries handled by workqueue.
Hworkqueue_queue_duration_secondsHow long in seconds an item stays in workqueue before being requested.
Hworkqueue_work_duration_secondsHow long in seconds processing an item from workqueue takes.

Exposing from Prometheus client

Go runtime metrics

TypeNameDescriptionUsage
Ggo_goroutinesNumber of goroutines that currently exist.
Ggo_threadsNumber of OS threads created.
Ggo_infoInformation about the Go environment.
Sgo_gc_duration_secondsA summary of the pause duration of garbage collection cycles.
Ggo_memstats_alloc_bytesNumber of bytes allocated and still in use.
Cgo_memstats_alloc_bytes_totalTotal number of bytes allocated, even if freed.
Ggo_memstats_sys_bytesNumber of bytes obtained from system.
Cgo_memstats_lookups_totalTotal number of pointer lookups.
Cgo_memstats_mallocs_totalTotal number of mallocs.
Cgo_memstats_frees_totalTotal number of frees.
Ggo_memstats_heap_alloc_bytesNumber of heap bytes allocated and still in use.
Ggo_memstats_heap_sys_bytesNumber of heap bytes obtained from system.
Ggo_memstats_heap_idle_bytesNumber of heap bytes waiting to be used.
Ggo_memstats_heap_inuse_bytesNumber of heap bytes that are in use.
Ggo_memstats_heap_released_bytesNumber of heap bytes released to OS.
Ggo_memstats_heap_objectsNumber of allocated objects.
Ggo_memstats_stack_inuse_bytesNumber of bytes in use by the stack allocator.
Ggo_memstats_stack_sys_bytesNumber of bytes obtained from system for stack allocator.
Ggo_memstats_mspan_inuse_bytesNumber of bytes in use by mspan structures.
Ggo_memstats_mspan_sys_bytesNumber of bytes used for mspan structures obtained from system.
Ggo_memstats_mcache_inuse_bytesNumber of bytes in use by mcache structures.
Ggo_memstats_mcache_sys_bytesNumber of bytes used for mcache structures obtained from system.
Ggo_memstats_buck_hash_sys_bytesNumber of bytes used by the profiling bucket hash table.
Ggo_memstats_gc_sys_bytesNumber of bytes used for garbage collection system metadata.
Ggo_memstats_other_sys_bytesNumber of bytes used for other system allocations.
Ggo_memstats_next_gc_bytesNumber of heap bytes when next garbage collection will take place.
Ggo_memstats_last_gc_time_secondsNumber of seconds since 1970 of last garbage collection.
Ggo_memstats_gc_cpu_fractionThe fraction of this program's available CPU time used by the GC since the program started.

Running process metrics

TypeNameDescriptionUsage
Cprocess_cpu_seconds_totalTotal user and system CPU time spent in seconds.
Gprocess_open_fdsNumber of open file descriptors.
Gprocess_max_fdsMaximum number of open file descriptors.
Gprocess_virtual_memory_bytesVirtual memory size in bytes.
Gprocess_virtual_memory_max_bytesMaximum amount of virtual memory available in bytes.
Gprocess_resident_memory_bytesResident memory size in bytes.
Gprocess_start_time_secondsStart time of the process since unix epoch in seconds.

Exposing from Octopus

Limb metrics

TypeNameDescriptionUsage
Glimb_connect_connectionsHow many connections are connecting adaptor.
Climb_connect_errors_totalTotal number of connecting adaptor errors.
Climb_send_errors_totalTotal number of errors of sending device desired to adaptor.
Hlimb_send_latency_secondsHistogram of the latency of sending device desired to adaptor.

Monitor

By default, the metrics will be exposed on port 8080(see brain options and limb options), they can be collected by Prometheus and visually analyzed through Grafana. Octopus provides a ServiceMonitor definition YAML to integrate with Prometheus Operator, which is an easy tool to configure and manage Prometheus instances.

Grafana Dashboard

For convenience, Octopus provides a Grafana Dashboard to visualize the monitoring metrics.

monitoring;

Integrate with Prometheus Operator

Using prometheus-operator HELM chart, you can easily set up a Prometheus Operator to monitor the Octopus. The following steps demonstrate how to run a Prometheus Operator on a local Kubernetes cluster:

  1. Use cluster-k3d-spinup.sh to set up a local Kubernetes cluster via k3d.
  2. Follow the installation guide of HELM to install helm tool, and then use helm fetch --untar --untardir /tmp stable/prometheus-operator the prometheus-operator chart to local /tmp directory.
  3. Generate a deployment YAML from prometheus-operator chart as below.
    helm template --namespace octopus-monitoring \
    --name octopus \
    --set defaultRules.create=false \
    --set global.rbac.pspEnabled=false \
    --set prometheusOperator.admissionWebhooks.patch.enabled=false \
    --set prometheusOperator.admissionWebhooks.enabled=false \
    --set prometheusOperator.kubeletService.enabled=false \
    --set prometheusOperator.tlsProxy.enabled=false \
    --set prometheusOperator.serviceMonitor.selfMonitor=false \
    --set alertmanager.enabled=false \
    --set grafana.defaultDashboardsEnabled=false \
    --set coreDns.enabled=false \
    --set kubeApiServer.enabled=false \
    --set kubeControllerManager.enabled=false \
    --set kubeEtcd.enabled=false \
    --set kubeProxy.enabled=false \
    --set kubeScheduler.enabled=false \
    --set kubeStateMetrics.enabled=false \
    --set kubelet.enabled=false \
    --set nodeExporter.enabled=false \
    --set prometheus.serviceMonitor.selfMonitor=false \
    --set prometheus.ingress.enabled=true \
    --set prometheus.ingress.hosts={localhost} \
    --set prometheus.ingress.paths={/prometheus} \
    --set prometheus.ingress.annotations.'traefik\.ingress\.kubernetes\.io\/rewrite-target'=/ \
    --set prometheus.prometheusSpec.externalUrl=http://localhost/prometheus \
    --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
    --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
    --set prometheus.prometheusSpec.ruleSelectorNilUsesHelmValues=false \
    --set grafana.adminPassword=admin \
    --set grafana.rbac.pspUseAppArmor=false \
    --set grafana.rbac.pspEnabled=false \
    --set grafana.serviceMonitor.selfMonitor=false \
    --set grafana.testFramework.enabled=false \
    --set grafana.ingress.enabled=true \
    --set grafana.ingress.hosts={localhost} \
    --set grafana.ingress.path=/grafana \
    --set grafana.ingress.annotations.'traefik\.ingress\.kubernetes\.io\/rewrite-target'=/ \
    --set grafana.'grafana\.ini'.server.root_url=http://localhost/grafana \
    /tmp/prometheus-operator > /tmp/prometheus-operator_all_in_one.yaml
  4. Create octopus-monitoring Namespace via kubectl create ns octopus-monitoring.
  5. Apply the prometheus-operator all-in-one deployment into the local cluster via kubectl apply -f /tmp/prometheus-operator_all_in_one.yaml.
  6. Apply the Octopus all-in-one deployment via kubectl apply -f https://raw.githubusercontent.com/cnrancher/octopus/master/deploy/e2e/all_in_one.yaml.
  7. Apply the monitoring integration into the local cluster via kubectl apply -f https://raw.githubusercontent.com/cnrancher/octopus/master/deploy/e2e/integrate_with_prometheus_operator.yaml
  8. Visit http://localhost/prometheus to view the Prometheus web console through the browser, or visit http://localhost/grafana to view the Grafana console(the administrator account is admin/admin).
  9. (Optional) Import the Octopus Overview dashboard from Grafana console.