Getting Started with
OpenTelemetry (OTel)

Last updated: April 20, 2025

1. Introduction: The Quest for Observability

Modern distributed systems, composed of microservices, serverless functions, and various infrastructure components, can be incredibly complex to understand and debug. Observability is the practice of gaining insights into these systems' internal states based on the data they generate. Traditionally, this data comes in three main forms, often called the "three pillars" of observability:

  • Metrics: Aggregated numerical data about system performance over time (e.g., CPU usage, request latency, error rates).
  • Logs: Timestamped records of discrete events occurring within the system (e.g., application start, error encountered, request received).
  • Traces: Records of the path a single request takes as it travels through different services in a distributed system, showing causality and latency breakdown.

Historically, collecting and correlating these signals required using disparate tools and vendor-specific agents, leading to integration challenges and vendor lock-in.

2. What is OpenTelemetry?

OpenTelemetry (OTel) is an open-source observability framework, created through the merger of OpenTracing and OpenCensus projects, and now part of the Cloud Native Computing Foundation (CNCF). Its primary goal is to standardize the way telemetry data (traces, metrics, logs) is generated, collected, and exported.

OTel provides a set of vendor-neutral APIs, SDKs (Software Development Kits), and tools for instrumenting applications and infrastructure. This allows developers to instrument their code once and send the data to a variety of observability backends (open-source tools like Jaeger or Prometheus, or commercial platforms like Datadog, Dynatrace, New Relic, etc.) without being locked into a specific vendor's proprietary agents or libraries.

3. Core Concepts

3.1 Signals (Traces, Metrics, Logs)

OpenTelemetry defines standard specifications and APIs for the three primary observability signals:

  • Traces: Represent the journey of a request through a system. A trace is composed of one or moreSpans Each Span represents a unit of work (e.g., an HTTP request, a database query) with a start time, duration, attributes (metadata), and relationships to other Spans (parent/child).
  • Metrics: Quantitative measurements about the system. OTel defines various metric instruments (e.g., Counter, Gauge, Histogram) for recording numerical data.
  • Logs: Timestamped text records, potentially with structured attributes. OTel aims to correlate logs with traces and metrics by including context like Trace IDs and Span IDs.

3.2 APIs and SDKs

  • API (Application Programming Interface): Defines how application code interacts with OpenTelemetry to create Spans, record Metrics, or emit Logs. Application code should ideally only depend on the API.
  • SDK (Software Development Kit): The concrete implementation of the API for a specific language. It handles concerns like sampling, processing, and exporting telemetry data. Different SDKs exist for various languages (Go, Python, Java, .NET, Rust, etc.).

This separation allows application code to remain vendor-agnostic while the SDK configuration determines where and how the data is sent.

3.3 Collector

The OpenTelemetry Collector is an optional but highly recommended component. It's a vendor-agnostic agent that can receive telemetry data (via OTLP or other protocols like Jaeger, Prometheus), process it (e.g., filter, batch, add attributes), and export it to one or more observability backends.

Using a Collector simplifies instrumentation (applications just send data to the Collector) and provides a central point for managing data processing and routing.

3.4 OTLP

The OpenTelemetry Protocol (OTLP) is the native wire protocol for OTel. It defines how telemetry data (traces, metrics, logs) should be encoded and transmitted between SDKs, Collectors, and backends.

3.5 Context Propagation

Crucial for distributed tracing, context propagation is the mechanism by which OTel ensures that trace information (like Trace IDs and Span IDs) is carried across service boundaries (e.g., in HTTP headers). This allows Spans generated in different services for the same request to be linked together into a single Trace.

4. Benefits of Using OpenTelemetry

  • Standardization: Provides a unified way to generate telemetry across different languages and platforms.
  • Vendor Neutrality: Avoids lock-in to specific observability platforms; switch backends without re-instrumenting code.
  • Comprehensive Observability: Integrates traces, metrics, and logs for a holistic view of system behavior.
  • Rich Ecosystem: Growing number of integrations, instrumentation libraries for common frameworks/libraries, and backend support.
  • Flexibility: Can export data directly from SDKs or use the powerful Collector for advanced processing and routing.

5. Conceptual Getting Started Flow

While specifics vary by language, the general process involves:

  1. Add Dependencies: Include the OTel API and SDK packages for your language in your project.
  2. Initialize SDK: Configure the SDK, typically setting up a TracerProvider, MeterProvider, and/or LoggerProvider. This often includes defining resource attributes (like service name, version).
  3. Configure Exporter: Specify where the telemetry data should be sent. Options include:
    • Console Exporter (for local testing).
    • OTLP Exporter (to send data to an OTel Collector or OTLP-compatible backend).
    • Specific Backend Exporters (e.g., Jaeger, Prometheus).
  4. Instrument Code:
    • Automatic Instrumentation: Use language-specific agents or libraries that automatically instrument common frameworks (e.g., web servers, HTTP clients, database drivers).
    • Manual Instrumentation: Use the OTel API directly in your code to create custom Spans, record Metrics, or emit Logs with context.
  5. Run Application: Execute your instrumented application.
  6. (Optional) Deploy Collector: Configure and run an OTel Collector to receive data from your application(s) and forward it to your chosen backend(s).

6. Integration with Backends

OpenTelemetry itself is not an observability backend; it focuses on generation and collection. You need a backend system to store, analyze, and visualize the data. Popular choices include:

  • Open Source: Jaeger (Tracing), Prometheus (Metrics), Grafana Tempo/Mimir/Loki (Traces/Metrics/Logs).
  • Commercial Platforms: Datadog, Dynatrace, New Relic, Honeycomb, Splunk Observability Cloud, etc. (Most now support OTLP).

The Collector plays a key role in exporting data to these diverse backends.

7. Conclusion

OpenTelemetry is rapidly becoming the industry standard for instrumenting cloud-native applications and infrastructure. By providing vendor-neutral APIs and SDKs for traces, metrics, and logs, it simplifies observability and empowers developers to gain deeper insights into their systems without vendor lock-in.

Getting started involves instrumenting your application using the appropriate SDK and configuring an exporter, often leveraging the flexibility of the OpenTelemetry Collector. Embracing OTel allows you to build more observable systems and choose the backend tools that best fit your needs.

8. Additional Resources

Related Articles

External Links