Stop Logging Everything and Start Telling Stories

Observability · structured logging · traces & spans · OpenTelemetry · Datadog (as the concrete example) · ~15 min read · Backend & reliability

In this post

Why this matters
The problem — logs that do not tell a story
What quality logs look like
Where logs end and traces begin
Cost, sampling, and the observability layer cake
Automating the standard with an AI skill

1. Why this matters

Recently I found myself in a frustrating but familiar situation: sitting in front of a Datadog dashboard in the middle of a working day, trying to diagnose a production issue with my team, and realising that the logs we had were not telling us anything useful. Not because there were too few of them — there were plenty. But because the ones that existed did not answer the questions we were actually asking. Which path did this request take? Why did it behave differently for this particular user? What did the payment provider actually return before this order ended up in a failed state?

It was not an isolated incident. The same problem kept surfacing whenever we needed insights into a specific trace or event — during mid-day debugging sessions, during stakeholder queries, during post-mortems. The logs were present. The story was not.

That experience pushed me to think more carefully about what quality logging and observability actually mean — not just "log more things" but "log the right things, in the right way, with the right tools sharing the load." This post is what I found.

It is about the difference between logs that generate noise and logs that tell stories. It is about how traces and spans can carry the burden that developers mistakenly put on logs. And it is about how to approach all of this without burning your observability budget on lines that no one will ever query.

The core mistake
Treating logs as the only observability surface. That pushes teams to either under-log (missing crucial context) or over-log (drowning in noise and cost) — and still leaves execution shape and timing invisible without traces.

2. The problem — logs that do not tell a story

Ask yourself honestly: if a critical bug appeared in your production system right now, could you open your log aggregator, find a single request, and reconstruct the complete sequence of events that led to the failure? Could you answer: what came in, what decision was made, which external system was called, what it returned, and what happened next?

If the answer is "probably not" or "it depends on the feature", you have a log quality problem.

This usually manifests in a few recognisable patterns.

The vague confirmation. Logs that tell you something happened but not what or why.

// What order? What status? What changed?
_logger.LogInformation("Order updated");

// What failed? In what context? What was being attempted?
_logger.LogError("Something went wrong");

The context vacuum. Logs that contain a message but no entity identifiers, no correlation IDs, nothing to tie a log line to a specific user, request, or record.

// Which user? Which product? From where?
_logger.LogInformation("User viewed product");

The missing decision. Code that takes a branching path — a cache hit, a feature flag, a fallback strategy — and logs nothing about why it took that path. When something goes wrong, you have no idea which branch the code was in.

The silent catch. Exception handlers that catch errors and log only a generic message, or worse, swallow them entirely. By the time you need that stack trace, it is gone.

The flood with no signal. Teams that log aggressively at every function entry and exit, filling their log aggregator with noise, and then find that the one log line they actually needed — the one that captures the specific business context at the moment of failure — is not there.

Every one of these patterns has the same root cause: the logs were written for the developer who wrote the code, not for the engineer who will debug it six months later with no context.

3. What quality logs look like

A quality log line answers four questions: what happened, to which entity, in what context, and why — in a format that a machine can index and a human can read.

Structured, not narrated

Unstructured log messages are human-readable and machine-unfriendly. Structured logs with named properties are both.

// ❌ Unstructured — readable but unsearchable
_logger.LogInformation($"Payment of {amount} processed for order {orderId}");

// ✅ Structured — readable AND queryable in Datadog
_logger.LogInformation(
    "Payment processed. {OrderId} {UserId} {AmountGbp} {Provider}",
    order.Id, order.UserId, order.Amount, provider.Name);

The difference matters enormously at scale. With structured logs you can query @OrderId:12345 in Datadog and find every log line related to that order across every service. With a flat string, you are running a regex against millions of lines.

Log the decision, not just the fact

When your code takes a path, log the reason — especially at branching points.

// ✅ Cache hit — log the reason
_logger.LogInformation(
    "Product catalogue served from cache. {CacheKey} {CacheAgeSeconds}",
    cacheKey, age.TotalSeconds);

// ✅ Cache miss — log the reason
_logger.LogWarning(
    "Cache miss for product catalogue. Falling back to DB. {CacheKey}",
    cacheKey);

Now when you see a latency spike, you can immediately tell whether it was caused by cache misses — because the decision is in the log.

Log at boundaries, not internals

A common mistake is logging inside every private helper method. This produces volume without signal. Instead, log at the boundaries of your system: where data enters, where decisions are made, where external systems are called, and where operations succeed or fail.

The boundaries that matter are: HTTP handlers, service layer entry and exit for non-trivial operations, database calls, outbound HTTP calls, message consumers, and background jobs.

Log with the right severity

Log levels are not decoration. Using them correctly is what makes alerting, filtering, and triage possible.

Level	When to use
`Trace`	Step-by-step execution detail. Development only, never on in production.
`Debug`	Diagnostic data for investigation. Off by default in prod; enable dynamically when needed.
`Info`	Normal, expected business events.
`Warn`	Something unexpected happened but the system recovered.
`Error`	An operation failed and could not recover.
`Critical`	The system is in an unrecoverable state.

When everything is Info or Error, you cannot filter. When your alerts fire on Error, you want to know that every Error log line represents a real failure worth waking someone up over.

Exceptions deserve full context

Every caught exception must produce a log. Not a string. Not a message. A log entry that includes the exception object itself, the relevant entity IDs, and a description of what operation was being attempted.

// ❌ Useless in production
_logger.LogError("Payment failed");

// ✅ Actionable — exception + context + entity IDs
_logger.LogError(ex,
    "Payment processing failed after retry. {OrderId} {PaymentProvider} {AttemptCount}",
    orderId, provider.Name, attemptCount);

4. Where logs end and traces begin

Tooling
The examples below use Datadog as the APM and log backend and OpenTelemetry with .NET (ActivitySource) because that is how this post is written. The ideas are vendor-agnostic: structured logs, trace–log correlation, span attributes instead of narrative log spam, and metrics for aggregates. You can apply the same split of responsibilities with Grafana, Honeycomb, New Relic, Jaeger, Tempo, AWS X-Ray, Google Cloud Trace, or any stack that gives you traces, searchable logs, and dashboards.

Here is the architectural mistake that leads both to poor observability and to unnecessary cost: putting the entire observability burden on logs.

Logs are excellent at recording discrete events — something happened at a specific moment in time, with specific context. They are poor at representing the shape of an execution: how long each step took, which path through the system a request travelled, where the bottleneck was in a complex workflow.

That is what distributed tracing is for.

The three pillars working together

A mature observability setup divides responsibility across three tools:

Logs answer: what happened and why? Business events, decision points, errors with full context.

Traces and spans answer: what was the execution path and how long did each step take? The complete lifecycle of a request, from entry point through every service, DB call, and external API.

Metrics answer: how is the system performing overall? Request rates, error rates, latency percentiles, resource utilisation.

When these three work together, a single production incident becomes navigable: metrics alert you that something is wrong, traces show you which service and which operation is the bottleneck, and logs give you the specific business context at the moment of failure.

When you rely only on logs for all three jobs, you end up either under-logging (missing crucial context) or over-logging (drowning in noise and racking up a significant Datadog bill).

What a span actually is

A span represents a unit of work: a database query, an outbound HTTP call, a message being processed, a business operation executing. Spans have a start time, an end time, a status, and a set of key-value attributes.

In .NET with OpenTelemetry, a span looks like this:

using var activity = Telemetry.Source.StartActivity("ProcessOrder");

// Attach domain context as attributes
activity?.SetTag("order.id", orderId.ToString());
activity?.SetTag("order.user_id", userId.ToString());

try
{
    var result = await ExecuteOrderAsync(orderId);

    activity?.SetTag("order.final_status", result.Status.ToString());
    activity?.SetStatus(ActivityStatusCode.Ok);
    return result;
}
catch (Exception ex)
{
    activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
    activity?.RecordException(ex);
    throw;
}

Spans are collected by the OpenTelemetry SDK and exported to your observability backend — in this case Datadog — where they appear as a flame graph showing the complete execution tree of a request.

Span attributes replace a category of log lines

Here is the shift in thinking that changes how you log: anything that describes the shape, path, or timing of an execution belongs on a span attribute, not a log line.

Consider an order processing workflow. Without tracing, you might log:

_logger.LogInformation("Starting order processing. {OrderId}", orderId);
_logger.LogInformation("Fetched order from DB. {OrderId} took 45ms", orderId);
_logger.LogInformation("Calling payment provider. {OrderId} {Provider}", orderId, provider);
_logger.LogInformation("Payment provider responded. {OrderId} took 230ms", orderId);
_logger.LogInformation("Order saved. {OrderId}", orderId);

These are five log lines per order. At scale, that is expensive. And it still does not give you a visual flame graph.

With tracing:

The DB call latency (45ms) is a span attribute
The payment provider call latency (230ms) is a child span with its own timing
The total operation duration is the parent span
The OrderId and Provider are span attributes queryable in Datadog APM

You eliminate four of those five log lines and gain a richer picture of the execution.

Span events for mid-span milestones

For notable milestones within a long-running operation — validation passed, a retry was triggered, an important threshold was crossed — use span events rather than log lines:

activity?.AddEvent(new ActivityEvent("validation_passed",
    tags: new ActivityTagsCollection
    {
        { "rules_evaluated", ruleCount },
        { "validation_duration_ms", duration.TotalMilliseconds }
    }));

Span events are timestamped annotations attached to a span. They appear in the trace view in Datadog and do not contribute to log ingestion costs.

5. Cost, sampling, and the observability layer cake

None of this happens in a vacuum. Logs and traces cost money, and in platforms like Datadog, the cost can grow surprisingly fast.

Where the cost comes from

Datadog charges for log ingestion (volume of data sent) and log indexing (what you make queryable and searchable). Traces have their own pricing based on ingested spans. The more you log and trace, the more you pay — but crucially, the relationship between cost and value is not linear. You can pay a lot for logs that add very little observability, or pay moderately for logs that give you everything you need.

The principle of appropriate tool for appropriate job

Every log line you write should be justifiable on its own merits:

Does this log line answer a question that a trace or metric cannot?
Is this the right level — would anyone ever filter for this in an incident?
Does this log at Debug that should be off in production by default?

Every span attribute you attach:

Is this data that would help isolate a performance problem or a business logic failure?
Is this already captured by auto-instrumentation?

OpenTelemetry provides automatic instrumentation for most frameworks — incoming HTTP requests, outbound HTTP calls, database queries via ORM. These are captured for free without a single line of manual instrumentation code. Before you write a manual span for a DB call, check whether AddEntityFrameworkCoreInstrumentation() (or its equivalent in your stack) is already capturing it.

Sampling for high-volume Debug logs

For services that handle very high request volumes, Debug-level logging in production is expensive even if the level is normally off, because you pay for it the moment you enable it. Use log sampling for scenarios where you need diagnostic detail but not for every request:

// Log debug detail for approximately 1% of requests
if (_random.NextDouble() < 0.01)
{
    _logger.LogDebug("Detailed cart state. {UserId} {ItemCount} {CartValue}",
        userId, cart.Items.Count, cart.TotalValue);
}

Datadog also supports dynamic log level changes at runtime without redeployment, which means you can turn Debug on temporarily for a specific service during an investigation, then turn it off — paying only for the window you need it.

The observability layer cake in one request

A well-instrumented request through a backend service looks like this:

At the trace level: Datadog APM shows a flame graph. The parent span is ProcessOrder, taking 320ms total. Child spans show FetchOrder (DB, 45ms), ChargePayment (external HTTP, 230ms), SaveOrder (DB, 12ms). Every span has order.id, user.id, and outcome status as attributes.

At the log level: Three log lines exist for this request: entry (Order processing started. OrderId=123), a business decision (Payment provider selected based on currency. Provider=Stripe OrderId=123), and exit (Order processing completed. OrderId=123 Status=Confirmed). All three carry TraceId and SpanId automatically via the OTel log bridge, so from any log line you can jump directly to the trace in Datadog.

At the metric level: Counters and histograms capture order throughput, payment success rate, and p99 latency — none of which requires a log line.

This is the story your observability stack should be able to tell about every request. Not every implementation will be perfect from day one, but this is the standard to aim for when reviewing and writing new code.

6. Automating the standard with an AI skill

Establishing these standards as a team policy is necessary but not sufficient. The gap between "we have a logging standard" and "every feature ships with quality observability" is enforcement — and enforcement is expensive when it relies entirely on code review.

One practical way to close that gap is to encode the standard into your AI coding agent's instructions, so it automatically reviews and corrects observability on every feature it touches.

The AGENT.md file in your repository root sets standing instructions for Claude Code and similar agents. The SKILL.md file is a structured reference the agent loads when performing observability review — covering wrapper detection, logging audit, tracing audit, and a post-task summary of everything changed.

Below is a generalised SKILL.md you can drop into your codebase. It is language-agnostic by design — the principles and checklist apply to any backend stack — but uses C# / .NET for all code examples. Adapt the library references in Step 1 to your language.

```markdown
---
name: observability-review
description: >
  Use this skill whenever you are working on any backend feature, bug fix, refactor, or code
  update and need to review, improve, or add logging and distributed tracing. Triggers include:
  any mention of "logging", "logs", "tracing", "spans", "OpenTelemetry", "observability",
  "structured logs", "Datadog", "Grafana", "Jaeger", "ILogger", "Serilog", "winston", "zap",
  "loguru", or any logging/tracing library in any language. Also trigger automatically whenever
  you create or modify a service, controller, repository, message consumer, background job, or
  HTTP client — even if the user has not explicitly mentioned logging. Logging and tracing review
  is part of the definition of done for every backend code change. If you are touching backend
  code, consult this skill.
---

# Observability Review Skill

## Purpose

Automatically audit and improve the logging and distributed tracing quality of every backend file
created or modified. This skill is language-agnostic — the principles apply equally to C#, Python,
Go, Java, Node.js, and any other backend language. Code examples use C# / .NET throughout, but
the patterns and rules translate directly to your stack.

---

## Language & Library Reference

Before applying this skill, identify the stack in use and map to the appropriate libraries:

| Language | Structured Logging | Tracing (OTel) |
| --- | --- | --- |
| C# / .NET | Microsoft.Extensions.Logging, Serilog | OpenTelemetry.Api (ActivitySource) |
| Python | structlog, loguru, logging (stdlib) | opentelemetry-sdk |
| Go | zap (Uber), zerolog, slog (stdlib 1.21+) | go.opentelemetry.io/otel/trace |
| Java / Kotlin | SLF4J + Logback, Log4j2 | opentelemetry-java |
| Node.js / TypeScript | pino, winston | @opentelemetry/sdk-node |
| Rust | tracing, log + env_logger | opentelemetry crate |

---

## Step 1 — Detect Existing Observability Wrappers

Do this before writing any logging or tracing code and before adding any dependencies.

Many teams wrap OpenTelemetry behind internal packages. Using them is mandatory if they exist.

Scan:

1. Dependency manifests (.csproj, package.json, pyproject.toml, go.mod, pom.xml) for internal
   packages referencing: Telemetry, Tracing, Observability, Logging, Instrumentation, Diagnostics
2. Files named telemetry.*, tracing.*, observability.* at any directory depth
3. Bootstrap / DI setup for non-standard calls like AddCompanyTelemetry(), setup_observability(),
   init_tracing() that are not from standard library packages
4. Config files for non-standard observability sections

Decision table:

- Wrapper exposes tracer factory → use it, do NOT initialise a new tracer
- Wrapper registers OTel at startup → do NOT call OTel setup again
- Wrapper provides logger setup → use it, do NOT reconfigure logging from scratch
- No wrapper found → proceed with raw OTel setup per Step 4

---

## Step 2 — Logging Audit

Log levels:

- Trace/Verbose: step-by-step execution, dev only, never on by default in prod
- Debug: diagnostic detail, off by default in prod, enable dynamically when needed
- Info/Information: normal expected business events
- Warn/Warning: recovered unexpected situations
- Error: operation failed and could not recover — must include exception and entity context
- Fatal/Critical: unrecoverable state / imminent crash

Rules:

- Always use named structured properties / fields — never plain string interpolation as the message
- Log at boundaries (HTTP handler, service entry/exit, DB calls, outbound HTTP, consumers, jobs)
- Log every significant branching decision and the reason the path was taken
- Every catch/except block must log at Error/Critical with the exception object — never swallow
- Never log passwords, tokens, secrets, or PII

C# example — structured vs unstructured:

  // CORRECT
  _logger.LogInformation(
      "Payment processed. {OrderId} {UserId} {AmountGbp} {Provider}",
      order.Id, order.UserId, order.Amount, provider.Name);

  // WRONG
  _logger.LogInformation($"Payment done for order {order.Id}");

C# example — exception logging:

  // CORRECT
  _logger.LogError(ex,
      "Failed to process payment. {OrderId} {PaymentProvider}", orderId, provider);

  // WRONG
  _logger.LogError("Payment failed");

---

## Step 3 — Tracing Audit

Add a manual span for every operation that:

- Has latency worth isolating in a flame graph
- Represents a discrete step in a business workflow
- Involves an external system (DB, HTTP, queue, file I/O)
- Is NOT already covered by auto-instrumentation

What auto-instrumentation typically covers (do NOT add manual spans):

- Incoming HTTP requests (ASP.NET Core, Express, FastAPI, Spring MVC)
- Outbound HTTP client calls
- DB queries via ORM (EF Core, SQLAlchemy, GORM, Hibernate)

Span pattern (C# — translate structure to your language):

  using var activity = Telemetry.Source.StartActivity("OperationName");
  activity?.SetTag("entity.id", entityId.ToString());

  try
  {
      var result = await DoWorkAsync();
      activity?.SetTag("result.count", result.Count);
      activity?.SetStatus(ActivityStatusCode.Ok);
      return result;
  }
  catch (Exception ex)
  {
      activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
      activity?.RecordException(ex);
      throw;
  }

Required attributes by type:

- Service/business: entity.id, entity.type, operation.outcome
- DB: db.operation, db.table, db.result.row_count
- External HTTP: external.service, external.operation, entity ID
- Message consumer: messaging.system, messaging.destination, messaging.message_id
- Background job: job.name, job.trigger, job.batch_size, job.records_processed

Use span events (not log lines) for mid-span milestones:

  activity?.AddEvent(new ActivityEvent("validation_passed",
      tags: new ActivityTagsCollection { { "rules_evaluated", ruleCount } }));

Span naming: PascalCase or snake_case verb+noun. ProcessPayment, FetchUserProfile.
Never: Method1, DoStuff, Handler, Run.

---

## Step 4 — Raw OTel Setup (only if no wrapper found in Step 1)

C# ActivitySource:

  public static class Telemetry
  {
      public static readonly ActivitySource Source =
          new ActivitySource("YourCompany.YourServiceName", "1.0.0");
  }

C# startup registration:

  builder.Services.AddOpenTelemetry()
      .WithTracing(tracing => tracing
          .AddSource(Telemetry.Source.Name)
          .AddAspNetCoreInstrumentation()
          .AddHttpClientInstrumentation()
          .AddEntityFrameworkCoreInstrumentation()
          .AddOtlpExporter());

C# log correlation (TraceId/SpanId on every log line):

  builder.Logging.AddOpenTelemetry(logging =>
  {
      logging.IncludeFormattedMessage = true;
      logging.IncludeScopes = true;
  });

For other languages, use the OTel log bridge / appender for your framework.

---

## Step 5 — Full Review Checklist

Logging:

- [ ] All log calls use named structured properties (no plain string interpolation)
- [ ] Log levels correct per Step 2 table
- [ ] Every catch/except logs at Error/Critical with exception object
- [ ] Entry and exit covered for every boundary type
- [ ] Branching decisions log the reason
- [ ] No secrets, tokens, passwords, or PII in any log property
- [ ] No logging inside tight loops without a guard

Tracing:

- [ ] Span exists for every DB call, external HTTP, message consumption, background job not covered by auto-instrumentation
- [ ] Every span has domain attributes at start and outcome attributes at end
- [ ] Every span sets Ok status on success or Error + RecordException on failure
- [ ] No duplicate spans over auto-instrumentation
- [ ] Span names are descriptive verb+noun
- [ ] Span events used for mid-span milestones

Correlation:

- [ ] trace_id and span_id flow into logs automatically via OTel log bridge or enricher
- [ ] Incoming correlation headers (X-Correlation-ID, traceparent) extracted and propagated

---

## Step 6 — Post-Task Summary

Always output after every code change:

  ## Observability Changes Made

  ### Wrapper Detection
  <"No wrapper found — used raw OTel" or wrapper details>

  ### Logging Updates
  - <file>: <what changed and why>

  ### Spans Added / Updated
  - <file>: <span name> | attributes: <list>

  ### Issues Requiring Manual Review
  - <anything needing human decision, e.g. unclear PII risk>

  ### Files With No Changes Needed
  - <file>: ✅ Observability already satisfactory

---

## References

- https://opentelemetry.io/docs/
- https://opentelemetry.io/docs/specs/semconv/
- https://docs.datadoghq.com/opentelemetry/
```

The question to ask of every log line you write is not "does this record that something happened?" It is: can someone on my team, in the middle of a busy working day with three other things on their plate, open this trace and reconstruct exactly what happened from the logs and spans alone — without needing to read the source code or ask the person who wrote it?

That is a high bar. It should be. The cost of not meeting it — in engineering time, in missed SLAs, in incidents that take four hours to diagnose instead of fifteen minutes — far exceeds the cost of writing good log lines in the first place.

Logs and traces are not a burden on top of feature development. They are how you prove your features work in production. Treat them that way, and your observability stack becomes one of the most valuable tools your team has.

Key takeaway
Log quality beats log quantity: structured fields, boundaries, decisions, and severities tell a story; traces and spans carry shape and timing; metrics carry aggregates. Split the work, justify every line and span attribute against cost, and correlate logs with traces. Encode the habit in agent skills if you have to — the payoff is faster incidents and a lower noise bill.

Found this useful? The SKILL.md above is free to use and adapt for your own codebase. Drop it in your skills directory and your AI coding agent can enforce these standards automatically on every feature it works on.