Serilog + OpenTelemetry:从请求日志到链路追踪的关联落地

这篇直接给落地方案,不再讲结构化日志的背景概念。目标只有一个:在 ASP.NET Core 服务里,把 Serilog 日志和 OpenTelemetry 链路追踪打通,排障时可以从一条错误日志直接跳到完整 Trace。

1. 问题背景:这篇要交付什么

按下面步骤做完,你会得到一条可执行排障链路:

  1. 日志里稳定带上 TraceIdSpanIdRequestPathRequestMethod
  2. 业务日志统一输出 OrderIdTenantId 这类主键字段
  3. 下游 HTTP 调用失败时,日志可直接回跳到同一 TraceId
  4. 现场排障固定为 3 步:查日志 -> 拿 TraceId -> 打开链路

这篇示例默认运行环境:

  • .NET 8
  • Serilog
  • OpenTelemetry + OTLP

2. 原理解析:实施前约定与字段契约

先把约定定下来,后面配置才不会反复返工。

2.1 平台层字段(必须统一)

  • TraceId:整条请求链路唯一键
  • SpanId:当前节点唯一键
  • RequestPath:请求路径
  • RequestMethod:请求方法
  • StatusCode:响应码

2.2 业务层字段(按场景补充)

  • 订单域:OrderIdCustomerId
  • 多租户:TenantId
  • 外部依赖:DependencyNameDownstreamStatusCode

2.3 字段命名规则(一次定死)

  • 统一使用 PascalCase
  • 同义字段只保留一个名字,比如只用 TraceId
  • 业务字段保持稳定,不随文案调整

3. 示例代码:按步骤落地

3.1 安装依赖

  • Serilog.AspNetCore
  • Serilog.Sinks.Console
  • Serilog.Enrichers.Environment
  • OpenTelemetry.Extensions.Hosting
  • OpenTelemetry.Instrumentation.AspNetCore
  • OpenTelemetry.Instrumentation.Http
  • OpenTelemetry.Exporter.OpenTelemetryProtocol

3.2 第一步:配置 Serilog 与 OpenTelemetry

把这段放进 Program.cs,先打通基础链路:

using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
using Serilog;
using Serilog.Events;

var builder = WebApplication.CreateBuilder(args);

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Information()
    .MinimumLevel.Override("Microsoft", LogEventLevel.Warning)
    .Enrich.FromLogContext()
    .Enrich.WithEnvironmentName()
    .WriteTo.Console(outputTemplate:
        "[{Timestamp:HH:mm:ss} {Level:u3}] TraceId={TraceId} SpanId={SpanId} {Message:lj}{NewLine}{Exception}")
    .CreateLogger();

builder.Host.UseSerilog();

const string serviceName = "order-api";

builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource.AddService(serviceName))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://localhost:4318/v1/traces");
            options.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.HttpProtobuf;
        }));

builder.Services.AddHttpClient("payment");

var app = builder.Build();

3.3 第二步:入口中间件统一注入链路字段

using System.Diagnostics;
using Serilog.Context;

app.Use(async (context, next) =>
{
    var activity = Activity.Current;

    using (LogContext.PushProperty("TraceId", activity?.TraceId.ToString() ?? string.Empty))
    using (LogContext.PushProperty("SpanId", activity?.SpanId.ToString() ?? string.Empty))
    using (LogContext.PushProperty("RequestPath", context.Request.Path.Value ?? string.Empty))
    using (LogContext.PushProperty("RequestMethod", context.Request.Method))
    {
        await next();
    }
});

这一步做完后,应用日志会自动带上 TraceIdSpanId

3.4 第三步:业务接口输出结构化日志并串联下游调用

app.MapPost("/api/orders/{orderId:long}/confirm", async (
    long orderId,
    IHttpClientFactory httpClientFactory,
    ILogger<Program> logger,
    CancellationToken ct) =>
{
    logger.LogInformation("Start confirm order. OrderId={OrderId}", orderId);

    var client = httpClientFactory.CreateClient("payment");
    using var response = await client.PostAsync(
        $"https://payment.internal/api/payments/{orderId}/capture",
        content: null,
        ct);

    if (!response.IsSuccessStatusCode)
    {
        logger.LogError(
            "Confirm order failed by downstream. OrderId={OrderId}, StatusCode={StatusCode}",
            orderId,
            (int)response.StatusCode);

        return Results.Problem(title: "confirm order failed", statusCode: 502);
    }

    logger.LogInformation("Confirm order succeeded. OrderId={OrderId}", orderId);
    return Results.Ok(new { orderId, status = "Confirmed" });
});

await app.RunAsync();

3.5 第四步:给出统一错误日志模板

错误日志建议统一按这个模板输出:

logger.LogError(
    "Order confirm failed. TraceId={TraceId}, SpanId={SpanId}, OrderId={OrderId}, StatusCode={StatusCode}, DependencyName={DependencyName}",
    Activity.Current?.TraceId.ToString() ?? string.Empty,
    Activity.Current?.SpanId.ToString() ?? string.Empty,
    orderId,
    (int)response.StatusCode,
    "payment-service");

3.6 第五步:现场排障固定流程

  1. 在日志平台按 OrderId 或错误码找到失败事件
  2. 拿到日志里的 TraceId
  3. 在 APM 按 TraceId 打开完整链路
  4. 看 span 耗时占比,确认瓶颈在入口、中间件、下游 HTTP 还是数据库

3.7 常见误配与修正

// 误配 1:只打文本,不打字段
logger.LogError($"confirm order failed, order={orderId}, trace={Activity.Current?.TraceId}");

// 修正:字段化输出,日志平台可检索
logger.LogError(
    "Confirm order failed. OrderId={OrderId}, TraceId={TraceId}",
    orderId,
    Activity.Current?.TraceId.ToString() ?? string.Empty);

4. 总结

这套方案落地后,现场排障可以固定成可执行动作:先查失败日志,再拿 TraceId 回跳链路,最后按 span 耗时定位瓶颈段。