Java服务_服务全局异常处理与告警收敛实战

一、Java异常相关概念

1.异常类

Throwable类

Java中所有异常类的父类，它包含了最终要的两个类Exception和Error。

Error类

属于程序无法处理的错误，是JVM需要承担的，无法通过try-catch进行捕捉，例如系统崩溃、内存不足、堆栈溢出，编译器不会对这类异常进行检查，一旦发生就容易导致程序运行终止，仅靠程序本身无法恢复。

Exception类

程序本身可以处理的异常，可以通过catch进行捕捉，也是我们需要处理的，以保证程序能够正常运行。Exception又分为运行时异常（RunTimeException，又叫非受检查异常unchecked Exception）和非运行时异常（又叫受检查异常checked Exception）。

运行时异常我们可处理可不处理，一般是程序运行时程序逻辑错误引起，我们应该在编码时尽量避免这种错误，比如：NullPointException。

非运行时异常是Exception中除RunTimeException以外的异常，一般是代码编译时出错引起，比如：IOException、SQLException等这种异常，Java编译器会强制要求我们处理。

2.异常的处理方式

try-catch

try中放可能发生异常的代码，如果发生异常，后面的代码不会再执行，直接进入catch，在catch中拿到异常对象，我们进行处理。

try-catch-finally

finally是无论异常是否发生都会执行的，通常用来释放资源。

try-finally

相当于没有捕捉异常。

throws

在方法名后面进行抛出，表明该方法对此异常不进行处理，由调用者进行处理，谁用谁处理，调用者也可继续向上抛出。

throw

在方法内进行抛出，我们手动抛出一个异常对象。

3.注意事项

对于非运行时异常，程序必须进行处理，用try-catch或throws都可以，在写代码时idea会提示。一般会提示我们在方法后面加上throws。

对运行时异常，如果程序中没有显式使用上一节中异常处理方法处理，则默认使用处理方法时throws处理。

子类重写父类的方法时，对抛出异常的规定：子类重写的方法，所抛出的异常类型不能大于父类异常的类型，可以是一样的类型或者是父类异常的子类。

4.自定义异常

自定义异常类继承Exception或RunTimeException；

继承Exception属于非运行时异常；

继承RunTimeException属于运行时异常。

二、csdn博客配置全局异常处理实战

在项目中我们通常会写很多接口，各种各样的异常出现会让我们的返回结果很受影响，因为我们的接口都会写通用的返回格式，但是异常出现时返回的错误就和我们的返回格式产生分歧，所以为了保证这种情况不出现，我们就需要配置全局异常处理，在异常发生时也按照我们想要的返回格式。

核心：@RestControllerAdvice+@ExceptionHandler

1.准备工作

常见的操作码

/**
 * 枚举了一些常用API操作码
 */
public enum ResultCode implements IErrorCode {
    SUCCESS(200, "操作成功"),
    FAILED(400, "操作失败"),
    VALIDATE_FAILED(404, "参数检验失败"),
    UNAUTHORIZED(401, "暂未登录或token已经过期"),
    FORBIDDEN(403, "没有相关权限");
    private int code;
    private String message;

    private ResultCode(int code, String message) {
        this.code = code;
        this.message = message;
    }

    public int getCode() {
        return code;
    }

    public String getMessage() {
        return message;
    }
}

封装API的错误码

/**
 * 封装API的错误码
 */
public interface IErrorCode {
    int getCode();

    String getMessage();
}

通用的返回体

import com.lcp.fitness.common.api.IErrorCode;
import com.lcp.fitness.common.api.ResultCode;
import lombok.Data;

import java.io.Serializable;

@Data
public class CommonResponse<T> implements Serializable {

    private int code;
    private String msg;
    private T data;
    private boolean success;

    public CommonResponse(int code, String msg) {
        this.code = code;
        this.msg = msg;
    }

    public CommonResponse(int code, String msg, T data, boolean success) {
        this.code = code;
        this.msg = msg;
        this.data = data;
        this.success = success;
    }


    //失败返回结果
    public static <T> CommonResponse fail() {
        return new CommonResponse(ResultCode.FAILED.getCode(), ResultCode.FAILED.getMessage(), null, false);
    }

    //失败返回结果
    public static <T> CommonResponse fail(String msg) {
        return new CommonResponse(ResultCode.FAILED.getCode(), msg, null, false);
    }

    //失败返回结果
    public static <T> CommonResponse fail(IErrorCode errorCode) {
        return new CommonResponse(errorCode.getCode(), errorCode.getMessage(), null, false);
    }

    //失败返回结果
    public static <T> CommonResponse fail(IErrorCode errorCode, String msg) {
        return new CommonResponse(errorCode.getCode(), msg, null, false);
    }

    //失败返回结果
    public static <T> CommonResponse fail(int code, String msg) {
        return new CommonResponse(code, msg, null, false);
    }

    //成功返回结果
    public static <T> CommonResponse success() {
        return new CommonResponse(ResultCode.SUCCESS.getCode(), ResultCode.SUCCESS.getMessage(), null, true);
    }

    //成功返回结果
    public static <T> CommonResponse success(T data) {
        return new CommonResponse(ResultCode.SUCCESS.getCode(), ResultCode.SUCCESS.getMessage(), data, true);
    }

    //成功返回结果
    public static <T> CommonResponse success(String msg, T data) {
        return new CommonResponse(ResultCode.SUCCESS.getCode(), msg, data, true);
    }


    /**
     * 参数验证失败返回结果
     */
    public static <T> CommonResponse<T> validateFailed() {
        return fail(ResultCode.VALIDATE_FAILED);
    }

    /**
     * 参数验证失败返回结果
     * @param message 提示信息
     */
    public static <T> CommonResponse<T> validateFailed(String message) {
        return new CommonResponse<T>(ResultCode.VALIDATE_FAILED.getCode(), message, null, false);
    }

    /**
     * 未登录返回结果
     */
    public static <T> CommonResponse<T> unauthorized(T data) {
        return new CommonResponse<T>(ResultCode.UNAUTHORIZED.getCode(), ResultCode.UNAUTHORIZED.getMessage(), data, false);
    }

    /**
     * 未授权返回结果
     */
    public static <T> CommonResponse<T> forbidden(T data) {
        return new CommonResponse<T>(ResultCode.FORBIDDEN.getCode(), ResultCode.FORBIDDEN.getMessage(), data, false);
    }

2.全局异常处理实现

自定义我们的异常类

import com.lcp.fitness.common.api.IErrorCode;

/**
 * 自定义API异常
 */
public class ApiException extends RuntimeException {
    private IErrorCode errorCode;

    public ApiException(IErrorCode errorCode) {
        super(errorCode.getMessage());
        this.errorCode = errorCode;
    }

    public ApiException(String message) {
        super(message);
    }

    public ApiException(Throwable cause) {
        super(cause);
    }

    public ApiException(String message, Throwable cause) {
        super(message, cause);
    }

    public IErrorCode getErrorCode() {
        return errorCode;
    }
}

全局异常处理：

这里可以使用@RestControllerAdvice+@ExceptionHandler或者@ControllerAdvice+@ExceptionHandler+@ResponseBody，都是可以的，@RestControllerAdvice=@ControllerAdvice+@ResponseBody。

import com.lcp.fitness.utils.CommonResponse;
import lombok.extern.slf4j.Slf4j;
import org.springframework.security.access.AccessDeniedException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

/**
 * 全局异常处理
 */
@RestControllerAdvice
@Slf4j
public class GlobalExceptionHandler {

    @ExceptionHandler(value = ApiException.class)
    public CommonResponse handle(ApiException e) {
        log.error(e.getMessage());
        return CommonResponse.fail(e.getMessage());
    }

    @ExceptionHandler(value = Exception.class)
    public CommonResponse exception(Exception e) {
        log.error(e.getMessage(), e);
        return CommonResponse.fail(e.getMessage());
    }

    /**
     * springsecurity权限认证失败返回
     * @param e
     * @return
     */
    @ExceptionHandler(value = AccessDeniedException.class)
    public CommonResponse accessDeniedException(AccessDeniedException e) {
        log.error(e.getMessage());
        return CommonResponse.fail("用户无权限访问");
    }

}

这样在我们某个接口再有运行时异常时，就不会有奇奇怪怪的格式了，我们希望即使有错误也都是我们定义好的这种格式。但是记住，要想统一异常最重要的还是对于异常出现位置的主观判断，我们要判断出哪些地方可能出现哪些代码异常或业务逻辑异常，然后在对应位置进行异常的抓取和异常码填充。

3.特殊情况filter中的异常如何捕捉

从我们全局异常的注解名字@RestControllerAdvice我们也可以看出，他是针对controller层做了切面处理，也就是说如果异常最终出现在了controller层中，我们可以进行处理，但是如果请求根本就没有到达controller层，在前面的filter层就出现了异常并返回，那么就无法捕捉到。过滤器Filter可以在controller处理逻辑之前和之后加入一些其他逻辑，可以在controller之前进行验证和信息处理，或者在controller之后进行统计记录。

fliter案例

import com.lcp.fitness.common.component.RedisCache;
import com.lcp.fitness.dto.LoginUser;
import com.lcp.fitness.utils.JwtTokenUtil;
import io.jsonwebtoken.Claims;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.security.authentication.UsernamePasswordAuthenticationToken;
import org.springframework.security.core.context.SecurityContextHolder;
import org.springframework.stereotype.Component;
import org.springframework.util.StringUtils;
import org.springframework.web.filter.OncePerRequestFilter;

import javax.servlet.FilterChain;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.util.Objects;

@Component
public class JwtAuthenticationTokenFilter extends OncePerRequestFilter {

    @Autowired
    private RedisCache redisCache;

    @Override
    protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain) throws ServletException, IOException {
        //获取token
        String token = request.getHeader("Authorization");
        if (!StringUtils.hasText(token)) {
            //放行
            filterChain.doFilter(request, response);
            return;
        }
        //解析token
        String userId = null;
        try {
            Claims claims = JwtTokenUtil.parseJWT(token);
            userId = claims.getSubject();
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException("token无效");
        }
        //从redis中获取用户信息
        String redisKey = "login:" + userId;
        LoginUser loginUser = redisCache.getCacheObject(redisKey);
        if(Objects.isNull(loginUser)){
            throw new RuntimeException("用户未登录");
        }
        //存入SecurityContextHolder
        //TODO 获取权限信息封装到Authentication中
        UsernamePasswordAuthenticationToken authenticationToken =
                new UsernamePasswordAuthenticationToken(loginUser,null,null);
        SecurityContextHolder.getContext().setAuthentication(authenticationToken);
        //放行
        filterChain.doFilter(request, response);
    }
}

解决filter中不起作用，我们没有办法改变@RestControllerAdvice注解的作用域，我的解决思路是将filter中的异常扔到controller层中，为此需要定义一个controller，专门用来接收这些特殊情况的异常。

import com.lcp.fitness.common.exception.ApiException;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import javax.servlet.http.HttpServletRequest;

/**
 * 全局异常处理-filter中的异常处理（全局异常只能处理controller层的异常，而filter中的异常捕捉不到，
 * 所以需要将filter中的异常全部重定向到该controller中，以实现全局异常的统一处理格式）
 */
@RestController
@RequestMapping("/exception")
public class ExceptionController {

    @RequestMapping("/handler")
    public void exception(HttpServletRequest request) {
        String msg = (String) request.getAttribute("msg");
        throw new ApiException(msg);
    }
}

在filter中，将原来throw抛出异常的代码改成下面的代码，使用重定向将异常信息转到controller层中，再加上一个return结束filter中的代码，不再执行后续逻辑。

1
2
3

request.setAttribute("msg", "token无效");
request.getRequestDispatcher("/exception/handler").forward(request, response);
return;

参考文献

Java中如何统一处理异常

三、定义驱动全局异常处理与告警收敛实战

1.准备工作

常见的告警/异常编码

public enum UmpUserAlarmType {

    FILTER_EXCEPTION("维度筛选值为空", "请检查传参中筛选值是否为空。"),

    REQUEST_TIME_START_MORE_END_EXCEPTION("查询时间范围错误：开始时间 %s 大于结束时间 %s ", "请检查dt传参。"),

    REQUEST_TIME_TOO_LARGE_EXCEPTION("查询的时间范围错误：时间跨度太大。最大查询时间范围为 %s 天，实际查询时间范围为 %s 天", "请检查dt传参。"),

    REQUEST_TIME_EMPTY_EXCEPTION("查询时间为空", "请检查dt传参。"),

    EZD_TIMEOUT("EZD调用超时", "请根据UUID查看详细SQL并优化相关查询SQL。"),

    EZD_TABLE_NOT_EXIST("表 %s 不存在", "请检查创建表是否成功。"),

    EZD_COLUMN_NOT_EXIST("列 %s 不存在", "请检查表相关表的列是否存在。"),

    EZD_MEMORY_LIMIT("SQL执行超出CK集群内存限制", "请根据UUID查看详细SQL，并优化相关SQL或联系CK集群负责人调整阈值。"),

    EZD_SQL_EXCEPTION("SQL执行异常", "请根据UUID查看详细SQL并确认异常。"),

    EZD_DB_EXCEPTION("DB异常", "请根据UUID查看详细SQL并确认异常。"),

    EZD_IMPLEMENT_TIMEOUT("SQL执行时间超过CK集群限制时间 %s s", "请根据UUID查看详细SQL并优化相关查询SQL或者调整CK阈值。"),

    ROUTE_DIMENSION_EXCEPTION("配置中缺少维度【%s】", "请在逻辑表新增相关维度."),

    ROUTE_DIMENSION_COMBINE_EXCEPTION("预计算不支持相关聚合维度组合【%s】", "请检查【聚合维度】中是否存在."),

    ROUTE_METRIC_EXCEPTION("不支持相关指标查询", "请检查逻辑表配置。"),

    ROUTE_LAST_DAY_EXCEPTION("不支持动态函数lastday能力", "如需该能力，请补充逻辑表 【%s】 高级配置相关内容！"),

    ROUTE_EXCEPTION("路由不支持", "请联系定义驱动研发同事。");

    /**
     * errorMsg
     */
    private final String errorMsg;
    /**
     * 指导信息
     */
    private final String info;

    UmpUserAlarmType(String errorMsg, String info) {
        this.errorMsg = errorMsg;
        this.info = info;
    }

    public String geErrorMsg() {
        return errorMsg;
    }

    public String getInfo() {
        return info;
    }
}

通用的http返回体

@Data
public class HttpResponse<T> {

    /**
     * 序列化id
     */
    private static final long serialVersionUID = 2621934688487804918L;

    /**
     * header
     */
    private Header header;

    /**
     * body
     */
    private T body;

    /**
     * default
     */
    private HttpResponse() {
    }

    /**
     * default
     */
    private HttpResponse(Header header, T body) {
        this.header = header;
        this.body = body;
    }

    /**
     * 成功返回
     */
    public static <T> HttpResponse<T> success(T body) {
        Header header = new Header();
        header.setCode(String.valueOf(ResponseCode.SUCCESS.getCode()));
        header.setDesc(ResponseCode.SUCCESS.getMessage());
        return new HttpResponse<>(header, body);
    }

    /**
     * 失败返回
     */
    public static <T> HttpResponse<T> fail(ResponseCode err) {
        Header header = new Header();
        header.setCode(String.valueOf(err.getCode()));
        header.setDesc(err.getMessage());
        return new HttpResponse<>(header, null);
    }

    /**
     * 失败返回
     */
    public static <T> HttpResponse<T> fail(int code, String message) {
        Header header = new Header();
        header.setCode(String.valueOf(code));
        header.setDesc(message);
        return new HttpResponse<>(header, null);
    }

    /**
     * 失败返回
     */
    public static <T> HttpResponse<T> fail(String message) {
        return fail(-1, message);
    }
}

2.全局异常处理实现

自定义异常类

@Slf4j
public class CommonException extends RuntimeException {

    /**
     * 异常构造方法
     * @param message
     */
    public CommonException(String message) {
        super(message);
    }

    /**
     * 抛异常
     */
    public static void throwCommonCommonException(String msg) {
        log.info(msg);
        throw new CommonException(msg);
    }

    /**
     * 抛异常_不存在
     */
    public static void throwNotFoundException(String entity, Long id) {
        String msg = String.format("%s %s not found", entity, id);
        throwCommonCommonException(msg);
    }
}

全局异常处理类

@Slf4j
@RestControllerAdvice
public class ExceptionAdvice {

    @ExceptionHandler(value = Exception.class)
    public HttpResponse exceptionHandler(Exception ex) {
        if (ex instanceof RuntimeException) {
            log.info("runtime exception", ex);
            return fail(ex.getMessage());
        }
        if (ex instanceof MethodArgumentNotValidException) {
            log.info("validation exception", ex);
            String msg = substringAfterLast(ex.getMessage(), "default message");
            return fail(msg);
        }
        //对import javax.validation.constraints.NotNull校验统一拦截
        if (ex instanceof BindException) {
            log.info("validation exception", ex);
            String msg = substringAfterLast(ex.getMessage(), "default message");
            return fail(msg);
        }
        log.info("uncaught exception: ", ex);
        return fail(SYSTEM_ERROR);
    }
}

3.判断业务逻辑异常并抓取和告警

业务逻辑异常抓得准和抓得全非常重要。

/**
 * 根据开始结束日期补充查询时间
 */
private static List<String> replenishDateRangeByStartEnd(String startDate, String endDate) {
    List<String> dateList = Lists.newArrayList();
    String queryStartDate = getDateWithoutTime(startDate);
    String queryEndDate = getDateWithoutTime(endDate);
    int dateNum = 0;
    for (String date = queryStartDate; date.compareTo(queryEndDate) <= 0; date = getDateAgo(date, -1)) {
        dateList.add(date);
        dateNum += 1;
    }
    // 日期范围多大
    if (dateNum >= MAX_QUERY_DAYS) {
        log.error("getMatchedDateList fail query date range larger than {} startDate {} endDate {}", MAX_QUERY_DAYS, startDate, endDate);
        String errorMsg = String.format(UmpUserAlarmType.REQUEST_TIME_TOO_LARGE_EXCEPTION.geErrorMsg(), MAX_QUERY_DAYS, dateNum);
        UmpUtil.alarmUser(errorMsg, UmpUserAlarmType.REQUEST_TIME_TOO_LARGE_EXCEPTION.getInfo());
        throw new CommonException(String.format(DRIVE_GET_DATE_ERROR_DATE_RANGE_TOO_LARGE.getDesc(), MAX_QUERY_DAYS, dateNum));
    }
    return dateList;
}

四、异常处理规约制定实战

背景：定义驱动服务建设初期，开发人员繁多，业务快速迭代，自定义异常类冗杂繁多，规则不一，对应异常信息和告警信息意义不明或者难以理解等现象严重。后续日积月累，对用户透出系统指引信息混乱，对研发异常告警排查造成巨大负担，过多的不分轻重无意义告警也导致研发响应积极性大大降低。在该背景下，对以web/rpc等形式协议透出给用户或下游系统的异常信息，触达给研发的告警信息进行统一梳理、归类和整改，制定详细的归类标准，形成清晰明了的增量规约，显得非常重要。

1.定义驱动服务异常原因分析及归纳

根据上述问题分析，可将目前出现的异常归类以下内容：

1.代码规范

1.1 异常处理未形成有效的规范，研发根据自身习惯使用异常处理

1.2 常规参数校验不完善

1.2.1 日期校验不完善

1.2.2 数组、列表等传参未判空校验（index size相关报警）

2.协议规范

2.1 标准协议未校验协议所必要的参数：trend_type能力未合理校验（dt参数不存在），直接空指针

2.2 不支持的协议未按照合理的提示输出给用户，而是直接异常：NotNullExpression协议不支持

3.业务流程规范

3.1 相关业务校验应该完善

3.2 业务的增删改查操作务必要保留日志现场

4.代码bug

4.1 retrying报警

2.系统梳理

现存异常处理类

系统当前存在的异常总共十三种包含ExternalApiException、CommonException、BizHandleException、JsonProcessingException、RuntimeException、Exception、EzdJsfRouterException、MetaPullException、ParamException、UserException、IllegalArgumentException。

存在的风险及问题：

异常类数量过多
部分异常含义不明确
部分异常存在重叠内容，本质可以合并

解决方案

约定统一规范的异常处理类，研发应当按照统一的规范进行异常处理，以系统功能范围为异常边界：

UserException：用户异常相关、即用户可以处理的异常；
InteractionException：交互异常相关、即BE所有的异常处理；
ProductionException：生产异常相关、即生产加速策略相关的所有异常处理；
RouteException：查询异常相关、即数据查询服务相关的所有异常处理；
CommonException：当不知道或无法明确异常范围时使用的公共兜底异常。

针对细节性内容可以作为子集再细分为：

ExternalApiException：第三方调用异常
ParamException：传参异常
JsonException：Json序列化异常
等等

报警、报错根据上述分类同步匹配处理。

3.制定规约

规约分为报错异常和报警异常：报错面向用户，报警面向研发。所以我们需要考虑用户群体的理解问题。基于此，异常处理信息应该让接收的人看到具体的信息理解并可以解决。报错不等于报警，报警一定是报错。系统预期外的异常为需要报警的报错；部分系统预期内异常为无须报警的报错，如用户入参错误等；部分系统预期内异常为需要报警的报错，如调用下游时间超时等。

针对预期的定义

查询：以标准协议内容为预期内，不属于标准协议的为预期外。
BE：以业务流程为预期内，非业务流程为预期外。

3.1 用户-报错

预期内
1. 报错应该包含【报错主体】【错误信息】【解决方案】
2. 1. 报错主体：比如逻辑表编辑报错，应该包含逻辑表ID；比如修饰，应该包含修饰ID；查询报错应该报错查询的逻辑表信息，没有的给出其他报错主体
  2. 解决方案：应该让用户可以看明白，并且可以自助解决问题，而不是直接抛出研发认为的异常信息
预期外
1. 报错应该包含使用功能，如【编辑逻辑表失败】
2. 报错应该返回统一异常信息：如【系统异常，请联系研发查看】
3. 最终结构应该为，如：编辑逻辑表失败，系统异常，请联系研发查看。

3.2 研发-报警

预期内：
1. 报警应该包含【等级（严重与否）】【环境】【功能范围】【机器】【异常信息】【应急预案（可根据分级）】
2. 1. 等级（严重与否）：是为了解决时间成本的问题，比如：当值班被紧急事务影响的时候，其他同事注意相关问题，可以根据等级本身来决定是否介入。
  2. 功能范围：基于当前未拆分的架构前提下，明确报警需要处理的人员范围
  3. 异常信息：应该尽可能的包含异常相关的具体信息
  4. 应急预案：针对部分需要切流等特殊场景，应该有匹配的应急预案，因为部分报警的目的是为了做系统切换，而不是代表系统完全不可用
预期外
1. 报警应该使用统一的异常兜底逻辑
2. 应当尽可能包含如下详细信息：异常的功能点、异常类、以及异常本身。
3. 1. 异常的功能点：为了尽快明确相关研发，降低排查时间
  2. 异常类：快速定位问题点
  3. 异常本身：部分异常，比如空指针异常，可以快速排查出结果。
可用率报警
1. 服务可用率【重要】
2. 1. 可用率口径统一【重要】，不能让错误内容干扰可用率正常展现
  2. 可用率范围
  3. 1. 查询服务可用率——IntelligentProduceService接口可用率（后续架构拆分，可按照拆分后细化可用率）
    2. 生产侧服务可用率——加速策略
    3. 配置侧服务可用率——逻辑表编辑、数据源创建等
3. 容器可用率：JDOS、负载均衡、磁盘等等
补充：异常可以分为：记录类异常、非记录类异常。
1. 记录类异常要有详细信息。
2. 非记录类异常要有通知机制，即【报警机制】。