Java 双层for循环

在日常开发中，经常会遇到两个集合之间做数据匹配的场景。

比如：

用户列表匹配用户备注；
订单列表匹配支付记录；
商品列表匹配库存信息；
部门列表匹配员工信息；
主表数据匹配明细数据。

很多时候，最直接的写法是双层 for 循环。

数据量小的时候，这种写法问题不明显。

但当数据量变大后，双层循环的性能问题会非常明显。

本文通过一个简单示例记录一下这个问题，以及如何用 Map 进行优化。

一、示例场景

假设现在有两个集合。

第一个是用户列表：

List<User> userList;

第二个是用户备注列表：

List<UserMemo> userMemoList;

需求是：

遍历用户列表，根据 userId 从用户备注列表中找到对应的 content，然后进行业务处理。

用户类：

import lombok.Data;

@Data
public class User {

    private Long userId;

    private String name;
}

用户备注类：

import lombok.Data;

@Data
public class UserMemo {

    private Long userId;

    private String content;
}

二、准备测试数据

模拟 5 万条用户数据：

public static List<User> getUserTestList() {
    List<User> users = new ArrayList<>();

    for (int i = 1; i <= 50000; i++) {
        User user = new User();

        user.setUserId((long) i);
        user.setName(UUID.randomUUID().toString());

        users.add(user);
    }

    return users;
}

模拟 3 万条用户备注数据：

public static List<UserMemo> getUserMemoTestList() {
    List<UserMemo> userMemos = new ArrayList<>();

    for (int i = 30000; i >= 1; i--) {
        UserMemo userMemo = new UserMemo();

        userMemo.setUserId((long) i);
        userMemo.setContent(UUID.randomUUID().toString());

        userMemos.add(userMemo);
    }

    return userMemos;
}

这里的数据量是：

userList：50000 条
userMemoList：30000 条

三、常见写法：双层 for 循环

最容易想到的写法是双层循环。

public static void main(String[] args) {
    List<User> userTestList = getUserTestList();
    List<UserMemo> userMemoTestList = getUserMemoTestList();

    StopWatch stopWatch = new StopWatch();
    stopWatch.start();

    for (User user : userTestList) {
        Long userId = user.getUserId();

        for (UserMemo userMemo : userMemoTestList) {
            if (userId.equals(userMemo.getUserId())) {
                String content = userMemo.getContent();

                System.out.println("模拟 content 业务处理：" + content);
            }
        }
    }

    stopWatch.stop();

    System.out.println("最终耗时：" + stopWatch.getTotalTimeMillis());
}

这段代码的逻辑很直观：

遍历每一个 User
  再遍历每一个 UserMemo
    判断 userId 是否相等

但性能问题也很明显。

这段代码的比较次数大约是：

50000 * 30000 = 1500000000

也就是 15 亿次匹配判断。

数据量一大，耗时会明显增加。

四、如果只匹配一条数据，至少要加 break

如果业务场景是：

每个 userId 在 userMemoList 中最多只对应一条记录。

那么当内层循环找到匹配数据后，就没有必要继续往后遍历了。

可以加上 break。

public static void main(String[] args) {
    List<User> userTestList = getUserTestList();
    List<UserMemo> userMemoTestList = getUserMemoTestList();

    StopWatch stopWatch = new StopWatch();
    stopWatch.start();

    for (User user : userTestList) {
        Long userId = user.getUserId();

        for (UserMemo userMemo : userMemoTestList) {
            if (userId.equals(userMemo.getUserId())) {
                String content = userMemo.getContent();

                System.out.println("模拟 content 业务处理：" + content);

                break;
            }
        }
    }

    stopWatch.stop();

    System.out.println("最终耗时：" + stopWatch.getTotalTimeMillis());
}

加上：break后，找到匹配项就会退出内层循环。

这样可以减少一部分无效遍历。

不过需要注意：

break 只能优化一对一匹配场景。
如果一个用户可能有多条备注，就不能随便加 break。

即使加了 break，整体仍然是双层循环。

时间复杂度依然接近：

O(n * m)

只是减少了一部分实际执行次数。

五、更好的方式：先转成 Map

如果是根据 userId 做匹配，更好的方式是先把 userMemoList 转成 Map。

Map<Long, String> contentMap = userMemoTestList.stream()
        .collect(Collectors.toMap(
                UserMemo::getUserId,
                UserMemo::getContent
        ));

然后遍历用户列表时，直接根据 userId 从 Map 中取值。

public static void main(String[] args) {
    List<User> userTestList = getUserTestList();
    List<UserMemo> userMemoTestList = getUserMemoTestList();

    StopWatch stopWatch = new StopWatch();
    stopWatch.start();

    Map<Long, String> contentMap = userMemoTestList.stream()
            .collect(Collectors.toMap(
                    UserMemo::getUserId,
                    UserMemo::getContent
            ));

    for (User user : userTestList) {
        Long userId = user.getUserId();

        String content = contentMap.get(userId);

        if (StringUtils.hasLength(content)) {
            System.out.println("模拟 content 业务处理：" + content);
        }
    }

    stopWatch.stop();

    System.out.println("最终耗时：" + stopWatch.getTotalTimeMillis());
}

这种写法分成两步：

第一步，构建索引：

遍历 userMemoList，生成 Map

第二步，查询匹配：

遍历 userList，通过 userId 从 Map 中获取 content

整体复杂度从：

O(n * m)

变成了：

O(n + m)

在数据量较大时，性能差异会非常明显。

六、为什么 Map 会快很多

双层循环的本质是：

每拿到一个 userId，都要去 userMemoList 里从头到尾找一遍。

这就像每次都重新扫表。

而使用 Map 后，相当于先对 userMemoList 建了一个索引。

userId -> content

后续查询时直接：

contentMap.get(userId);

大多数情况下，HashMap#get 的查询效率接近 O(1)。

因此整体性能会比双层循环好很多。

可以简单理解为：

双层 for：每次都从列表里查
Map：先建索引，再按 key 查

这和数据库查询中“有没有索引”的差别类似。

七、需要注意重复 key 的问题

上面的写法：

Map<Long, String> contentMap = userMemoTestList.stream()
        .collect(Collectors.toMap(
                UserMemo::getUserId,
                UserMemo::getContent
        ));

有一个前提：

userMemoList 中的 userId 不能重复。

如果存在重复 userId，Collectors.toMap() 会抛出异常。

例如：

Duplicate key

如果业务上允许重复，需要指定合并策略。

例如保留第一条：

Map<Long, String> contentMap = userMemoTestList.stream()
        .collect(Collectors.toMap(
                UserMemo::getUserId,
                UserMemo::getContent,
                (oldValue, newValue) -> oldValue
        ));

或者保留最后一条：

Map<Long, String> contentMap = userMemoTestList.stream()
        .collect(Collectors.toMap(
                UserMemo::getUserId,
                UserMemo::getContent,
                (oldValue, newValue) -> newValue
        ));

如果一个 userId 对应多条备注，则应该转成：

Map<Long, List<UserMemo>>

例如：

Map<Long, List<UserMemo>> memoMap = userMemoTestList.stream()
        .collect(Collectors.groupingBy(UserMemo::getUserId));

然后处理时：

for (User user : userTestList) {
    List<UserMemo> memos = memoMap.get(user.getUserId());

    if (memos == null || memos.isEmpty()) {
        continue;
    }

    for (UserMemo memo : memos) {
        System.out.println("模拟 content 业务处理：" + memo.getContent());
    }
}

八、Map 优化适合哪些场景

这种优化方式适合以下场景：

1. 两个集合按某个 key 关联

例如：

userId
orderId
productId
deptId
tenantId

2. 一个集合是主数据，一个集合是补充数据

例如：

用户列表 + 用户备注
订单列表 + 支付记录
商品列表 + 库存信息
部门列表 + 员工数量

3. 数据量较大

如果只有几十条数据，双层循环问题不大。

但如果数据达到几万、几十万，建议优先考虑 Map。

4. 查询条件可以作为唯一 key

如果能用某个字段作为 key 直接查找，就非常适合提前构建 Map。

九、什么时候不能直接用 Map 替代

也不是所有双层循环都可以直接改成 Map。

下面这些场景需要谨慎。

1. 匹配条件不是等值匹配

例如：

if (order.getAmount().compareTo(rule.getMin()) >= 0
        && order.getAmount().compareTo(rule.getMax()) <= 0) {
    // ...
}

这种范围匹配不能简单用 Map 解决。

2. 一个 key 对应多个值

如果是 1 对多关系，不能简单转成 Map<K, V>。

应该使用：

Map<K, List<V>>

3. 数据量很小

如果两个集合都只有几条数据，没必要过度优化。

直接双层循环反而更直观。

4. 内存非常敏感

构建 Map 需要额外内存。

如果数据量非常大，需要综合考虑内存占用。

十、实际开发中的建议

1. 数据量小，优先可读性

如果数据量很小，双层循环可以接受。

不要为了优化而让代码变复杂。

2. 一对一匹配时，找到后加 break

如果确认只需要找一条数据，内层循环匹配成功后应该加 break。

if (matched) {
    // 处理逻辑
    break;
}

3. 大数据量匹配，优先考虑 Map

如果两个集合都有几千条以上，并且是按 key 匹配，优先考虑转 Map。

4. 注意重复 key

使用 Collectors.toMap() 时，一定要确认 key 是否唯一。

不唯一时，要么指定合并策略，要么使用 groupingBy()。

5. 不要在循环里频繁查数据库

双层循环已经可能很慢。

更严重的是在循环里查数据库：

for (User user : userList) {
    UserMemo memo = userMemoMapper.selectByUserId(user.getUserId());
}

这会产生典型的 N+1 查询问题。

更好的方式是：

一次性查出需要的数据；
转成 Map；
在内存中匹配。

结论

双层 for 循环在小数据量场景下问题不大，但当两个集合数据量都比较大时，性能问题会非常明显。

如果是两个集合根据某个 key 做等值匹配，建议先把被匹配集合转成 Map。

例如：

Map<Long, String> contentMap = userMemoList.stream()
        .collect(Collectors.toMap(
                UserMemo::getUserId,
                UserMemo::getContent
        ));

然后遍历主集合时直接通过 key 获取：

String content = contentMap.get(user.getUserId());

这种方式可以把时间复杂度从：

O(n * m)

优化到：

两个大集合按 key 匹配时，不要习惯性写双层 for。
先把被匹配集合转成 Map，再做关联处理，通常会更清晰，也更高效。