JAVA28-正则表达式

常用的元字符

正则字符 解释
^ 匹配开始的位置
$ 匹配结束的位置
. 匹配单个任意字符
\w 匹配单个单词
\s 匹配单个空白字符
\d 匹配单个数字字符
\b 开始或结尾一定要是一个单词

次数匹配

正则字符 解释
* 匹配0次或多次
+ 匹配1次或多次
? 匹配0次或1次
{n} 匹配n次
{n,} 匹配>=n次
{n,m} 匹配n到m次

选择匹配

正则字符 解释
[aeiou] 匹配单个字符a,e,i,o,u中的任意一个
[0-9] 匹配数字0到9
[A-Z] 匹配大写字母
[A-Z0-9_] 匹配大写字母或数字或下划线

匹配反义

正则字符 解释
[^aeiou] 匹配除了a,e,i,o,u之外的任意单个字符
[^x] 匹配单个非x的字符
\W 匹配非单个单词
\S 匹配单个非空白字符
\D 匹配单个非数字字符
\B 开始或结尾不能是一个单词

分组与捕获

正则字符 解释
() 使用括号来指定一个被捕获的分组,分组的编号从1开始,而0表示整个匹配的字符串,1开始才是分组
(?:) 不捕获和分配编号,括号只用与分组或标记优先级
(?s).* .匹配换行符

Java中使用正则表达式

  • Pattern.compile
  • pattern.matcher
  • Matcher.find
  • Matcher.group
  • String.replaceAll

正则练习1,判断一个字符串是不是合法的固定电话号码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public class PhoneNumberMatcher {
    private static final Pattern pattern = Pattern.compile("([0]\\d{2,3})-([^0]\\d{6,7})");

    // 请编写一个函数,判断一个字符串是不是合法的固定电话号码
    // 合法的固定电话号码为:区号-号码
    // 其中区号以0开头,三位或者四位
    // 号码以非零开头,七位或者八位
    // 三位区号后面只能跟八位电话号码
    // 合法的电话号码示例:
    // 021-12345678
    // 0571-12345678
    // 0373-1234567
    // 不合法的电话号码示例:
    // 02134-1234 位数不对
    // 123-45678901 区号必须以0开头
    // 021-1234567 三位区号后面只能跟八位电话号码
    public static boolean isPhoneNumber(String str) {
        Matcher matcher = pattern.matcher(str);
        boolean isMatcher = matcher.find();
        if (isMatcher) {
            if (matcher.group(1).length() == 3) {
                return matcher.group(2).length() == 8;
            }
        }
        return isMatcher;
    }
}

正则练习2,传入日志字符串,将每行开头的时间戳删除

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public class LogProcessor {
    // 传入日志字符串,将每行开头的时间戳删除
    // 返回删除时间戳后的字符串
    // 例如,输入字符串:
    //
    // [2019-08-01 21:24:41] bt3102 (11m:21s)
    // [2019-08-01 21:24:42] TeamCity server version is 2019.1.1 (build 66192)
    // [2019-08-01 21:24:43] Collecting changes in 2 VCS roots (22s)
    //
    // 返回结果:
    //
    // bt3102 (11m:21s)
    // TeamCity server version is 2019.1.1 (build 66192)
    // Collecting changes in 2 VCS roots (22s)
    public static String process(String log) {
       return log.replaceAll("\\[\\d{4}-\\d{2}-\\d{2} (?:\\d{2}:){2}\\d{2}]\\s\\b", "");
    }

    public static void main(String[] args) {
        String str =
                "[2019-08-01 21:24:41] bt3102 (11m:21s)\n"
                        + "[2019-08-01 21:24:42] TeamCity server version is 2019.1.1 (build 66192)\n"
                        + "[2019-08-01 21:24:43] Collecting changes in 2 VCS roots (22s)\n";

        System.out.println(process(str));
    }
}

正则练习3,请从中提取GC活动的信息,每行提取出一个GCActivity对象

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
public class GCLogAnalyzer {
    private static final Pattern pattern = Pattern.compile("\\[PSYoungGen: (\\d+)K->(\\d+)K\\((\\d+)K\\)].*\\s(\\d+)K->(\\d+)K\\((\\d+)K\\).*user=(\\d\\.\\d+)(?:\\s?).*sys=(\\d\\.\\d+)(?:\\s?).*real=(\\d\\.\\d+)");

    // 在本项目的根目录下有一个gc.log文件,是JVM的GC日志
    // 请从中提取GC活动的信息,每行提取出一个GCActivity对象
    //
    // 2019-08-21T07:48:17.401+0200: 2.924: [GC (Allocation Failure) [PSYoungGen:
    // 393216K->6384K(458752K)] 416282K->29459K(1507328K), 0.0051622 secs] [Times: user=0.02
    // sys=0.00, real=0.01 secs]
    // 例如,对于上面这行GC日志,
    // [PSYoungGen: 393216K->6384K(458752K)] 代表JVM的年轻代总内存为458752,经过GC后已用内存从393216下降到了6384
    // 416282K->29459K(1507328K) 代表JVM总堆内存1507328,经过GC后已用内存从416282下降到了29459
    // user=0.02 sys=0.00, real=0.01 分别代表用户态消耗的时间、系统调用消耗的时间和物理世界真实流逝的时间
    // 请将这些信息解析成一个GCActivity类的实例
    // 如果某行中不包含这些数据,请直接忽略该行
    public static List<GCActivity> parse(File gcLog) throws IOException {
        List<String> lines = Files.readAllLines(gcLog.toPath());
        return lines.stream().map(pattern::matcher)
                .filter(Matcher::find)
                .map(matcher -> new GCActivity(Integer.parseInt(matcher.group(1)),
                        Integer.parseInt(matcher.group(2)),
                        Integer.parseInt(matcher.group(3)),
                        Integer.parseInt(matcher.group(4)),
                        Integer.parseInt(matcher.group(5)),
                        Integer.parseInt(matcher.group(6)),
                        Double.parseDouble(matcher.group(7)),
                        Double.parseDouble(matcher.group(8)),
                        Double.parseDouble(matcher.group(9))))
                .collect(Collectors.toList());
    }

    public static void main(String[] args) throws IOException {
        List<GCActivity> activities = parse(new File("gc.log"));
        activities.forEach(System.out::println);
    }

    public static class GCActivity {
        // 年轻代GC前内存占用,单位K
        int youngGenBefore;
        // 年轻代GC后内存占用,单位K
        int youngGenAfter;
        // 年轻代总内存,单位K
        int youngGenTotal;
        // JVM堆GC前内存占用,单位K
        int heapBefore;
        // JVM堆GC后内存占用,单位K
        int heapAfter;
        // JVM堆总内存,单位K
        int heapTotal;
        // 用户态时间
        double user;
        // 系统调用消耗时间
        double sys;
        // 物理世界流逝的时间
        double real;

        public GCActivity(
                int youngGenBefore,
                int youngGenAfter,
                int youngGenTotal,
                int heapBefore,
                int heapAfter,
                int heapTotal,
                double user,
                double sys,
                double real) {
            this.youngGenBefore = youngGenBefore;
            this.youngGenAfter = youngGenAfter;
            this.youngGenTotal = youngGenTotal;
            this.heapBefore = heapBefore;
            this.heapAfter = heapAfter;
            this.heapTotal = heapTotal;
            this.user = user;
            this.sys = sys;
            this.real = real;
        }

        @Override
        public String toString() {
            return "GCActivity{"
                    + "youngGenBefore="
                    + youngGenBefore
                    + ", youngGenAfter="
                    + youngGenAfter
                    + ", youngGenTotal="
                    + youngGenTotal
                    + ", heapBefore="
                    + heapBefore
                    + ", heapAfter="
                    + heapAfter
                    + ", heapTotal="
                    + heapTotal
                    + ", user="
                    + user
                    + ", sys="
                    + sys
                    + ", real="
                    + real
                    + '}';
        }

        public int getYoungGenBefore() {
            return youngGenBefore;
        }

        public int getYoungGenAfter() {
            return youngGenAfter;
        }

        public int getYoungGenTotal() {
            return youngGenTotal;
        }

        public int getHeapBefore() {
            return heapBefore;
        }

        public int getHeapAfter() {
            return heapAfter;
        }

        public int getHeapTotal() {
            return heapTotal;
        }

        public double getUser() {
            return user;
        }

        public double getSys() {
            return sys;
        }

        public double getReal() {
            return real;
        }
    }
}
  • 日志文件示例
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Java HotSpot(TM) 64-Bit Server VM (25.181-b13) for linux-amd64 JRE (1.8.0_181-b13), built on Jul  7 2018 00:56:38 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 32707224k(7763808k free), swap 16777212k(16772860k free)
CommandLine flags: -XX:InitialHeapSize=1610612736 -XX:MaxHeapSize=1610612736 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
2019-08-21T07:48:15.609+0200: 1.133: [GC (Metadata GC Threshold) [PSYoungGen: 283264K->20213K(458752K)] 283264K->20229K(1507328K), 0.0220106 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
2019-08-21T07:48:15.631+0200: 1.155: [Full GC (Metadata GC Threshold) [PSYoungGen: 20213K->0K(458752K)] [ParOldGen: 16K->19450K(1048576K)] 20229K->19450K(1507328K), [Metaspace: 20743K->20743K(1067008K)], 0.0461462 secs] [Times: user=0.16 sys=0.01, real=0.05 secs]
2019-08-21T07:48:16.604+0200: 2.127: [GC (Metadata GC Threshold) [PSYoungGen: 163389K->16486K(458752K)] 182840K->35944K(1507328K), 0.0129310 secs] [Times: user=0.03 sys=0.01, real=0.01 secs]
2019-08-21T07:48:16.617+0200: 2.140: [Full GC (Metadata GC Threshold) [PSYoungGen: 16486K->0K(458752K)] [ParOldGen: 19458K->23066K(1048576K)] 35944K->23066K(1507328K), [Metaspace: 34726K->34723K(1079296K)], 0.0324194 secs] [Times: user=0.09 sys=0.00, real=0.03 secs]
2019-08-21T07:48:17.401+0200: 2.924: [GC (Allocation Failure) [PSYoungGen: 393216K->6384K(458752K)] 416282K->29459K(1507328K), 0.0051622 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2019-08-21T07:48:17.832+0200: 3.355: [GC (Allocation Failure) [PSYoungGen: 399600K->4676K(458752K)] 422675K->27751K(1507328K), 0.0053542 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2019-08-21T07:48:18.259+0200: 3.782: [GC (Allocation Failure) [PSYoungGen: 397892K->6632K(458752K)] 420967K->29706K(1507328K), 0.0062974 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2019-08-21T07:48:18.644+0200: 4.167: [GC (Allocation Failure) [PSYoungGen: 399848K->8944K(501760K)] 422922K->32026K(1550336K), 0.0087488 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
2019-08-21T07:48:19.026+0200: 4.549: [GC (Allocation Failure) [PSYoungGen: 486640K->11024K(500224K)] 509722K->34114K(1548800K), 0.0050016 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2019-08-21T07:48:19.378+0200: 4.902: [GC (GCLocker Initiated GC) [PSYoungGen: 488720K->12496K(503808K)] 511810K->35594K(1552384K), 0.0070491 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2019-08-21T07:48:19.757+0200: 5.280: [GC (Allocation Failure) [PSYoungGen: 495312K->14048K(503296K)] 518415K->37151K(1551872K), 0.0074778 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2019-08-21T07:48:20.115+0200: 5.639: [GC (Allocation Failure) [PSYoungGen: 496864K->15936K(502272K)] 519967K->39047K(1550848K), 0.0059050 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
2019-08-21T07:48:20.492+0200: 6.015: [GC (Allocation Failure) [PSYoungGen: 497216K->17904K(499200K)] 520327K->41015K(1547776K), 0.0061056 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2019-08-21T07:48:20.796+0200: 6.319: [GC (Allocation Failure) [PSYoungGen: 499184K->19200K(498176K)] 522295K->42311K(1546752K), 0.0062008 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
updatedupdated2025-03-012025-03-01