需求:解析CSV文件并忽略引号内的逗号
解决方案:
public static void main(String[] args) { String s = "a,b,c,\"1,000\""; String[] result = s.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"); for (String str : result) { System.out.println(str); } }
输出:
a
b
c
“1,000”
尝试使用正则表达式处理内容时, 需要小心替换字符串中是否包含:$ or /, 譬如:
Pattern pattern = Pattern.compile(“\\{C0\\}”);
Matcher matcher = pattern.matcher(“Price: [{C0}].”);
System.out.println(matcher.replaceAll(“€6.99”));
System.out.println(matcher.replaceAll(“$6.99”));
输出:
Price: [€6.99].
Exception in thread “main” java.lang.IndexOutOfBoundsException: No group 6
at java.util.regex.Matcher.group(Unknown Source)
at java.util.regex.Matcher.appendReplacement(Unknown Source)
at java.util.regex.Matcher.replaceAll(Unknown Source)
at TestExcel2Xml.main(TestExcel2Xml.java:10)
可见第一个replaceAll是正常工作的, 但第二个中的美元符号就出问题了.
Java API:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
可以使用Matcher.quoteReplacement(String)对替换内容进行预先处理: (API)
Returns a literal replacement String for the specified String. This method produces a String that will work use as a literal replacement s in the appendReplacement method of the Matcher class. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes (‘\’) and dollar signs (‘$’) will be given no special meaning.
修改为:
Pattern pattern = Pattern.compile(“\\{C0\\}”);
Matcher matcher = pattern.matcher(“Price: [{C0}].”);
System.out.println(matcher.replaceAll(“€6.99”));
System.out.println(matcher.replaceAll(Matcher.quoteReplacement(“$6.99”)));
正确输出:
Price: [€6.99].
Price: [$6.99].
自己用的:
\\w+([-.]\\w+)*@\\w+([-.]\\w+)*\\.[a-z]{2,3}
可能不是最完美的, 但基本没大问题.
Java测试类一并奉上:
//定义正则表达式 private static final String REGEX_EMAIL = "\\w+([-.]\\w+)*@\\w+([-.]\\w+)*\\.[a-z]{2,3}";// \\w+([-.]\\w+)*";// "[\\w]+[\\w.]*@(\\w+\\.)+[A-Za-z]+"; // [\\w]+[\\w+.]+\\.\\w+"; //邮件检查正则表达式 // \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)* /** * @param args */ public static void main(String[] args) { // TODO Auto-generated method stub String s = "[email protected]"; System.out.println(s.matches(REGEX_EMAIL)); }
从给定的文字内容中 解析Email地址, 并显示出来. 同时支持QQ提取.
/** * 从String中通过正则表达式找到所有的Email地址. * @param $str * @return array 搜索到的Email地址组成的array. */ public static function parseEmails($str) { $emails = array(); preg_match_all("(([\w\.-]{1,})@([\w-]{1,}\.+[a-zA-Z]{2,}))", $str, $matches, PREG_PATTERN_ORDER); // var_dump($matches); foreach($matches[0] as $email) { $emails[$email] = $email; } return $emails; }
$matches中包含所有搜索到的Group, 可使用不同的Pattern对得到的数组进行排序, 如上$matches[0]为最外部Group搜索到的字符.
详细可参阅:http://php.net/manual/en/function.preg-match-all.php
相关阅读:
// Proudly powered by Apache, PHP, MySQL, WordPress, Bootstrap, etc,.