Skip to content

Tag: Regex

Replacing with SED and regexes

Let’s say you have a html file called file.html and you want to replace “.jpg” to “.png” but only when in a href of anchor elements.

Example of input:
[html]<a href="alice.jpg">alice<a>
<a href="bob.jpg">bob<a>
something.jpg
href.jpg
<a href="example.com">alice.jpg</a>
<img src="href.jpg">
[/html]

Desired output:
[html]<a href="alice.png">alice<a>
<a href="bob.png">bob<a>
something.jpg
href.jpg
<a href="example.com">alice.jpg</a>
<img src="href.jpg">
[/html]

Notice that only the first two references to “.jpg” were changed to “.png”, the ondes in the href of the anchor.

You can use sed with regexes to achieve this.
[bash]
sed -i -E ‘s/(<a href=".*).jpg(")/\1.png\2/’ file.html
[/bash]

Where:

  • -i for editing the files in-place
  • -E to use a script
  • s// substitute
  • (<a href=”.*) group 1, the string ‘<a href=”‘ followed of any character zero or more times
  • .jpg the .jpg we want to replace
  • (“) group 2, only “
  • \1.png\2 substitute with the same group 1 then .png then the group 2.

Regex with negatives lookahead and lookbehind

"Looking different directions" by Paul Kline at (https://www.flickr.com/photos/paulelijah/6717953239/)
“Looking different directions” by Paul Kline.

Problem: Match strings that contains a single quotation mark ('), but not multiple ones together.

Solution:

(?<!')'(?!')

This is a regex for a single quotation mark with a (?<!') in the left and a (?!’) in the right. The (?<!') is a ?< look behind if not ! a single quotation mark '. The (?!') is a look ahead ? if not ! a single quotation mark '.

Java code:

[java]import java.util.regex.Pattern;

public class RegexProblem {
public static void main(String args[]) {
Pattern single_quote = Pattern.compile("(?<!’)'(?!’)");
String[] phrases = {
"",
"’",
"a’a",
"aaa",
"aa’aa",
"aa”aa",
"aa”’aaa",
"aaa””aaa"
};
for(String phrase: phrases){
System.out.println(String.format("For %s is %s.", phrase,
single_quote.matcher(phrase).find()));
}
}
}
[/java]

The output is:

For  is false.
For ' is true.
For a'a is true.
For aaa is false.
For aa'aa is true.
For aa''aa is false.
For aa'''aaa is false.
For aaa''''aaa is false.