Skip to content

Tag: Regex

Replacing with SED and regexes

Let’s say you have a html file called file.html and you want to replace “.jpg” to “.png” but only when in a href of anchor elements.

Example of input:

<a href="alice.jpg">alice<a>
<a href="bob.jpg">bob<a>
something.jpg
href.jpg
<a href="example.com">alice.jpg</a>
<img src="href.jpg">

Desired output:

<a href="alice.png">alice<a>
<a href="bob.png">bob<a>
something.jpg
href.jpg
<a href="example.com">alice.jpg</a>
<img src="href.jpg">

Notice that only the first two references to “.jpg” were changed to “.png”, the ondes in the href of the anchor.

You can use sed with regexes to achieve this.

sed -i -E 's/(<a href=".*).jpg(")/\1.png\2/' file.html

Where:

  • -i for editing the files in-place
  • -E to use a script
  • s// substitute
  • (<a href=”.*) group 1, the string ‘<a href=”‘ followed of any character zero or more times
  • .jpg the .jpg we want to replace
  • (“) group 2, only “
  • \1.png\2 substitute with the same group 1 then .png then the group 2.

Regex with negatives lookahead and lookbehind

"Looking different directions" by Paul Kline at (https://www.flickr.com/photos/paulelijah/6717953239/)
“Looking different directions” by Paul Kline.

Problem: Match strings that contains a single quotation mark ('), but not multiple ones together.

Solution:

(?<!')'(?!')

This is a regex for a single quotation mark with a (?<!') in the left and a (?!’) in the right. The (?<!') is a ?< look behind if not ! a single quotation mark '. The (?!') is a look ahead ? if not ! a single quotation mark '.

Java code:

import java.util.regex.Pattern;

public class RegexProblem {
  public static void main(String args[]) {
    Pattern single_quote = Pattern.compile("(?<!')'(?!')");
    String[] phrases = {
      "",
      "'",
      "a'a",
      "aaa",
      "aa'aa",
      "aa''aa",
      "aa'''aaa",
      "aaa''''aaa"
    };
    for(String phrase: phrases){
      System.out.println(String.format("For %s is %s.", phrase,
            single_quote.matcher(phrase).find()));
    }
  }
}

The output is:

For  is false.
For ' is true.
For a'a is true.
For aaa is false.
For aa'aa is true.
For aa''aa is false.
For aa'''aaa is false.
For aaa''''aaa is false.