Replacing with SED and regexes

Let’s say you have a html file called file.html and you want to replace “.jpg” to “.png” but only when in a href of anchor elements.

Example of input:

<a href="alice.jpg">alice<a>
<a href="bob.jpg">bob<a>
something.jpg
href.jpg
<a href="example.com">alice.jpg</a>
<img src="href.jpg">

Desired output:

<a href="alice.png">alice<a>
<a href="bob.png">bob<a>
something.jpg
href.jpg
<a href="example.com">alice.jpg</a>
<img src="href.jpg">

Notice that only the first two references to “.jpg” were changed to “.png”, the ondes in the href of the anchor.

You can use sed with regexes to achieve this.

sed -i -E 's/(<a href=".*).jpg(")/\1.png\2/' file.html

Where:

  • -i for editing the files in-place
  • -E to use a script
  • s// substitute
  • (<a href=”.*) group 1, the string ‘<a href=”‘ followed of any character zero or more times
  • .jpg the .jpg we want to replace
  • (“) group 2, only “
  • \1.png\2 substitute with the same group 1 then .png then the group 2.

Leave a Reply

Your email address will not be published. Required fields are marked *