Brain Dump

A place to store my random thoughts and anything else I might find useful.

How-To: Remove errant CR (Carriage return, ^M, ‘\r’, 0x0D, 13 in decimal) from text file.

Posted by mzanfardino on July 5, 2012

Contents

1. Overview
2. Solution
3. Summary
4. References

Overview

There are occasions when one has to work with a text file originating from a DOS/Windows platform on a *nix machine. Opening the file on *nix reveals the addition of a carriage return (^M) appended to the end of each line. This occurs as a result of DOS/Windows utilizing a combination of the CR (Carriage return, ^M, ‘\r’, 0x0D, 13 in decimal) + LF (Line feed, ^L, ‘\n’, 0x0A, 10 in decimal) to indicate a new line while *nix traditionally uses just the LF[1].

Solution

Depending on the depth of the problem there are a number of solutions one might implement. When dealing with an individual file I find that vi/m is the right tool for the job. I imagine emac or other editors are equally suited to the task, but as I’m a vi/m guy, this is what works for me:

Before:

$ vim file_with_errant_cr
This is a test^M
This is also a test^M
Here again, another test^M
What, more tests?^M
Yeah, but this is the last test.^M
:%s/^v^M//g

NOTE: the ^v in this context is the Ctrl-v key-combination and ^M is the Ctrl-m key-combination. These are literals; do not type ^v and ^m. When typed the output will look like:

:%s/^M//g

After:

This is a test
This is also a test
Here again, another test
What, more tests?
Yeah, but this is the last test.

That’s it! Save the file and all the errant CR (^M) are gone.

ED: Additional research has revealed (several) additional, easy solutions to this problem. One of the easiest is to issue the following command in vim:

:%s/.$//

This simple but effective alternative replaces the last character of each line (.$) with nothing (//). No need for ^v/^m key combos. However, it is important to note that this does exactly what I said; so let me restate: it replaces the last character of each line regardless of what that character is! So be sure this is what you want to do before doing it!

When dealing with multiple files it is impractical to have to edit each and every file. In this case I would write a simple bash routine which uses sed to find and replace the errant character and save the files to something else. Let’s take for example a group of files named testn.txt where n is some incremental value:


$ for f in test*.txt; do cat ${f} | sed s/^v^M//g > ${f/.txt/.new}; done;

As with vi/m, the ^v is the Ctrl-v key-combination and ^M is the Ctrl-m key-combination. When typed the output will look like:


$ for f in test*.txt; do cat ${f} | sed s/^M//g > ${f/.txt/.new}; done;

The results will be new files created absent the CR with file extension .new. The original files will be left untouched.

Summary

The approaches I’ve taken here were inspired by research I did on the subject and the input I received from Quinn McHenry’s post on Tech Recipes[2]. Quinn also explains clearly that “In UNIX, you can escape a control character by preceeding it with a CONTROL-V”. This was critical to my understanding of how to use sed as well as understanding how vi/m can be used to replace control characters.

References

  1. Newline: From Wikipedia, the free encyclopedia
  2. Remove ^M characters at end of lines in vi by Quinn McHenry
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: