Excel Text and String Functions:
Removing spaces and nonprinting characters from text: Excel TRIM & CLEAN functions
Related Links:
1. ASCII Code, Extended ASCII characters (8-bit system) and ANSI Code.
2. Excel CODE & CHAR functions.
3. Excel Text and String Functions: LEFT, RIGHT, MID, LEN, FIND, SEARCH, REPLACE, SUBSTITUTE.
When you import data into an Excel worksheet, or download from internet, or copy from a website, or in case of multiple users working on the same worksheet, you might often find unnecessary spaces or irregular characters, at times invisible, which were not intended to be part of the actual data. To remove the extra / unwanted spaces and non-printable characters from the data, use the Excel TRIM & CLEAN functions. Leading and trailing blank characters will result in incorrect sorting & filtering of your data; extra spaces and characters will prevent correct working of your formulas (say, when you do a vlookup), cause formatting & calculation errors, and generally impede in manipulating text/data besides appearing visibly incorrect.
TRIM Function (Worksheet)
Use the excel TRIM function to remove all spaces from text leaving only single spaces between words. Trim deletes all leading & trailing spaces in a text string, and multiple spaces within words are reduced to a single space. Note that Trim removes the space character with ANSI code 32, but does not remove the non-breaking space character in the Unicode character set with decimal value 160 which is used extensively in web pages (with html entity  ). Syntax: TRIM(text).
Example:
=TRIM(” James Bond 007 “) [Formula]
Returns the string “James Bond 007”.
TRIM leaves single space between words and removes leading, trailing & multiple inbetween spaces from text.
————————————————————————————————————-
CLEAN Function (Worksheet)
The excel CLEAN Function is used to remove all nonprintable characters from text. Note that Trim removes the first 32 non-printing characters in the 7 bit ASCII code (values 0 to 31), but does not remove the additional non-printing characters with values 127, 129, 141, 143, 144, and 157 in the Unicode character set. Syntax: CLEAN(text).
Example:
=CLEAN(A1) [Formula]
Cell A1 contains: =CHAR(17)&”James”&CHAR(17)&”Bond”
CLEAN Function removes the nonprintable characters, CHAR(17) from the string, and returns “JamesBond”.
————————————————————————————————————-
Use a combination of the TRIM, CLEAN and SUBSTITUTE functions
Though most times a simple formula “=TRIM(CLEAN(A1))” can remove spaces and nonprinting characters from text, but the two functions of TRIM & CLEAN are not always fully effective in removing some characters for which an alternate method is required.
You often work with text received from other applications which might have irregular spacing or contain characters which might not print with your operating system, and in these cases it is apt to use the TRIM & CLEAN functions. A text may contain: space characters – Unicode character set value 32 which is a 7-bit ASCII space character; non-breaking space characters – Unicode character set value 160 generally used in web pages with html name non-printing characters – Unicode character set values 0 to 31 which are the first 32 non-printing characters in the 7 bit ASCII code; additional non-printing characters – Unicode character set values 127, 129, 141, 143, 144, and 157.
If you copy data into Excel from a web page, the non-breaking spaces that are generally used in web pages may also show up in your worksheet. The TRIM function was meant to remove the 7-bit ASCII space character (value 32). The CLEAN function was meant to remove the first 32 non-printing characters in the 7 bit ASCII code (values 0 to 31). By themselves both these functions do not remove the higher value Unicode characters – the TRIM function does not remove the nonbreaking space character (value 160) and the CLEAN function does not remove the additional non-printing characters (values 127, 129, 141, 143, 144, and 157). To remove the higher value Unicode characters, the SUBSTITUTE function is used.
Example:
=TRIM(A1) [Formula]
Cell A1 contains the string: =“James”&CHAR(32)&CHAR(32)&”Bond”&CHAR(160)&CHAR(160)&”007″. Note that there are 2 spaces between ” James” and “Bond” and then 2 spaces (nonbreaking space characters with decimal value of 160) between “Bond” and “007”. The string Length – LEN(A1) – is 16 characters. Refer Image 1a.
Formula returns the string “James Bond 007”. TRIM Function deletes one space between “James” and “Bond”. Length of returned string is 15 characters. TRIM function does not remove the nonbreaking spaces represented by CHAR(160), which can be done by using the SUBSTITUTE function as shown below. Refer Image 1b.
=TRIM(SUBSTITUTE(A1, CHAR(160), CHAR(32))) [Formula]
Replaces each nonbreaking space character (Unicode value of 160) with a space character (ASCII value of 32) by using the SUBSTITUTE function, and then removes the multiple embedded spaces from the above string in Cell A1 (Image 1a).
Formula returns the string “James Bond 007”. Length of returned string is 14 characters. Refer Image 2.
Example:
=TRIM(SUBSTITUTE(A1,”/”,””)) [Formula]
String in cell A1 is “James / Bond” which has 2 spaces and the character “/” between the words. Its length is 12 characters, determined by LEN(A1).
Formula part =SUBSTITUTE(A1,”/”, “”) replaces the symbol “/” with nothing (“”), and returns the string “James Bond”, with 2 spaces between the words. The string length is now 11 characters.
TRIM function removes the one extra space between the words, and returns the string “James Bond”, whose length is 10 characters.
Note: The excel LEN function is particularly useful to determine hidden space characters, as shown above.
Example:
=CLEAN(SUBSTITUTE(A1, CHAR(127), CHAR(7))) [Formula]
Cell A1 contains the string: =”James”&CHAR(7)&”Bond”&CHAR(127)&”007″ . Formula replaces the nonprinting DEL character (ASCII value of 127) with a BEL character (ASCII value of 7) by using the SUBSTITUTE function, and then removes both the BEL characters from the string, and returns the string “JamesBond007”.
Note: CLEAN function does not remove the nonprinting DEL character (ASCII value of 127).
Example:
=CLEAN(SUBSTITUTE(A1,CHAR(128),CHAR(7))) [Formula]
Cell A1 contains the string: =”James”&CHAR(128)&”Bond”. Formula replaces the € character (ASCII value of 128) with a BEL character (ASCII value of 7) by using the SUBSTITUTE function, and then removes both the BEL characters from the string, and returns the string “JamesBond” .
Note: CLEAN function does not remove the € character (ASCII value of 128).
————————————————————————————————————-
Find ASCII value of Characters to be replaced
Above examples show how to use the SUBSTITUTE function to TRIM & CLEAN data. However, you might need to know what characters are to be substituted (ie. replaced), to use the formula. Try finding out what the actual characters are (ie. their ASCII code) by using the CODE function, in a helper column, as shown below.
Find Visible Spaces:
If cell A1 has the string ” James”, with the leading space being a nonbreaking space character with decimal value of 160, then the formula =CODE(LEFT(A71)) will return the ASCII value of “160”.
Find Visible ASCII Characters:
=CODE(MID(A1,6,1)) [Formula]
Cell A1 contains the string “James€Bond”. Formula returns the ASCII value of 128 for the sixth character in the string (€), which is required to be removed by using CLEAN and SUBSTITUTE functions.
Find Invisible spaces / characters:
Obviously, this task gets more difficult in case of invisible characters viz. if Cell A1 contains =“JamesB”&CHAR(9)&”ond” which is visible as “JamesBond”, and the character (ASCII value 9) between “JamesB” & “ond” is not visible.
Using =LEN(A1), you can determine that the string length is 10 characters, which is one more than what is visible in “JamesBond”. One way of finding out this invisible character is as follows:
Find each character by repeating the formula in column C for the number of times (column B) as the Length of the string. The blank cell in column C is the invisible character to be removed, and its ASCII value is determined by the Code function in cell D7 (refer Images 3a to 3c).
Find All Spaces & Characters of ASCII value 127 and above, which cannot be removed by TRIM & CLEAN functions:
Cell A1 contains =“James”&CHAR(160)&”B”&CHAR(127)&”ond” wherein the narrow box-shaped character appears after “B”.
Using =LEN(A1), you can determine that the string length is 11 characters, which is 2 more than the regular characters you want in the string (JamesBond). To find ASCII values of 127 and above in this string, do as follows: use the formula =CODE(MID($A$1, B1,1)) to determine code number (ie. ASCII value) of each character in the string, as shown in Images 4a & 4b. Column B represents the position of each character, while column C returns the ASCII code number of each character.