Extract Unique List from a Column: (i) Fixed Length of 1st 5 Characters of each cell string; or (ii) Variable Length of Characters in each cell Preceding the 1st “-” (Dash)
Extract a Unique List of Codes from column cells: (i) Fixed Length of 1st 5 Characters; (ii) Variable Number of Characters Preceding the 1st “-” (dash)
Extract Fixed Length Sub-strings: Column A cells contains strings, of which the 1st Five Characters represent a Code – Extract a Unique List of these codes in column B, 2 options: cell B2 onwards and cell B14 onwards.
Extract Variable Length Sub-strings: Column D contains strings, of which the Characters to the left of the 1st “-” (Dash) represent a Code – Extract a Unique List of these codes in column E, 2 options: cell E2 onwards and cell E14 onwards.
Cell B14 / E14 formulas are relatively shorter / efficient compared to cells B2 / E2 formulas respectively. However both B14 / E14 formulas refer to preceding cells of their column, which is not the case with cells B2 / E2 formulas. If cell B14 formula is in a different column, replace “B$13:B13” in the formula with that column (ex. if formula is in column F then update to “F$13:F13”); if cell B14 formula starts from a different cell, say G2 instead of B14, then replace “B$13:B13” with “G$1:G1”; do similarly for cell E14 formula.
Enter Array formula (Ctrl+Shift+Enter) in cell B2 / E2 and copy down. Enter Normal Non-Array formula in cell B14 / E14 and copy down.
———————————————————————————————–
Extract a Unique List of Fixed Length codes of 1st Five Characters from each column A cell string, in column B, cell B14 onwards
Cell B14 formula:
=IFERROR(INDEX( LEFT(A$2:A$20, 5), MATCH(0, INDEX( COUNTIF( B$13:B13, LEFT( A$2:A$20, 5)),), 0)),””)
Explanation – Cell B14 formula (Non-Array formula):
1) COUNTIF(B$13:B13, LEFT(A$2:A$20, 5)): returns an array of zeros and numbers – {0;0;0;0; 0;0;0;0; 0;0; 0;0; 0;0; 1;1; 1;1;1} – this counts values of column A (the 1st 5 characters) in the preceding cells of the formula column (column B) and as the formula gets copied down it returns an array of 1s and zeros, with 1 indicating column A values which already occur in these preceding cells (or blank cells). The aim is to match with zero values which return non-blank column A cells which have not already appeared in preceding cells of column B. INDEX is used with COUNTIF to keep the formula as non-array. This is the lookup_array argument of the MATCH function.
2) Using MATCH with zero as lookup_value returns the position of zeros and ensures that only those column A values are considered which have not already occurred in the preceding cells of column B ie. unique values are considered. This picks up the 1st value from column A which has not already been listed in preceding cells starting from B13. Using the INDEX function – “INDEX(LEFT(A$2:A$20,5),” – returns the corresponding values of 1st 5 characters of column A (ie. cell A2, A3, A6 … values).
———————————————————————————————–
Related Formulas – illustrated in the Ebook of Excel Formulas.
Column B – cell B2 formula: Similar to cell B14 formula this also extracts a Unique List of Fixed Length codes of 1st Five Characters from each column A cell string, in column B, cell B14 onwards. The cell B2 formula uses the FREQUENCY function and does not refer to preceding cells of the column unlike cell B14 formula so that even if the start cell is in a different column or row the cell B2 formula will remain the same. This is an Array (Ctrl+Shift+Enter) formula.
Column E – cell E2 formula: Extracts a Unique List of Variable Length codes of Characters to the left of the 1st “-” (Dash) from each column D cell string, in column E, cell E2 onwards. This is an Array formula (Ctrl+Shift+Enter) using the FREQUENCY function and does not refer to preceding cells of the column unlike cell E14 formula so that even if the start cell is in a different column or row the cell E2 formula will remain the same.”.
Column E – cell E14 formula: Similar to cell E2 formula this also extracts a Unique List of Variable Length codes of Characters to the left of the 1st “-” (Dash) from each column D cell string, in column E, cell E14 onwards. This is a shorter formula as compared to cell E2 formula, but refers to preceding cells of its column, which is not the case with cells E2 formula – if cell E14 formula is in a different column, replace “E$13:E13” in the formula with that column (ex. if formula is in column F then update to “F$13:F13”); if cell E14 formula starts from a different cell, say G2 instead of E14, then replace “E$13:EB13” with “G$1:G1”. This is a Non-Array / Normal formula.