Once upon a time, I was requested to extract TFL titles, TFL layout orientation from individual RTF files into Excel file. It is not difficult as there are tons of materials in internet telling us how to extract data from word tables or paragraphs, right? At least this is what I thought when I decided to take the task initially. Unfortunately, I found that there are some special characters in the extracted text. Those special characters were produced by RTF code like “\line”. I could be able to remove those characters using CLEAN function. However, the problem emerged when I tried to use the special character as delimiter of SPLIT function to manipulate strings.

I tried a lot of methods like vbcrtl or vbcrlf and so on. But no one worked. Suddenly, it occurred to me that I can use ASC function to get the ASCII value of the special characters and then use CHR function to reference the special character. Luckily, it worked.

I have created a RTF file like below.

Here display RTF code, corresponding text in RTF and extracted text using VBA. You can see that there are some weird squares in the third column. And these weird squares are resulted from RTF code.

VBA to Extract contents from RTF file

Suppose that there are two spreadsheets in workbook, “Source File” and “Output Info”.  In “Source File” tab, full path of RTF file was placed in the first cell of column B. And text from this RTF file was placed in third and fourth column of “Output Info” tab. Extracted information were already displayed in above figure.

Click here to hide/show code


Sub extract()
Dim objWord As Object
Set objWord = CreateObject(“Word.Application”)
objWord.Visible = True
WordNam = ThisWorkbook.Worksheets(“Source File”).Cells(1, 2)
objWord.Documents.Open WordNam

For i = 1 To objWord.Documents(WordNam).tables(1).Rows.Count
ThisWorkbook.Worksheets(“Output Info”).Cells(i, 3) = objWord.Documents(WordNam).tables(1).cell(i, 1).Range.Text
ThisWorkbook.Worksheets(“Output Info”).Cells(i, 4) = WorksheetFunction.Clean(objWord.Documents(WordNam).tables(1).cell(i, 1).Range.Text)
Next i
objWord.Documents.Close
objWord.Quit
End Sub

VBA to Get ASCII Value of Special Characters Produced by RTF code

Following code can be used to print each character including blank space from extracted text into column A of “Ascii Value” tab. What is more, corresponding ASCII values were also put in column B.

Click here to hide/show code


Sub ChkSChar()
a = 1
For m = 1 To ThisWorkbook.Worksheets(“Output Info”).UsedRange.Rows.Count
Text = ThisWorkbook.Worksheets(“Output Info”).Cells(m, 3)

For i = 1 To Len(Text)
ThisWorkbook.Worksheets(“Ascii Value”).Cells(a + i – 1, 1) = WorksheetFunction.Clean(ThisWorkbook.Worksheets(“Output Info”).Cells(m, 3))
ThisWorkbook.Worksheets(“Ascii Value”).Cells(a + i – 1, 2) = Mid(Text, i, 1)
ThisWorkbook.Worksheets(“Ascii Value”).Cells(a + i – 1, 3) = Asc(Mid(Text, i, 1))

Next i
a = a + i
Next m
End Sub

Partial Results

Here are partial results and explanations. Please refer to the first figure when checking results. Blank space can be represented with CHR(32) and a carriage return can be referred by CHR(13). While for special character produced by “\line” is actually CHR(11). Look at the lower left part of following figure. If we use CHR(11) as delimiter of SPLIT function, we can get the first substring before “\line”.

This method can also be used to remove special characters that cannot be cleared by CLEAN function.