Friday, 30 May 2014

Removing all formatting from a complicated Word document including margins and headers

I have spent the last few weeks trying to sort out my document, which is a very complicated Word document originating from about twenty smaller files merged together and worded on over eight years. I decided to use paste it into an HTML converter (Blue Griffon which I happened to have about my person) and in HTML my document was revealed to have over 4,000 lines of coding before the text began. Not good! However much I faffed about some margins and header spacing seemed to have a text wrapping mind of their own, probably, but not necessarily reflected in one of those 4,000 lines of code, but which one?

 Having played about with the concept of using styles instead of embedded paragraph coding, which is obviously the correct way to ensure consistency and to control things centrally, I decided to take a deep breath and remove all coding from my document. This would mean losing the embedded index and reference hyperlinks, (gulp) and all my nice header and footer formatting that I had worked on for months. But, look on the bright side, all that work was not in vain because now I understand a lot more about styles and now my document will, hopefully, be neat and clean at the end of the process. I did think about using La Tex instead of Word in future, but apparently although this is good for HTML documents it does not work well with Createspace. Go with the devil you know (will I regret this???)

 To remove all code I discovered there was a choice of some nice downloadable Word add-ins but also some even easier online web pages. On the online web pages you just paste in your text and it spits out a clean version. I discovered a very nice site where you could configure exactly what you wanted left in, but after an hour today I could not find that site again, sadly (should have bookmarked it). Nonetheless I did discover a similar online site, http://www.cleanuphtml.com/cleanup.html that takes all your text and converts it to a blob of HTML. I cut and pasted this blob into my createspace template, removed the few lines of HTML code at the beginning and end of the text, highlighted everything, selected 'normal' as a style, modified paragraphs in 'normal' to 'block' justification, and behold, I am confronted with a 193 page block of text containing everything I ever wrote over 8 years.

Now all it needs is a little massage (should take about six weeks or so, at four hours a week???????) and perhaps we will end up with some sort of presentable product. If anyone has any theories as to how this process could be improved, please let me know.

No comments:

Post a Comment