Converting your Word Doc to HTML
As many of you know converting a word doc is a huge pain in the butt. When I have tried to export as a html from Mircosoft® Office Word I would get a mess that would take me forever to clean up.
I have found a few ways around this to make the process easier. Now I have not tested this using Microsoft® Office Word 2008 but I would like to know if anyone else has tried this and what they found to work best.
Option 1.) You can post your word doc to your Google™ Apps. account and then choose to “view as html”. You can then do a view source using your browser tools to get the source code. This process may still give you junk html code that has to be cleaned out.
Option 2.) You can email the doc to your Google™ Gmail then open the mail through the webmail. You may need to click on “All Mail” to see your email. There is a option next to the attachment of “view as html”. Then you can view source though your browser tools. So far there is still stuff to be cleaned out but using a batch find tool in your html editor its been the cleanest or easiest to clean. I orginally found this article (cited: H.Tschabitscher (2009) http://email.about.com/od/gmailtips/qt/et070306.htm)
Option 3.) textism.com/wordcleaner/ some of you may have tried this one already. I could not get this to clean up my code. I put it out there and saved my word doc file as html and it failed. (let me know if anyone used this successfully and which version of word you used. You may find a solution in that the files need to be over 20kb to work (cited: J.Atwood (2006) http://www.codinghorror.com/blog/archives/000485.html)
OpenOffice.org may be another viable option, but I think they still have some kinks to work out. The software crashed on my mac. I am using OSX 10.5.6 currently.
Also as a side note: I have seen some WYSIWYG with a option to import word to the content area. Though I have not seen how clean or rather messy the html code is with this, I’m hopping that someday this may be more available in future CMS & WYSIWYG website systems.
So for me Option 2 has been working out the best using a find and replace to get bad code out of the word doc. It really helps if the word docs are formatted correctly from the start. I hope this helps you out as you try to convert your content in word and translate it to clean html for your website systems.