I have content that can be provided in plain text, HTML, or (arbitrary schemaless) XML.
What format is going to get the best results? Or will any of these be fine?
Plain text is your best bet.
I agree with Ram: plain text is fastest, carries least potential for
analysis errors, etc. We do remove tags from HTML and/or XML, but the
object of that is to get to plain text, so if you can simply provide
that text, you're ahead of the game.
© 2012 OpenAmplify