Whether this chunking process is slow and manual or quick and automated really depends on how much legacy content was created using standardized styles and templates properly. If practically none of the content was created using standard styles and templates, then there is a great deal of manual evaluation that must be done.
The most important aspect of the chunking process is to have the people doing the chunking UNDERSTAND what they are doing. This is best accomplished by providing them with thorough training, support, and supervision. Consistency is the key. Select a single process, train everyone in that process and execute the process without exception.
NOTE: The importance of thorough and consistent content editing increases by several orders of magnitude when content is entered into the database. Enter it wrong once ... use it wrong many times.
"Organizations that implement highly configurable or customizable products need to rely on their software vendors to meet the early training needs of the planners and technicians. To the degree that they wish to own or control product configuration, customization, and the ongoing support of those modifications, they also need to be prepared to invest in the staff development required to enable those capabilities."[5]
There are two approaches to legacy content that are usually successful.
The advantage of the first method is that you generally obtain a more consistent conversion with fewer errors. The advantage of the second method is that you train your entire group in the XML database and process. You also may learn some things early on that allow you to modify the database or your processes so that they are more applicable to your training.
As with any complex operation, when there are advantages, there are also risks. The risk inherent in the first method is that it may result in a fully functional content base and no one trained to use it properly. The second method risks creating a database with so many inconsistencies that it is practically useless. The correct method for each organization depends upon the technical background of the team and their workload. Organizations with lower levels of technical proficiency and higher per capita workload generally do better with the first method.