Web Conversion Process (HTML)

Step 1: Obtain files

This may require chasing Account Managers.

Step 2: Convert to HTML

  1. Clean Up Source Document
    1. If Word document –
      1. Select all text and set to Arial 12 pt left aligned.
      2. Remove Table of Contents dots/lines and page numbers
      3. Remove duplicate title text, logos, extra line breaks where possible.
      4. Separate chapters that will form individual pages using a marker or extra line breaks. Use table of contents as guide for separation.
        1. If no table of contents provided, use heading style for guide. Some documents will be small enough to only be a single HTML page and contents.
      5. Replace graphics with placeholder text or image src code
      6. Because footnote sup text will not transfer, place [#] next to footnote
      7. Remove underline and replace with bold/italic as is appropriate for style of document.
      8. Copy/paste all footnotes to end of document under heading Footnotes. Check for endnotes and place at end of document.
        1. If footnotes renumber at each section, these will go in a separate HTML page immediately after the corresponding chapter.
    2. If InDesign Source –
      1. Use TextExporter plugin to extract text and footnotes from document, saving as rtf.
      2. Open rtf in Word and follow Word document steps.
  2. Select all word document and drag text to TextEdit document, saving file as .htm
  3. Capture images – using screenshot, snap images directly from the PDF supplied.
    1. Optimise images in Photoshop and number them in order with job number _ g# eg. j0000_g01.gif.
      1. Only use JPG format for actual pictures; all other graphics should be gif format.
  4. Clean up HTML working file
    1. Use BBEdit Tidy function to convert to XHTML and clean document.
    2. Open in Dreamweaver and run custom regex HTML clean up script getting rid of further classes and Word formatting.
    3. Remove remaining tags
      1. Remove style tags.
      2. Remove extra line breaks
      3. Fix bullets and remove numbered lists by running Dreamweaver command.
      4. Remove table tags around single cell tables.
      5. Strip div tags.
    4. Format tables
      1. Paste in tables from Excel source files if provided.
        1. Remove col tags from imported Excel tables
      2. Remove p tags from inside td tags.
      3. Strip tbody tags unless a Transport Scotland publication.
        1. If Transport Scotland, caption tags are used for the table title and heading of table is placed inside thead tags.
      4. Remove align and valign from td tags.
      5. Set table headings to column heading and row heading class
        1. If GROS, must use specific col and row head class and must be bolded
      6. Set table cellpadding to 5, spacing and border to 1.
    5. Add acronym tags
    6. Add abbr tags
    7. Add footnote links and other internal links
    8. Add images into corresponding placeholders
  5. Supply customer with acronyms CSV
    1. Generate using Toolbox
  6. Separate working HTML file into individual template pages
    1. Template corresponds
  7. View files in web browser and compare to document PDF
    1. Check external and internal links.
      1. If links broken, source correct links from customer via account manager
  8. Zip files and add acronyms through Toolbox process
    1. Files must be zipped through terminal to remove the DS_store
  9. Add zip folder to Simply Asset
  10. Email notice of Simply Asset upload to SG web team while BCC’ing account manager

Step ?: Package web publication archive

Compress HTML and images along with PDF and any associated files in a zip archive.

Next steps:

Send web conversion archive to SG web team


Follow process to create e-book version