• Published:October 6th, 2008
  • Comments:No Comment
  • Category:PHP, XML
  • Share / Bookmark

One of my current clients has an old FoxPro (old as in created in '97 and upgraded in '99) database application that the users can't stand.  It frequently crashes and it's all too easy to lose valuable data after a lengthy data entry session (just by clicking the sadly misnamed Save button).  We are porting them over to a LAMP-based web application.  One of the early tasks in this project is to analyze their existing database structure.  This application involves hundreds of FoxPro DBF files which are tables (data and schema) that reside in various child directories.  

My initial approach (following the KISS principle) was to inspect some of these files and see what's in 'em and how to get the schema out of 'em.  I downloaded a trial version of DBF Viewer 2000 and it's a fine tool.  I was able to view the data and the schema and, using SnagIt, I captured the text and pasted the values to an Excel spreadsheet.  But purely manual approach would take way too long for me.  Time to whip out a quick script.  The problem: .NET or PHP or other?  I'm a recovering .NET dev working on PHP apps.  I surprised myself by deciding to go for the PHP option.  And I'm glad I did.  Here's how I was able to create an Excel spreadsheet with hundreds of worksheets in PHP.


I assumed I'd be able to plug into some existing PHP-based Excel spreadsheet API and knock this out.  Boy was I surprised.  There's a few classes related to Excel over at PHPClasses.org but they were either too simple (not enough features I needed) or unsupported or just plain useless.  The Spreadsheet Excel Writer PEAR package has way too many bugs in it and I couldn't get it to work on my setup (probably my fault).  It then hit me that moden versions of Excel (since 2003?) can read and write XML (given a proper schema).  After some quick testing (involving creating a simple spreadsheet and exporting to XML and viewing the output in a text editor) I decided the best solution would be to create an Excel-flavored XML file.  

Here's the source data in Excel:

And here's the 92 line XML output for this data:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
  <Author>Brian</Author>
  <LastAuthor>Brian</LastAuthor>
  <Created>2008-10-06T15:14:58Z</Created>
  <LastSaved>2008-10-06T15:17:02Z</LastSaved>
  <Company>One Who Serves</Company>
  <Version>12.00</Version>
 </DocumentProperties>
 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>10170</WindowHeight>
  <WindowWidth>20655</WindowWidth>
  <WindowTopX>600</WindowTopX>
  <WindowTopY>360</WindowTopY>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
 </ExcelWorkbook>
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="11" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Sheet1">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="2" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="15">
   <Row>
    <Cell><Data ss:Type="String">This is A1</Data></Cell>
    <Cell><Data ss:Type="String">This is B1</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="String">This is A2</Data></Cell>
    <Cell><Data ss:Type="String">This is B2</Data></Cell>
   </Row>
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <Selected/>
   <Panes>
    <Pane>
     <Number>3</Number>
     <ActiveRow>17</ActiveRow>
     <ActiveCol>11</ActiveCol>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
 <Worksheet ss:Name="Sheet2">
  <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="15">
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
 <Worksheet ss:Name="Sheet3">
  <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="15">
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
</Workbook>

However, most of the elements are either static or unimportant (for my simple needs). So, here's the part that's most important:

 <Worksheet ss:Name="Sheet1">
  <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="2" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="15">
   <Row>
    <Cell><Data ss:Type="String">This is A1</Data></Cell>
    <Cell><Data ss:Type="String">This is B1</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="String">This is A2</Data></Cell>
    <Cell><Data ss:Type="String">This is B2</Data></Cell>
   </Row>
  </Table>
 </Worksheet>

Everything before the first Worksheet tag became a static heredoc header and the footer is simply the closing </Workbook> tag.  I didn't require setting any <WorksheetOptions> so I removed that element completely. 

Now that I figured out how to layout an XML fragment that represents a worksheet, all I needed to next was the following:

  • loop through the directories where I stored the DBF files in question (using PHP's RecursiveDirectoryIterator class)
  • parse each DBF file for the data schema (here's how I learned how to do this)
  • loop through the parsed data (using foreach loops on arrays)
  • generate the appropriate <Cell> and <Data> tags

After much trial and error, I finally succeeded in creating an well-formed XML document.  I probably should have used some existing PHP XML API but, for the sake of time, I went with the brute force method of hacking my way to a well-formed XML document (via heredoc strings and concatenating variables and new line characters, etc.).  However, this XML file did not open in Excel for many reasons and Excel was not being reasonable with me and giving me information I can use for debugging my text.  I kept getting (nearly) useless errors like this:

I couldn't find the log mentioned either.  After much more experimenting, I finally learned the following lessons:

  • Worksheet names need to be unique (duh but not so obvious in the heat of the moment).
    • Set the ss:Name attribute for the <Worksheet> tag.
  • Worksheet names are limited to 31 characters.
    • This sucked since I was using a version of the file path to create unique worksheet names.  
    • I solved this one by using sequential numbers for the worksheet names and creating a sort of table of content worksheet that maps the files to their matching worksheet (cumbersome but effective).
  • The ss:ExpandedRowCount attribute for the <Table> tag needs to match the number of actual rows to be displayed in each worksheet.
    • This one took me a long time to debug because of the aforementioned useless Exel error message.

In the end, I have my Excel spreadsheet with 183 worksheets in it (one for every DBF file and the master TOC).  I can regenerate it at will.  And, most importantly, I can refactor it later and use it again to extract the data from the DBF files (probably to export the data to MySQL for later use).  While I was fighting the ExpandedRowCount issue, I did find another method of creating Excel Spreadsheets in PHP that I may use in the future - PHPExcel.  Once I got it up and running on my dev machine, I was amazed at what I could do with it.  My script would be much cleaner without all of those heredoc strings in it.  Plus, PHPExcel is well-documented and has a growing base of users which can only bode well for the future of this set of classes.


Rss Comment

No Comment

No comments yet.

Post a Comment





This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 5 chars within 0..9 and A..F, and submit the form.

  

Oh no, I cannot read this. Please, generate a