Demo content: importing the complete works of Shakespeare to SharePoint

We often find that we need a fair amount of sample content to really showcase some extended WCM and FAST scenarios, and found that lorem ipsum and other automated text generation just wasn’t cutting it. As someone with a keen interest in the theatre, where better to turn to than Shakespeare’s plays, especially since I found out that someone went to the trouble of marking up all 37 plays in XML

There were a number of challenges to work through:     

  1. The XML I downloaded had DTD references that I couldn’t be bothered working through how to make work via the System.Xml.XmlDocument object, so I went in and deleted these in all of the files. This made processing easier. 
  2. The structure I chose to implement this in was a series of webs beneath a root web, with Root / Plays / [Play name] / [Act name]. Each act web has pages beneath it.
  3. I needed to convert the data in each scene to XHTML to push into the PageContent field.
  4. It takes about an hour on my VM to process all 37 plays – resulting in around 130 webs and nearly 1000 SharePoint publishing pages.

With this is mind I had the following code:  

$xmlpath = “C:\SPBPC2010\Assets\shaks200”
$xmlpath | ls -Filter “*.xml” | % { Process-Play -play ([xml] (Get-Content $_.VersionInfo.FileName)) -parentweb $playsWeb }

This iterates through the folder that contains the XML files for the plays, and calls “Process-Play”.             

function Process-Play([xml] $play, [Microsoft.SharePoint.SPWeb] $parentWeb)
{
Write-Host “Processing play: ” $play.PLAY.TITLE 

## 1. Create the web for the PLAY
$name = Convert-ToSafeString $play.PLAY.TITLE
$playWeb = New-SPWeb (“{0}/{1}” -f $playsWeb.Url, $name) -name $play.PLAY.TITLE -Template $basetemplate
 
## 2. Process the ACTS
$play.PLAY.ACT | % { Process-Act -act $_ -web $playWeb}
 
Write-Host “Finished processing play: ” $play.PLAY.TITLE
Write-Host “——————————————————-“
}
 

Here, we are using a function called “Convert-ToSafeString” to get rid of spaces, quotes and other characters which will cause issues in a URL, and then create a new web for that play. We then walk down through the XML ($play.PLAY.ACT) and call “Process-Act”. This does a very similar thing, and also calls “Process-Scene”:$act.SCENE | % { Process-Scene -scene $_ -web $actWeb}

Powershell is great for doing this kind of iterative processing, and is really a time saver when it comes to writing quick scripts to do this kind of task.

Process-Scene is where we start actually putting in some content. For each scene, we want to create a Publishing Page and then add content to the Page Content field: 

function Process-Scene([System.Xml.XmlElement] $scene, [Microsoft.SharePoint.SPWeb] $web)
{
Write-Host “Processing the scene: “$scene.TITLE
## 1. Create the page
 
$pubweb = Get-SPPublishingWeb -web $web
$PageLayout = Get-SPPublishingPageLayout -web $web -name $basepagelayout
$pagename = Convert-ToSafeString ($scene.TITLE.Substring(0, $scene.TITLE.IndexOf(“.”)))
$pagecollection = [Microsoft.SharePoint.Publishing.PublishingPageCollection] $pubweb.GetPublishingPages()
$page = New-SPPublishingPage -PageCollection $pagecollection -name $pagename -pagelayout $pagelayout -title $scene.TITLE
 
## 2. Field: PageContent
$pagecontent = Transform-Xml -xsl $sceneXsl -xml $scene
$page.ListItem.Set_Item(“PublishingPageContent”, $pagecontent)      
 
## 5. Update and Publish
$page.Update()
$page.ListItem.Update()
$page.CheckIn($true)
$page.ListItem.File.Publish($true)
 
}
Here you can see we are creating a new publishing page [I have a utility function for this – it isn’t out of the box], and calling another utility function “Transform-Xml” to set the value of the PublishingPageContent field to the transformed content.
 
These utility functions are here: 
## ==============================================================================
## UTILITY FUNCTIONS
## ==============================================================================
 
function Get-SPPublishingWeb( [Microsoft.SharePoint.SPWeb] $web )
{
<#
       .SYNOPSIS
              Gets a SharePoint publishing web
       .EXAMPLES
                    
#>
       return [Microsoft.SharePoint.Publishing.PublishingWeb]::GetPublishingWeb($web)
}
 
## ==============================================================================
 
function Get-SPPublishingPageLayout(
       [Microsoft.SharePoint.SPWeb] $web,
       [string] $name)
{
<#
       .SYNOPSIS
              Gets a publishing page layout
       .EXAMPLES
                    
#>
       $pubWeb = Get-SPPublishingWeb($web)
       return $pubWeb.GetAvailablePageLayouts() | ? { $_.Name -eq $name}   
}
 
## ==============================================================================
 
function New-SPPublishingPage(
       [Microsoft.SharePoint.Publishing.PublishingPageCollection] $PageCollection,
       [string] $Name,
       [Microsoft.SharePoint.Publishing.PageLayout] $PageLayout,
       [string] $Title)
{
<#
       .SYNOPSIS
              Creates a publishing page in the SPPublishingPageCollection supplied
       .EXAMPLES
                    
#>
Write-Host “Creating page: $PageName”
       $newPage = $PageCollection.Add(($Name + “.aspx”), $PageLayout)
       $newPage.Title = $Title   
       $newPage.Update()
       $newPage.ListItem.Update()
 
       return [Microsoft.SharePoint.Publishing.PublishingPage] $newPage
}
 
## ==============================================================================
 
function Convert-ToSafeString([string] $s)
{
       return $s.Replace(” “, “-“).Replace(“‘”, “”).Replace(“,”,””)
}
 
## ==============================================================================
 
function Transform-Xml([System.Xml.Xsl.XslCompiledTransform] $xsl, $xml)
{
       $sw = New-Object System.IO.StringWriter
       $xsl.Transform($xml, $null, $sw) | Out-Null
       Write-Output $sw.ToString()
}