Friday, January 20, 2012

SharePoint WCM HTML clean-up

SharePoint is a great web content management system, it is fast, scalable, reliable, comes with lots of out-of-the-box components, web parts, etc. that often make the life of the content managers much easier. There are certain aspects of the WCM capabilities of SharePoint though that sometimes need a little more time or some hacking to get them to work properly, or the way you may want them to work. One such thing is the HTML code that appears on your SharePoint pages - there are several things in the HTML generated by the SharePoint WCM system which make it look not quite neat and tidy. The SharePoint UI is built on top of the asp.net Web Forms technology, so SharePoint actually inherits some of the HTML issues directly from its asp.net foundation. The problem here is that when you use asp.net and Web Forms you don't have full control over the HTML that is going to be generated in your page. With the advent of the asp.net MVC this was one of the arguments in favor of the latter, because with asp.net MVC the developer indeed has full control over the generated HTML code. Unfortunately SharePoint doesn't utilize the MVC framework, so many of us at one or another point have had to struggle with the extra HTML bits that get injected in the SharePoint aspx page. Some examples for such bits that come directly from asp.net are the many system hidden fields that appear in the "form" element, the infamous "ViewState" field among them which can grow very big in size, the inline JavaScript blocks with "form" submit helpers, etc. Several intrinsically SharePoint items that further inflate your HTML are for instance the inclusion of the two "core" files: "core.js" and "core.css" (quite big both of them), the many nested HTML "table" elements around your web parts which are rendered by the containing WebPartZone controls, especially in cases when you want your HTML to contain only nice looking "div" elements, etc. The dilemma here is that because SharePoint utilizes in-place page content editing and it is a single aspx file that handles both the editing process and the actual displaying of the page to the end user, the items (web controls in most cases) responsible for these extra (but necessary) HTML artifacts cannot be removed directly from the page. So, we need them for the page content editing, but on the other hand we need to somehow get rid of them, or hide them, or at least suppress the extra HTML that they generate so that the page in display mode shows only the bare minimum of HTML that needs to render the page contents. In the WCM context, I assume here that the SharePoint site is publicly accessible or at least allows anonymous access within some internal network, so the hiding of the extra artifacts is necessary only when the pages are being accessed anonymously. This is a pretty broad scenario and this particular setup is quite popular in the WCM function of SharePoint. The next question is how many of the "extra" HTML SharePoint artifacts may be unwanted in your scenario. If it is about simple content pages with SharePoint field controls only or standard content editor web parts you actually won't need any of the above mentioned bits in display mode with anonymous access. This is especially true when your HTML design is very different from the standard SharePoint page design.
So, after several years and several partial solutions I decided to wrap up the whole thing in a single solution. And it turned out that the solution was pretty easy and simple to develop, and luckily - very easy to use too. It is actually a single user control that you need to place in one place only in your master page. And that's all. The control has several public properties that can be used to configure it, so that it suppresses some of the SharePoint artifacts that it can handle but not others (I will explain these in detail shortly). I chose to create the control as a user control (and there is no code behind assembly, the code is placed inline in the ascx file directly) because this way you have the two deployment options - to either place it in the TEMPLATE/CONTROLTEMPLATES folder of the SharePoint hive, or to upload it to the Master Page Gallery of your site and reference it in your master page from there (I explained this technique in this recent posting of mine). Of course, the code can be easily transferred to a simple web control and put in an assembly of yours.
You can download the user control that I named appropriately "HideSPArtifacts.ascx" from here.
Provided that you have uploaded it to the Master Page Gallery of your site collection (/_catalogs/masterpage) you will need the following lines of code to place it in your master page file:
First you need the "Register" directive at the top of your master page

<%@ Register="" TagPrefix="MyControls" TagName="HideSPArtifacts" Src="~SiteCollection/_catalogs/masterpage/HideSPArtifacts.ascx" %>


The second bit is to place the control declaration in the page mark-up:

<MyControls:HideSPArtifacts runat="server" RemoveCoreJS="false" RemoveCoreCss="false" RemoveHeadCss="false" RemoveForm="false" AddBodyOnLoadDummy="false" EnablePageViewState="true" RemoveZoneHeaders="false"/>


Two very important notes here: 1) if you upload the user control to the Master Page Gallery of your site collection you will have to make additionally certain modifications to your web.config file (check the previous posting that I mentioned above). 2) You need to place the control's declaration (MyControls:HideSPArtifacts) immediately after the opening "html" element of your page and before the "head" HTML element.
One other thing that you should check in your master page is whether you have a "head" element and whether it has the runat="server" attribute (if you have used one of the SharePoint master pages as a base for your master page you will have these). If this condition is not satisfied the control won't be able to remove some of the SharePoint artifacts from the page.
So, after you have the user control in your master page and open a page from your site anonymously (if you view the page as authenticated user the control will do nothing) and you have the values of the control's properties as they are in the snippet above you will see ... no changes in the HTML code of the page. This is because all "Remove.." properties are set to "false". I will now give you a list with the properties of the control and will briefly explain what changes in the generated HTML you will see after setting each property:
  • RemoveCoreJS - as the name suggests, if this property is set, the include script declaration for the SharePoint's "core.js" file is removed from the page
  • RemoveCoreCss - when set, this property causes the core.css style sheet include to be removed from the page. Note that if you use alternate style sheets, these won't be rendered either. This is because the HideSPArtifacts control will block the rendering of the standard SharePoint CssLink control (if available).
  • RemoveHeadCss - when you use certain web controls like the TreeView control, the AspMenu control and some other controls, the asp.net page generates an inline CSS block in its "head" HTML element. If you don't want this inline style sheet to appear in the page (check if this doesn't affect any of the controls that you use on the page), set this property to true.
  • RemoveZoneHeaders - the WebPartZone controls that contain your web parts have the bad habit of creating several nested "table" HTML elements. The web parts' chrome (frame) which you most often set to "none", because you don't need it in WCM public sites also renders a "table" element. If you don't want any of these "table" elements and want to have only the HTML markup directly rendered by your web parts set this property to true. Note that even if you set the "ChromeType" property of the web part, no chrome will be actually rendered (in anonymous mode only).
  • EnablePageViewState - the default value of this property is true and in this case the control will change nothing on the page. If you set this property to false it will simply set the EnableViewState property of the containing page to false (only when the page is viewed anonymously). The net effect will be that ... you will still have the "ViewState" hidden field in your page, but it will contain only thirty or so bytes of data.
  • RemoveForm - when set this property removes the "form" element from your page. Actually it does something much more radical - it removes also all system hidden fields including the ViewState field and all inline JavaScript blocks that were included using the methods of Page.ClientScript - e.g. RegisterClientScriptBlock, RegisterStartupScript, etc. You will get rid of a ton of HTML and JavaScript in your page, which you wouldn't need if you don't have controls and logic that need to do POST submits of the page. If your pages (at least the pages using the master page with the HideSPArtifacts control) contain only SharePoint field controls and content editor web parts this will be a perfect choice. Note however that you will need to carefully check all your pages - some controls (like Button, LinkButton, etc) directly crash if there is no "form" rendered on the page. Other controls may stop function properly because they will miss some JavaScript code that won't get rendered. Bottom line - use cautiously.
  • AddBodyOnLoadDummy - the default value of this property is true. It has a visible effect only when the "RemoveForm" property is set to true. It adds a small JavaScript block with several empty JavaScript functions. One of these functions is called "_spBodyOnLoadWrapper". This function appears in the "onload" attribute of the "body" element of the default SharePoint master page. Since the "RemoveForm" property removes all inline JavaScript associated with the page's "form" element, the real definition of this JavaScript function won't be available on the page, and you will see a JavaScript error in your browser when the page loads. This is the reason why this property causes the adding of this small JavaScript block with empty definitions of this and two other system JS functions.

And now, let me briefly explain how the trick with hiding a control without hiding its contents is possible. Actually the idea is to hide the control itself (or at least parts of it) but display its child controls. This technique is used in the implementation of the "RemoveForm", "RemoveHeadCss" and "RemoveZoneHeaders" properties. The following steps are executed:

  • in its OnInit override the HideSPArtifacts hooks onto the parent page's InitComplete event
  • in the InitComplete event handler an empty control (class Control) is created in the "Controls" collection of the parent control of the control which we want to hide. The new control is inserted in the "Controls" collection of the parent control right after the target control.
  • The SetRenderMethodDelegate method of the new empty control is called - this method receives a single delegate parameter, which you use to provide a method to be called right after the control's "Render" method exits. The idea is to use the empty control as a place-holder to inject some HTML right after the control that we want to hide.
  • In the "Render" method override of the HideSPArtifacts control the "Visible" property of the control that we want to hide is set to false. Since we have placed the HideSPArtifacts right after the beginning of the master page its Render method is guaranteed to be called first in the child controls' chain. This way the control whose Visible property is set to false will not get rendered.
  • in the render method passed as the render delegate parameter of the SetRenderMethodDelegate method, the Controls collection of the target control is iterated and all child controls are rendered using the Control.RenderControl method. This way we have the target control itself not rendered but all its children get actually rendered within the empty control that was injected right after it. This is how the goal of hiding the control itself but not its child controls is achieved.

You can check the source code of the HideSPArtifacts control for the details of the actual implementation.