External BLOB/Binary Store for Windows SharePoint Services 3.0 in C#/.NET 2.0 - Part III

Part III – File Management

[Note: I know this took several months to get this blog entry completed and I apologize.  Co-worker attrition, project inheritance, and a personal life took priority to help maintain what bits of sanity that I still have left.  Hopefully this is worth the wait.]

Overview

So far in this series we have covered the history, Microsoft documentation, and the architectural decisions around implementing an External Binary Store.  We have also gone over how to implement the COM interface in painful detail.

In this blog we will cover the file management aspects of an EBS.  The first section will cover the file manager component and the second section will address the orphaned file cleanup.

The EBS File Manager

For the most part file management for an EBS is pretty straight forward.  You need to ability to store and retrieve files based on a particular store Id and a binary Id as well as being able to delete a list of files.  These three features will give you all of the functionality that you need to successfully and completely implement an EBS.

It is recommended by Microsoft that you store the files in a secured area that only the SharePoint AppPool user account has access to.  This limits who has access to the physical files being stored by the Binary Store. 

One thing you may want to consider is if you would like to be able to use this for other file storage outside of the EBS.  For example, you want to use this same assembly to store sensitive legal documents that are maintained through a home grown system.  If that’s the case then you’ll probably want to extend the methods to include support for store maintenance and content maintenance. Store maintenance could include the ability to create new stores, get a list of the content in a given store, remove a store, determine if a store exits for a given id, and the ability to compact a store (remove all binary Id's that don't exist in the supplied list of binary Id's).  The only additional content maintenance methods that I could thing of in addition to store and retrieve are remove and does content exist for the given id.

Speaking of Id's... I implemented mine to include support for both GUID based and string based Id's for the store and content Id's.  This provides support for the SharePoint implementation and for any other implementation.

Now that you have decided what all you want this manager to support, now you need to decide how you are going to implement it.  To make this work with the COM component I made it a GAC'd assembly.

The last thing to talk about is the configuration.  I wanted to be able to set the root of the binary store in a configuration file.  Since this is being loaded by a COM object within the SharePoint application domain (yes, I know it's not .NET but you know what I mean) there isn't anywhere for me to store a configuration setting...or is there?  This assembly is loaded in the GAC, so I poked around until I figure out that I could give this assembly it's own .config file even though it's in the GAC.  I wrote a blog on how to do this at: Configuration Files for GAC Assemblies

The EBS Orphaned File Cleanup Process

To implement the COM component for the EBS provider you will need to create a console application that uses your EBS File Manager component. 

This application will need to open a site collection and get the list of current ExternalBinaryId's (from the SPSite.ExternalBinaryIds property).  You will need to get a list of all of the Id's on stored on disk and determine which of those items is no longer referenced by SharePoint (via the ExternalBinaryIds property) and delete those pieces of orphaned content.  I implemented this by building up a list of Id's from the ExternalBinaryId's and then called the compact method in the EBS File Manager (see above.)

You should schedule to be run during a timeframe in which the BLOB content will not be modified and make sure that the account has full control over the directory where the files are physically stored. Also, remember to make this configurable to support multiple site collections should you host multiples on each server.

Summary

That concludes the coding part of this effort. By now you should have fully functional code base that you are ready to plug in to SharePoint and test.

In the next and final blog on this I'll cover deployment and debugging as well as any final thoughts that I have to share.

External BLOB/Binary Store for Windows SharePoint Services 3.0 in C#/.NET 2.0 - Part II

Part II – The COM Component

Overview

In the previous post in this series we discussed the exposure of an external storage API from Window SharePoint Services, Microsoft’s implementation documents, what I have been able to figure out as it relates to the implementation under the covers, the architectural decisions that you must make, and the architectural decisions that I’ve made for this blog series.  If you have not read it, please make sure that you do before you continue.

We are going to focus on the COM Component in this blog entry.  This is easily the most important and most difficult piece of this whole solution which makes this one the longest blog in the series…sorry.  This is also the most technical and detailed entry so I’m going to try my best to hold back on the sarcasm, but as a result this one will be very dry (not like the previous one was much better). 

For those who aren’t going to read the previous blog, all I’m going to tell you is that we are doing this as a C# .NET 2.0 solution.

Now, back to my attempted cure for insomnia and excitement…

The EBS Provider

To implement the COM component for the EBS provider you will need to create a C# class project, prepare the class project for GAC installation, create the interface files from the IDL, prepare the provider class for COM Interop, implement the interface methods, create the ILockByte support methods, and create the memory dereferencing method.

Create the C# Class Project
I’m assuming that you know how to create a C# Class project.  If not, then you can read the Microsoft docs here: http://msdn2.microsoft.com/en-us/library/ms173077(VS.80).aspx.

GAC Settings
For this you will need to add your key file to the project and then in the project properties on the “Signing” tab check the “Sign the assembly” checkbox and select the key file in the “Choose a strong name key file:” dropdown.  For further information on this see Global Assembly Cache concepts at http://msdn2.microsoft.com/en-us/library/yf1d93sz(VS.80).aspx

To help with deployment and development you should consider setting these values on the “Build Events” tab:
Pre-build event command line:
"$(DevEnvDir)..\..\SDK\v2.0\bin\gacutil" /u "$(TargetName)"

Post-build event command line:
"$(DevEnvDir)..\..\SDK\v2.0\bin\gacutil" /i "$(TargetPath)"

Run the post-build event:
On successful build

This will remove the project from the GAC before the build and add the project to the GAC after a successful build.

Interface Implementation
There are two interfaces that must be implemented for this component.  They are the ISPExternalBinaryProvider and the ILockBytes interfaces.

Here is the IDL for the ISPExternalBinaryProvider as provided in the Microsoft implementation documentation (http://msdn2.microsoft.com/en-us/library/bb802811.aspx):

/*************************************************
    File: extstore.idl
    Copyright (c): 2006 Microsoft Corp.
*************************************************/
import "objidl.idl";

[
    uuid(39082a0c-af6e-4ac2-b6f0-1a1ff6abbae1)
]

library SharePointBinaryStore
{
    [
        local,
        object,
        uuid(48036587-c8bc-4aa0-8694-5a7000b4ba4f),
        helpstring("ISPExternalBinaryProvider interface")
    ]
    interface ISPExternalBinaryProvider : IUnknown
    {
        HRESULT StoreBinary(
            [in] unsigned long cbPartitionId,
            [in, size_is(cbPartitionId)] const byte* pbPartitionId,
            [in] ILockBytes* pilb,
            [out] unsigned long* pcbBinaryId,
            [out, size_is(, *pcbBinaryId)] byte** ppbBinaryId,
            [out,optional] VARIANT_BOOL* pfAccepted);

        HRESULT RetrieveBinary(
            [in] unsigned long cbPartitionId,
            [in, size_is(cbPartitionId)] const byte* pbPartitionId,
            [in] unsigned long cbBinaryId,
            [in, size_is(cbBinaryId)] const byte* pbBinaryId,
            [out] ILockBytes** ppilb);
    }
}

 

For my implementation I took this IDL and ran it through the MIDL compiler (midl.exe) to get a type library and then through the Type Library Importer (tlbimp.exe) to get an assembly.  Using the IDL file that way created a bunch of gross looking code that was a pain to work with. I took some time through trial and error and came up with the following interface representations for both the ISPExternalBinaryProvider and the ILockBytes that work in a .NET implementation.  I think these are much cleaner and easier to work with.  By the way, each of these where in their own .cs file without any namespace information.

[ComImport, ComConversionLoss, Guid("48036587-C8BC-4AA0-8694-5A7000B4BA4F"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface ISPExternalBinaryProvider
{
[MethodImpl(MethodImplOptions.InternalCall, 
MethodCodeType=MethodCodeType.Runtime)]
    void StoreBinary([In] uint cbPartitionId, 
                     [In] ref byte pbPartitionId, 
                     [In, MarshalAs(UnmanagedType.Interface)] ILockBytes pilb, 
                     out uint pcbBinaryId, 
                     out IntPtr ppbBinaryId, 
                     [Optional] out bool pfAccepted);

    [MethodImpl(MethodImplOptions.InternalCall, 
MethodCodeType = MethodCodeType.Runtime)]
    void RetrieveBinary([In] uint cbPartitionId, 
                        [In] ref byte pbPartitionId, 
                        [In] uint cbBinaryId, 
                        [In] ref byte pbBinaryId, 
                        [MarshalAs(UnmanagedType.Interface)] out ILockBytes ppilb);
}

[ComImport, Guid("0000000A-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface ILockBytes
{
    [MethodImpl(MethodImplOptions.InternalCall, 
        MethodCodeType=MethodCodeType.Runtime)]
    void ReadAt([In] UInt64 ulOffset, 
                [In] IntPtr pv, 
                [In] uint cb, 
                [Out] out uint pcbRead);

    [MethodImpl(MethodImplOptions.InternalCall, 
MethodCodeType = MethodCodeType.Runtime)]
    void WriteAt([In] UInt64 ulOffset, 
                 [In] IntPtr pv, 
                 [In] uint cb, 
                 [Out] out uint pcbWritten);

    [MethodImpl(MethodImplOptions.InternalCall, 
MethodCodeType = MethodCodeType.Runtime)]
    void Flush();

    [MethodImpl(MethodImplOptions.InternalCall, 
        MethodCodeType=MethodCodeType.Runtime)]
    void SetSize([In] UInt64 cb);

    [MethodImpl(MethodImplOptions.InternalCall, 
        MethodCodeType=MethodCodeType.Runtime)]
    void LockRegion([In] UInt64 libOffset, 
                    [In] UInt64 cb, 
                    [In] uint dwLockType);

    [MethodImpl(MethodImplOptions.InternalCall, 
        MethodCodeType=MethodCodeType.Runtime)]
    void UnlockRegion([In] UInt64 libOffset, 
                      [In] UInt64 cb, 
                      [In] uint dwLockType);

    [MethodImpl(MethodImplOptions.InternalCall, 
        MethodCodeType=MethodCodeType.Runtime)]
    void Stat([Out, MarshalAs(UnmanagedType.Struct)] out STATSTG pstatstg, 
              [In] int grfStatFlag);
}

 

One thing that I want to point out now is the difference in method declaration for the StoreBinary and RetrieveBinary in the interfaces above versus the IDL and Microsoft’s documentation.  The documentation says that the methods must return an HRESULT of S_OK or E_FAIL, but the methods in the interface are declared as void.  The reason for this is that when I tried to implement them to return the HRESULT it caused the process to fail miserably and causes SharePoint to hang.  When I changed them to void and stopped returning values, then everything worked well.

COM Interop Preparation
In the project properties you will need to check the “Register for COM Interop” checkbox on the “Build” tab.

The provider class must inherit the ISPExternalBinaryProvider interface and have a public default constructor even if it is empty.  Also, you will need to set these class attributes.

•    [ProgId("[Object Name].[Class Name]")]
This is the ProgID for the COM class.  Details on the ProgIdAttribute class can be found at http://msdn2.microsoft.com/en-us/library/system.runtime.interopservices.progidattribute(VS.80).aspx.

•    [Guid("00000000-0000-0000-0000-000000000000")]
This is a new GUID generated using the GuidGen.exe tool shipped with Visual Studio. Detail on the GuidAttribute class can be found at http://msdn2.microsoft.com/en-us/library/system.runtime.interopservices.guidattribute(VS.80).aspx.

•    [ClassInterface(ClassInterfaceType.None)]
Defines the class interface type and for this project I used the above values.  Details on the ClassInterfaceAttribute call can be found at http://msdn2.microsoft.com/en-us/library/system.runtime.interopservices.classinterfaceattribute(VS.80).aspx.

 

Implement the Interface Methods
You must explicitly implement the interface members.  The easiest way to do this is let Visual Studio do this for you.  If you hover your mouse over the ISPExternalBinaryProvider after the : in the class definition you will notice a little blue rectangle/line under the “I”.  If you click on that or press “Shift+Alt+F10” you will get a couple of options including “Explicitly implement interface ‘ISPExternalBinaryProvider’”.  When it has finished it should look like this: 

void ISPExternalBinaryProvider.RetrieveBinary(… 

 

Notice that there is no explicit scope (public, private, internal, etc) on the method definitions.  That’s how they should be so don’t change it.

In the RetrieveBinary method you will basically need to dereference the partition Id, the binary Id, retrieve the byte[] from the EBS file manager, create an ILockBytes objects, and set the ppilb output parameter to the ILockBytes object.  I recommend doing all of this in a try/catch block and because of this you will need to explicitly set the ppilb equal to null at the top of the method to get the component to compile.

In the StoreBinary method you will need to dereference the partition Id, read the byte[] out of the ILockBytes, write the byte[] to a file associated with the partition Id which should create a new binary Id for you, dereference the binary Id into the pcbBinaryId for the size and the ppbBinaryId for the first byte of the binary Id, and finally you need to set the pfAccepted to true if you were able to write the file or false if SharePoint should take care of writing the file.  Again, this should be done in a try/catch block and the three output parameters should be set to default values.  Something to remember is that to dereference the binary Id going back to SharePoint that you should first convert it to a byte[].

In the following two sections I’ll discuss how to read the byte[] from and write the byte[] to an ILockBytes object and how to dereference the memory for the Id pointers.

ILockByte Support
The ILockBytes interface supports 3 methods that we use in this process: Stat, ReadAt, and WriteAt.  We don’t need to use any of the other exposed methods.

To read a byte[] from an ILockBytes interface you need to create an memory buffer using Marshal.AllocHGlobal([buffer size]) (I used 8192 for the buffer size), call the Stat method to get the number of bytes in the ILockBytes, create the a byte[] equal to the resulting size, and loop through reading the bytes from the ILockBytes using ReadAt and write them into the resulting byte[] (shown below).  Don’t forget to free the memory using Marshal.FreeHGlobal([buffer variable]).  So...I was wanting to give you the method for reading the bytes from an ILockBytes but that request was denied by my employer (I know, I don't know why either).  Anyway I found a way to do this in the Memory generation of Excel files.

do
{
    lockBytes.ReadAt(offset, buf, (UInt32)8192, out bytesRead);
    if (bytesRead > 0)
    {
        Marshal.Copy(buf, bytes, (Int32)offset, (Int32)bytesRead);
        offset += bytesRead;
    }
} while (bytesRead > 0);

 

To write a byte[] to an ILockBytes interface you will need to reference an external method in the OLE32.dll called CreateILockBytesOnHGlobal.  Here is the code for that declaration:

[DllImport("ole32.dll")] static extern int CreateILockBytesOnHGlobal(IntPtr hGlobal,
                                                                     bool fDeleteOnRelease,

                                                                     out ILockBytes ppLockbytes); 
 

Now that you have the external definition you can continue with writing a byte[] to an ILockBytes. To write the byte[] you need to create the resulting ILockBytes object, get the size of the byte[], call the CreateILockBytesOnHGlobal, allocate a buffer, and loop through writing the bytes into the ILockBytes using the WriteAt method (shown below).  Again, I was hoping to be able to give you the entire method, but that request was denied.  For this one, basically take the opposite of what you've done for reading from the ILockBytes.

while (byteSize > 0)
{
     bytesRead = (byteSize > 8192 ? 8192 : (Int32)byteSize);
     Marshal.Copy(bytes, (Int32)offset, buf, bytesRead);
     lockBytes.WriteAt(offset, buf, (UInt32)bytesRead, out bytesWritten);
     if (bytesWritten == 0)
     {
         throw new ApplicationException("Unable to write to contents");
     }
     offset += bytesWritten;
     byteSize -= bytesRead; 
} 

 

That pretty much covers reading a byte[] from and writing a byte[] to an ILockBytes interface.

Memory Dereferencing
The last thing that you need to do in this component is to have some way to dereference the pointers coming in from SharePoint and going out to SharePoint.  The only way to achieve this that I could find is to use unsafe code and in order to do that you will need to edit the project properties and select the “Allow unsafe code” in the “General” section on the “Build” tab.

To dereference the pointer coming in from SharePoint you will need to create a byte[] buffer to the size indicated, create a byte* equal to the incoming byte from SharePoint using the fixed keyword, and then perform a memory copy (shown below).  I chose to make this a method since it is needed multiple times and since it is a method using the fixed keyword it has to be flagged as unsafe. 

fixed (byte* refBytes = &bytes)
{
     Marshal.Copy(new IntPtr(refBytes), buffer, 0, (Int32)size); 
} 

 

To dereference the newly created binary Id going back to SharePoint you need to convert the Id to a byte[],  set the pcbBinaryId to the number of bytes in the Id, allocate the memory on the heap, and then copy the bytes into the allocated memory using the Marshal.Copy method.  Since this is needed only once I left this code in the StoreBinary method.  Here is the snippet:

pcbBinaryId = (UInt32)binaryIdBytes.Length;
ppbBinaryId = Marshal.AllocHGlobal((Int32)pcbBinaryId); 
Marshal.Copy(binaryIdBytes, 0, ppbBinaryId, (Int32)pcbBinaryId); 

 

You will notice that we are allocating memory on the heap without releasing it.  If we released it then SharePoint wouldn’t get our id back out.  This is pretty much undocumented in its entirety so I’m hoping the SharePoint is freeing this memory when it is finished with it, otherwise we will have a memory leak here.  Likewise, it isn’t clear who’s supposed to free the memory for the incoming partition Id, so we may have a memory leak there. (Believe me, I’ll rant about this and many other things in the Final Thoughts section of Part IV).

Summary

In this entry we covered all of the technical details for creating the COM Component.  You now know how to setup the COM Interop project, implement the required interface, get information in to and out of the ILockBytes interface, and dereference the memory for the values being passed back and forth with SharePoint.

In the next blog I’ll cover the details for implementing a file manager and an orphan file cleanup process.