Part III – File Management
[Note: I know this took several months to get this blog entry completed and I apologize. Co-worker attrition, project inheritance, and a personal life took priority to help maintain what bits of sanity that I still have left. Hopefully this is worth the wait.]
Overview
So far in this series we have covered the history, Microsoft documentation, and the architectural decisions around implementing an External Binary Store. We have also gone over how to implement the COM interface in painful detail.
In this blog we will cover the file management aspects of an EBS. The first section will cover the file manager component and the second section will address the orphaned file cleanup.
The EBS File Manager
For the most part file management for an EBS is pretty straight forward. You need to ability to store and retrieve files based on a particular store Id and a binary Id as well as being able to delete a list of files. These three features will give you all of the functionality that you need to successfully and completely implement an EBS.
It is recommended by Microsoft that you store the files in a secured area that only the SharePoint AppPool user account has access to. This limits who has access to the physical files being stored by the Binary Store.
One thing you may want to consider is if you would like to be able to use this for other file storage outside of the EBS. For example, you want to use this same assembly to store sensitive legal documents that are maintained through a home grown system. If that’s the case then you’ll probably want to extend the methods to include support for store maintenance and content maintenance. Store maintenance could include the ability to create new stores, get a list of the content in a given store, remove a store, determine if a store exits for a given id, and the ability to compact a store (remove all binary Id's that don't exist in the supplied list of binary Id's). The only additional content maintenance methods that I could thing of in addition to store and retrieve are remove and does content exist for the given id.
Speaking of Id's... I implemented mine to include support for both GUID based and string based Id's for the store and content Id's. This provides support for the SharePoint implementation and for any other implementation.
Now that you have decided what all you want this manager to support, now you need to decide how you are going to implement it. To make this work with the COM component I made it a GAC'd assembly.
The last thing to talk about is the configuration. I wanted to be able to set the root of the binary store in a configuration file. Since this is being loaded by a COM object within the SharePoint application domain (yes, I know it's not .NET but you know what I mean) there isn't anywhere for me to store a configuration setting...or is there? This assembly is loaded in the GAC, so I poked around until I figure out that I could give this assembly it's own .config file even though it's in the GAC. I wrote a blog on how to do this at: Configuration Files for GAC Assemblies
The EBS Orphaned File Cleanup Process
To implement the COM component for the EBS provider you will need to create a console application that uses your EBS File Manager component.
This application will need to open a site collection and get the list of current ExternalBinaryId's (from the SPSite.ExternalBinaryIds property). You will need to get a list of all of the Id's on stored on disk and determine which of those items is no longer referenced by SharePoint (via the ExternalBinaryIds property) and delete those pieces of orphaned content. I implemented this by building up a list of Id's from the ExternalBinaryId's and then called the compact method in the EBS File Manager (see above.)
You should schedule to be run during a timeframe in which the BLOB content will not be modified and make sure that the account has full control over the directory where the files are physically stored. Also, remember to make this configurable to support multiple site collections should you host multiples on each server.
Summary
That concludes the coding part of this effort. By now you should have fully functional code base that you are ready to plug in to SharePoint and test.
In the next and final blog on this I'll cover deployment and debugging as well as any final thoughts that I have to share.