Thursday, November 17, 2011

A strategy for maintaining sequence in parallel processing in .Net 4.0

Most of us are aware how the offerings of parallel processing introduced in .Net 4.0 (actually with .Net 3.5 as well, but the integration was not as seamless) can bring significant performance benefits while processing CPU intensive tasks. In terms of practical usage, recently for one of my projects I was looking to take advantage of parallel processing while trying to split the image of a 8.5 x 11 inch page into multiple slices, and then process individual slices using GDI+ (some very CPU intensive processing as you can guess).
But there was a slight catch – while it was perfectly OK to process each slice in any CPU in any order, but before I returned the individual image back, I had to put the slices back in order so that the page image could be reconstituted from the individual slices that would be put back in the original sequence. Kind of goes like this:

Unfortunately, that’s when the issue started. When I looked up the info online, there were many wild guesses and some bits and pieces of info here and there using PLINQ’s AsOrdered() approach, but nothing concrete that showed a clean implementation. So I ended up writing my own class implementation to take care of this.
So, starting from scratch, here is the original non-parallel implementation:
//Main CPU intensive function that runs in sequential loop
public List<RealObject> PerformSequential()
{
    List<RealObject> myObjectArray = new List<RealObject>();
    for(int i=0; i< 100; i++)
    {
        myObjectArray.Add(GetMyRealObject());
    }
    return myObjectArray;
}

First step was to make the for-loop parallel:
//Main CPU intensive function that runs in parallel loop
public List<RealObject> PerformParallel()
{
    List<RealObject> myObjectArray = new List<RealObject>();
    Parallel.For(0, 100, i =>
    {
        myObjectArray.Add(GetMyRealObject());
    });
    return myObjectArray;
}

So obviously due to the parallel nature of the way work is performed, the list myObjectArray will have randomly positioned items e.g. i=2, followed by i=0, then 3 and then 1 etc. The intention is to be able to return the myObjectArray list after re-sorting the individual objects within it so that they correspond to i=0, 1, 2, 3 etc.
For doing this we would add a new class, which will hold the individual RealObject objects, and another variable that will hold the location pointer. This object will implement the IComparable interface and we will overload the CompareTo implementation so that later when we sort the objects, we can regain the original sequence of objects that they were intended to be received as:
//The intermediate object that will hold the true object and implement sorting
public class SortableRealObject : IComparable
{
    public int locationPointer;
    public RealObject realObject;

    public int CompareTo(object obj)
    {
        SortableRealObject thisObject = (SortableRealObject)obj;
        return this.locationPointer.CompareTo(thisObject.locationPointer);
    }
}

Then all we will need to do is populate this locationPointer within the loop, do a sort on the array, and finally repopulate the true object to be returned. So the modified code will look like this:
//Main CPU intensive function that runs in parallel loop
public List<RealObject> PerformParallel()
{
    List<RealObject> myObjectArray = new List<RealObject>();
    List<SortableRealObject> mySortableObjectArray = new List<SortableRealObject>();
    Parallel.For(0, 100, i =>
    {
        mySortableObjectArray.Add(new SortableRealObject(){locationPointer = i, realObject = GetMyRealObject()});
    });
    mySortableObjectArray.Sort();
    foreach(SortableRealObject mySortableObject in mySortableObjectArray)
    {
        myObjectArray.Add(mySortableObject. realObject;
    }
    return myObjectArray;
}

That’s it, we now have GetMyRealObject() that was performed parallel on multiple CPU cores, while the returned myObjectArray is perfectly sorted in the way they would have been if the for loop was performed sequentially in a regular for loop!

Tuesday, November 15, 2011

C#: Slice a Bitmap horizontally with a specified maximum (physical) size per slice

I will explain the problem in detail once I find some time... but here is the code for now:

Sample code to call the slicer class (using a PNG image in this example)
******************************************************
Bitmap baseBitmap = Bitmap.FromFile("c:\\mybitmap.png");
ImageSlicer slicer = new ImageSlicer();
List<Bitmap> slicedBitmaps = slicer.SliceImage(baseBitmap, 20000, ImageFormat.Png);

Code for the class
**************

using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;

namespace KodeSharp.ImageSlicer
{
    class ImageSlicer
    {
        public List<Bitmap> SliceImage(Bitmap baseBitmap, int maxSizeInBytes, ImageFormat imageType)
        {
            return GetSlicedImages(new List<Bitmap>() { baseBitmap }, maxSizeInBytes, imageType);
        }

        private List<Bitmap> GetSlicedImages(List<Bitmap> baseBitmaps, int maxSizeInBytes, ImageFormat imageType)
        {
            List<Bitmap> bmpOut = new List<Bitmap>();
            foreach (Bitmap bmp in baseBitmaps)
            {
                if (IsSliceTooBig(bmp, maxSizeInBytes, imageType))
                {
                    List<Bitmap> bmps = SliceImageInHalf(bmp);
                    bmpOut.AddRange(GetSlicedImages(new List<Bitmap>() { bmps[0] }, maxSizeInBytes, imageType));
                    bmpOut.AddRange(GetSlicedImages(new List<Bitmap>() { bmps[1] }, maxSizeInBytes, imageType));
                }
                else
                {
                    bmpOut.Add(bmp);
                }
            }
            return bmpOut;
        }

        private bool IsSliceTooBig(Bitmap bmp, int maxSizeInBytes, ImageFormat imageType)
        {
            MemoryStream ms = new MemoryStream();
            bmp.Save(ms, imageType);
            if (ms.Length > maxSizeInBytes)
                return true;
            else
                return false;
        }

        private List<Bitmap> SliceImageInHalf(Bitmap sourceImage)
        {
            int slice1Height = sourceImage.Height / 2;
            int slice2Height = sourceImage.Height - slice1Height;
            int sliceWidth = sourceImage.Width;

            Bitmap slice1 = sourceImage.Clone(new Rectangle(0, 0, sliceWidth, slice1Height), sourceImage.PixelFormat);
            Bitmap slice2 = sourceImage.Clone(new Rectangle(0, slice1Height, sliceWidth, slice2Height), sourceImage.PixelFormat);

            return new List<Bitmap>() { slice1, slice2 };
        }
    }
}

Wednesday, July 13, 2011

C#: Mass search and replace text in Word documents

Code sequence to call (remember to call Dispose() to close all WINWORD instances):
 
DocxSearchAndReplace docxSearchAndReplace = new DocxSearchAndReplace();
DocxSearchAndReplace.SearchAndReplaceResult result = docxSearchAndReplace.SearchAndReplace(parameters...);
docxSearchAndReplace.Dispose();

And finally here is the DocxSearchAndReplace.cs class:

using System;
using System.Collections.Generic;
using System.IO;
using Word = Microsoft.Office.Interop.Word;
 
namespace KodeSharp.MassDocx
{
    public class DocxSearchAndReplace : IDisposable
    {
        public object objMissing = System.Reflection.Missing.Value;
        public object objFalse = false;
        public object objTrue = true;
 
        //wordApp is used globally to make sure multiple calls to SearchAndReplace() in a loop is performance-optimized
        public Word.Application wordApp;
 
        public struct SearchAndReplaceResult
        {
            public string FileName;
            public int NumberOfChanges;
            public string Message;
            public SearchAndReplaceResult(string fileName, int numberOfChanges, string message)
            {
                FileName = fileName;
                NumberOfChanges = numberOfChanges;
                Message = message;
            }
        }
 
        public DocxSearchAndReplace()
        {
            wordApp = new Word.Application();
        }
 
        //Dispose must be called to dispose the Word COM objects
        public void Dispose()
        {
            ((Word._Application)wordApp).Quit();
            System.Runtime.InteropServices.Marshal.ReleaseComObject(wordApp);
            wordApp = null;
        }
 
        public SearchAndReplaceResult SearchAndReplace(FileInfo wordFile, string searchText, string replaceText,
            bool matchCase, bool matchWholeWord, bool matchWildcard, bool matchSoundsLike, bool matchAllWordForms,
            string sourceFamily, string sourceSize, bool sourceStyleCheck, bool sourceBold, bool sourceItalic, bool sourceUnderline,
            string targetFamily, string targetSize, bool targetStyleCheck, bool targetBold, bool targetItalic, bool targetUnderline,
            bool dummySave, bool trackChanges)
        {
            //Ignore temporary files that Word creates during a session and are left alone when Word is closed improperly or forcibly
            if (wordFile.Name.StartsWith("~$") || wordFile.Extension.ToLower() != ".docx")
                return new SearchAndReplaceResult(wordFile.Name, 0, "Invalid file");
 
            Word.Document wordDoc = wordApp.Documents.Open(wordFile.FullName);
 
            wordApp.Selection.Find.ClearFormatting();
            wordApp.Selection.Find.Replacement.ClearFormatting();
 
            if (!String.IsNullOrEmpty(sourceFamily)) wordApp.Selection.Find.Font.Name = sourceFamily;
            if (!String.IsNullOrEmpty(sourceSize)) wordApp.Selection.Find.Font.Size = (float)Convert.ToDouble(sourceSize);
            if (sourceStyleCheck)
            {
                wordApp.Selection.Find.Font.Bold = sourceBold ? 1 : 0;
                wordApp.Selection.Find.Font.Italic = sourceItalic ? 1 : 0;
                wordApp.Selection.Find.Font.Underline = sourceUnderline ? Word.WdUnderline.wdUnderlineSingle : Word.WdUnderline.wdUnderlineNone;
            }
 
            if (!String.IsNullOrEmpty(targetFamily)) wordApp.Selection.Find.Replacement.Font.Name = targetFamily;
            if (!String.IsNullOrEmpty(targetSize)) wordApp.Selection.Find.Replacement.Font.Size = (float)Convert.ToDouble(targetSize);
            if (targetStyleCheck)
            {
                wordApp.Selection.Find.Replacement.Font.Bold = targetBold ? 1 : 0;
                wordApp.Selection.Find.Replacement.Font.Italic = targetItalic ? 1 : 0;
                wordApp.Selection.Find.Replacement.Font.Underline = targetUnderline ? Word.WdUnderline.wdUnderlineSingle : Word.WdUnderline.wdUnderlineNone;
            }
 
            List<bool> trackChangeStatus = GetTrackChangeStatus(wordDoc);
            if (!dummySave)
                SetTrackChangeStatus(wordDoc, trackChanges);
 
            int numChanged = 0;
 
            //Body
            wordApp.Selection.WholeStory();
            Word.Selection wordSelection = wordApp.Selection;
            numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
 
            //TextBoxes within body
            Word.Shapes wordShapes = wordDoc.Shapes;
            numChanged += MakeChangesWithinShapes(ref wordApp, ref wordDoc, ref wordShapes, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
 
            //Headers, Footers and Textboxes within
            foreach (Word.Section wordSection in wordDoc.Sections)
            {
                //Headers
                wordSection.Headers[Word.WdHeaderFooterIndex.wdHeaderFooterEvenPages].Range.Select();
                numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordSection.Headers[Word.WdHeaderFooterIndex.wdHeaderFooterFirstPage].Range.Select();
                numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordSection.Headers[Word.WdHeaderFooterIndex.wdHeaderFooterPrimary].Range.Select();
                numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
 
                //Footers
                wordSection.Footers[Word.WdHeaderFooterIndex.wdHeaderFooterEvenPages].Range.Select();
                numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordSection.Footers[Word.WdHeaderFooterIndex.wdHeaderFooterFirstPage].Range.Select();
                numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordSection.Footers[Word.WdHeaderFooterIndex.wdHeaderFooterPrimary].Range.Select();
                numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
 
                //Textboxes within Headers
                wordShapes = wordSection.Headers[Word.WdHeaderFooterIndex.wdHeaderFooterEvenPages].Shapes;
                numChanged += MakeChangesWithinShapes(ref wordApp, ref wordDoc, ref wordShapes, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordShapes = wordSection.Headers[Word.WdHeaderFooterIndex.wdHeaderFooterFirstPage].Shapes;
                numChanged += MakeChangesWithinShapes(ref wordApp, ref wordDoc, ref wordShapes, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordShapes = wordSection.Headers[Word.WdHeaderFooterIndex.wdHeaderFooterPrimary].Shapes;
                numChanged += MakeChangesWithinShapes(ref wordApp, ref wordDoc, ref wordShapes, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
 
                //Textboxes within Footers
                wordShapes = wordSection.Footers[Word.WdHeaderFooterIndex.wdHeaderFooterEvenPages].Shapes;
                numChanged += MakeChangesWithinShapes(ref wordApp, ref wordDoc, ref wordShapes, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordShapes = wordSection.Footers[Word.WdHeaderFooterIndex.wdHeaderFooterFirstPage].Shapes;
                numChanged += MakeChangesWithinShapes(ref wordApp, ref wordDoc, ref wordShapes, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                wordShapes = wordSection.Footers[Word.WdHeaderFooterIndex.wdHeaderFooterPrimary].Shapes;
                numChanged += MakeChangesWithinShapes(ref wordApp, ref wordDoc, ref wordShapes, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
            }
 
            if (!dummySave)
                SetTrackChangeStatus(wordDoc, trackChangeStatus);
 
            ((Word._Document)wordDoc).Close(!dummySave, objMissing, objMissing);
            wordDoc = null;
 
            //Log
            string message;
            if (numChanged > 0)
                message = "Matches " + (dummySave ? "intercepted" : "successfully replaced");
            else
                message = "No match found";
 
            return new SearchAndReplaceResult(wordFile.Name, numChanged, message);
        }
 
        public int MakeChangesWithinShapes(ref Word.Application wordApp, ref Word.Document wordDoc, ref Word.Shapes wordShapes,
            string searchText, string replaceText,
            bool matchCase, bool matchWholeWord, bool matchWildcard, bool matchSoundsLike, bool matchAllWordForms,
            bool dummySave)
        {
            int numChanged = 0;
            for (int i = 0; i < wordShapes.Count; i++)
            {
                object idx = i + 1;
                Word.Shape wordShape = wordShapes.get_Item(ref idx);
                if (wordShape.Type == Microsoft.Office.Core.MsoShapeType.msoTextBox)
                {
                    wordShape.TextFrame.TextRange.Select();
                    numChanged += MakeChanges(ref wordApp, ref wordDoc, searchText, replaceText, matchCase, matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, dummySave);
                }
            }
            return numChanged;
        }
 
        public int MakeChanges(ref Word.Application wordApp, ref Word.Document wordDoc,
            string searchText, string replaceText,
            bool matchCase, bool matchWholeWord, bool matchWildcard, bool matchSoundsLike, bool matchAllWordForms,
            bool dummySave)
        {
            //Make dummy change to get count first
            wordApp.Selection.WholeStory();
            int numCharsBefore = wordApp.Selection.Characters.Count;
            string tempReplaceText = "@" + searchText;
            wordApp.Selection.Find.Execute(searchText, matchCase,
                matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, objTrue, Word.WdFindWrap.wdFindContinue, objMissing,
                tempReplaceText, Word.WdReplace.wdReplaceAll,
                objMissing, objMissing, objMissing, objMissing);
            wordApp.Selection.WholeStory();
            int numCharsAfter = wordApp.Selection.Characters.Count;
            int numChanged = numCharsAfter - numCharsBefore;
 
            //Undo changes if dummy mode
            if (dummySave)
            {
                wordDoc.Undo();
                wordDoc.UndoClear();
            }
            //Make actual change if otherwise
            else
            {
                wordApp.Selection.Find.Execute(tempReplaceText, matchCase,
                    matchWholeWord, matchWildcard, matchSoundsLike, matchAllWordForms, objTrue, Word.WdFindWrap.wdFindContinue, objMissing,
                    replaceText, Word.WdReplace.wdReplaceAll,
                    objMissing, objMissing, objMissing, objMissing);
            }
 
            return numChanged;
        }
 
        public List<bool> GetTrackChangeStatus(Word.Document wordDoc)
        {
            List<bool> trackChangeStatus = new List<bool>();
            trackChangeStatus.Add(wordDoc.TrackMoves);
            trackChangeStatus.Add(wordDoc.TrackFormatting);
            trackChangeStatus.Add(wordDoc.TrackRevisions);
            return trackChangeStatus;
        }
 
        public void SetTrackChangeStatus(Word.Document wordDoc, List<bool> trackChangeStatus)
        {
            wordDoc.TrackMoves = trackChangeStatus[0];
            wordDoc.TrackFormatting = trackChangeStatus[1];
            wordDoc.TrackRevisions = trackChangeStatus[2];
        }
 
        public void SetTrackChangeStatus(Word.Document wordDoc, bool trackChangeStatus)
        {
            wordDoc.TrackMoves = trackChangeStatus;
            wordDoc.TrackFormatting = trackChangeStatus;
            wordDoc.TrackRevisions = trackChangeStatus;
        }
    }
}