Pages

Friday, May 30, 2014

SSIS - Split a file into multiple files based on string pattern

Scenario:

I have a sample file which looks like this 



Test.his




I have a scenario, where in I need to split the above file into multiple files at every occurrence of the word "Group". 

It means, I need to split the above "Test.his" file  into 7 files (Why is it 7 ? Answer: Count the occurrences of the word "Group").

Solution: 

The solution for this gets simpler using an Execute SQL Task , Control Table in the database , For Each Loop Container and Script Task functionality. Lets see how we could achieve this. 

Step 1: 

Lets create a control table in our database. 



CREATE TABLE [dbo].[Test_ControlTable](

[GROUPNAME] [varchar](20) NULL,

[SEARCHINDEX] [varchar](50) NULL


GO


Step 2: 

Execute the below mentioned script to insert rows into the table. 

INSERT INTO [dbo].[Test_ControlTable] VALUES('PATIENT','Group:PATIENT');
GO
INSERT INTO [dbo].[Test_ControlTable] VALUES('APRENURS','Group:APRENURS');
GO
INSERT INTO [dbo].[Test_ControlTable] VALUES('PHYS','Group:PHYS');
GO
INSERT INTO [dbo].[Test_ControlTable] VALUES('PR','Group:PR');
GO
INSERT INTO [dbo].[Test_ControlTable] VALUES('PN','Group:PN');
GO
INSERT INTO [dbo].[Test_ControlTable] VALUES('PROCTIM','Group:PROCTIM');
GO
INSERT INTO [dbo].[Test_ControlTable] VALUES('STUDY','Group:STUDY');
GO

Your SELECT * from [dbo].[Test_ControlTable] gives the below results.




Step 3: 

Create a new project in Visual Studio(BIDS) with Project Name as Test.



Step 4:

Drag and drop Execute SQL Task into work space of the package.



And configure it as below

ResultSet            :  Full Result Set
 SQLStatement  :  SELECT GroupName, SearchIndex FROM [dbo].[Test_ControlTable]



In the Result set tab, 

Click on Add, and give Result name as 0 (Zero) and map it to Object Variable (create one in variables  window as shown in the image )




Step 5:

Drag and drop For Each Loop container and Script Task into the work space as shown below.


This means, the script task runs for every row that is coming from the control table.

Step 6: 

Create two variables in the variables window as shown below. 

                                        

Step 7: 

Configure the For Each Loop container as shown




Step 8: 

Edit the Script Task with the following script:


#region Namespaces
using System;
using System.Text;
using System.Linq;
using System.IO;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Collections.Generic;
using System.Text.RegularExpressions;
#endregion


public void Main()
{
// TODO: Add your code here

            //Declare and Initialize the variables 
            String vFileName = "C:\\Users\\tmhsxk35\\Documents\\Test.his" ;
            String vGroupName = Dts.Variables["User::vGroupName"].Value.ToString();
            String vSearchIndex = Dts.Variables["User::vSearchIndex"].Value.ToString();
            String vSplitFilePath = "C:\\Users\\tmhsxk35\\Documents\\SplitFiles\\";


            // Reading the file 
            String s = File.ReadAllText(vFileName, Encoding.Default);

            //Finding the location of vSearchIndex 
            Match match = Regex.Match(s, @"\b" + Regex.Escape(vSearchIndex) + @"\b",                           RegexOptions.IgnoreCase);

            //If the search is successful, then capture the position of the Index into vPos
            if (match.Success)
            {
                int vPos = match.Index;

                //Assigning the position of next occurence of "Group" to vNextPos
                int vNextPos = s.IndexOf("Group", vPos + 1);

                //Capturing the content between the occurences of "Group" into vContent
                String vContent = s.Substring(vPos, vNextPos - vPos - 2);

                //Writing the content to a file
                StreamWriter sw = File.CreateText(vSplitFilePath + vGroupName + ".csv");
                sw.Write(vContent);
                sw.Flush();
                sw.Close();

                Dts.TaskResult = (int)ScriptResults.Success;
            }  


Dts.TaskResult = (int)ScriptResults.Success;
}






Use the script mentioned above to split the files. 

Save the package. And execute the package to see the file split to 7 files at the path mentioned above. 






Please modify the script as required.

Thanks for taking time to check out the post!

P.S:

 I used For Each Loop Container, Control Table, Execute SQL Task because my searchIndex is a variable. If your searchIndex is constant, you dont have to use them. Just initialize everything in variables inside the script task. 

No comments:

Post a Comment