An outside vendor is supplying flat files as part of the nightly load process into the system. The file that the vendor will be supplying will have a date attached to the file name. In addition, it is possible for the vendor to send multiple files over at the same time for processing. One important note, the files being sent will have the same structure for processing but the names of the files will differ on each run.
The issue for the processing is to figure out how to process the files into the system since hard coding the input file name to be loaded is not a viable option.
Solution
Take advantage of the For Each Loop Container and the Script task to process this scenario very easily.
Control Flow
The following screen print illustrates the setup of the SSIS package that will handle the processing of the vendor files:
-
In addition, here are the Connections associated with the above package Control Flow:
Task Breakdown
For this specific scenario, the Vendor is supplying their files via FTP. There is nothing exciting about this task, the task is setup to read the FTP address, grab all files in a specific location and move the files to a destination on my network. The connections being used are the Source Files connection, which is the destination location the network, and the Vendor FTP Connection.
The FTP scenario is not a requirement to receive the files, it is just a simple method to move files from an outside location onto the network and the point of this blog is not to go into details of the FTP task.
Once the files have been moved into the Source folder for processing, the For Each Loop Container is the main starting point of the dynamic processing. In the For Each Loop container there are seven different collection sets available for processing, the one that this process will use is the ForEach File Enumerator. The following is a screen shot of the For Each Loop Collection setup:
The specific property to note is the Retrieve file name options. In this scenario, the For Each Loop Container will be retrieving the Fully Qualified path of the file found.
When a file is found in the source folder, a variable will be populated with the file path. In the package, a User Variable called FilePath was created for the storage of the file path. The variable is declared in the Variable Mappings page, which is displayed below:
The container is now setup to loop through the collection of files in the source path. Now that we can loop through all files being sent from the vendor, the issue remaining is how to map each file found in the collection into the SQL Server database.
In the Control Flow diagram, there is a Script task and a Data Flow task inside of the For Each Loop Container, which will handle the processing of each file. The Script Task will be used to update the Data Flow task with the file to be processed but before detailing that task, the setup of the Data Flow task is needed to understand how the Script task will be used.
The Data Flow task is setup with a Flat File Source into a SQL Destination. The Flat File source was setup originally with a sample file from the Vendor to get the Meta Data about the column types into the process so a mapping can occur between the Flat File and the SQL Server table. The Flat File sample file is the Loop_Load_File Connection in the Connection Manager seen above. The following is the sample Data Flow:
For this scenario, it is a really simple load into table. However, this Data Flow has the Loop Source connected directly to the Loop_Load_File Connection, which can only contain one name. This is where the Script task becomes handy into changing the name of the file to be loaded.
The Script task has the FilePath user variable as a ReadOnlyVariable for the script input. The script design changes the Loop_Load_File Connection to be the path stored within the variable:
Public Sub Main()
Try
'dynamically set the connection to the found file
Dts.Connections("Loop_Load_File").ConnectionString = Dts.Variables("FilePath").Value.ToString
'return success
Dts.TaskResult = Dts.Results.Success
Catch
'error occurred, return failure
Dts.TaskResult = Dts.Results.Failure
End Try
End Sub
By processing this script before the calling of the Data Flow task, the Flat File connection has been updated to point to the file loaded from the collection and therefore allows the Data Flow task to connect to each file without having to hard code any file names in the process.
Lastly, in the Control Flow, there is an Archive script that moves the file from the Source folder into an Archive folder, which is a rather simple process.
Conclusion
The presented steps displayed a simple approach to processing Flat Files with different files names, but of the same structure, into a SQL Server destination with very minimal script needed and taking advantage of the power of the For Each Loop Container.
No comments:
Post a Comment