Bulk File Importer
How it works
The Bulk File Importer is a console application that connects to a Content Repository of an existing Sense/Net installation and imports files from the given sourcepath to the specified target location. During importing it also indexes content therefore searching among the imported files is immediately available in the target Sense/Net installation after import. The Sense/Net website needs to be shut down during import.
We have conducted a number of tests importing a real-life data set including vast amount files from our very own intranet installations.
|sec||hour||folder||file||total count||MB||cps||MB/s||avg size (B)||commit||buffer (kB)||thread|
|32026||8,9||299 284||1 554 405||1 853 689||328 430||57||10||221 553||5 000||10 240||100|
Our test environment
- IBM X3550 rack mount server:
- 2 x Intel Xeon E5420 2.5 MHz processor
- 32GB Memory
- 4 x 146GB SAS HDD (Raid-10) LuceneIndex
- HP MSA100 SAN (2Gbit Fibre Channel connection)
- 4x300GB SCSI HDD (Raid-10) SOURCE: documents to import
- 4x300GB SCSI HDD (Raid-10) DESTINATION: SQL database disk
- Windows Server 2008R2 64-bit with Service Pack 1 and all update installed
- Microsoft SQL Server 2008R2 64-bit Service Pack 1 and Cumulative update package 4 installed
- Boost SQL server priority enabled
Recommended test environment
- 2 x Intel Xeon E55xx series processor
- >32GB Memory
- SAS Raid10 or Raid5 (with minimum of 4 disks) volumes as source
- SAS Raid10 (with minimum of 4 disks, 8 disks are recommended) as destination SQL database disk
- SAS Raid-1 (4 SAS disk Raid10 is recommended if possible) for LuceneIndex destination
If above volumes are from a SAN, 4/8Gbit Fibre Channel connection is recommended. The whole import process speed depends on the source and destination volume performance.
The following description shows how to create a clean install and start bulk import right away:
=== Configure database === 1. Create new empty database. 2. Set recovery mode to Simple (Database/Properties/Options) - only during the import to make it faster. 3. Set file autogrowth to at least 500MB (Database/Properties/Files/Autogrowth/File growth) 4. Restart SQL, so that memory usage drops to minimum levels. === Configure executables === 1. Set datasource and initial catalog in the following files: - Deployment\InstallSenseNet.bat - WebSite\Web.config - WebSite\bin\import.exe.config - WebSite\bin\indexpopulator.exe.config - TurboImport\TurboImport.exe.config 2. Set files path in TurboImport\TurboImport.exe.config: <add key="SourcePath" value="\\reposql01\50000"/> 3. Set ContentRepository path in TurboImport\TurboImport.exe.config: <add key="TargetPath" value="/Root/Import"/> 4. Set indexdirectory path in TurboImport\TurboImport.exe.config: <add key="IndexDirectoryPath" value="..\WebSite\LuceneIndex"/> (this should point to the same location where the indexdirectory of the website is located) 5. Set max threadcount in TurboImport\TurboImport.exe.config: <add key="MaxThreads" value="100"/> (the default value is ok for an 8-core installation. Too much threads could cause too much overhead on computers having less core count). === Install Sense/Net === Go to Deployment and execute InstallSenseNet.bat This will create the database and import the startup Content Repository. === Check Sense/Net installation by running it === 1. Create a new IIS website and set it to the WebSite folder. 2. Check that the portal is running so all above configuration settings were correct. 3. Stop website in IIS 4. Stop IIS service if SQL and IIS is on the same machine (to release memory). === Import folders & files === Do this after website is stopped. Go to Deployment and execute TurboImport.bat This will import all files specified under the path given with the SourcePath config element. The importer logs current status to console and also creates the following files: detailedlog.csv - a detailed information log of all imported files errorlog.txt - an error log containing exception messages that occured during import importlog.txt - an excerpt of status infos similar to the console output === Start Sense/Net website === Do this after 'Import folders & files' is finished. 1. Set database recovery mode back to the required level. 2. Check the following settings: - for successful start after import the ContentRepository allowed name characters should be set to allow everything excluding the '*' character. This might not be restrictive enough for a real life ECMS Repository. You can change this setting with the <add key="InvalidNameCharsPattern" value="[*]" /> element in web.config setting. TurboImport will import everything regardless of this setting. - the RestoreIndex feature should be turned off for the website, as the possibly huge LuceneIndex will not be stored in the database. When using multiple web nodes manually scatter contents of the LuceneIndex folder. This option is can be configured with the <add key="RestoreIndex" value="false"/> element in web.config setting. 3. Start website (also IIS service if stopped), and check imported files in Content Repository.
There are no external references for this article.