UNIX ORACLE SQL SERVER WINDOWS 技术: 2008-07-27

From

http://blogs.conchango.com/jamiethomson/archive/2006/01/05/SSIS_3A00_-Suggested-Best-Practices-and-naming-conventions.aspx

SSIS: Suggested Best Practices and naming conventions

I thought it would be worth publishing a list of guidelines that I see as SSIS development best practices. These are my own opinions and are based upon my experience of using SSIS over the past 18 months. I am not saying you should take them as gospel but these are generally tried and tested methods and if nothing else should serve as a basis for you developing your own SSIS best practices.

One thing I really would like to see getting adopted is a common naming convention for tasks and components and to that end I have published some suggestions at the bottom of this post.

This list will get added to over time so if you find this useful keep checking back here to see updates!

If you know that data in a source is sorted, set IsSorted=TRUE on the source adapter output. This may save unnecassary SORTs later in the pipeline which can be expensive. Setting this value does not perform a sort operation, it only indicates that the data it sorted.
Rename all Name and Description properties from the default. This will help when debugging particularly if the person doing the debugging is not the person that built the package.
Only select columns that you need in the pipeline to reduce buffer size and reduce OnWarning events at execution time
Following on from the previous bullet point, always use a SQL statement in an OLE DB Source component or LOOKUP component rather than just selecting a table. Selecting a table is akin to "SELECT *..." which is universally recognised as bad practice. (http://www.sqljunkies.com/WebLog/simons/archive/2006/01/20/17865.aspx). In certain scenarios the approach of using a SQL statement can result in much improved performance as well (http://blogs.conchango.com/jamiethomson/archive/2006/02/21/2930.aspx).
~~Use SQL Server Destination as opposed to OLE DB Destination where possible for quicker insertions~~ I used to recommend using SQL Server Destinations wherever possible but I've changed my mind. Experience from around the community suggests that the difference in performance between SQL Server Destination and OLE DB Destination is negligible and hence, given the flexibility of packages that use OLE DB Destinations it may be better to go for the latter. Its an "it depends" consideration so you should consider what you prefer based on your own testing.
Use Sequence containers to organise package structure into logical units of work. This makes it easier to identify what the package does and also helps to control transactions if they are being implemented.
Where possible, use expressions on the SQLStatementType property of the Execute SQL Task instead of parameterised SQL statements. This removes ambiguity when different OLE DB providers are being used. It is also easier. (UPDATE: There is a caveat here. Results of expressions are limited to 4000 characters so be wary of this if using expressions ).
If you are implementing custom functionality try to implement custom tasks/components rather than use the script task or script component. Custom tasks/components are more reusable than scripted tasks/components. Custom components are also less bound to the metadata of the pipeline than script components are.
Use caching in your LOOKUP components where possible. It makes them quicker. Watch that you are not grabbing too many resources when you do this though.
LOOKUP components will generally work quicker than MERGE JOIN components where the 2 can be used for the same task (http://blogs.conchango.com/jamiethomson/archive/2005/10/21/2289.aspx).
Always use DTExec to perf test your packages. This is not the same as executing without debugging from SSIS Designer (http://www.sqlis.com/default.aspx?84).
Use naming conventions for your tasks and components. I suggest using acronymns at the start of the name and there are some suggestions for these acronymns at the end of this article. This approach does not help a great deal at design-time where the tasks and components are easily identifiable but can be invaluable at debug-time and run-time. e.g. My suggested acronymn for a Data Flow Task is DFT so the name of a data flow task that populates a table called MyTable could be "DFT Load MyTable".
If you want to conditionally execute a task at runtime use expressions on your precedence constraints. Do not use an expression on the "Disable" property of the task.
Don't pull all configurations into a single XML configuration file. Instead, put each configuration into a seperate XML configuration file. This is a more modular approach and means that configuration files can be reused by different packages more easily.
If you need a dynamic SQL statement in an OLE DB Source component, set AccessMode="SQL Command from variable" and build the SQL statement in a variable that has EvaluateAsExpression=TRUE. (http://blogs.conchango.com/jamiethomson/archive/2005/12/09/2480.aspx)
When using checkpoints, use an expression to populate the CheckpointFilename property which will allow you to include the value returned from System::PackageName in the checkpoint filename. This will allow you to easily identify which package a checkpoint file is to be used by.
When using raw files and your Raw File Source Component and Raw File Destination Component are in the same package, configure your Raw File Source and Raw File Destination to get the name of the raw file from a variable. This will avoid hardcoding the name of the raw file into the two seperate components and running the risk that one may change and not the other.
Variables that contain the name of a raw file should be set using an expression. This will allow you to include the value returned from System::PackageName in the raw file name. This will allow you to easily identify which package a raw file is to be used by. N.B. This approach will only work if the Raw File Source Component and Raw File Destination Component are in the same package.
Use a common folder structure (http://blogs.conchango.com/jamiethomson/archive/2006/01/05/2559.aspx)
Use variables to store your expressions (http://blogs.conchango.com/jamiethomson/archive/2005/12/05/2462.aspx). This allows them to be shared by different objects and also means you can view the values in them at debug-time using the Watch window.
Keep your packages in the dark (http://www.windowsitpro.com/SQLServer/Article/ArticleID/47688/SQLServer_47688.html). In summary, this means that you should make your packages location unaware. This makes it easier to move them across environments.
If you can, filter your data in the Source Adapter rather than filter the data using a Conditional Split transform component. This will make your data flow perform quicker.
When storing information about an OLE DB Connection Manager in a configuration, don't store the individual properties such as Initial Catalog, Username, Password etc... just store the ConnectionString property.
Your variables should only be scoped to the containers in which they are used. Do not scope all your variables to the package container if they don't need to be.
Employ namespaces for your packages
Make log file names dynamic so that you get a new logfile for each execution.
Use ProtectionLevel=DontSaveSensitive. Other developers will not be restricted from opening your packages and you will be forced to use configurations (which is another recommended best practice)
Use annotations wherever possible. At the very least each data-flow should contain an annotation.
Always log to a text file, even if you are logging elsewhere as well. Logging to a text file has less reliance on external factors and is therefore most likely to contain all informatoin required for debugging.
Create a new solution folder in Visual Studio Solution Explorer in order to store your configuration files. Or, store them in the 'miscellaneous files' section of a project.
Always use template packages to standardise on logging, event handling and configuration.
If your template package contains variables put them in a dedicated namespace called "template" in order to differentiate them from variables that are added later.
Break out all tasks requiring the Jet engine (Excel or Access data sources) into their own packages that do nothing but that data flow task. Load the data into Staging tables if necessary. This will ensure that solutions can be migrated to 64bit with no rework. (Thanks to Sam Loud for this one. See his comment below for an explanation)

The acronymns below should be used at the beginning of the names of tasks to identify what type of task it is.

Task	Prefix
For Loop Container	FLC
Foreach Loop Container	FELC
Sequence Container	SEQC
ActiveX Script	AXS
Analysis Services Execute DDL	ASE
Analysis Services Processing	ASP
Bulk Insert	BLK
Data Flow	DFT
Data Mining Query	DMQ
Execute DTS 2000 Package	EDPT
Execute Package	EPT
Execute Process	EPR
Execute SQL	SQL
File System	FSYS
FTP	FTP
Message Queue	MSMQ
Script	SCR
Send Mail	SMT
Transfer Database	TDB
Transfer Error Messages	TEM
Transfer Jobs	TJT
Transfer Logins	TLT
Transfer Master Stored Procedures	TSP
Transfer SQL Server Objects	TSO
Web Service	WST
WMI Data Reader	WMID
WMI Event Watcher	WMIE
XML	XML

These acronymns should be used at the beginning of the names of components to identify what type of component it is.

Component	Prefix
DataReader Source	DR_SRC
Excel Source	EX_SRC
Flat File Source	FF_SRC
OLE DB Source	OLE_SRC
Raw File Source	RF_SRC
XML Source	XML_SRC
Aggregate	AGG
Audit	AUD
Character Map	CHM
Conditional Split	CSPL
Copy Column	CPYC
Data Conversion	DCNV
Data Mining Query	DMQ
Derived Column	DER
Export Column	EXPC
Fuzzy Grouping	FZG
Fuzzy Lookup	FZL
Import Column	IMPC
Lookup	LKP
Merge	MRG
Merge Join	MRGJ
Multicast	MLT
OLE DB Command	CMD
Percentage Sampling	PSMP
Pivot	PVT
Row Count	CNT
Row Sampling	RSMP
Script Component	SCR
Slowly Changing Dimension	SCD
Sort	SRT
Term Extraction	TEX
Term Lookup	TEL
Union All	ALL
Unpivot	UPVT
Data Mining Model Training	DMMT_DST
DataReader Destination	DR_DST
Dimension Processing	DP_DST
Excel Destination	EX_DST
Flat File Destination	FF_DST
OLE DB Destination	OLE_DST
Partition Processing	PP_DST
Raw File Destination	RF_DST
Recordset Destination	RS_DST
SQL Server Destination	SS_DST
SQL Server Mobile Destination	SSM_DST

UNIX ORACLE SQL SERVER WINDOWS 技术

7/31/2008

EXCERPT: SSIS: Suggested Best Practices and naming conventions

From

http://blogs.conchango.com/jamiethomson/archive/2006/01/05/SSIS_3A00_-Suggested-Best-Practices-and-naming-conventions.aspx

SSIS: Suggested Best Practices and naming conventions

Post a Comment

经常去的地方

Who am I

最近看过的电影

Archived

最近看过的书