6/14/2007

数据库的调整和优化--Coefficient from QUEST


数据库的调整和优化可能是一个数据库管理员的最主要也是最关键的任务之一。
遗憾的是,对像SQL Server这样的数据库进行调整和性能调节并不是一件容易的事。SQL Server带有几种监控和调试工具,功能强大,但精通掌握十分困难。

数据库管理员制胜之宝(图一)
图1 Coefficient类似Profiler

然而,现在有了Coefficient。它是Intrinsic Design公司发布的新产品,对初学者和有经验的数据库管理员都适用。 Coefficient可以帮助加强SQL Server的性能调节,提高操作员监控SQL Server内部工作的能力。最有用处的是,它能把监控数据 以几乎所有的数据库管理员都可以理解的格式呈列出来。
Coefficient并不是Performance Monitor, Query Analyzer或者Profiler的完全替换品。 Coefficient更像加强的SQL Server Profiler。它的工作方式十分类似于Profiler,但比Profiler更为容易使用 和理解。(如图1)

Coefficient运行过程
Coefficient可以安装在SQL Server服务器上,也可以安装在任何通过网络联结在SQL Servers上的计算机上。安装十分简单,安装完毕就可立即进行监控工作了。一般来说,按下列步骤进行:
首先在主页面上增加一个SQL Server,就像使用Enterprise Manager注册一个SQL Server一样。
在列表上选择一个想要分析的数据库,然后建立一个跟踪,即通常所说的Trace,有点像用SQL Server Profiler建立的跟踪文件,但是比 它更容易操作:不必经过多少如Events、Data Columns和Filters之类的复杂工作。替而代之的是一种逐级的向导,只需要设置几个简单 的参数即可完成,包括指定Trace表名(Coefficient用来存储跟踪数据的SQL Server表)、设置Trace运行时间(试用版最多只能 运行20分钟)、设置分析是否在Trace结束之后立即执行,等等。
如果选择在Trace完成之后立即运行分析(Analysis),还需要指定分析文件存储的地方和运行分析的类型,这些常常是通过选择模板完成的,也就是说,模板(Template)是用来指定想要执行的分析的具体类型。
完成向导,Trace就开始了,一直运行到指定的时间跑完为止(如图2)。在这期间,监控所得到的数据保存在SQL Server中指定的Trace表中。随后可以立即进行Analysis,用户也可以在任何时候对Trace表中的监控数据进行多种分析。
分析的结果保存为很多HTML文件,用户可以方便地打开并点击查看各项内容。更重要的是,分析结果能方便地发布在内部网的网络服务器上,供多个用户使用浏览器进行查阅。

Analysis是Coefficient的关键
在介绍分析结果之前,让我们先了解模板(Template)和分析(Analysis)类型。当用Coefficient执行分析时,可以选择一个已预定 义好的模板,也可以建立自己的模板。实际上,模板就是一个很多分析类型的列表,建立模板就是选择需要进行的分析对象。通过这个列表, Coefficient处理从数据库中收集的跟踪数据。Coefficient总共可以分析多达49种不同的指标,这些指标都和SQL Server内部 工作有关。主要有以下典型性能分析:
客户中断请求和中断连接信息。通过attention events常常可以查出在客户端和服务器之间的通讯问题。
数据库管理员制胜之宝(图二)
SQL Server和NT用户连接信息。例如哪个用户连接了多长时间。
死锁次数及导致这些死锁的信息。
SQL Server错误和警告信息。
表统计的遗漏信息。可以帮助确定统计是否存在任何遗漏,这些信息可以用来优化查询,提高执行程序代码的速度。
数据库管理员制胜之宝(图三)
图3 选择模板和分析项目

SQL语句是否执行和SQL Prepare/Execute模型。运用这些模型常常可以加快重复执行SQL代码的速度。
执行计划重复使用的频率。
存储过程再编译的频率。
存储程序和SQL语句的调用情况。包括持续时间、读和写的次数,等等。
SQL Server超时次数。

一个Coefficient实例
假设要分析数据库中的存储过程的性能,并确定存储过程的最大共同点。当调整一个数据库时,最好的解决办法之一就是确定哪个存储过程运行最频繁,而后对这些 存储过程进行优化。即使只使一个存储过程减少了三分之一的运行时间,如果一天需要运行10,000次这个存储过程,那也是一个可喜的成绩。
首先建立一个Trace,选择SPSQL模板进行分析。SPSQL模板可以提供很多数据,但是我们仅仅对哪个存储过程运行最频繁感兴趣,所以只需选择Stored Procedure Call Frequency report(SPSQL模板的一个部分)。
依照上述步骤进行即可生成监控报告,报告里既有表格也有图例,在表格里以存储过程运行的频率大小排序,图例就显示在表的上面(如图3)。
数据库管理员制胜之宝(图四)
图4 分析报告

从图中,很容易就能得知哪个存储过程运行频繁。在这个例子中,存储过程名字是pcGet_SJZH_NJ,在Trace周期内被调用682次。从表格中还能得到更多关于存储过程的信息,例如平均持续时间(毫秒)、平均数量、读写次数和CPU平均占用率等等。
Coefficient的另一个特色是可以提供更多的详细信息。例如,假定我们想要了解pcGet_SJZH_NJ过程的详细信息,就可以点击名字打开另 一个页面查看更多具体数据,在接下来的页面上能看到常规统计信息、持续时间统计、读取统计、记录统计、CPU统计、存储过程引用和存储过程的源代码(如图 4)。另外,还可以点击“Execution plan”查看存储过程的执行计划。需要强调的是,提供的信息很详细,不仅可以查看到发送给存储过程的数 据,也可以看到存储过程的参数,以及执行计划(Execution plan)。

嵌入报告中的帮助信息
Coefficient产生的数据比较多,用户可能会担心弄不清这些数据到底表示什么,怎样才能利用它们进行数据库性能调整。其实担心大可不必,在每篇分 析报告的下面都有大篇幅的关于数据含义的总体说明。这些说明并不涉及产生的具体数据,但有很多关于数据库性能优化的基础知识。从某种意义上说, Coefficient本身就是一本SQL Server数据库优化手册。

Coefficient的适用人群
Coefficient对SQL Server程序员和数据库管理员都很有用。程序员可以把它用在基于SQL Server的应用软件的开发过程和测试过 程中,而数据库管理员会觉得有益于调整已开发出来的应用软件,因为数据库随时间不断变化,数据库管理员们能使用Coefficient即时查看数据库内部 的运行情况。最常用的是用Coefficient定义数据库各个性能方面的基准,然后比较不同周期的分析结果,借以了解数据库的运行情况。

6/13/2007

SqlCredit – Developing a Complete SQL Server OLTP Database Project Part V


I am writing this month from Redmond, where I am attending the SQL Server Katmai “System Design Review”. Considering that we’re fairly close to seeing the first publicly-available CTP for Katmai, I have decided to end my struggle with supporting both SQL Server 2000 and SQL Server 2005 in SqlCredit. As of this month, this project supports only SQL Server 2005. Those of you that have already made that plunge with your products are probably thinking “finally”. Those of you that are still required to support SQL Server 2000 in your jobs are probably wishing it were that easy for you to drop support.

As I built out the last pieces of the original table set (Card, Vendor, and Purchase), I found a number of pieces that needed to be fixed in the design. At this early stage in development, allow yourself to rework the design, as you need to, and don’t allow your application developers to lock you into the original just because you gave them the code. They should be dropping and recreating their development database as you publish changes. At some point, we will consider this database “published”. From that point on, changes will have to support upgrades. Until then, nothing is “written in stone”.

New Objects: Card, Vendor, Purchase

I added these new tables and their associated stored procedures. There is also a new script (BuildSqlCredit.bat) that is a simple script to build the SqlCredit database. It currently includes a call to drop the existing database.

These tables and procedures are straightforward. The new schema looks like this.

Not all of the unit tests are included yet. I will add those soon.

Running Changes

PrimaryCardholderID

Based on a previous discussion, I added PrimaryCardholderID to the Account table. The Account record needs to be associated with one cardholder record that is the “primary” cardholder.

This required some creative coding; it borders on “hack”, but it works for now. To insert a new Account takes these steps:

  • BEGIN TRAN
  • Create the Account record with a PrimaryCardholderID of 0 (zero) as a placeholder.
  • Create the Cardholder record for the “primary” cardholder.
  • UPDATE the Account record with the ID of the just-created Cardholder record.
  • COMMIT TRAN

I also had to leave out the foreign key on Account.PrimaryCardholderID. That would have created even more problems. Yuck.

I am not happy with this solution; it will definitely have to be refactored in the future. This will work for now, but I want to see a much more elegant solution. As I have stated before, this series is a process. You don’t see just the final, pristine code.

Small Schema Changes

1. Changed Status column name to StatusCd.

This column name is used in every table and represents a simple name/value mapping (e.g. Account.StatusCd 1 means ‘ACTIVE’). The views do the translation to strings. Because this status value includes this mapping, I have chosen to use a suffix of “Cd” for “code” to differentiate it from other columns. In the views, the string status is returned in the StatusString column. It is consistent throughout the database.

2. Removed the index on Account.StatusCd

All of the indexes are based on experience and assumptions at this point. None of the other tables have indexes on StatusCd, so I dropped the index here. We can add indexes later if we find they are needed.

3. Added “OBJECT::” to DENY statements.

If you’re like me, every time I look something up in SQL 2005 Books Online, I find something new. I was reading through the entry for DENY and saw that they used this “OBJECT::” prefix. It specifies “the class of the securable on which the permission is being denied. The scope qualifier ‘::’ is required.”

4. Added DENY statements to views.

I had DENY statements for the tables but no DENY statements for the views on those tables.

5. Added FILLFACTOR to nonclustered indexes

Each of the clustered primary keys uses an IDENTITY column, so I can leave those fillfactors at the default (100%). Each of the non-clustered indexes will be populated fairly randomly, so I set those to 80%.

6. Renamed filegroups and files for consistency.

I did not like it that I had filegroup names and file names that included “Data”. In general, the naming was inconsistent.

Example Filegroup Changes

Filegroup Name

Old

SqlCredit_FGData1

New

SqlCredit_FG1

Logical File Name

Old

SqlCredit_Primary_Data1

New

SqlCredit_Primary_File1

Old

SqlCredit_FGData1_File1

New

SqlCredit_FG1_File1

Old

SqlCredit_Log1

New

SqlCredit_Log_File1

OS File Name

Old

SqlCredit_Primary_Data1.mdf

New

SqlCredit_Primary_File1.mdf

Old

SqlCredit_FGData1_File1.mdf

New

SqlCredit_FG1_File1.mdf

Old

SqlCredit_Log1.ldf

New

SqlCredit_Log_File1.ldf

I also did not like it that the four non-primary filegroups were each created with two files and the primary filegroup had just one. I added a second filegroup for primary. Now each filegroup and the log have two associated files.

7. Redistributed tables and indexes on filegroups.

The table and index distribution over the four non-primary filegroups was not consistent, so I redistributed everything. This exercise is as much art as science. The basic guidelines are that you don’t want a table and an index on the same filegroup so that the system can do simultaneous writes when a record is added (or updated or deleted). If there are two tables that are often queried together (JOIN), separate those tables onto different filegroups, again so that the system can do simultaneous reads.

Objects and Filegroups:

Object

Table

Index

Filegroup

Account

X

SqlCredit_FG1

Cardholder

X

SqlCredit_FG2

Cardholder.AccountID

X

SqlCredit_FG3

Cardholder.LastName

X

SqlCredit_FG4

Card

X

SqlCredit_FG3

Card.CardholderID

X

SqlCredit_FG1

Card.CardNumber

X

SqlCredit_FG2

Vendor

X

SqlCredit_FG4

Vendor.BusinessName

X

SqlCredit_FG2

Purchase

X

SqlCredit_FG1

Purchase.CardID

X

SqlCredit_FG4

Purchase.VendorID

X

SqlCredit_FG3

Filegroups and Objects:

Filegroup

Table

Index

SqlCredit_FG1

Account

Card.CardholderID

Purchase

SqlCredit_FG2

Cardholder

Card.CardNumber

Vendor.BusinessName

SqlCredit_FG3

Card

Cardholder.AccountID

Purchase.VendorID

SqlCredit_FG4

Vendor

Cardholder.LastName

Purchase.CardID

If you look at the distribution of your tables and indexes, they should make sense. For instance, I originally had three indexes on FG3 and one on FG4, so I moved Purchase.CardID to FG4. Some of this may need to be reworked after doing performance testing, but this gives you a good start.

Some of these changes seem fairly minor. What is the point? Consistency. I should have caught some of these as I was writing the initial code. Part of the value I receive from writing this series is ending up with code that I can use as an example in future projects.

Create Boring Schema Code

One thing I notice as I add new tables and procedures to the project: They all look the same. If you looked at Cardholder before and look at Card in the new code, they look painfully similar.

I think of that as a feature, not a lack of creativity. Implement your designs based on best-practices and standards, not on a whim. If someone reads your table-creation schema code, they should be bored. They should not be scratching their head wondering why you chose to name things differently from one table to the next or included DateCreated in one table and not another.


SqlCredit – Developing a Complete SQL Server OLTP Database Project Part IV


Database Security: It’s Not Just For Administrators

Too many applications are built assuming that security needs to be handled at the web tier or middle tier only, ignoring details like object permissions in the database. The other myth is that database security is just a matter of administration.

I am proposing a set of design security standards here that I hope you will scrutinize closely. If you believe they could be better, speak up!

Security Logins, GRANTs

For a web application like SqlCredit, users will not reach the database with their Windows Authentication credentials. The user will log in at the web tier, and their application credentials will accompany their session. When the application accesses the database, it will use Windows Authentication (a.k.a. integrated security). The login used by the application to access the database should have limited privileges. There are many good articles that detail those specific privileges, so I won’t go into that detail here. For in-depth coverage, see Marcin Policht’s two series on SQL Server security, covering both SQL Server 2000 and 2005.

It is a best practice to remove the public role from production servers, so there are no specific denies regarding the public role.

Grant Privileges to Stored Procedures and Functions, Not Tables

The security we are setting up here is for the primary users of the application: the on-line customers. We are not handling reporting capabilities or back-office work (at least for now).

Because the application will be accessing the database through our published APIs (stored procedures and functions), those are the only objects that need to have GRANT statements. There is no need for direct access to the tables by the OLTP application login.

I know that privileges are not granted automatically, but I still like to do explicit DENYs to tables and views. I have gotten so much negative feedback from our implementation teams that I have finally been convinced not to DENY SELECT on tables and views. In a perfect world, I would do that, but I won’t for this code. If you control both the design and the implementation, then you may decide to lock down even SELECT.

An example of a DENY statement is here:

    DENY DELETE, INSERT, REFERENCES, UPDATE ON dbo.Account TO SqlCreditAppSqlRole;

An example of a GRANT statement is here:

    GRANT EXEC ON dbo.CreateAccount TO SqlCreditAppSqlRole;

You probably already understand that I can create an INSERT procedure that inserts records into the Account table, GRANT EXEC on that stored procedure, and the INSERTs will work through the stored procedure but will not work directly (ad-hoc).

An important consideration here is to balance the number of distinct groups/roles you create against the complexity of managing those credentials. For an application like SqlCredit, I would recommend setting up distinct groups/roles for the following functions:

  • OLTP access
  • Reporting
  • Purging

The user that can update a single card should not be the same user that can purge a month’s worth of old purchase data.

Feedback: What do you think about this design? I have created a new thread in the forum to discuss security, and I would appreciate any feedback. Certainly, there is more to a complete implementation’s security plan than this. Besides encrypting certain personal data (which I will add in a later installment), what do you think is missing from the application design?

Remember that at this point we are still supporting both SQL Server 2000 and 2005, so we cannot yet use any 2005-specific security features. However, please do comment on changes you would expect once we move this series to 2005-only security.

Windows Authentication Group and User

Before running the scripts, the Windows Authentication group needs to be set up. Since I am doing this development and writing on a laptop that is not connected to anything, I will create the group and user as local accounts. Here are the steps to set up the Windows Auth group and user:

  • Set up SqlCreditAppWinAuthGp
  • Set up SqlCreditWinAuthUser
  • Add user SqlCreditWinAuthUser to group SqlCreditAppWinAuthGp

All the rest of the set-up is done by the scripts.

Testing Object Access Using TSqlTest

Setting up object security is not an easy task, but to be confident that it works correctly, it must be tested. (Of course, to be continually confident requires continuous integration and automated testing.)

TSqlTest allows us to create a separate script to be run as the SqlCreditWinAuthUser Windows login to verify that

  • This login cannot directly affect any tables (INSERT, UPDATE, DELETE).
  • This login can execute the appropriate stored procedures.

The attached test script (UnitTests\TestSecurityForAppUser.sql) is called by a batch file (Tests_LoginAsSqlCreditWinAuthUser.bat) while logged in as the application user (SqlCreditWinAuthUser).

When the script tries to INSERT directly into the Account table, error 229 is raised.

Msg 229, Level 14, State 5, Line 89
INSERT permission denied on object 'Account', database 'SqlCredit', schema 'dbo'.

Because this is the exact error expected, the test passes. Any other error number will cause the test to fail. The script then tests that the application user can execute the appropriate stored procedures.

There are now two different batch files that must be run as two different users to complete the full test suite. This requires a bit more scripting but provides confidence that the security is working properly.

For Next Time

Next time, we will fill out much more of the application: cards, vendors, and purchases.

References and Further Reading

SqlCredit – Developing a Complete SQL Server OLTP Database Project Part III


Running Changes

As stated in part 1, “This series will be about the process that results in a well-designed, well-coded, and well-tested database.” Part of that process is updating the design and code based on feedback from other members of the team. In this case, the feedback generally comes from the forum.

Stored Procedure Naming Convention

I got strong feedback in the stored procedures thread that naming is better than , so I changed the naming convention.

Account and Cardholder Tables

Based on forum feedback on CardType and AccountType, I will add type (Gold, Platinum, etc.) to Account instead of Card. I originally planned to call this CardType, but that no longer seems appropriate. I will have to call it something other than “type” because that is not descriptive enough. If you have naming suggestions, pass them on. I know this is a fairly minor decision, but it is something typically discussed with other developers in a group.

The Gold/Platinum change caused me to challenge the placement of the “Partner” (Alaska Airlines, Lexus, etc.). That also looks like it needs to be part of the account and not the individual card.

If a person has more than one type or partner (a Gold card from Lexus and a Platinum card from Alaska Airlines), then those cards will be part of separate Accounts. That person’s personal data will be duplicated.

These changes will have to wait until the initial version of the tables is complete and tested.

Diagram

I had requests for an E/R diagram. This would definitely be helpful. In a normal project, I make a point of creating diagrams and passing them out to the developers. My goal is to see people using the E/R diagrams as a reference throughout the project. Here is a logical diagram of the currently-implemented schema:

DateTime and UTC

We had an interesting discussion about datetime columns and UTC. The datetime columns will remain UTC as they were originally.

Also, I found that using smalldatetime for DateCreated and DateLastUpdated made unit testing difficult. The precision is just not good enough for unit testing or, most likely, for production. I changed the columns to a full datetime.

NULLable Columns

I had set a number of columns to be nullable (Account.Address2, Cardholder.FirstName, etc.). I really don’t want to have to write “IF IS NULL OR = ''”code for these simple string values, so I changed those to be NOT NULL. I will leave it to the calling code to check for empty strings in places where a value is required.

Phone Columns

Somehow, I had named the three phone numbers HomePhone, MobilePhone, and WorkNumber. I fixed it by changing WorkNumber to WorkPhone.

Database Options

There was not as much forum discussion of database options as I had hoped, but I have done more reading based on prompting from Scott Hanselman and other regional directors. (See the References and Further Reading section at the end.) I made two updates to CreateDatabase.sql based on this feedback. Specifically, I added SET statements for ANSI_PADDING and NUMERIC_ROUNDABORT.

I am not at all satisfied with this one. I will continue to do my own research, and I have a few people helping me. I am hoping that I will get more feedback from readers in the forum.

CRUD Procedures for Account and Cardholder

The Account and Cardholder tables now have a complete set of CRUD stored procedures. The procedures include the updates to the tables. These are available in the attached code. Other Read stored procedures may be added later, but the base procedure is there now.

Automated Unit Testing Using TSqlTest

I wrote and published TSqlTest out of my own need to have an automated set of unit tests for stored procedures. It is written specifically for Microsoft SQL Server. There are options, most notably TSQLUnit and the recently-published Visual Studio Team Edition for Database Professionals.

What is TSqlTest and How is it Different?

TSqlTest is a simple, lightweight database test framework. It utilizes .sql and .bat scripts only. No perl, no VB, no ruby; just T-SQL code wrapped in a bit of batch file. It is designed to be used for automated testing and incorporated into a continuous integration environment.

It supports testing of stored procedures, functions, triggers, and database “state”. It can test pretty much anything written in T-SQL. I suspect it will be used for parts I haven’t even thought of yet.

Here is a diagram of the TSqlTest tables:

and an example of a test command and the output from the test run in the attached code:

>.\Tests .\RGINST1 full
===============================================
Results from tests on database SqlCredit:
Total Tests: 23
Failed Tests: 0
Open Defects: 0
Failed Files: 0
===============================================

Features

  • Automated reporting (human-readable and XML/Nant output)
  • Support for known defects
  • Report results even if a script fails
  • Scripts include before/after counters for tables (verify that the tests did not leave rows behind)

Testing Database “State”

TSqlTest allows you to write simple tests of database “state” to verify a certain level of “goodness” in your database before firing off the rest of the tests. Examples include testing for tables, indexes, stored procedures, functions, defaults, and data.

By convention, I always create a test scripts called _InitialDatabaseStateTests.sql for each database. The attached script tests for

  • the existence of the TSqlTest framework
  • the existence of the Account table
  • the existence of the CreateAccount stored procedure
  • the default for the Account.DateCreated column (making sure it is set to GETUTCDATE())

This script is great for adding simple tests as a database evolves over time. If you add a column to an existing table, add a test here that validates the existence of the column. If you change a default, add a test that validates the new default.

SqlCredit – Developing a Complete SQL Server OLTP Database Project Part II


Priorities

Our product owner has chosen to implement the accounts and users end-to-end for sprint 1. The following database-related tasks have been identified and estimated (and I have volunteered to take them):

Task

Estimate

Create database

3

Create Account table and CRUD procedures

2

Build automated unit tests for Account procedures

2

Create Cardholder table and CRUD procedures

2

Build automated unit tests for Cardholder procedures

2

Create Database

In the CREATE DATABASE script, we create four filegroups plus the primary filegroup. Each filegroup, including the log, is split physically between two files.

Filegroups

This database is created from the very beginning with multiple filegroups and multiple files per filegroup (including primary).

Spreading the tables and indexes among multiple filegroups and physical files is a well-known performance enhancement. If this database were deployed to a real production environment, the database administrator would have the ability to place the filegroups and files on separate physical drives or separate LUNs.

Choosing the right number of filegroups and files is a balancing act. Choosing too many filegroups or too many files per filegroup makes administration and installation needlessly complex. Choosing too few will not give the database administrator options if they find performance issues in production.

In this case, I have chosen four filegroups outside of primary. No tables or indexes will be created on the primary filegroup.

Some people create specific filegroups for data versus indexes. I like to mix data and indexes on all [non-primary] filegroups. This is much more an art than a science, so it is unlikely you will find hard-and-fast rules.

Schema

For this sprint, there are only two tables, so these will be on two different filegroups, and the indexes will be spread across the remaining two filegroups.

  • Account:
  • Cardholder:
  • Index on Account.Status:
  • Index on Cardholder.AccountID:
  • Index on Cardholder.LastName:
SqlCredit_FGData1
SqlCredit_FGData2
SqlCredit_FGData3
SqlCredit_FGData3
SqlCredit_FGData4

Indexes, Clustering

Initially, there are indexes on specific columns that we know will be queried. More indexes may be added later.

Each table has a clustered primary key. Both of these tables have surrogate keys, and both use integer identity columns. When you use identity columns, consider carefully what size the datatype should be. I have used everything from tinyint to bigint. In certain applications, int is just not big enough. You do not want your application to fail years later because the identity has hit its limit. Moreover, the development, testing, and migration required to change the datatype of a primary key is far too great to risk.

Database Options

There are a number of database options that are OFF by default but should be ON based on recommendations from the SQL Server 2000 Best Practices Analyzer. The CreateDatabase script sets all these options to ON:

  • ANSI_NULLS
  • ANSI_WARNINGS
  • ARITHABORT
  • CONCAT_NULL_YIELDS_NULL
  • QUOTED_IDENTIFIER

I am very interested to find what people are using in production environments (either 2000 or 2005). Please provide feedback through the forum (or write me directly): Are you setting these options to ON? If not, why not? I have created a forum thread to discuss database options, so watch for updates there. What other options do you change, and why?

“Status” Columns

The Account and Cardholder tables both include a Status column. The simplest solution here would be to store a string (e.g. ‘Active’, ‘Inactive’, ‘Deleted’). To normalize these values, you might create an AccountStatus table and store a small value in Account that identifies the row in the AccountStatus table (e.g. 1, 2, or ‘A’, ‘I’, ‘D’).

However, for a column like Status where the set of values is well-defined, consider using an enumeration. This requires defining status strings and associated status numbers before the application is deployed. The application code will have to include an enumeration that matches that in the database. To make this Status column human-readable without requiring people to memorize the mapping, use a view that translates the status identifiers into their associated strings.

Application programmers who work all day in C#, C++, or Java are very comfortable with enumerations. The one requirement is to keep the application code enumeration exactly matching the database enumeration. If the set of possible values is not stable, then don’t use this method.

DateTime Columns and UTC

Another design piece that I would like feedback on is datetime columns. Are you storing datetime in local time or UTC? I have created a forum thread to discuss local versus UTC. What do you see as benefits and drawbacks? If you use UTC, do you “tag” the columns somehow so that it is obvious that the time is UTC and not local?

At least initially, I will be using UTC dates. I may reconsider this choice based on the feedback in the forum.

An Open Question: Storing the Primary Cardholder

One requirement of the system is the ability to identify the primary cardholder. This is the person whose name, address, and phone number will be associated with the account. There may be multiple Cardholder records associated with a single account, and we need to be able to choose one record as the primary. The question is this, how should this be modeled?

Option 1: Add a PrimaryCardholder bit column to Cardholder.

thumbs up Minimal storage requirements.
thumbs down More than one Cardholder record could be marked as primary.
thumbs down Updating the primary cardholder for an account requires transactionally updating two Cardholder records (turning one off and one on).

Option 2: Add a PrimaryCardholderID column to Account.

thumbs up Guarantees that exactly one Cardholder is marked as primary for each account.
thumbs down Requires transactionally adding an Account record, adding a Cardholder record, and updating the Account record.

Option 3: Add an associative table (call it PrimaryCardholder) that stores AccountID and CardholderID.

thumbs up Allows creation of the three records (Account, Cardholder, and PrimaryCardholder) independently.
thumbs up A unique index on AccountID would guarantee that only one record exists per account.
thumbs down How can you guarantee that one PrimaryCardholder record does exist per Account record?
thumbs down Slightly more-complex schema.
thumbs down Requires transactionally adding an Account record, a Cardholder record, and a PrimaryCardholder record.

Please provide your feedback and recommendations through the forum. I will leave this requirement out of the code for now.

Stored Procedures

I have only included the “create” procedures for now. These procedures are pretty straightforward. Because we are supporting both SQL Server 2000 and 2005, there are no TRY/CATCH blocks. Later in the column, we will switch to 2005-only and will tackle the issue of TRY/CATCH then.

Hopefully, the @TestMode parameter will seem obvious. If not, we will go into details regarding automated unit testing next month.

What do you like or dislike about the stored procedures? I would appreciate hearing your feedback either through the forum or directly. I have written literally thousands of stored procedures but have never had the privilege of such a wide-spread code review.

For Next Time

Next month, we will polish up what is here plus add ‘R’, ‘U’, and ‘D’ procedures and automated stored procedure unit testing. As of this writing, I have not written the automated unit tests for these procedures, so it’s quite possible there are defects in them. The code is available for download, but it’s not ready to ship until the unit tests are written, passing, and integrated into the automation scripts. That’s part of what we will cover next month.

Another critical part that is not built into these scripts is security. That will wait for later, but we can’t put it off for long. Leaving security until you have time for it is a recipe for disaster.

SqlCredit – Developing a Complete SQL Server OLTP Database Project Part I


January 29, 2007

Part 1: Introduction, Business Problem, and Initial Design

Series Introduction

The purpose of the SqlCredit series is to demonstrate the database design and development portion of a development project. We will build a complete, functioning, tested credit card database to illustrate the complete software development lifecycle. Microsoft provides us with AdventureWorks, but all we get to see is the final product. This series will be about the process that results in a well-designed, well-coded, and well-tested database.

Part of the plan is to do an initial design that actually has some built-in “refactoring opportunities” (bits that we will fix later in the series). But overall, the design should be solid, scalable, well-coded, and well-tested. When the series is complete, the code-base will allow a reader to create a complete database with tables, stored procedures, triggers, functions, automated unit tests, and sample data by running a simple script. This is not an academic exercise; it is a pragmatic design based on goals of scalability, performance, and quality.

We? Who’s “We”?

This project will use agile methodologies (1, 2) and scrum, so the series must be interactive. At different times you, the reader, will be asked to play the role of development team member (code reviews and sprint reviews), QA (testing and writing unit tests), and product owner (prioritizing backlog items). You may not like all the decisions that are made, but I doubt you have ever been on a project where you liked all the decisions … even if you were the only developer.

Database Journal has set up a forum specifically for this series here. It is open now for you to provide feedback, suggest changes, weigh in on project priorities, and eventually, to provide automated unit tests. One thing I am really looking forward to is having people say, “I have faced that same issue and solved it this way ….”

I don’t claim to know everything there is to know about OLTP design and coding. I have done quite a bit of it. As part of my work at Corillian, I have helped to develop databases that are in use at some of the largest banks and credit unions in the country. However, part of my motivation for writing this series is to learn from the readers so that I can hone my design and coding skills.

The Business Problem: SqlCredit

The problem we will tackle is a credit card database for a start-up credit card company called SqlCredit. This database will need to store information about accounts, people, cards, vendors, and transactions (purchases).

Partly to make things interesting, and partly because this is always a hard business problem, we will state that accounts have one to n persons and also that accounts have zero to n cards. Therefore, a husband and wife can each have their own card numbers but the same “account.” They may also have two physical cards that share the same number. The database design will not dictate how SqlCredit chooses to issue cards. (Cards will have three-digit security codes, so the natural key for a card will be the combination of card number and security code.)

Project Kick-Off

We have spent time working with our product owner and researching on our own, so we have a good idea of what we need to build. Let’s get started with some initial database schema design and overall design decisions.

Objects to Be Modeled:

  • Account
  • Person (cardholder)
  • Credit Card
  • Card Type (e.g. “Gold”, “Platinum”)
  • Card Partner (e.g. “Alaska Airlines”, “Lexus”)
  • Vendor (the merchant where the purchase was made)
  • Purchase (individual credit card transaction)

Design Decisions/Requirements:

  • An account must have at least one cardholder
  • An account can have multiple cardholders
  • Each account has a “primary” cardholder (the person whose name shows on the bill and who is the first contact point for any issues)
  • Each cardholder is issued a physical card with a unique combination of card number and security code
  • Address is an attribute of an account
  • Phone number is an attribute of a person, not an account
  • The vendor table is much simpler than would be required for a real application, but this piece was simplified for the sake of brevity
  • Each card must have exactly one card type
  • Cards may or may not have a card partner (0 or 1)
  • No need to support UNICODE characters
  • US dollar sales only
  • US addresses only
  • Delay implementation of history records
  • History will not be required for transactions or vendors
  • Built on SQL Server 2005 but not using 2005-specific features until later in the series

Coding Standards, Error Reporting, Unit Tests

Error Reporting

I wrote an extensive article covering stored procedure error checking and reporting. The stored procedures in this series will be based on that pattern and will include full error checking and reporting (not always “level 3” but at least “level 2”).

Triggers and History Records

I have always steered away from writing triggers in my databases. I don’t like the idea of code being fired that is not obvious from the code being called directly. However, at the PASS 2006 conference, I attended a class given by Joy Mundy. She is a business intelligence expert and spoke about feeding data from an OLTP system into an OLAP system.

As a BI designer, she puts more trust in a transactional system if the updates are tracked using triggers versus stored procedures. There are too many opportunities for inserts and updates executed outside of the standard stored procedures. Yes, triggers can be disabled also, but it is much less likely that someone would explicitly disable a trigger than it is that they would update a record using an ad-hoc query or some bit of old code that does not call the standard stored procedure. According to Joy, “Education of DBAs + a trigger works much better for me than a stored procedure.”

Because it is likely that whatever transactional system you build will eventually feed a business intelligence system, I recommend taking Joy’s advice and building these triggers for tracking history.

Automated Testing

TSqlTest is a SQL Server unit test framework that I developed as part of my work at Corillian. It is available at TSqlTest.org. Automated unit tests for the project will utilize TSqlTest. The project will not be “test-driven,” but automated tests will be written early in the project.

Development Standards

The code will be developed using the development standards here. Yes, I wrote those too.