Really quick tip:
A client, after an ASE 15 upgrade (ESD #2), was experiencing significant increase in Rep Server stable queue length.
The latency vanished after enabling literal autoparameterization.
Tuesday, February 24, 2009
Thursday, February 19, 2009
Finding problem queries in ASE 15
Shoot me an email to get a pdf of this article...
Finding slow running queries in ASE 15
Introduction
Fixing a performance problem isn’t usually difficult, once you’ve found it.
According to studies, 80% of the mean time to resolution in performance is the identification of the root cause. When you consider that other studies have shown that 24% of IT staff time is devoted to performance issues, the math becomes very simple: eliminate 80% of the 24% and you get back 20% of your staff time.
There are several different places you have to look when you’re evaluating a performance problem. First, you have to verify that the problem is within the database / database server. In today’s multi-tier environments, this can be nontrivial. There is at least one good tool out on the market that will identify which tier is actually responsible for the most elapsed time, but for purposes of this article, we’ll assume you’re positively identified issues in the data tier. (Drop me an email for my personal take on application performance monitoring tools!)
If you are fortunate enough to know which queries are problems, most of the time you’ll use “set showplan” output to analyze the query, find out which index it is or isn’t using, and act accordingly. Analyzing showplan output is beyond scope of this article (it would make a good series, though), but as with each new ASE release, ASE 15 showplan output has gotten yet easier to read (and if you like, you can even get it on xml format for later storage and parsing).
In environments though of even light-to-moderate complexity, sometimes all you know is that users are complaining of general slowness. In the past, it has been nontrivial to identify specifically what was slowing the users (or the system) down. Is it a single specific query? Is there one query which is blocking a class of users? A query which is running in 1.2 seconds, which perhaps doesn’t register high enough on the pain scale to rate a complaint call, but which is running thousands of times per minute (or second!) and is using 43% of resources? If it’s specific user(s), what are they doing?
If you are responsible for tuning, you’ll first try to isolate the target query. Prior to doing this, though, you may want to start at a high level, look at sp_sysmon output to make sure it’s not a system issue, and then try to track down overall problem queries.
With the advent of mid-12.5.x, and enhance in ASE 15, Sybase introduced the MDA tables, which after a relatively simple install, enables the DBA to identify what queries are running, what they’re doing, what is holding them up / speeding them up, This has been an amazing advancement over prior tuning work.
In addition, for ASE 15, Sybase has introduced the ability to collect query metrics, which enables you to track queries and query performance for your session, and if you have the sa_role, for another login’s session.
Identify the problem – solve the problem – verify the solution. This will make friends fast.
Identifying problem queries
The first step in solving a problem is identifying the root cause. Sometimes your mission is vague: Everything seems slow today. You might start looking at sp_sysmon output (WAY beyond scope, but check out the Administration Guide or pick up “Administering ASE 15” from Wordware publishing for detail). The sp_sysmon output will validate the environment at the macro scale (i.e. no server-wide bottlenecks at CPU, IO, Network, Memory / Cache, etc.). The next step is to find the problem queries.
At that point, you may have specific information: “The order entry application is slow.” If you are very lucky, you’ve got only a few screens to look at, a few developers to interview regarding what SQL or which stored procedures are associated with the screens, and specific sets of queries to look at.
On other occasions, you hear, “Most of what I’m working on is slow,” and if you inquire further, you hear, “I think I was working in…” and wonder if you’re going to find the root of the problem.
ASE 15 now offers solutions to each problem: When you don’t know where the problem is, you can use the MDA tables to find out what’s taking up the time on your server. When you know which login is causing the problem, you can use query metrics to find out what the login is doing. And, once you’ve identified the problem queries, you can use the updated showplan to identify what the server is using as a query plan.
MDA tables
The Monitoring and Data Access (MDA) tables were added in ASE 12.5.0.3 to help identify what system resources are being used for. They are virtual tables in the master database. There was an install process which required, as well as a bit of configuration required before they would be usable.
Monitoring has turned out to be so important that with ASE 15 the MDA tables are installed as part of the buildmaster script (i.e. at installation time).
The MDA tables enable you to identify what queries are running, what resources they are taking up, and what the queries are waiting for (for example, network, io, or locks to be released).
The complete list of MDA tables in 15.0.2 follows {tech writer please format these so as to not take up too much of this document}:
monCachePool
monIOQueue
monProcedureCacheModuleUsage
monCachedObject
monLicense
monProcess
monCachedProcedures
monLocks
monProcessActivity
monCachedStatement
monNetworkIO
monProcessLookup
monDataCache
monOpenDatabases
monProcessNetIO
monDeadLock
monOpenObjectActivity
monProcessObject
monDeviceIO
monOpenPartitionActivity
monProcessProcedures
monEngine
monProcedureCache
monProcessSQLText
monErrorLog
monProcedureCacheMemoryUsage
monProcessStatement
monProcessWaits
monTableColumns
monProcessWorkerThread
monTableParameters
monState
monTables
monStatementCache
monWaitClassInfo
monSysPlanText
monWaitEventInfo
monSysSQLText
monSysStatement
monSysWaits
monSysWorkerThread
There are other, more exhaustive, dissertations available on the Sybase web site and across the internet on using these, but here are a few quick and easy things you can do with them. If you have trouble accessing the MDA tables, ask your DBA too add the “mon_role” to your login.
MDA tables at the application level
Two tables in particular will help identify the SQL running in the application: monProcessSQLText and monSysSQLText.
monProcessSQLText
shows currently executing SQL
monSysSQLText
shows recent, completed SQL
monSysStatement
More information on completed SQL statements
Let’s say access to a particular table is slow; what SQL accesses the table?
select * into #sqltext from monSysSQLText
select * from #sqltext where SQLText like ‘%pt_sample%’
/* Note: after you’ve selected the data once, ASE assumes that you don’t want to see it again. A second select from the same table will get you successive data. If you want to access it again later, you may want to select into a temp table… and, this is good information to keep historically*/
{tech writer, please format the following for readability, replacing commas with tabs, thank you!}
SPID,KPID,ServerUserID,BatchID,SequenceInBatch,SQLText
26,1900573,1,27,1,'select * from pt_sample s, pt_tx t where t.id = s.id'
And, now how about some statistics on the statement?
select * from monSysStatement where KPID = 1900573
SPID,KPID,DBID,ProcedureID,PlanID,BatchID,ContextID,LineNumber,CpuTime,WaitTime,MemUsageKB,PhysicalReads,LogicalReads,PagesModified,PacketsSent,PacketsReceived,NetworkPacketSize,PlansAltered,RowsAffected,ErrorStatus,HashKey,SsqlId,StartTime,EndTime
25, 1900573,1,1280004560,30,12,2,0,0,0,684,0,0,0,0,0,2048,0,0,0,0,0,2008-11-15 10:03:14.793,2008-11-15 10:03:14.793
You can see:
· Number of logical I/Os
· Number of physical I/Os
· Number of network packets sent/received
· Number of milliseconds of ‘waiting time’ during statement execution
· Start time & end time of execution
There’s a wealth of other information you can glean from the MDA tables including unused indexes, table utilization, physical io, table activity, heavily utilized tables, and wait states (indicating what applications are waiting for). This is worth taking the time to understand.
Tip: There is a configuration parameter, ‘max SQL text monitored’, which is STATIC, which limits the size of the SQL statement that you can see (that is, it truncates past this limit). You should try 16384 as a value. This increases the amount of memory used by the server, but should give you the ability to look at your entire query most of the time.
Query Metrics
Query metrics allow you to measure performance for your session using the sp_metrics stored procedure. This allows you to track:
· Cpu execution time, elapsed time
· Logical I/O (found in cache, cached by async prefetch)
· Physical I/O (regular I/O, async prefetch)
· Count - # of times query executed
· Tracks min, max and avg (less count)
This is likely to replace the statistics io and statistics time session settings.
To capture metrics,
1) You need to enable the metrics capture at the server level:
sp_configure "enable metrics capture", 1 /* note that this defaults to zero */
2) you then need to enable metrics capture for your session
set metrics capture on
3) Then you’ll flush the metrics from the aggregation to the system view, sysquerymetrics using the sp_metrics stored procedure
sp_metrics ‘flush’
4) And finally, you can select from the sysquerymetrics table
select * from sysquerymetrics
Using sp_metrics
The sp_metrics stored procedure is used to collect and retain groups of captured session metric data.
sp_metrics [flush backup drop help], @arg1 [, @arg2]
Query metrics are captured in the default running group (the running group with a group id (gid) of 1) in the sysquerymetrics view. Note that the sysquerymetrics view contains a self join & decoding of the sysqueryplans table, which contains abstract query plan information (among other things).
sp_metrics has built-in help:
sp_metrics ‘help’ /* in case you forget syntax or the stored procedure arguments.*/
(Output: )
Command list:
help - get help on a command
flush - flush all metrics from memory to system tables
drop - drop a metric ID or a group of metrics
backup - move the metrics in the default group to a backup group
filter - delete metrics records based on filtering condition
show - display the number of records in each metrics group
You can move metrics to a different group using the “backup” parameter.
sp_metrics 'flush'
Flushes all metrics from memory to system tables. If you skip this step, the information you need may not yet be in the view
sp_metrics 'drop', '2', '5'
Drops the groups of saved metrics. @arg1 is an integer in char form representing the first metrics group to drop in the range (must be a valid group), @arg2 is an integer in char form representing the last metrics group to drop in the range
sp_metrics 'backup', '6'
Saves metrics for future perusal. This example moves the metrics in the default group (i.e. what is currently running) to group #6. @arg1 is an integer in char form representing group number. It must be higher than 1 (1 represents the current group)
Application tracing
You can trace all of the SQL text for an existing session to a physical file using the set tracefile session option. This is great for a situation when you are exploring complaint s from a specific user and want to find out what the user is doing. Use it like this:
Set tracefile FILENAME [for spid]
And disable it like this:set tracefile off [for spid]
For example,
Set tracefile ‘/opt/Sybase/tracefiles/2008 1101.trc12’ for 12
… will write all SQL for server process ID 12 to the file at path ‘/opt/Sybase/tracefiles/2008 1101.trc12’. If you don’t specify a spid, you will record SQL for your own session.
This will store SQL as well as query plans (showplan output), statistics (statistics io output), show_sqltext (below!), set option show, and/or dbcc traceon (100) output.
You must either have the sa or sso role in order to run “set tracefile”, or you must have been granted “set tracefile” permission. Note that the tracefile will be on the server machine, not on your client.
set show_sqltext
You can print the SQL text for ad-hoc queries, stored procedures, cursors, and dynamic prepared statements using set show_sqltext. You do not need to enable the set show_sqltext before you execute the query (as you do with commands like set showplan on) to collect diagnostic information for a SQL session. You can do it when you get the “things are running slow” telephone call.
Before you enable show_sqltext, you must first use dbcc (dbcc traceon(3604)) to instruct the server to send the output to your terminal instead of to standard output (i.e. to the console of the session that started up the ASE instance).
The syntax for set show_sqltext is:
set show_sqltext {on off}
For example, this enables show_sqltext:
set show_sqltext on
Once set show_sqltext is enabled, Adaptive Server prints all SQL text to standard out for each command or system procedure you enter. Depending on the command or system procedure you run, this output can be extensive. This is a good command to run in conjunction with the set tracefile option.
Summary
Every application encounters the occasional performance problem. (Some systems encounter problems more occasionally than others).
There are a variety of tools that have been around for a while (for example, sp_sysmon) which will enable you to understand server performance from a macro level. But, when it comes to identifying problem queries, or problem logins, it can be more difficult.
With the advent of ASE 12.5, Sybase introduced the MDA tables to identify all server activity. This has been enhanced with ASE 15. In addition, ASE 15 introduced a variety of ways to identify session metrics, SQL, and query plans to help identify issues as they crop up.
Finding slow running queries in ASE 15
Introduction
Fixing a performance problem isn’t usually difficult, once you’ve found it.
According to studies, 80% of the mean time to resolution in performance is the identification of the root cause. When you consider that other studies have shown that 24% of IT staff time is devoted to performance issues, the math becomes very simple: eliminate 80% of the 24% and you get back 20% of your staff time.
There are several different places you have to look when you’re evaluating a performance problem. First, you have to verify that the problem is within the database / database server. In today’s multi-tier environments, this can be nontrivial. There is at least one good tool out on the market that will identify which tier is actually responsible for the most elapsed time, but for purposes of this article, we’ll assume you’re positively identified issues in the data tier. (Drop me an email for my personal take on application performance monitoring tools!)
If you are fortunate enough to know which queries are problems, most of the time you’ll use “set showplan” output to analyze the query, find out which index it is or isn’t using, and act accordingly. Analyzing showplan output is beyond scope of this article (it would make a good series, though), but as with each new ASE release, ASE 15 showplan output has gotten yet easier to read (and if you like, you can even get it on xml format for later storage and parsing).
In environments though of even light-to-moderate complexity, sometimes all you know is that users are complaining of general slowness. In the past, it has been nontrivial to identify specifically what was slowing the users (or the system) down. Is it a single specific query? Is there one query which is blocking a class of users? A query which is running in 1.2 seconds, which perhaps doesn’t register high enough on the pain scale to rate a complaint call, but which is running thousands of times per minute (or second!) and is using 43% of resources? If it’s specific user(s), what are they doing?
If you are responsible for tuning, you’ll first try to isolate the target query. Prior to doing this, though, you may want to start at a high level, look at sp_sysmon output to make sure it’s not a system issue, and then try to track down overall problem queries.
With the advent of mid-12.5.x, and enhance in ASE 15, Sybase introduced the MDA tables, which after a relatively simple install, enables the DBA to identify what queries are running, what they’re doing, what is holding them up / speeding them up, This has been an amazing advancement over prior tuning work.
In addition, for ASE 15, Sybase has introduced the ability to collect query metrics, which enables you to track queries and query performance for your session, and if you have the sa_role, for another login’s session.
Identify the problem – solve the problem – verify the solution. This will make friends fast.
Identifying problem queries
The first step in solving a problem is identifying the root cause. Sometimes your mission is vague: Everything seems slow today. You might start looking at sp_sysmon output (WAY beyond scope, but check out the Administration Guide or pick up “Administering ASE 15” from Wordware publishing for detail). The sp_sysmon output will validate the environment at the macro scale (i.e. no server-wide bottlenecks at CPU, IO, Network, Memory / Cache, etc.). The next step is to find the problem queries.
At that point, you may have specific information: “The order entry application is slow.” If you are very lucky, you’ve got only a few screens to look at, a few developers to interview regarding what SQL or which stored procedures are associated with the screens, and specific sets of queries to look at.
On other occasions, you hear, “Most of what I’m working on is slow,” and if you inquire further, you hear, “I think I was working in…” and wonder if you’re going to find the root of the problem.
ASE 15 now offers solutions to each problem: When you don’t know where the problem is, you can use the MDA tables to find out what’s taking up the time on your server. When you know which login is causing the problem, you can use query metrics to find out what the login is doing. And, once you’ve identified the problem queries, you can use the updated showplan to identify what the server is using as a query plan.
MDA tables
The Monitoring and Data Access (MDA) tables were added in ASE 12.5.0.3 to help identify what system resources are being used for. They are virtual tables in the master database. There was an install process which required, as well as a bit of configuration required before they would be usable.
Monitoring has turned out to be so important that with ASE 15 the MDA tables are installed as part of the buildmaster script (i.e. at installation time).
The MDA tables enable you to identify what queries are running, what resources they are taking up, and what the queries are waiting for (for example, network, io, or locks to be released).
The complete list of MDA tables in 15.0.2 follows {tech writer please format these so as to not take up too much of this document}:
monCachePool
monIOQueue
monProcedureCacheModuleUsage
monCachedObject
monLicense
monProcess
monCachedProcedures
monLocks
monProcessActivity
monCachedStatement
monNetworkIO
monProcessLookup
monDataCache
monOpenDatabases
monProcessNetIO
monDeadLock
monOpenObjectActivity
monProcessObject
monDeviceIO
monOpenPartitionActivity
monProcessProcedures
monEngine
monProcedureCache
monProcessSQLText
monErrorLog
monProcedureCacheMemoryUsage
monProcessStatement
monProcessWaits
monTableColumns
monProcessWorkerThread
monTableParameters
monState
monTables
monStatementCache
monWaitClassInfo
monSysPlanText
monWaitEventInfo
monSysSQLText
monSysStatement
monSysWaits
monSysWorkerThread
There are other, more exhaustive, dissertations available on the Sybase web site and across the internet on using these, but here are a few quick and easy things you can do with them. If you have trouble accessing the MDA tables, ask your DBA too add the “mon_role” to your login.
MDA tables at the application level
Two tables in particular will help identify the SQL running in the application: monProcessSQLText and monSysSQLText.
monProcessSQLText
shows currently executing SQL
monSysSQLText
shows recent, completed SQL
monSysStatement
More information on completed SQL statements
Let’s say access to a particular table is slow; what SQL accesses the table?
select * into #sqltext from monSysSQLText
select * from #sqltext where SQLText like ‘%pt_sample%’
/* Note: after you’ve selected the data once, ASE assumes that you don’t want to see it again. A second select from the same table will get you successive data. If you want to access it again later, you may want to select into a temp table… and, this is good information to keep historically*/
{tech writer, please format the following for readability, replacing commas with tabs, thank you!}
SPID,KPID,ServerUserID,BatchID,SequenceInBatch,SQLText
26,1900573,1,27,1,'select * from pt_sample s, pt_tx t where t.id = s.id'
And, now how about some statistics on the statement?
select * from monSysStatement where KPID = 1900573
SPID,KPID,DBID,ProcedureID,PlanID,BatchID,ContextID,LineNumber,CpuTime,WaitTime,MemUsageKB,PhysicalReads,LogicalReads,PagesModified,PacketsSent,PacketsReceived,NetworkPacketSize,PlansAltered,RowsAffected,ErrorStatus,HashKey,SsqlId,StartTime,EndTime
25, 1900573,1,1280004560,30,12,2,0,0,0,684,0,0,0,0,0,2048,0,0,0,0,0,2008-11-15 10:03:14.793,2008-11-15 10:03:14.793
You can see:
· Number of logical I/Os
· Number of physical I/Os
· Number of network packets sent/received
· Number of milliseconds of ‘waiting time’ during statement execution
· Start time & end time of execution
There’s a wealth of other information you can glean from the MDA tables including unused indexes, table utilization, physical io, table activity, heavily utilized tables, and wait states (indicating what applications are waiting for). This is worth taking the time to understand.
Tip: There is a configuration parameter, ‘max SQL text monitored’, which is STATIC, which limits the size of the SQL statement that you can see (that is, it truncates past this limit). You should try 16384 as a value. This increases the amount of memory used by the server, but should give you the ability to look at your entire query most of the time.
Query Metrics
Query metrics allow you to measure performance for your session using the sp_metrics stored procedure. This allows you to track:
· Cpu execution time, elapsed time
· Logical I/O (found in cache, cached by async prefetch)
· Physical I/O (regular I/O, async prefetch)
· Count - # of times query executed
· Tracks min, max and avg (less count)
This is likely to replace the statistics io and statistics time session settings.
To capture metrics,
1) You need to enable the metrics capture at the server level:
sp_configure "enable metrics capture", 1 /* note that this defaults to zero */
2) you then need to enable metrics capture for your session
set metrics capture on
3) Then you’ll flush the metrics from the aggregation to the system view, sysquerymetrics using the sp_metrics stored procedure
sp_metrics ‘flush’
4) And finally, you can select from the sysquerymetrics table
select * from sysquerymetrics
Using sp_metrics
The sp_metrics stored procedure is used to collect and retain groups of captured session metric data.
sp_metrics [flush backup drop help], @arg1 [, @arg2]
Query metrics are captured in the default running group (the running group with a group id (gid) of 1) in the sysquerymetrics view. Note that the sysquerymetrics view contains a self join & decoding of the sysqueryplans table, which contains abstract query plan information (among other things).
sp_metrics has built-in help:
sp_metrics ‘help’ /* in case you forget syntax or the stored procedure arguments.*/
(Output: )
Command list:
help - get help on a command
flush - flush all metrics from memory to system tables
drop - drop a metric ID or a group of metrics
backup - move the metrics in the default group to a backup group
filter - delete metrics records based on filtering condition
show - display the number of records in each metrics group
You can move metrics to a different group using the “backup” parameter.
sp_metrics 'flush'
Flushes all metrics from memory to system tables. If you skip this step, the information you need may not yet be in the view
sp_metrics 'drop', '2', '5'
Drops the groups of saved metrics. @arg1 is an integer in char form representing the first metrics group to drop in the range (must be a valid group), @arg2 is an integer in char form representing the last metrics group to drop in the range
sp_metrics 'backup', '6'
Saves metrics for future perusal. This example moves the metrics in the default group (i.e. what is currently running) to group #6. @arg1 is an integer in char form representing group number. It must be higher than 1 (1 represents the current group)
Application tracing
You can trace all of the SQL text for an existing session to a physical file using the set tracefile session option. This is great for a situation when you are exploring complaint s from a specific user and want to find out what the user is doing. Use it like this:
Set tracefile FILENAME [for spid]
And disable it like this:set tracefile off [for spid]
For example,
Set tracefile ‘/opt/Sybase/tracefiles/2008 1101.trc12’ for 12
… will write all SQL for server process ID 12 to the file at path ‘/opt/Sybase/tracefiles/2008 1101.trc12’. If you don’t specify a spid, you will record SQL for your own session.
This will store SQL as well as query plans (showplan output), statistics (statistics io output), show_sqltext (below!), set option show, and/or dbcc traceon (100) output.
You must either have the sa or sso role in order to run “set tracefile”, or you must have been granted “set tracefile” permission. Note that the tracefile will be on the server machine, not on your client.
set show_sqltext
You can print the SQL text for ad-hoc queries, stored procedures, cursors, and dynamic prepared statements using set show_sqltext. You do not need to enable the set show_sqltext before you execute the query (as you do with commands like set showplan on) to collect diagnostic information for a SQL session. You can do it when you get the “things are running slow” telephone call.
Before you enable show_sqltext, you must first use dbcc (dbcc traceon(3604)) to instruct the server to send the output to your terminal instead of to standard output (i.e. to the console of the session that started up the ASE instance).
The syntax for set show_sqltext is:
set show_sqltext {on off}
For example, this enables show_sqltext:
set show_sqltext on
Once set show_sqltext is enabled, Adaptive Server prints all SQL text to standard out for each command or system procedure you enter. Depending on the command or system procedure you run, this output can be extensive. This is a good command to run in conjunction with the set tracefile option.
Summary
Every application encounters the occasional performance problem. (Some systems encounter problems more occasionally than others).
There are a variety of tools that have been around for a while (for example, sp_sysmon) which will enable you to understand server performance from a macro level. But, when it comes to identifying problem queries, or problem logins, it can be more difficult.
With the advent of ASE 12.5, Sybase introduced the MDA tables to identify all server activity. This has been enhanced with ASE 15. In addition, ASE 15 introduced a variety of ways to identify session metrics, SQL, and query plans to help identify issues as they crop up.
Friday, February 13, 2009
ASE 15 Encryption Best Practices
This is an article I wrote for Sybase... for a .pdf shoot me an email or go to Sybase's web site
http://www.sybase.com/detail_list?id=117763
or
www.sybase.com/asesecurity
Introduction
At this point, you’ve already determined that you want to encrypt the data in your database. You have a handle on basic data encryption techniques. So, you don’t need to be told that encryption performs two purposes: Protection of data against internal prying eyes, as well as protection of data against external threats (hacking, theft of backup tapes, etc.). You also don’t need a discussion on the merits of performing encryption in the database tier, where you have database-caliber performance and protection of encryption keys (among other things) rather than incur the overhead and additional cost of using a 3rd-party encryption tool in an application tier. And, you don’t need to be told that you need to install the ASE_ENCRYPTION license option, enable the encryption configuration option, and create keys.
But, you may want to know what design considerations will affect storage and performance. In this article, we’re going to explore best practices and design considerations in column-level encryption as practiced with Sybase Adaptive Server Enterprise 15 (ASE 15). If you need a more basic overview, or a basic discussion of why and how we encrypt, read the precursor article, “Column-level encryption in ASE 15.” There is also great information in the ASE Systems Administration Guide and the Encryption Guide, available with online books.
This article is intended to be a great read prior to designing an encryption scheme for your secure-server-to-be. There are a variety of decisions you’re going to need to make, from encryption options to storage of encryption keys. We’ll discuss pros & cons and make recommendations based upon some of your security business decisions.
Brief Review of Data Encryption Concepts
First, let’s make sure we’re on the same page from a terminology and process standpoint.
There are two basic components to data encryption: Encryption, wherein we store and protect the data, and decryption, with which we retrieve and unscramble the data.
On the encryption side, you need to be able to create and manage an encryption key, as well as the ability to set permissions for logins who should have access. On the decryption side, you need to be able to transparently select from the appropriate columns.
Note to final content editor: This picture has been copied directly from the encryption guide, but I can’t think of a better way to explain the concept; please redraw it for me (sans the SSN reference), I’m confident you can do it better than I, and would likely have to do it anyway.
The Key Custodian (this is a role initially granted to the System Security Officer – SSO) uses the create encryption key command to create an encryption key. The server may contain one encryption key for the server, one per database, one per column, or any combination thereof. (This will be one of your early decisions; more on that later.) The server uses a symmetric encryption algorithm, which means that the same encryption key is used for both encryption and decryption.
When the client process accesses an encrypted column, the server knows which encryption key was associated with the column (it is stored in the sysencryptkeys table in each database). At insert or update time, the column is transparently encrypted immediately before writing the row. At select time, the server decrypts the data immediately after reading the data. This is all done transparently to the user, if the user has both select and decrypt permissions on the column.
From a utilization standpoint, to enable encryption, you follow these steps:
1) Install the license option (encryption does not come standard with ASE)
2) Enable encryption using sp_configure ‘enable encrypted columns’,1
3) Set the system encryption password for the database in which you’re going to store the keys (more on this later, you can do this for each database, or store them all together)
4) Create the column encryption key(s) (CEK) using create encryption key … you may end up creating one or many. The CEK is used in conjunction with a system encryption password (SEP), a database-specific password which is set (or reset) by the SSO. It is unique per database which contains the SEK. In other words, if the keys are all stored in one database (more on that later), the SSO only needs to set an SEP in one database.
5) Specify column(s) to be encrypted (and with which keys)
6) Grant decrypt permission to users authorized to use the information. If the users have select permission only, they will get gibberish back.
Design Considerations
We’re going to start off with design consideration in support of the basic scheme we’ve described.
First, here’s a list of the things you need to decide / know before embarking upon an encryption solution:
1) Where are you going to keep the CEKs? Choices include in each database, or in a separate database, or some combination (potential dump / load issues here).
2) How many CEKs are you going to keep? One per column? One per database? One per server? One per type of column you’re going to encrypt (we’ll go into detail about the limitations of individual encryption types)?
3) What size CEK are we going to use? Choices are 128-192-256 bits; the bigger the key, the higher CPU cost of encryption.
4) Will we use an init vector or pad the CEKs?
5) Are we going to search / join / create relational integrity on (and, of course, correspondingly index) any encrypted columns?
6) How often (and/or) will we change encryption keys? This requires the reencryption of all data that uses the key.
Don’t answer yet; first, we’ll go into ramifications of each decision.
Creation of Encryption keys
Column encryption in ASE uses Advanced Encryption Standard (AES) with 128, 192, or 256-bit encryption key sizes. The longer the bit string, the more difficult it is for a hacker to decrypt. On the other hand, the more complicated the encryption, the more CPU resources will be taken up by the encryption / decryption algorithm.
Which level is right for you? That’s really a question for your site security officer. You should be aware that most public, commercial products / projects are using 128-bit encryption, and the government uses that up to the SECRET level; TOP SECRET requires 192 or 256-bit encryption. How difficult is this to crack? It’s been calculated that cracking a 128-bit algorithm requires 2120 operations, which is currently not considered feasible (Source: Wikipedia).
That said, 128-bit encryption seems secure enough for most applications. But, check with your security officer about your shop standard. If you go with a higher level, be sure to benchmark the effect of the higher level of encryption against CPU utilization.
To securely protect key values, ASE uses the system encryption password (set up by the Key Custodian) in conjunction with random values to generate a 128-bit key-encryption key (KEK). The KEK is in turn used to encrypt (prior to storage) all of the CEKs you create.
Syntax:
create encryption key keyname [as default] for algorithm
[with [keylength num_bits]
[init_vector [null random]]
[pad [null random]]]
where:
keyname
must be unique in the user’s table, view, and procedure name space in the current database.
as default
allows the Key Custodian to create a database default key for encryption. This CEK will be used when a table creator fails to specify a CEK name with encryption.
algorithm
Advanced Encryption Standard (AES) is the only algorithm supported.
keylength num_bits
Valid key lengths are 128, 192, and 256 bits, with a default of 128 bits.
init_vector
Random / null
Instructs the server to randomly pattern an initialization string, so that the ciphertext of two identical pieces of plaintext are different. Use this to prevents detection of data patterns . the catch:
You can create indexes and optimize joins and searches only on columns where the encryption key does not specify an initialization vector.
The default is to use an initialization vector, that is, init_vector random.
pad
Random / null
The default, null, omits random padding of data, and supports the use of an index.
When random, data is automatically padded with random bytes before encryption. You can use padding instead of or in addition to an initialization vector to randomize the ciphertext (the underlying data, stored as varbinary).
Initialization vectors (init_vector) and column padding (pad) might be a reason that you have multiple CEKs. You might have one which uses the default, random vector, and another which you use for searchable strings.
You may also have multiple columns in a single table which might require a higher level of security, or which might need to be searchable.
Using CEKs
The encryption key is references at table creation time (or via the alter table, which will dynamically encrypt all of the target data). You may reference a CEK in the local database, or a remote database. After the CEK is created, you will have to grant select permission on the key to the group (role) that will be creating / altering the tables.
Local vs. remote database keys
Encrypting in a database other than the secure database provides an additional layer of security. If the database dump is ever stolen, the encryption key is nowhere to be found. In addition, the administrator or operator can password protect the database dump, making things that much harder for a hacker.
The flip side is that if you are storing CEKs in a different database, you must make sure you synchronize your database dumps. In fact, you might consider treating the CEK database as another master database, dumping it each time a change occurs. Keep in mind this goes in the opposite direction too… you want to dump the “data” database if the changed “CEK” database gets dumped, for consistency’s sake.
More on dumps & loads
To prevent accidental loss of keys, the drop database command fails if the database contains keys currently used to encrypt columns in other databases. Before dropping the database containing the encryption keys, first remove the encryption on the columns by using alter table, then drop the table or database containing the encrypted columns.
Store database dumps and the dumps of the key databases in separate physical locations. This prevents loss in case an archive is stolen.
Column support
ASE 15 supports the encryption of int, smallint, tinyint, unsigned int, smallint, tinyint, float4, float8, decimal, numeric, char, varchar, binary and varbinary datatypes. Note that null values are not encrypted.
Cross platform encryption
Cross platform support is available if both platforms use the same character set (and if you have the encryption key); will need the encryption key. In other words, you can decrypt data on a platform other than the platform on which encryption took place. This is useful for bcp, among other things. (i.e. test/development systems, where you might move databases over, including encryption key database(s), and give select but not decrypt permissions to the testers / developers).
Changing encryption keys
You will likely make a security decision to change encryption keys on a periodic basis. This is implemented easily with the alter table command.
Because changing the encryption key requires decryption from the old key and corresponding reencryption, you’ll want to set aside plenty of time for the activity. (i.e. table scan + encryption time)
This is likely to preclude frequent CEK changes in large-scale and/or high volume production environments.
A Final Design Note
Encrypted columns take up more storage space because of the data column changes which add 16-byte encryption information + offset lengths. If encrypted data has a significant number of rows, be sure to take this into account at capacity planning time.
Performance implications of encryption
Encryption is CPU-intensive. How CPU-intensive is something you’re going to have to benchmark on your own application / hardware /load combination, but will vary based upon:
Number of CPUs
ASE engines
System load
Concurrent sessions
Encryption per session
Encryption key size
Length of data
In general, it’s far to say that the larger the key size and the wider the data, the higher your CPU utilization will be. As a result, you will want to ensure that you only encrypt columns that require the extra security. There are other considerations other than encryption and decryption of data.
Indexes on encrypted columns
ASE avoids table scans by using an index to access data. If you are searching for data based upon the content of an encrypted column, as opposed to simply retrieving the information, we have to be cognizant of a few things:
1) The index lookups are efficient because they look up and compare ciphertext values.
2) Noting that encryption happens when data goes in or comes out, things like range searches would end up using the index to get a range of ciphertext, rather than data, which is less than useful. But, indexes on the encrypted data are just fine for equality / inequality searches.
3) Same with sorts: any sort of encrypted data is going to incur additional overhead because the data needs to be decrypted prior to the sort. In other words, the index will not help avoid a sort).
4) In order to index the columns, the encryption key must have been created with both init_vector and pad (random padding) turned off.
5) Joins of encrypted columns are optimized if the following are true:
a. Indexing is permitted (i.e. init_vector and pad set to NULL).
b. The joining columns have the same datatype. Char and varchar are considered to be the same datatypes, as are binary and varbinary.
c. For int and float types, the columns have the same length.
d. For numeric and decimal types, the columns have the same precision and scale.
e. The joining columns are encrypted with the same key.
f. The joining columns are not part of an expression.
g. The join operator is ‘=’ or ‘<>’.
h. The data has the default sort order.
6) In order to use an index, we have to have a search argument (sarg), which is a where clause of the format encrypted_column operator constant.
7) You can define referential integrity constraints between two encrypted columns when:
a. Indexing is permitted (i.e. init_vector and pad set to NULL).
b. Both referencing and referenced columns are encrypted.
c. The referenced and referencing column are encrypted with the same key.
Referential integrity checks are efficient because they are performed on ciphertext values.
Returning Default Values
Protection exceptions are sloppy, unless your business requirement specifically states that it wants an exception thrown. In order to avoid protection errors, you simply set up a default at table creation time.
create table secure_table (ssnum char(11) encrypt with Key1
decrypt_default 'If you get caught looking at this data you are out of a job')
It’s that easy.
Recommended Practices & Other recommendations
CEK selection (absent indexing needs)
Columns characteristics
Recommended encryption key properties
Low cardinality data
Key with Initialization Vector or random padding
High cardinality data (SSN, Phone#, Credit Card#)
Key without Initialization Vector and random padding
Primary key columns and indexed columns
Key without Initialization Vector and random padding
Foreign key columns
Same key as referenced primary key (fully qualified name of the key should match)
Columns used in joins
Same key without Initialization Vector and random padding
You might also consider using a single key for a single “type” of data. For example, a social security number might get the same key regardless of the table it’s in. That way, if you have a specific standard “Change SSN keys monthly” you know where to start looking.
Key locations
Keep keys in separate databases, so that a stolen database doesn’t contain the decryption key.
Make sure dumps of CEK databases and the data itself are synchronized.
Change keys periodically
Makes key attacks harder
Encrypt data during movement
Use BCP with the –c option to keep the ciphertext in character format
Alternatively, use replication
Also, use SSL to send data
Beware impact of range searches (including foreign key searches) on encrypted data
Consider using a (noncompromising) prefix, rather than the entire key, as an unencrypted, indexed, searchable column
As the output may contain secure data, disable / do not use capture:
monSysSQLText and monProcessSQLText
query metrics capture
statement cache
monitor server
dbcc sqltext
Audit access to tables with encrypted data
Consider using central key management, and replicating keys out to CEK database on the target servers
Protect keys using explicit passwords. Do not give key custodians select permission on encrypted data.
Use the decrypt default feature to preserve application transparency with stored procedures which may start returning scrambled results from suddenly-encrypted columns
Conclusion
Database encryption is in and of itself fairly cookbook. You install the encryption option; enable it; create a system encryption password; a system encryption key; and then you create or alter a table column to use the encryption.
It is also fairly flexible; you can create a system-wide encryption key, or a separate encryption key for every column you encrypt.
There are a several options to consider when encrypting data, and these decisions are going to be driven based upon your security needs and procedures, in conjunction with your knowledge of resource on your server, as well as your performance needs.
For example, using 128-bit encryption is probably more than adequate, but if you’re going to use 256-bit encryption, make sure your CPU capacity is up to it prior to rolling this set of decisions out to production. Or, if you are going to be searching on an encrypted column, you need to make sure you can index that column, and that limits the type of encryption key you can use on that column.
http://www.sybase.com/detail_list?id=117763
or
www.sybase.com/asesecurity
Introduction
At this point, you’ve already determined that you want to encrypt the data in your database. You have a handle on basic data encryption techniques. So, you don’t need to be told that encryption performs two purposes: Protection of data against internal prying eyes, as well as protection of data against external threats (hacking, theft of backup tapes, etc.). You also don’t need a discussion on the merits of performing encryption in the database tier, where you have database-caliber performance and protection of encryption keys (among other things) rather than incur the overhead and additional cost of using a 3rd-party encryption tool in an application tier. And, you don’t need to be told that you need to install the ASE_ENCRYPTION license option, enable the encryption configuration option, and create keys.
But, you may want to know what design considerations will affect storage and performance. In this article, we’re going to explore best practices and design considerations in column-level encryption as practiced with Sybase Adaptive Server Enterprise 15 (ASE 15). If you need a more basic overview, or a basic discussion of why and how we encrypt, read the precursor article, “Column-level encryption in ASE 15.” There is also great information in the ASE Systems Administration Guide and the Encryption Guide, available with online books.
This article is intended to be a great read prior to designing an encryption scheme for your secure-server-to-be. There are a variety of decisions you’re going to need to make, from encryption options to storage of encryption keys. We’ll discuss pros & cons and make recommendations based upon some of your security business decisions.
Brief Review of Data Encryption Concepts
First, let’s make sure we’re on the same page from a terminology and process standpoint.
There are two basic components to data encryption: Encryption, wherein we store and protect the data, and decryption, with which we retrieve and unscramble the data.
On the encryption side, you need to be able to create and manage an encryption key, as well as the ability to set permissions for logins who should have access. On the decryption side, you need to be able to transparently select from the appropriate columns.
Note to final content editor: This picture has been copied directly from the encryption guide, but I can’t think of a better way to explain the concept; please redraw it for me (sans the SSN reference), I’m confident you can do it better than I, and would likely have to do it anyway.
The Key Custodian (this is a role initially granted to the System Security Officer – SSO) uses the create encryption key command to create an encryption key. The server may contain one encryption key for the server, one per database, one per column, or any combination thereof. (This will be one of your early decisions; more on that later.) The server uses a symmetric encryption algorithm, which means that the same encryption key is used for both encryption and decryption.
When the client process accesses an encrypted column, the server knows which encryption key was associated with the column (it is stored in the sysencryptkeys table in each database). At insert or update time, the column is transparently encrypted immediately before writing the row. At select time, the server decrypts the data immediately after reading the data. This is all done transparently to the user, if the user has both select and decrypt permissions on the column.
From a utilization standpoint, to enable encryption, you follow these steps:
1) Install the license option (encryption does not come standard with ASE)
2) Enable encryption using sp_configure ‘enable encrypted columns’,1
3) Set the system encryption password for the database in which you’re going to store the keys (more on this later, you can do this for each database, or store them all together)
4) Create the column encryption key(s) (CEK) using create encryption key … you may end up creating one or many. The CEK is used in conjunction with a system encryption password (SEP), a database-specific password which is set (or reset) by the SSO. It is unique per database which contains the SEK. In other words, if the keys are all stored in one database (more on that later), the SSO only needs to set an SEP in one database.
5) Specify column(s) to be encrypted (and with which keys)
6) Grant decrypt permission to users authorized to use the information. If the users have select permission only, they will get gibberish back.
Design Considerations
We’re going to start off with design consideration in support of the basic scheme we’ve described.
First, here’s a list of the things you need to decide / know before embarking upon an encryption solution:
1) Where are you going to keep the CEKs? Choices include in each database, or in a separate database, or some combination (potential dump / load issues here).
2) How many CEKs are you going to keep? One per column? One per database? One per server? One per type of column you’re going to encrypt (we’ll go into detail about the limitations of individual encryption types)?
3) What size CEK are we going to use? Choices are 128-192-256 bits; the bigger the key, the higher CPU cost of encryption.
4) Will we use an init vector or pad the CEKs?
5) Are we going to search / join / create relational integrity on (and, of course, correspondingly index) any encrypted columns?
6) How often (and/or) will we change encryption keys? This requires the reencryption of all data that uses the key.
Don’t answer yet; first, we’ll go into ramifications of each decision.
Creation of Encryption keys
Column encryption in ASE uses Advanced Encryption Standard (AES) with 128, 192, or 256-bit encryption key sizes. The longer the bit string, the more difficult it is for a hacker to decrypt. On the other hand, the more complicated the encryption, the more CPU resources will be taken up by the encryption / decryption algorithm.
Which level is right for you? That’s really a question for your site security officer. You should be aware that most public, commercial products / projects are using 128-bit encryption, and the government uses that up to the SECRET level; TOP SECRET requires 192 or 256-bit encryption. How difficult is this to crack? It’s been calculated that cracking a 128-bit algorithm requires 2120 operations, which is currently not considered feasible (Source: Wikipedia).
That said, 128-bit encryption seems secure enough for most applications. But, check with your security officer about your shop standard. If you go with a higher level, be sure to benchmark the effect of the higher level of encryption against CPU utilization.
To securely protect key values, ASE uses the system encryption password (set up by the Key Custodian) in conjunction with random values to generate a 128-bit key-encryption key (KEK). The KEK is in turn used to encrypt (prior to storage) all of the CEKs you create.
Syntax:
create encryption key keyname [as default] for algorithm
[with [keylength num_bits]
[init_vector [null random]]
[pad [null random]]]
where:
keyname
must be unique in the user’s table, view, and procedure name space in the current database.
as default
allows the Key Custodian to create a database default key for encryption. This CEK will be used when a table creator fails to specify a CEK name with encryption.
algorithm
Advanced Encryption Standard (AES) is the only algorithm supported.
keylength num_bits
Valid key lengths are 128, 192, and 256 bits, with a default of 128 bits.
init_vector
Random / null
Instructs the server to randomly pattern an initialization string, so that the ciphertext of two identical pieces of plaintext are different. Use this to prevents detection of data patterns . the catch:
You can create indexes and optimize joins and searches only on columns where the encryption key does not specify an initialization vector.
The default is to use an initialization vector, that is, init_vector random.
pad
Random / null
The default, null, omits random padding of data, and supports the use of an index.
When random, data is automatically padded with random bytes before encryption. You can use padding instead of or in addition to an initialization vector to randomize the ciphertext (the underlying data, stored as varbinary).
Initialization vectors (init_vector) and column padding (pad) might be a reason that you have multiple CEKs. You might have one which uses the default, random vector, and another which you use for searchable strings.
You may also have multiple columns in a single table which might require a higher level of security, or which might need to be searchable.
Using CEKs
The encryption key is references at table creation time (or via the alter table, which will dynamically encrypt all of the target data). You may reference a CEK in the local database, or a remote database. After the CEK is created, you will have to grant select permission on the key to the group (role) that will be creating / altering the tables.
Local vs. remote database keys
Encrypting in a database other than the secure database provides an additional layer of security. If the database dump is ever stolen, the encryption key is nowhere to be found. In addition, the administrator or operator can password protect the database dump, making things that much harder for a hacker.
The flip side is that if you are storing CEKs in a different database, you must make sure you synchronize your database dumps. In fact, you might consider treating the CEK database as another master database, dumping it each time a change occurs. Keep in mind this goes in the opposite direction too… you want to dump the “data” database if the changed “CEK” database gets dumped, for consistency’s sake.
More on dumps & loads
To prevent accidental loss of keys, the drop database command fails if the database contains keys currently used to encrypt columns in other databases. Before dropping the database containing the encryption keys, first remove the encryption on the columns by using alter table, then drop the table or database containing the encrypted columns.
Store database dumps and the dumps of the key databases in separate physical locations. This prevents loss in case an archive is stolen.
Column support
ASE 15 supports the encryption of int, smallint, tinyint, unsigned int, smallint, tinyint, float4, float8, decimal, numeric, char, varchar, binary and varbinary datatypes. Note that null values are not encrypted.
Cross platform encryption
Cross platform support is available if both platforms use the same character set (and if you have the encryption key); will need the encryption key. In other words, you can decrypt data on a platform other than the platform on which encryption took place. This is useful for bcp, among other things. (i.e. test/development systems, where you might move databases over, including encryption key database(s), and give select but not decrypt permissions to the testers / developers).
Changing encryption keys
You will likely make a security decision to change encryption keys on a periodic basis. This is implemented easily with the alter table command.
Because changing the encryption key requires decryption from the old key and corresponding reencryption, you’ll want to set aside plenty of time for the activity. (i.e. table scan + encryption time)
This is likely to preclude frequent CEK changes in large-scale and/or high volume production environments.
A Final Design Note
Encrypted columns take up more storage space because of the data column changes which add 16-byte encryption information + offset lengths. If encrypted data has a significant number of rows, be sure to take this into account at capacity planning time.
Performance implications of encryption
Encryption is CPU-intensive. How CPU-intensive is something you’re going to have to benchmark on your own application / hardware /load combination, but will vary based upon:
Number of CPUs
ASE engines
System load
Concurrent sessions
Encryption per session
Encryption key size
Length of data
In general, it’s far to say that the larger the key size and the wider the data, the higher your CPU utilization will be. As a result, you will want to ensure that you only encrypt columns that require the extra security. There are other considerations other than encryption and decryption of data.
Indexes on encrypted columns
ASE avoids table scans by using an index to access data. If you are searching for data based upon the content of an encrypted column, as opposed to simply retrieving the information, we have to be cognizant of a few things:
1) The index lookups are efficient because they look up and compare ciphertext values.
2) Noting that encryption happens when data goes in or comes out, things like range searches would end up using the index to get a range of ciphertext, rather than data, which is less than useful. But, indexes on the encrypted data are just fine for equality / inequality searches.
3) Same with sorts: any sort of encrypted data is going to incur additional overhead because the data needs to be decrypted prior to the sort. In other words, the index will not help avoid a sort).
4) In order to index the columns, the encryption key must have been created with both init_vector and pad (random padding) turned off.
5) Joins of encrypted columns are optimized if the following are true:
a. Indexing is permitted (i.e. init_vector and pad set to NULL).
b. The joining columns have the same datatype. Char and varchar are considered to be the same datatypes, as are binary and varbinary.
c. For int and float types, the columns have the same length.
d. For numeric and decimal types, the columns have the same precision and scale.
e. The joining columns are encrypted with the same key.
f. The joining columns are not part of an expression.
g. The join operator is ‘=’ or ‘<>’.
h. The data has the default sort order.
6) In order to use an index, we have to have a search argument (sarg), which is a where clause of the format encrypted_column operator constant.
7) You can define referential integrity constraints between two encrypted columns when:
a. Indexing is permitted (i.e. init_vector and pad set to NULL).
b. Both referencing and referenced columns are encrypted.
c. The referenced and referencing column are encrypted with the same key.
Referential integrity checks are efficient because they are performed on ciphertext values.
Returning Default Values
Protection exceptions are sloppy, unless your business requirement specifically states that it wants an exception thrown. In order to avoid protection errors, you simply set up a default at table creation time.
create table secure_table (ssnum char(11) encrypt with Key1
decrypt_default 'If you get caught looking at this data you are out of a job')
It’s that easy.
Recommended Practices & Other recommendations
CEK selection (absent indexing needs)
Columns characteristics
Recommended encryption key properties
Low cardinality data
Key with Initialization Vector or random padding
High cardinality data (SSN, Phone#, Credit Card#)
Key without Initialization Vector and random padding
Primary key columns and indexed columns
Key without Initialization Vector and random padding
Foreign key columns
Same key as referenced primary key (fully qualified name of the key should match)
Columns used in joins
Same key without Initialization Vector and random padding
You might also consider using a single key for a single “type” of data. For example, a social security number might get the same key regardless of the table it’s in. That way, if you have a specific standard “Change SSN keys monthly” you know where to start looking.
Key locations
Keep keys in separate databases, so that a stolen database doesn’t contain the decryption key.
Make sure dumps of CEK databases and the data itself are synchronized.
Change keys periodically
Makes key attacks harder
Encrypt data during movement
Use BCP with the –c option to keep the ciphertext in character format
Alternatively, use replication
Also, use SSL to send data
Beware impact of range searches (including foreign key searches) on encrypted data
Consider using a (noncompromising) prefix, rather than the entire key, as an unencrypted, indexed, searchable column
As the output may contain secure data, disable / do not use capture:
monSysSQLText and monProcessSQLText
query metrics capture
statement cache
monitor server
dbcc sqltext
Audit access to tables with encrypted data
Consider using central key management, and replicating keys out to CEK database on the target servers
Protect keys using explicit passwords. Do not give key custodians select permission on encrypted data.
Use the decrypt default feature to preserve application transparency with stored procedures which may start returning scrambled results from suddenly-encrypted columns
Conclusion
Database encryption is in and of itself fairly cookbook. You install the encryption option; enable it; create a system encryption password; a system encryption key; and then you create or alter a table column to use the encryption.
It is also fairly flexible; you can create a system-wide encryption key, or a separate encryption key for every column you encrypt.
There are a several options to consider when encrypting data, and these decisions are going to be driven based upon your security needs and procedures, in conjunction with your knowledge of resource on your server, as well as your performance needs.
For example, using 128-bit encryption is probably more than adequate, but if you’re going to use 256-bit encryption, make sure your CPU capacity is up to it prior to rolling this set of decisions out to production. Or, if you are going to be searching on an encrypted column, you need to make sure you can index that column, and that limits the type of encryption key you can use on that column.
Tuesday, February 10, 2009
Encryption in ASE 15
Introduction
In the early days of Sybase ASE, there was one level of security: either permission is granted on a column, or it is not. If it is granted, you can query the data with a select statement; if permission is not granted, you may not query the data with a select statement.
This was the mechanism with which Sybase protected login passwords. Server logins all had permission to use the master database, and select from the syslogins table, but would not have permission to select from the password column; therefore the password was protected; wasn’t it?
There are two catches here. First, the sa had permission to select from that column[1]; the sa has permission to do most everything on the server. The second problem was that a dump or database file which was stolen from the company could be mined, and the password pulled from that dump.
Sybase’s answer 15 years ago is the answer that has become commonplace today: The data in the column is encrypted. Now, the sa can still select data from the column, but without the encryption key, the data is meaningless.
Data encryption meets both needs in data security: It protects data from “friendly” eyes, as well as “unfriendly” eyes, who have perhaps stolen a backup tape.
With the advent of ASE 12.5.4, Sybase gave customers the ability to create encryption keys and encrypt any other data in the database(s). This met an immediate market need, because there had been a lot of high-profile cases in the news where databases had been stolen, and this had led to identity theft on a massive sale. With ASE 15, encryption got better & easier, with the ability to recover lost encryption keys, specify default values for data a user has no permission for, and additional datatypes that may be encrypted. (Note: this article focuses on ASE 15, if you are still on 12.5.x and care about encryption, it is probably worth upgrading for this additional functionality in and of itself.)
There are / have been third-party products to perform the encryption, but performing the encryption at the database level is an improvement in performance and reduces the need for a security tier (don’t we already have enough tiers in our environments?).
Data Encryption in ASE 15
There are two basic components to data encryption: Encryption, wherein we store and protect the data, and decryption, with which we retrieve and unscramble the data.
On the encryption side, you need to be able to create and manage an encryption key, as well as the ability to set permissions for logins who should have access. On the decryption side, you need to be able to transparently select from the appropriate columns.
Encrypting the Data – the Encryption Key
ASE 15 introduces several new concepts, which work together to create a consistent and usable encrypted data security model. The primary focus is the management of and access to an encryption key.
Note that ASE manages the security of keys by keeping keys encrypted. There are actually two keys between the user and the data: the column encryption key (CEK) and the key encryption key (KEK). The CEK encrypts data and users must have access to it before they can access the encrypted data. For security it is stored in encrypted form. ASE encrypts the CEK with a KEK when you create or alter an encryption key. The KEK is also used to decrypt the CEK before you can access decrypted data. The KEK is derived internally from the system encryption password, a user-specified password, or a login password, depending on how you specify the key’s encryption with the create and alter encryption key statements. CEKs are stored in encrypted form in sysencryptkeys. ASE 15 provides the ability to move keys between systems in a secure manner.
Key Custodian
Somebody needs to manage the encryption keys, which includes creating, dropping, and modifying them, distributing passwords, and providing for key recovery in the event of a lost password (important feature!).
In order to separate the management of the encryption key from the day-to-day administration of security (logins, etc.), Sybase has introduced the concept of Key Custodian. The keycustodian_role is automatically granted to the sso_role. It can also be granted to any login by a user with the sso_role. You can have multiple key custodians, who each own a set of keys.
The key custodian can:
• Create and alter encryption keys
• Assign a database default key a key
· Set up key copies for designated users, allowing each user access to the key through a chosen password or a login password
• Share key encryption passwords
• Grant schema owners select access to encryption keys
• Set the system encryption password
• Recover encryption keys
• Drop encryption keys they own
• Change ownership of keys they own
There are three options for passwords, a system password, an encryption key password, and a login password.
System Encryption Password
The key custodian sets the system encryption password using:
sp_encryption system_encr_passwd, 'password'
Using a system encryption password simplifies the administration of encrypted data because:
• Key management is restricted to setting up and changing the system encryption password
• You need not specify passwords on create and alter encryption key statements.
• Password distribution and recovery from lost passwords are not required
• Privacy of data is enforced through decrypt permission on the column.
• Restricted decrypt permission reinforces this privacy against the power of the administrator
The downside to System encryption passwords is that you may need to synchronize passwords across multiple systems (for example, in a Replication Server envitonment. There are vulnerabilities, and given that different people may have the keycustodian_role, we might not want both arbitrarily able to change the system encryption password, as this could add a layer of confusion.
User-Specific Encryption Passwords
A login with the keycustodian_role or the sso_role can restrict access to data even from the SA or DBO by specifying passwords on keys through the create encryption key or alter encryption key statements.
If keys have explicit passwords, users need Decrypt permission on the column and the encryption key’s password.Users must also have knowledge of the password to run DML commands that encrypt data.
Use the create encryption key command to associate a password with a key:
create encryption key [db.[owner].]keyname [as default]
[for algorithm_name] [with {[keylength num_bits]
[passwd 'password_phrase'] [init_vector {NULL random}]
[pad {NULL random}]}]
• for algorithm_name – (optional for the 15.0.2 release) specifies the algorithm you are using. The default is the Advanced Encryption Standard (AES) algorithm
• password_phrase – is a quoted alphanumeric string of up to 255 bytes in length that ASE uses to generate the KEK.
Note that ASE doesn’t save the password. Instead, it saves a string of validating bytes known as the “salt” in sysencryptkeys.eksalt, which allows it to recognize whether a password used on a subsequent encryption or decryption operation is legitimate for a key.
This example shows how to use passwords on keys, and the key custodian’s function in setting up encryption. Here, the password would be shared among all the users.
1) Key custodian Jeff creates an encryption key:
create encryption key key1 with passwd 'Th1sIsmyP@ssw0rd'
2) The Key custodian distributes the password to all users who need access to encrypted data
3) Each user enters the password before processing tables with encrypted columns:
set encryption passwd 'Th1sIsmyP@ssw0rd' for key jeff.key1
4) If somebody quits, or the password us hacked / guessed / business rules require us to change the key, key custodian Jeff alters the key to change the password.
Alter encryption key key1 with passwd 'Th1sIsmyNewP@ssw0rd'
Note: all this changes is the KEK. The data is not decrypted and re-encrypted as when you change a key. Only the KEK is changed so only the key is re-encrypted. Also note that this method requires an application change as the application needs to be able to submit the 'set encryption passwd…' phrase…..and any associated restrictions/issues with appservers and 'set proxy/session_authorization'.
Encryption Summary
The SSO starts out as a Key Custodian, and can create additional Key Custodians. The Key Custodian is able to manage Keys, but not necessarily see the data, as he may not have select permission on the target tables, but only permission on the keys.
Keys may be protected with passwords at the system level, which become part of individual users’ own passwords, or they may be protected with user-specific passwords, which may then be shared at the user level.
Decrypting (querying) the Data
Once the data has been encrypted using keys with user-defined passwords, several things have to happen in order for a user to see data in the clear:
· You need select permission on the column to read the information, insert/update/delete permission (as appropriate) to modify the information
· You need decrypt permission on the column to read (or if you’re going to use it for comparison purposes in a query predicate). If you have select permission, and do not have decrypt permission, you’re going to get either the default (if it has been applied) or a permissions error (if no default has been applied)[2].
· You need to supply a password, unless the system encryption password or a login password.
The encryption password is set using the “set encryption password” syntax above on a per session basis; simple.
This example illustrates how Adaptive Server determines the password when it must encrypt or decrypt data. It assumes that the ssn column in the employee and payroll tables is encrypted with key1, as shown in these simplified schema creation statements:
create encryption key key1 with passwd "MyPw2008"
create table customer (
ssn char (11) encrypt with key1,
name varchar(30),
credit_card varchar(20) encrypt with key1)
The key custodian shares the password required to access customer.ssn with Penny. He doesn’t need to disclose the name of the key to do this.
1) If Penny has select and decrypt permission on customer, she can select customer data using the password given to her for customer.ssn:
set encryption passwd "MyPw2008" for column customer.ssn
select name from customer where ssn = '111-22-3456'
name
-----------------------
Soaring Eagle Consulting, Inc.
2) Note that even though Penny is not retrieving the column, because she’s using it as a reference, it needs to be decrypted.
3) If Penny attempts to select data from payroll without specifying the password for customer.ssn, the following select fails (even if Penny has select and decrypt permission on customer)
select credit_card from customer where ssn = '111-22-3456'
You cannot execute 'SELECT' command because the user
encryption password has not been set.
4) To avoid this error, Penny must first enter:
set encryption passwd " MyPw2008" for column customer.ssn
The key custodian may choose to share passwords on a column-name basis and not on a key-name basis to avoid users hard coding key names in application code, which can make it difficult for the DBO to change the keys used to encrypt the data. However, if one key is used to encrypt several columns, it may be convenient to enter the password once. If one key is used to encrypt several columns and the user is setting a password for the column, they need to set password for all the columns they want to process.
The need to share the passwords and embed it in an application is part of the downfall of this approach - but it could be done via application asking for tokens (i.e. asking for both the key name and the key password from the user) - or by looking it up in a catalog.
The column name approach is not necessarily a great solution - if using an encryption password, an alternative such as a key catalog is viable and would allow keys to be changed without an application change.
Key Copies
Key copies enable users to access encrypted columns using their own copy of a single key. This provides accountability for data because a key copy is designated for an individual user with a private password known only to the user. Without knowledge of the passwords protecting the key and its copies, not even the SA can access the data.
To preserve application transparency, you may encrypt key copies with your login password, avoiding application changes to supply the key copy’s password.
Using login passwords on key copies
The easiest way to manage key copies is by associating the password with the user’s login. Users whose login password is associated with a key can access encrypted data without supplying a password. This is a good thing, because the fewer passwords a user has to remember, the less likely he is to put it on a post-it note and stick it to his monitor. In addition, applications don’t have to prompt for a password.
Sample syntax:
alter encryption key [database_name.[owner_name].]key_name
with password 'base_key_password'
add encryption for user 'user_name' for login_association
Recovering Lost Keys
What happens when a key is lost? Or a Key custodian quits? In short, and without a bundle of syntax, ASE provides the ability to recover from the loss of a user-specified encryption password on a key copy, a login password, or loss or unavailability of password on base key.
Returning Default Values
Protection exceptions are sloppy, unless your business requirement specifically states that it wants an exception thrown. In order to avoid protection errors, you simply set up a default at table creation time.
create table secure_table (ssnum char(11) encrypt with Key1
decrypt_default 'If you get caught looking at this data you are out of a job')
It’s that easy.
Conclusion
Data is encrypted to keep its contents protected from prying eyes, both from within and without.
ASE 15 provides a detailed and complete approach to encrypting, managing, and retrieving data in a consistent manner, while also creating capabilities to manage and secure keys in a flexible yet secure way.
[1] You may ask why this is a bad thing. After all, the sa has permission to do anything / change anything on the server anyway. It was nicely explained to me (about 18 years ago) by a Pentagon employee attending a Systems Administration class I was teaching that people tend to use the same password for all their systems; so, if you are able to read folks’ passwords on Sybase, you may have access to their email, etc., as well. Lesson learned.
[2] This may lend itself to some interesting programming practices at many shops. For example, many of us write defaults into stored procedures, and test the parameters to see if they have default values, and make decisions based upon the absence of a value. You can do the same thing here, and identify that the column has a default value, identifying that the user does not have permission.
In the early days of Sybase ASE, there was one level of security: either permission is granted on a column, or it is not. If it is granted, you can query the data with a select statement; if permission is not granted, you may not query the data with a select statement.
This was the mechanism with which Sybase protected login passwords. Server logins all had permission to use the master database, and select from the syslogins table, but would not have permission to select from the password column; therefore the password was protected; wasn’t it?
There are two catches here. First, the sa had permission to select from that column[1]; the sa has permission to do most everything on the server. The second problem was that a dump or database file which was stolen from the company could be mined, and the password pulled from that dump.
Sybase’s answer 15 years ago is the answer that has become commonplace today: The data in the column is encrypted. Now, the sa can still select data from the column, but without the encryption key, the data is meaningless.
Data encryption meets both needs in data security: It protects data from “friendly” eyes, as well as “unfriendly” eyes, who have perhaps stolen a backup tape.
With the advent of ASE 12.5.4, Sybase gave customers the ability to create encryption keys and encrypt any other data in the database(s). This met an immediate market need, because there had been a lot of high-profile cases in the news where databases had been stolen, and this had led to identity theft on a massive sale. With ASE 15, encryption got better & easier, with the ability to recover lost encryption keys, specify default values for data a user has no permission for, and additional datatypes that may be encrypted. (Note: this article focuses on ASE 15, if you are still on 12.5.x and care about encryption, it is probably worth upgrading for this additional functionality in and of itself.)
There are / have been third-party products to perform the encryption, but performing the encryption at the database level is an improvement in performance and reduces the need for a security tier (don’t we already have enough tiers in our environments?).
Data Encryption in ASE 15
There are two basic components to data encryption: Encryption, wherein we store and protect the data, and decryption, with which we retrieve and unscramble the data.
On the encryption side, you need to be able to create and manage an encryption key, as well as the ability to set permissions for logins who should have access. On the decryption side, you need to be able to transparently select from the appropriate columns.
Encrypting the Data – the Encryption Key
ASE 15 introduces several new concepts, which work together to create a consistent and usable encrypted data security model. The primary focus is the management of and access to an encryption key.
Note that ASE manages the security of keys by keeping keys encrypted. There are actually two keys between the user and the data: the column encryption key (CEK) and the key encryption key (KEK). The CEK encrypts data and users must have access to it before they can access the encrypted data. For security it is stored in encrypted form. ASE encrypts the CEK with a KEK when you create or alter an encryption key. The KEK is also used to decrypt the CEK before you can access decrypted data. The KEK is derived internally from the system encryption password, a user-specified password, or a login password, depending on how you specify the key’s encryption with the create and alter encryption key statements. CEKs are stored in encrypted form in sysencryptkeys. ASE 15 provides the ability to move keys between systems in a secure manner.
Key Custodian
Somebody needs to manage the encryption keys, which includes creating, dropping, and modifying them, distributing passwords, and providing for key recovery in the event of a lost password (important feature!).
In order to separate the management of the encryption key from the day-to-day administration of security (logins, etc.), Sybase has introduced the concept of Key Custodian. The keycustodian_role is automatically granted to the sso_role. It can also be granted to any login by a user with the sso_role. You can have multiple key custodians, who each own a set of keys.
The key custodian can:
• Create and alter encryption keys
• Assign a database default key a key
· Set up key copies for designated users, allowing each user access to the key through a chosen password or a login password
• Share key encryption passwords
• Grant schema owners select access to encryption keys
• Set the system encryption password
• Recover encryption keys
• Drop encryption keys they own
• Change ownership of keys they own
There are three options for passwords, a system password, an encryption key password, and a login password.
System Encryption Password
The key custodian sets the system encryption password using:
sp_encryption system_encr_passwd, 'password'
Using a system encryption password simplifies the administration of encrypted data because:
• Key management is restricted to setting up and changing the system encryption password
• You need not specify passwords on create and alter encryption key statements.
• Password distribution and recovery from lost passwords are not required
• Privacy of data is enforced through decrypt permission on the column.
• Restricted decrypt permission reinforces this privacy against the power of the administrator
The downside to System encryption passwords is that you may need to synchronize passwords across multiple systems (for example, in a Replication Server envitonment. There are vulnerabilities, and given that different people may have the keycustodian_role, we might not want both arbitrarily able to change the system encryption password, as this could add a layer of confusion.
User-Specific Encryption Passwords
A login with the keycustodian_role or the sso_role can restrict access to data even from the SA or DBO by specifying passwords on keys through the create encryption key or alter encryption key statements.
If keys have explicit passwords, users need Decrypt permission on the column and the encryption key’s password.Users must also have knowledge of the password to run DML commands that encrypt data.
Use the create encryption key command to associate a password with a key:
create encryption key [db.[owner].]keyname [as default]
[for algorithm_name] [with {[keylength num_bits]
[passwd 'password_phrase'] [init_vector {NULL random}]
[pad {NULL random}]}]
• for algorithm_name – (optional for the 15.0.2 release) specifies the algorithm you are using. The default is the Advanced Encryption Standard (AES) algorithm
• password_phrase – is a quoted alphanumeric string of up to 255 bytes in length that ASE uses to generate the KEK.
Note that ASE doesn’t save the password. Instead, it saves a string of validating bytes known as the “salt” in sysencryptkeys.eksalt, which allows it to recognize whether a password used on a subsequent encryption or decryption operation is legitimate for a key.
This example shows how to use passwords on keys, and the key custodian’s function in setting up encryption. Here, the password would be shared among all the users.
1) Key custodian Jeff creates an encryption key:
create encryption key key1 with passwd 'Th1sIsmyP@ssw0rd'
2) The Key custodian distributes the password to all users who need access to encrypted data
3) Each user enters the password before processing tables with encrypted columns:
set encryption passwd 'Th1sIsmyP@ssw0rd' for key jeff.key1
4) If somebody quits, or the password us hacked / guessed / business rules require us to change the key, key custodian Jeff alters the key to change the password.
Alter encryption key key1 with passwd 'Th1sIsmyNewP@ssw0rd'
Note: all this changes is the KEK. The data is not decrypted and re-encrypted as when you change a key. Only the KEK is changed so only the key is re-encrypted. Also note that this method requires an application change as the application needs to be able to submit the 'set encryption passwd…' phrase…..and any associated restrictions/issues with appservers and 'set proxy/session_authorization'.
Encryption Summary
The SSO starts out as a Key Custodian, and can create additional Key Custodians. The Key Custodian is able to manage Keys, but not necessarily see the data, as he may not have select permission on the target tables, but only permission on the keys.
Keys may be protected with passwords at the system level, which become part of individual users’ own passwords, or they may be protected with user-specific passwords, which may then be shared at the user level.
Decrypting (querying) the Data
Once the data has been encrypted using keys with user-defined passwords, several things have to happen in order for a user to see data in the clear:
· You need select permission on the column to read the information, insert/update/delete permission (as appropriate) to modify the information
· You need decrypt permission on the column to read (or if you’re going to use it for comparison purposes in a query predicate). If you have select permission, and do not have decrypt permission, you’re going to get either the default (if it has been applied) or a permissions error (if no default has been applied)[2].
· You need to supply a password, unless the system encryption password or a login password.
The encryption password is set using the “set encryption password” syntax above on a per session basis; simple.
This example illustrates how Adaptive Server determines the password when it must encrypt or decrypt data. It assumes that the ssn column in the employee and payroll tables is encrypted with key1, as shown in these simplified schema creation statements:
create encryption key key1 with passwd "MyPw2008"
create table customer (
ssn char (11) encrypt with key1,
name varchar(30),
credit_card varchar(20) encrypt with key1)
The key custodian shares the password required to access customer.ssn with Penny. He doesn’t need to disclose the name of the key to do this.
1) If Penny has select and decrypt permission on customer, she can select customer data using the password given to her for customer.ssn:
set encryption passwd "MyPw2008" for column customer.ssn
select name from customer where ssn = '111-22-3456'
name
-----------------------
Soaring Eagle Consulting, Inc.
2) Note that even though Penny is not retrieving the column, because she’s using it as a reference, it needs to be decrypted.
3) If Penny attempts to select data from payroll without specifying the password for customer.ssn, the following select fails (even if Penny has select and decrypt permission on customer)
select credit_card from customer where ssn = '111-22-3456'
You cannot execute 'SELECT' command because the user
encryption password has not been set.
4) To avoid this error, Penny must first enter:
set encryption passwd " MyPw2008" for column customer.ssn
The key custodian may choose to share passwords on a column-name basis and not on a key-name basis to avoid users hard coding key names in application code, which can make it difficult for the DBO to change the keys used to encrypt the data. However, if one key is used to encrypt several columns, it may be convenient to enter the password once. If one key is used to encrypt several columns and the user is setting a password for the column, they need to set password for all the columns they want to process.
The need to share the passwords and embed it in an application is part of the downfall of this approach - but it could be done via application asking for tokens (i.e. asking for both the key name and the key password from the user) - or by looking it up in a catalog.
The column name approach is not necessarily a great solution - if using an encryption password, an alternative such as a key catalog is viable and would allow keys to be changed without an application change.
Key Copies
Key copies enable users to access encrypted columns using their own copy of a single key. This provides accountability for data because a key copy is designated for an individual user with a private password known only to the user. Without knowledge of the passwords protecting the key and its copies, not even the SA can access the data.
To preserve application transparency, you may encrypt key copies with your login password, avoiding application changes to supply the key copy’s password.
Using login passwords on key copies
The easiest way to manage key copies is by associating the password with the user’s login. Users whose login password is associated with a key can access encrypted data without supplying a password. This is a good thing, because the fewer passwords a user has to remember, the less likely he is to put it on a post-it note and stick it to his monitor. In addition, applications don’t have to prompt for a password.
Sample syntax:
alter encryption key [database_name.[owner_name].]key_name
with password 'base_key_password'
add encryption for user 'user_name' for login_association
Recovering Lost Keys
What happens when a key is lost? Or a Key custodian quits? In short, and without a bundle of syntax, ASE provides the ability to recover from the loss of a user-specified encryption password on a key copy, a login password, or loss or unavailability of password on base key.
Returning Default Values
Protection exceptions are sloppy, unless your business requirement specifically states that it wants an exception thrown. In order to avoid protection errors, you simply set up a default at table creation time.
create table secure_table (ssnum char(11) encrypt with Key1
decrypt_default 'If you get caught looking at this data you are out of a job')
It’s that easy.
Conclusion
Data is encrypted to keep its contents protected from prying eyes, both from within and without.
ASE 15 provides a detailed and complete approach to encrypting, managing, and retrieving data in a consistent manner, while also creating capabilities to manage and secure keys in a flexible yet secure way.
[1] You may ask why this is a bad thing. After all, the sa has permission to do anything / change anything on the server anyway. It was nicely explained to me (about 18 years ago) by a Pentagon employee attending a Systems Administration class I was teaching that people tend to use the same password for all their systems; so, if you are able to read folks’ passwords on Sybase, you may have access to their email, etc., as well. Lesson learned.
[2] This may lend itself to some interesting programming practices at many shops. For example, many of us write defaults into stored procedures, and test the parameters to see if they have default values, and make decisions based upon the absence of a value. You can do the same thing here, and identify that the column has a default value, identifying that the user does not have permission.
Labels:
ASE 15,
encrypted,
encrypted columns,
encryption,
Sybase
Monday, February 2, 2009
ASE 15, the optimizer, and a quick config tip
A customer recently called after an ASE 12.5 to ASE 15 upgrade, complaining of performance issues of certain, (previously) quick queries. I didn't have this one in my back pocket then, and should have:
We increased statement cache, and enabled literal autoparameterization, the problem dissolved.
Apparently, the optimizer is a lot smarter with 15, but also works much harder. Caching sql statements made a huge difference.
We increased statement cache, and enabled literal autoparameterization, the problem dissolved.
Apparently, the optimizer is a lot smarter with 15, but also works much harder. Caching sql statements made a huge difference.
Labels:
ASE 12.5,
ASE 15,
ASE upgrade,
sql statement performance,
upgrade
Wednesday, January 28, 2009
Context
This is a repeat of the post from my SQL Server P&T blog... after this, you won't see much overlap. Enjoy!
One of my favorite quotations from a fictional character comes from “Atlas Shrugged” by Ayn Rand: There is no substitute for competence.
One of the most important things that we need to do as IT professionals is to use our own experience and competence even when taking advice from peers, or those you deem as experts. Even my own advice is sometimes not correct, when taken out of context (You will note if you follow these blog entries that I will attempt a lot of “tongue-in-cheek” humor, please take it that way).
As a public speaker (Users Groups, conventions, etc.) I continually try to challenge those attendees at my talks to rethink some of the decisions they’ve made, based upon facts I give them based upon my own experiences. Frequently, they challenge me back (for me, this is part of the fun).
At a recent talk I gave on indexes, I said something along these lines:“You may be able to come with some theoretical exceptions to this rule, but from a practical perspective, you should have a clustered index on every table. Without a clustered index, you are creating a hot spot at the tail end of your table’s page chain, (meaning contention when you have multiple insertions, and/or updates which cause rows to remove). In addition, updates which cause rows to move will cause data to be unbalanced towards the tail end of the table’s page chain (as well).”
A point that has come up (on at least two recent occasions) from an audience member was along the following lines:“ I was reading a paper by {no idea who wrote the paper, if anybody reading this does, please let me know so that I can credit the author), and in it he/she stated that benchmarking has shown that insertions are fastest with NO clustered index on the table because of some benefits of page allocations …”
The first time I heard that, I responded that I’d have to read the article to see what it had to say before responding, but that my recommendations from own extensive practical experience stood. The second time I heard that, something clicked, and I responded with, “Was that a single process performing a data load of some sort, or was that a thousand concurrent processes trying to gain access to a table?”
Context.
If you are trying to load data into a table, your issue isn’t necessarily contention, it is the speed of your disk & corresponding page allocations. If you are trying to give many users access to the table, your issue is going to be both page and index contention, at update time, page split time, etc.
Over time, I’m going to be offering a variety of recommendations, every one of which will be based upon my personal experiences in 20 years of tuning Sybase ASE.
Yet, I caution you: everything needs to be taken in context. Test my assertions; make sure that the environments I’m describing actually match your own.
One of my favorite quotations from a fictional character comes from “Atlas Shrugged” by Ayn Rand: There is no substitute for competence.
One of the most important things that we need to do as IT professionals is to use our own experience and competence even when taking advice from peers, or those you deem as experts. Even my own advice is sometimes not correct, when taken out of context (You will note if you follow these blog entries that I will attempt a lot of “tongue-in-cheek” humor, please take it that way).
As a public speaker (Users Groups, conventions, etc.) I continually try to challenge those attendees at my talks to rethink some of the decisions they’ve made, based upon facts I give them based upon my own experiences. Frequently, they challenge me back (for me, this is part of the fun).
At a recent talk I gave on indexes, I said something along these lines:“You may be able to come with some theoretical exceptions to this rule, but from a practical perspective, you should have a clustered index on every table. Without a clustered index, you are creating a hot spot at the tail end of your table’s page chain, (meaning contention when you have multiple insertions, and/or updates which cause rows to remove). In addition, updates which cause rows to move will cause data to be unbalanced towards the tail end of the table’s page chain (as well).”
A point that has come up (on at least two recent occasions) from an audience member was along the following lines:“ I was reading a paper by {no idea who wrote the paper, if anybody reading this does, please let me know so that I can credit the author), and in it he/she stated that benchmarking has shown that insertions are fastest with NO clustered index on the table because of some benefits of page allocations …”
The first time I heard that, I responded that I’d have to read the article to see what it had to say before responding, but that my recommendations from own extensive practical experience stood. The second time I heard that, something clicked, and I responded with, “Was that a single process performing a data load of some sort, or was that a thousand concurrent processes trying to gain access to a table?”
Context.
If you are trying to load data into a table, your issue isn’t necessarily contention, it is the speed of your disk & corresponding page allocations. If you are trying to give many users access to the table, your issue is going to be both page and index contention, at update time, page split time, etc.
Over time, I’m going to be offering a variety of recommendations, every one of which will be based upon my personal experiences in 20 years of tuning Sybase ASE.
Yet, I caution you: everything needs to be taken in context. Test my assertions; make sure that the environments I’m describing actually match your own.
Subscribe to:
Posts (Atom)