jueves, 14 de enero de 2010

Search in text fields with T-SQL: beyond LIKE

Versión en castellano

One of the first things you learn when begin with T-SQL is to use the LIKE clause to make comparisons in text fields (char, nchar, varchar, nvarchar or text). But this clause is quite limited and don’t have good performance. So, what can be used instead of LIKE?

A good option is to use CONTAINS(that to be honest, I discovered by accident: P), which allows us to refine more comparisons and offers much better performance.


To give an example of using CONTAINS I will use the PUBS database (Microsoft offers as an example with SQL Server and can be downloaded from here) and we will find records in the "jobs" table filtering by "job_desc" field.

To give you an idea (or if you don't want download example databases) the "jobs" table has the following records:


job_id job_desc min_lvl max_lvl
-------- -------------------------------------------------- ------- -------
1 New Hire - Job not specified 10 10
2 Chief Executive Officer 200 250
3 Business Operations Manager 175 225
4 Chief Financial Officier 175 250
5 Publisher 150 250
6 Managing Editor 140 225
7 Marketing Manager 120 200
8 Public Relations Manager 100 175
9 Acquisitions Manager 75 175
10 Productions Manager 75 165
11 Operations Manager 75 150
12 Editor 25 100
13 Sales Representative 25 100
14 Designer 25 100


Before using CONTAINS we must create a FULLTEXT index in the table, some like this:


CREATE FULLTEXT CATALOG ft AS DEFAULT;
CREATE FULLTEXT INDEX ON Jobs(job_desc)
KEY INDEX PK__jobs__173876EA
GO


PK__jobs__173876EA is the table’s PRIMARY KEY.


Once this is done we can start with examples:

Search exact word

Obtains all records containing the word 'manager' in "job_desc" field.
Note: is not case sensitive.


select * from jobs where CONTAINS( job_desc, 'manager' )

job_id job_desc min_lvl max_lvl
-------- -------------------------------------------------- ------- -------
3 Business Operations Manager 175 225
7 Marketing Manager 120 200
8 Public Relations Manager 100 175
9 Acquisitions Manager 75 175
10 Productions Manager 75 165
11 Operations Manager 75 150


Also you can search for phrases (more than 1 word) and separators are ignored.


select * from jobs where CONTAINS (job_desc, '"hire job"');

job_id job_desc min_lvl max_lvl
-------- -------------------------------------------------- ------- -------
1 New Hire - Job not specified 10 10


Search by approximation with several options

Finds the records containing any word beginning with 'pub' or 'edit'.


select * from jobs where CONTAINS(job_desc, '"pub*" OR "edit*"');

job_id job_desc min_lvl max_lvl
-------- -------------------------------------------------- ------- -------
5 Publisher 150 250
6 Managing Editor 140 225
8 Public Relations Manager 100 175
12 Editor 25 100


CONTAINS supports AND, AND NOT, OR.

Search a word near another one

Finds records that contain a word beginning with 'pub' and that word is near 'manager'.


select * from jobs where CONTAINS (job_desc, '"pub*" NEAR manager')

job_id job_desc min_lvl max_lvl
-------- -------------------------------------------------- ------- -------
8 Public Relations Manager 100 175


CONTAINS can also search for synonyms and derived words, but this requires using a thesaurus.

Happy codding ;)

Extra ball
If you find CONTAINS interested, see also FREETEXT and CONTAINSTABLE.

miércoles, 13 de enero de 2010

3 ways to find SQL Server version with T-SQL

Versión en castellano

In a perfect world all our facilities have the same versions of the applications, database engines and auxiliary libraries, but this world is a utopia, and we sometimes have to find out what version of the database engine is working or what dll version is installed.

In the case of SQL Server I have discovered 3 ways to find the version from T-SQL, with any of the 3 options will get the full version, but in most cases only need the "major release". For example if you want to execute features of the new versions, or ones that are obsolete in the latest version but still work in older versions.

Let’s go:

@@VERSION

Is a system variable, so is useful for any version of SQL Server, but from the 2005 version returns a nvarchar instead of varchar from previous versions.


declare @version char(2)

select @version = substring(@@version, charindex('- ', @@version) + 2, charindex('.',@@version) - charindex('- ', @@version)-2)


SERVERPROPERTY(‘ProductVersion’)

This option is only available after SQL Server 2000 and returns a value of type sqlvariant, so you have to convert it to varchar or nvarchar to get the main version.


declare @productversion varchar(50)

declare @version char(2)

select @productversion = CAST(SERVERPROPERTY('productversion') AS VARCHAR(50))

set @version = SUBSTRING(@productversion,1,CHARINDEX('.',@productversion)-1)



xp_msver ‘ProductVersion’

Perhaps the most complicated, because it returns a result set like this:


Index Name Internal_Value Character_Value
------ -------------------------------- -------------- ----------------------
2 ProductVersion 589824 9.00.1406.00

(1 row(s) affected)


But you can also get the version with something like this:

CREATE TABLE #VersionTable(
[Index] int,
Name varchar(30),
Internal_Value int,
Character_Value varchar(50))
GO

INSERT INTO #VersionTable
EXEC master..xp_msver 'ProductVersion'

DECLARE @Version char(2)

SELECT @Version = (
SELECT SUBSTRING(Character_Value,1,CHARINDEX('.', Character_Value)-1)
FROM #VersionTable)

drop table #VersionTable


With any of the 3 options we get the following in @Version:

‘7’ -> SQL Server 7
‘8’ -> SQL Server 2000 y MSDE
‘9’ -> SQL Server 2005
‘10’ -> SQL Server 2008

Happy coding ;)

martes, 12 de enero de 2010

Getting SQL Servers instances and databases’ names with .NET (VB.NET and C#)

Versión en castellano

This is an example of a form that show SQL Server data sources (some like the “Add Connection” in Visual Studio).

Create a new Windows Forms project, set name “MyConnections”, and create a new form frmConnections. In this form add 2 comboboxes: cmbServers and cmbDataBases (yes, I’m working a lot the names :P).

Add this code to the form:

VB.NET


Imports System.Data
Imports System.Data.Sql
Imports System.Data.SqlClient

Public Class frmConnections
Private servers As SqlDataSourceEnumerator
Private tableServers As DataTable
Private server As String


Public Sub New()
InitializeComponent()
servers = SqlDataSourceEnumerator.Instance
tableServers = New DataTable()
End Sub

End Class


C#

using System;
using System.Windows.Forms;
using System.Data;
using System.Data.Sql;
using System.Collections.Generic;

namespace MyConnections
{
public partial class frmConnections: Form
{
SqlDataSourceEnumerator servers;
System.Data.DataTable tableServers;
String server;

public frmConnections ()
{
InitializeComponent();
servers = SqlDataSourceEnumerator.Instance;
tableServers = new DataTable();
}
}
}

To get a list of available instances of SQL Server use the SqlDataSourceEnumerator.Instance object, namely the getDataSource() method that returns a DataTable with the characteristics of instances: server name, instance name, if it is part of a cluster and its version ("8.00.x" for SQL Server 2000 and "9.00.x" for SQL Server 2005).

Now, the cmbServers click event:

VB.NET

Private Sub cmbServers_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmbServers.Click


' Check if datatable is empty
If tableServers.Rows.Count = 0 Then

' Get a datatable with info about SQL Server 2000 and 2005 instances
tableServers = servers.GetDataSources()

' List that will be combobox’s datasource
Dim listServers As List(Of String) = New List(Of String)

' For each element in the datatable add a new element in the list
For Each rowServer As DataRow In tableServers.Rows

' SQL Server instance could have instace name or only server name,
' check this for show the name
If String.IsNullOrEmpty(rowServer ("InstanceName").ToString()) Then
listServers.Add(rowServer ("ServerName").ToString())
Else
listServers.Add(rowServer ("ServerName") & "\\" & rowServer ("InstanceName"))
End If

Next

'Set servers list to combobox’s datasource
Me.cmbServers.DataSource = listServers
End If

End Sub


C#

private void cmbServers_Click(object sender, EventArgs e)
{

// Check if datatable is empty
if (tableServers.Rows.Count == 0)
{
// Get a datatable with info about SQL Server 2000 and 2005 instances
tableServers = servers.GetDataSources();


// List that will be combobox’s datasource
List listServers = new List();


// For each element in the datatable add a new element in the list
foreach (DataRow rowServer in tableServers.Rows)
{

// Server instance could have instace name or only server name,
// check this for show the name
if (String.IsNullOrEmpty(rowServidor["InstanceName"].ToString()))
listServers.Add(rowServer ["ServerName"].ToString());
else
listServers.Add(rowServer["ServerName"] + "\\" + rowServer["InstanceName"]);
}

// Set servers list to combobox’s datasource
this.cmbServers.DataSource = listServers;
}
}



For the combobox with the databases list we must execute a query againts sys.databases view:

VB.NET

Private Sub cmbDataBases_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmbDataBases.Click

Dim listDataBases As List(Of String) = New List(Of String)
Dim connectString As String
Dim selectSQL As String

' Check if user was selected a server to connect
If Me.cmbServers.Text = "" Then
MsgBox("Must select a server")
Return
End If

server = Me.cmbServers.Text

'Set connection string with selected server and integrated security
connectString = "Data Source=" & server & " ;Integrated Security=True;Initial Catalog=master"


Using con As New SqlConnection(connectString)

' Open connection
con.Open()

'Get databases names in server in a datareader
selectSQL = "select name from sys.databases;"

Dim com As SqlCommand = New SqlCommand(selectSQL, con)
Dim dr As SqlDataReader = com.ExecuteReader()


While (dr.Read())
listDataBases.Add(dr(0).ToString())
End While

'Set databases list as combobox’s datasource
Me.cmbDataBases.DataSource = listaBasesDatos

End Using

End Sub


C#

private void cmbBasesdeDatos_Click(object sender, EventArgs e)
{
string select;
string connectString;
List listDataBases = new List();

// Check if user was selected a server to connect
if (this.cmbServers.Text == "")
{
MessageBox.Show("Must select a server");
Return;
}

this.server = this.cmbServers.Text;

// Set connection string with selected server and integrated security
connectString = "Data Source=" + this.server + ";Integrated Security=True;Initial Catalog=master";

using (System.Data.SqlClient.SqlConnection con = new System.Data.SqlClient.SqlConnection(connectString))
{
// Open connection
con.Open();

//Get databases names in server in a datareader
select = "select name from sys.databases;";

System.Data.SqlClient.SqlCommand com =
new System.Data.SqlClient.SqlCommand(select, con);
System.Data.SqlClient.SqlDataReader dr = com.ExecuteReader();

while (dr.Read())
{
listDataBases.Add(dr[0].ToString());
}

// Set databases list as combobox’s datasource
this.cmbDataBases.DataSource = listDataBases;
}
}


And that’s all, we have a form that show SQL Server instances and theirs databases.


Note: if you don’t use integrated security, must change connection string to add user and password attributes.

Happy codding ;)

lunes, 11 de enero de 2010

Converting rows in columns with T-SQL: PIVOT() and things that will brighten the day :)

Versión en castellano

Sometimes things that seem very complex have a easy solution, but this isn’t the first one that we think (yes, sound like Occam’s razor, but isn’t the same).

Well, the case is that I had a logging table in which some external process inserted the results of passing for different states, something like this:



Process State Timestamp

Process1 10 2008/12/21 12:22:05.320
Process2 10 2008/12/21 12:22:05.358
Process1 11 2008/12/21 12:22:06.210
Process1 12 2008/12/21 12:22:06.588
Process2 11 2008/12/21 12:22:10.100
Process2 12 2008/12/21 12:22:10.048
Process3 10 2008/12/21 14:30:05.358
Process3 11 2008/12/21 14:30:20.052


To monitor processes and identify performance problems at certain times: many processes at once, backups were began, the server runs out of memory, ... I get a query grouping by process and hour.


SELECT process, count(*) as requests, datepart(hh,Timestamp) as hour
FROM MiLOG with (nolock)
GROUP BY process, datepart(hh,Timestamp)
ORDER BY 1,3


With this query we retrieve a result like this:



Process Requests Hour

Process1 10 12
Process1 2 13
Process2 5 12
Process2 3 13
Process3 25 14
Process3 4 15


Of course, this output is a bit complicated to treat because even though they are grouped by the process and the time, when you have lots of data becomes too cumbersome to check them, and just bringing the results to a spreadsheet you can see more clearly. Then I wondered, is there not a way to do it from T-SQL convert rows into columns, with a column for each hour and a single row for each process?

As always the first solution to a complicated query is a sub-query (I don't know why but it's true, someone would have to study it :P) but in this case was not a sub-query, but rather 24 sub-queries, all attacking the same table, and all using the function datepart (hh, TimeStamp). Surely you would get the expected result, but at the expense of a very poor performance.

As this situation is very common, I think that should have an easy way to fix it, so in an initial Google search did not surprise me the first results show the same resolution of sub-queries (for the principle that the first solution in a complicated query is a sub-query: P) and I was doing a huge query with a lot of sub-queries, JOINs to the same table. But on the first page of results I found a much simpler solution: the PIVOT clause (that is new SQL Server 2005, so don't try in earlier versions).

So, with a query like that:


WITH Processes(process, requests, hour)
AS(
SELECT process, count(*) as requests, datepart(hh,Timestamp) as hour
FROM MiLOG with (nolock)
GROUP BY process, datepart(hh,Timestamp)
)
SELECT * FROM Processes PIVOT(SUM(requests) FOR Hour IN ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23])) AS Result
ORDER BY process


we have the expected result:



Process 0 1 2 ... 21 22 23

Process1 0 10 23 4 5 21
Process2 12 11 15 22 17 8
Process3 9 7 2 19 9 13


Note: PIVOT() statement works only with agregate functions (like COUNT, SUM, MAX, AVG, …).


Another interesting thing, is:


WITH Processes( process, requests, hour) AS(...)


that Specifies a temporary named result set, known as a common table expression (CTE).


Happy codding ;)


Extra ball: for this case I also thought of using CUBE, that is other summarizing tool provided by T-SQL.