a NULL means that the value is unknown, whereas an empty string means that someone deliberately entered a "null" string. So my standard sql test "SELECT mycol FROM mytable WHERE mycol IS NOT NULL" produced some additional results due to empty strings.
This is what I do in my code
ISNULL(NULLIF(myfield,''), 'TRUE') != 'TRUE'
More info can be found here
http://decipherinfosys.wordpress.com/2009/03/27/null-vs-empty-zero-length-string/
Posted by decipherinfosys on March 27, 2009
Not many end users understand the difference between an empty
string of zero length (”) vs Null. We have seen that at times, the end
users put in an empty string in a field and there are no checks in the
system to prevent them from being able to get that data into the
database. An empty string of zero length is not the same as a Null
value in SQL Server. A string of zero length is still a
string with no characters – a Null on the other hand can mean more than
just denoting an absence of data. Maybe the data is not applicable or
the data is missing or it is just not present yet. A classic analogy
given in this case is that of a blank CD with nothing on it vs no CD. A
lot of debate is there about the usefulness of Null and the design
patterns but that is not what we want to cover here today. We will
provide some links in the Resources section which cover that topic as
well. What we want to mention in this post today are the differences
between a Null value vs a zero length string and how the behavior is
also different between different RDBMS. We will cover this for a couple
of different data types besides the string data type.
We have blogged about the side effects of using a zero length string for an integer data type column before – here.
Today, we saw a similar issue at a client site – this time with a
datetime data type column. Here is an example (using SQL Server here):
declare @table table (col1 datetime)
insert into @table values (''
);
select * from @table
———————–
1900-01-01 00:00:00.000
The empty string gets converted to the default datetime value of
1900-01-01 00:00:00.000 as shown above. Not really what the end user
was expecting. And why do we get that particular date? Because
datetime is internally stored as two integers (hence the 8 bytes per
datetime value) – the first integer stores the number for computing
dates before or after the base date of 1900-01-01 and the second integer
has the number of clock ticks post midnight with each tick being
1/300th of a second.
So, what could be some other issues that this can result into besides silently corrupting the data?
a) Your SQL code that would otherwise expect to work like the IS NULL or IS NOT NULL checks won’t work anymore,
b) In case you were doing this for a number column, the aggregate
functions like AVG() which otherwise would be ignoring the Null value
will now count the record,
c) The SQL code which uses functions like ISNULL(), COALESCE() etc. will
not function as expected since these fields will not have a Null value.
d) The sorts won’t work as expected since Null and the empty string (and
the subsequent default value that actually gets inserted) are not the
same thing.
e) If you have this column as part of the foreign key, you will get an
error at the time of the insert itself since instead of a Null value,
the code will try to insert another default value in and it will violate
the FK constraint.
f) Any concatenation operations or MAX(), MIN() functions can lead to un-desired results.
As always, there is no substitute for good design and good code. A
good design can put checks in place both at the application level
(validations – either at the client side or application layer) as well
as the DB level (check constraint) to prevent such scenarios from
happening.
Do note that if you enter in an empty string of more than zero
length, in SQL Server, the result would be the same. ANSI_PADDING does
effect the storage but the comparison rules remain the same. You can
read more on that in one of our previous blog posts here.
And if we were to use a varchar or a char column, the behavior is the
same with the difference of course being that it is string of length 1
in the case of a CHAR data-type column.
declare @table table (col1 char(1), col2 varchar(1))
insert into @table values ('', ''
);
select col1, datalength(col1) char_str_length, col2, datalength(col2) varchar_str_length from @table where col1 = ''
col1 char_str_length col2 varchar_str_length
---- --------------- ---- ------------------
1 0
Also, if you are a developer who has worked in both SQL Server and Oracle, then you would know that the behavior in Oracle is a bit different. It treats Null and an empty string of zero length,
the same way. And it treats an empty string of more than a zero length
different than a Null value. This is different than the ANSI (and SQL
Server behavior). So, in Oracle:
a) A zero length variable length string (varchar2 data type) is treated as a NULL.
b) A string of more than zero length is not treated as a NULL.
c) A zero length fixed length string (char data type) is not treated as a NULL since CHAR data types are blank padded strings.
Here is an example:
SQL> select 1 as col1,length(''
) LEN from dual where ''
is null
col1 LEN
———- ———-
1 NULL
SQL> select 1 as col1,length(''
) LEN from dual where to_char(''
) is null
col1 LEN
———- ———-
1 NULL
This shows that an empty string is treated as a null in Oracle and
the default data type of an empty string is varchar2 since if it were
char, then automatically its length would have been one as stated above.
It is always better to check for such issues and in order to follow
the same guidelines across RDBMS, check for empty strings via client
side logic or application layer logic or check constraints at the DB
level and ensure that you are going to put in NULL if that is what your
design intention was rather than having such side effects of the empty
string.
Resources:
- The design of software – good discussion on null vs empty string – here.
- SQL Batman’s post on SQL Server and Null values – here and then the post by Aaron Alton – here. Very good articles.
- Empty strings and default values – here. We covered above the integer, datetime and string fields, this post from Madhivanan covers other data types as well.
- Oracle vs SQL Server discussion on this topic @ Stackoverflow – here and another discussion’s insight into why Oracle’s behavior is not like ANSI standards – here.