GDBase core functions

The core functions listed below are available by default.

Simple functions

Simple functions can be used in any expression. They return a result immediately based on their inputs.

String Functions

charindex(X, Y [,Z])

Given 2 input strings X, Y and an integer Z searches from the Z-th character for the string X in string Y. Returns the position where the match occured. Characters are counted from 1. 0 is returned when no match occurs. If Z is omited, search begins from the first character.

difference(X, Y)

Returns the number of different characters between the soundex value of the string arguments.

glob(X, Y)

This function is used to implement the X GLOB Y syntax of GDBase. See also the description of the GLOB operator.

leftstr(X, Y)

Returns the Y leftmost characters of the string X. If the string has a length smaller than or equal to Y or is NULL this function is a no-op.

length(X)

Return the string length of X in characters.

levenshtein(X, Y, [Z])

Return the Levenshtein distance between two strings X and Y. If the third parameter is not null then the distance is normalized.

levenshtein_damerau(X, Y, [Z])

Return the Levenshtein-Damerau distance between two strings X and Y. If the third parameter is not null then the distance is normalized.

like(X, Y [,Z])

This function is used to implement the Y LIKE X [ESCAPE Z] syntax of SQL. If the variant with the ESCAPE clause is to be used, the function should be invoked with three arguments. Otherwise, it should be invoked with two arguments only. See also the description of the LIKE operator.

lower(X)

Returns a copy of the string X with all characters converted to lower case (ASCII characters only).

ltrim(X)

Removes the whitespace at the begining of the string X.

padc(X, Y[ ,Z])

Appends the character Z at the end and at the begining of X until it has a length of Y characters. Tries to add as many characters at the left as at the right. If Z is ommited spaces are appended. When X has length greater than or equal to Y it is a no-op. Additionally, padc(NULL) returns NULL.

padl(X, Y[, Z])

Given an input string X and an integer Y adds the character Z at the begining of X until it has a length of Y characters. If Z is ommited spaces are appended. When X has length greayer than or equal to Y it is a no-op. Additionally, padl(NULL) returns NULL.

padr(X, Y[ ,Z])

Given an input string X and an integer Y adds the character Z at the end of X until it has a length of Y characters. If Z is ommited spaces are appended. When X has length greater than or equal to Y it is a no-op. Additionally, padr(NULL) returns NULL.

proper(X)

Given a string X returns the same string but with the first upper-case letter and rest in lower-case.

qgram(X, Y[, normalize[, q]])

In simple terms a q-gram is a string of length q. This fuction calculates the q-gram distance from string X to string Y. If normalize is absent or null then non-normalized q-gram distance will be computed. Otherwise, the function returns normalized distance between 0 and 1. The optional parameter q specifies the length (i.e. q) of q-grams.

Non-normalized q-gram distance is computed in the following way: for any q-gram let denote the number of its occurrences in the string . Then, the non-normalized q-gram distance from string to string is calculated according to the formula

where the summation is performed over all q-grams .

The normalized q-gram distance is computed as

quote(X)

Returns a string which is the value of its argument suitable for inclusion into another SQL statement. Strings are surrounded by single-quotes with escapes on interior quotes as needed. The function is useful when writing triggers to implement undo/redo functionality.

regexp(regular-expression, text)

Enables using regular expressions to search the text. The result is 1 if a match is found and 0 otherwise. See also the description of the REGEXP operator.

The art of constructing regular expressions is a very broad subject. A detailed treatment of regular expression can be found at the PCRE website.

regexp_group(regular-expression, text, group-num)

Enables capturing groups defined by a regular expression. The first argument is a regular expression, the second one is the text to search and the third one is the number of the group in the first argument. Groups are counted from left to right according to the order of occurence of the left group delimiters '('. If group-num is 0 then the whole matched text is returned. See also the description of the REGEXP operator.

Example 43.25. Extracting telephone numbers.

table 'TAB1':
    id  number
    1   'phone 500111222'
    2   '+48500333444'
    3   '600112233'
    4   '0501555666'

# this command extracts the last 6 digits from phone numbers if they are preceeded
# by 500 or 501. Note that any digits precceding 500 or 501 are not taken into account.
sql TAB2:
    SELECT *, REGEXP_GROUP('.*((500)|(501))(\d{6}).*', number, 4) AS extr FROM TAB1
print TAB2

# this command extracts complete 9-digit phone numbers starting with 500 or 501
sql TAB3:
    SELECT *, REGEXP_GROUP('.*((((500)|(501))(\d{6}))).*', number, 1) AS extr FROM TAB1
print TAB3
    

Output:

id | number          | extr   |
+--+-----------------+--------+--
 1 | phone 500111222 | 111222 |
 2 | +48500333444    | 333444 |
 3 | 600112233       |   None |
 4 | 0501555666      | 555666 |

id | number          | extr      |
+--+-----------------+-----------+--
 1 | phone 500111222 | 500111222 |
 2 | +48500333444    | 500333444 |
 3 | 600112233       |      None |
 4 | 0501555666      | 501555666 |
    
regexp_replace(X, Y, Z)

Function similar to replace(X, Y, Z). The difference is that here Y is a regular expression. See also the description of the REGEXP operator.

Example 43.26. Removing non-digits from telephone numbers.

table 'TAB1':
    id  number
    1   '500123456'
    2   '500-123-123'
    3   '0500123789'
    4   '600111222'
    5   'tel: 600123321'
    6   '+48500111222'
    7   '0226942500'
    8   '226531010 (work)'
    9   '(0-22)6208518'

#remove all non-digits from the number field
sql TAB2:
    SELECT *, REGEXP_REPLACE(number, '\D', '') AS digits_only FROM TAB1
print TAB2
    

Output:

id | number           | digits_only |
+--+------------------+-------------+--
 1 | 500123456        | 500123456   |
 2 | 500-123-123      | 500123123   |
 3 | 0500123789       | 0500123789  |
 4 | 600111222        | 600111222   |
 5 | tel: 600123321   | 600123321   |
 6 | +48500111222     | 48500111222 |
 7 | 0226942500       | 0226942500  |
 8 | 226531010 (work) | 226531010   |
 9 | (0-22)6208518    | 0226208518  |
    
replace(X, Y, Z)

Replaces the occurrences of Y in X by Z.

replace_chars(X, Y, Z)

In the string X replaces every occurence of every character in Y with the corresponding character in Z.

replicate(X, Y)

Returns a string that constists of the string X contatenated Y times.

reverse(X)

Given a string X returns the same string but with the characters in reverse order.

rightstr(X, Y)

Returns the Y-th rightmost character of the string X. If X has length <= Y or is NULL this function is a no-op.

rtrim(X)

Removes the whitespace at the end of the string X.

strfilter(X, Y)

Returns the string X with the characters NOT in Y removed.

substr(X, Y, Z)

Returns a substring of the string X that starts at the Y-th character and which is Z characters long. The leftmost character of X has number 1. If Y is negative then the first character of the substring is found by counting from the right instead of left.

trim(X)

Removes the whitespace at the begining and end of the string X.

upper(X)

Returns a copy of the string X converted to all upper-case letters (ASCII characters only).

words_count(X[, Y])

Divides the string X into substrings using the separator Y and returns the number of substrings.

The separateor Y is specified as a regular expression and defaults to '\s+'.

words_intersect(X, Y[, Z])

Divides the strings X and Y into substrings using the separator Z and returns the number of substrings which occur in both X and Y.

The separateor Z is specified as a regular expression and defaults to '\s+'.

words_sort(X[, Y[, Z]])

Divides the string X into substrings using the separator Y, sorts the substrings and returns a string obtained by concatenating the substrings in the order they have been sorted using Z as the joining string.

The separateor Y is specified as a regular expression and defaults to '\s+'. The joining string ZZ> dfaults to an empty string.

Numeric Functions

abs(X)

Returns the absolute value of X.

acos(X)

Returns the arc-cosine of X (in the range [0, π] radians).

acosh(X)

Returns the inverse hyperbolic cosine of X (in the range [0, ∞]).

asin(X)

Returns the arc-sine of X (in the range [-π/2, +π/2] radians).

asinh(X)

Returns the inverse hyperbolic sine of X.

atan(X)

Returns the angle whose tangent is X (in the range (-π/2, +π/2) radians).

atanh(X)

Returns the inverse hyperbolic tangent of X.

atan2(Y, X)
or
atn2(Y, X)

Returns the angle whose tangent is Y/X (in the full angular range [-π, +π] radians).

ceil(X)

Returns the smallest integer value not less than X.

cos(X)

Returns the cosine of X. If X is large the value returned might not be meaningful, but the function reports no error.

cosh(X)

Returns the hyperbolic cosine of X.

cot(X)

Returns the cotangent of X. If X is large the value returned might not be meaningful, but the function reports no error.

coth(X)

Returns the hyperbolic cotangent of X.

degrees(X)

Converts Radians into Degrees.

exp(X)

Returns the exponential of X.

floor(X)

Returns the largest integer value not greater than X.

log(X)

Returns the natural logarithm of X. If X <= 0 then NULL is returned.

log10(X)

Returns the base-10 logarithm of X. If X <= 0 then NULL is returned.

pi()

Returns the (approximate) value of Pi (π).

power(X, Y)

Returns X raised to the power Y.

probit(X)

Computes the probit function for the argument X.

radians(X)

Converts Degrees into Radians.

rand([X])

Returns a pseudo-random floating-point value v in the range 0 <= v < 1.0. X is an optional seed parameter for the pseudo-random number generation algorithm.

random([X])

Returns a pseudo-random integer between -9223372036854775808 and +9223372036854775807. X is an optional seed parameter for the pseudo-random number generation algorithm.

round(X[, Y])

Rounds off the number X to Y digits to the right of the decimal point. If Y is omitted, 0 is assumed.

sign(X)

Returns +1,0 or -1 when the argument is respectively positive, 0 or negative. When the argument is NULL the result is also NULL.

sin(X)

Returns the sine of X. If X is large the value returned might not be meaningful, but the function reports no error.

sinh(X)

Returns the hyperbolic sine of X.

square(X)

Returns the square of X.

sqrt(X)

Returns the real square root of X. If X < 0 then NULL is returned.

tan(X)

Returns the tangent of X. If X is large the value returned might not be meaningful, but the function reports no error.

tanh(X)

Returns the hyperbolic tangent of X.

Date And Time Functions

Date and time functions can be used for manipulating date, time and datetime data objects The functions take a date, time or datetime arguments, which are appropriately formatted time strings.

Basic date and time functions

These functions can be used for extracting and representing the information stored in date-time objects The timestring argument may be followed by modifiers, which are also passed as string arguments. The modifiers can be used to further manipulate the timestring.

The strftime() function is the most versatile one in this group.

date(timestring, modifier, modifier, ...)

Returns the date in the 'YYYY-MM-DD' format.

datetime(timestring, modifier, modifier, ...)

Returns the time in the 'YYYY-MM-DD HH:MM:SS' format.

day(timestring, modifier, modifier, ...)

Returns an integer (0-31) equal to the day of the month.

hour(timestring, modifier, modifier, ...)

Returns an integer (0-23) equal to the hour of the day.

julianday(timestring, modifier,  modifier, ...)

Returns a float value representing the date in Julian Date format, i.e. equal to the number of days since noon in Greenwich on November 24, 4714 B.C. Julian Date is the preferred internal representation of dates in computers.

minute(timestring, modifier, modifier,  ...)

Returns an integer number (0-59) equal to the minute of the hour.

month(timestring, modifier, modifier,  ...)

Returns an integer (0-12) equal to the month of the year.

second(timestring, modifier, modifier, ...)

Returns an inetger (0-59) equal to the number of seconds of the minute.

strftime(format, timestring, modifier, modifier, ...)

Returns a TEXT type value with the date formatted according to the format string specified as the first argument.

Notice that strftime() can be used to express other functions from this group:

date(...) -> strftime('%Y-%m-%d', ...) 
time(...) -> strftime('%H:%M:%S', ...) 
datetime(...) -> strftime('%Y-%m-%d %H:%M:%S', ...) 
...
                                

However, while strftime() always returns a string result, other function may return integers or floats.

The following time format strings are supported:

Table 43.3. Format symbols for strftime

SymbolDescriptionFormatting
%d
day of the month 'DD'
%f
seconds of the minute including the fractional part 'SS.SSS'
%H
hour of the day 'HH'
%j
day of the year (001-366) 'NNN'
%m
month of the year 'MM'
%M
minute of the hour 'MM'
%s
number of seconds since 1970-01-01  
%S
second of the minute 'SS'
%w
day of the week (0-6), sunday is '0'. 'N'
%W
week of the year (00-59), the first week is '00' 'NN'
%Y
the year (0000-9999) 'YYYY'
strptime(format, string)

Extract the datetime information from string according to the specification provided in format and returns a datetime object. The format string is interpreted according to the following rules:

Table 43.4. Format characters for strptime

SymbolDescriptionExamples
y
year
yyyy -> 1996, yy > 96
M
month
MM > 01
d
day
dd -> 01, dd > 23
H
hour
HH -> 03, HH > 21
m
minute
mm -> 07, mm > 58
s
second
ss -> 08 ss > 56
S
milisecond
SSS -> 023

Additionally, format may contain other characters, such as - or . or /, etc.

If either format or string is incorrect, NULL is returned.

Note

The result is always a datetime object, even if format or string contain only information about the date (or only about time).
time(timestring, modifier, modifier, ...)

Returns the time in the HH:mm:ss format.

year(timestring, modifier, modifier,...)

Returns the year.

Modifiers

Modifiers can be used to alter the interpretation of the date passed to one of the functions above. Modifiers take effect in the order they are provided, from left to right. The following modifiers are available:

Table 43.5. Date and time modifiers

ModifierDescription
NNN cmonths
changes the timestamp by NNN months; this is slightly different from months, see this note.
NNN cyears
changes the timestamp by NNN years; this is slightly different from years, see this note.
NNN days
changes the timestamp by NNN days
NNN hours
changes the timestamp by NNN hours
NNN minutes
change the timestamp by NNN minutes
NNN months
change the timestamp by NNN months; this is slightly different from cmonths, see this note.
NNN years
chages the date by NNN years; this is slightly different from cyears; see this note.
end of day
advances the timestamp to the end of day, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd 23:59:59.999.
end of hour
advances the timestamp to the end of hour, i.e. changes YYY-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:59:59.999
end of minute
advances the timestamp to the end of minute, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:mm:59.999
end of month
advances the timestamp to the end of month, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd 23:59:59.999, where dd is the last day of the month MM.
end of second
advances the timestamp to the end of second, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:mm:ss.999
end of year
advances the timestamp to the end of year, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-12-31 23:59:59.999.
first day
moves the timestamp back to the first day of the month, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-01 HH:mm:ss.SSS
first hour
moves the timestamp back to the first hour of the day, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd 00:mm:ss.SSS.
first minute
moves the timestamp back to the first minute of the hour i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:00:ss.SSS.
first month
moves the timestamp back to the first month of the year, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-01-dd HH:mm:ss.SSS.
first second
moves the timestamp back to the first second of the minute, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:mm:00.SSS.
last day
advances the timestamp to the last day of the month, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:mm:ss.SSS, where dd is the last day of the month.
last hour
advances the timestamp to the last hour of the day, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd 23:mm:ss.SSS.
last minute
advances the timestamp to the last minute of the hour, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:59:ss.SSS.
last month
advances the timestamp to the last month of the year, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-12-dd HH:mm:ss.SSS.
last second
advances the timestamp to the last second of the minute, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:mm:59.SSS.
localtime
converts the timestamp to local time, assuming the timestamp uses Coordinated Universal Time (UTC).
start of day
moves the timestamp back to the beginning of the day, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd 00:00:00.000.
start of hour
moves the timestamp back to the beginning of the hour, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:00:00.000.
start of minute
moves the timestamp back to the beginning of the minute, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:mm:00.000.
start of month
moves the timestamp back to the beginning of the month, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-01 00:00:00.000.
start of second
moves the timestamp back to the beginning of the second, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-MM-dd HH:mm:ss.000.
start of year
moves the timestamp back to the beginning of the year, i.e. changes yyyy-MM-dd HH:mm:ss.SSS to yyyy-01-01 00:00:00.000.
unixepoch
works only if immediately preceeded by a timestring in the 'DDDDDDDDDD' format; causes the timestring to be interpreted as the number of seconds since 1970-01-01 00:00:00.000 and not as a Julian day number, which is the default interpretation.
utc
converts the timestamp to Coordinated Universal Time, assuming the timestamp uses the local time zone.
weekday N
advances the timestamp to the next weekday indicated by N, where 0 is Sunday, 1 is Monday and so forth.

Note

The difference between months and cmonths is in what happens when the day of month in the initial date does not exist in the modified date, after adding or subtracting the number of months specified by NNN. In such cases months increases the resulting date by the difference of days between the original day of month and the last day of the new month, while cmonths replaces the new day of month with the last day ofthe resulting month. This is best illustrated by executing the following code:

SELECT date('2001-01-31', '+1 cmonths')
                            

returns 2001-02-28, while

SELECT date('2001-01-31', '+1 months')
                            

returns 2001-03-03. Similarly

SELECT date('2001-03-31', '-1 cmonths')
                            

returns 2001-02-28, while

SELECT date('2001-03-31', '-1 months')
                            

returns 2001-03-03.

The difference between years and cyears is exhibited only in the case when the date to be modified is February 29 of a leap year and the modified year is not a leap year. In this case years converts the resulting date to March 1, while cyears to February 28.

Note

The NNN preceeding cmonths, cyears, days, hours, minutes, months, years can be used with or without a sign. Moreover NNN can be replaced by an integer variable: see this example

The modifiers cmonths, days, hours etc. can also be used without the final 's', i.e. cmonth, day, hour, etc.

Example 43.27. Compute the current date.

SELECT date('now');
                        

Example 43.28.  Compute the last day of the current month.

SELECT date('now','start of month','+1month','-1 day');
                        

Example 43.29.  Compute the date and time given a unix timestamp 1092941466.

SELECT datetime(1092941466, 'unixepoch');
                        

Example 43.30.  Compute the date and time given a unix timestamp 1092941466, and compensate for your local timezone.

SELECT datetime(1092941466, 'unixepoch','localtime');
                        

Example 43.31.  Compute the current unix timestamp.

SELECT strftime('%s','now');
                        

Example 43.32.  Compute the number of days since the battle of Hastings.

SELECT julianday('now') - julianday('1066-10-14');
                        

Example 43.33.  Compute the number of seconds between two dates:

SELECT julianday('now')*86400 -	julianday('2004-01-01 02:34:56')*86400;
                        

Example 43.34.  Compute the date of the first Tuesday in October (January + 9) for the current year.

SELECT date('now','start of year','+9 months','weekday 2');
                        

Example 43.35.  Change the current date forward by the number of days specified in the nDays column of the tab1 table.

SELECT datetime('now', nDays || 'days') FROM tab1
                        
Functions for calculating the difference between timestamps

The functions below can be used to calculate the difference between two timestamps. They require two date or time arguments, i.e. objects of date, time or datetime type or appropriately formatted time strings.

datediff(date1, date2)

Returns the difference between the two dates in days. Is is equivalent to julianday(date1) - julianday(date2). This function works with DATE, DATETIME and TIMESTAMP. If DATETIME or TIMESTAMP are used as argument, only the date part is used in the calculation of the result.

daydiff(date1, date2)

This function equivalent to datediff(date1, date2)

monthdiff(date1, date2)

This function is similar to datediff(date1, date2) but returns the difference between the two dates in months. Days are not taken into account when calculating the result.

yeardiff(date1, date2)

This function is similar to datediff(date1, date2) but returns the difference between the two dates in years. Days and months are not taken into account when calculating the result.

timediff(time1, time2)

Return the difference between the two datetimes in hours and seconds. Works only with the TIME type.

Example 43.36.  Compute the number of days between two dates:

SELECT datediff('2004-01-01 02:34:56', '2004-05-17');
                        
Time Strings

A time string can be in any of the following formats:

  1. yyyy-MM-dd

  2. yyyy-MM-dd HH:mm

  3. yyyy-MM-dd HH:mm:ss

  4. yyyy-MM-dd HH:mm:ss.SSS

  5. yyyy-MM-ddTHH:mm

  6. yyyy-MM-ddTHH:mm:ss

  7. yyyy-MM-ddTHH:mm:ss.SSS

  8. HH:mm

  9. HH:mm:ss

  10. HH:mm:ss.SSS

  11. now

  12. NNNN.NNNN

In formats 5 through 7, the "T" is a literal character separating the date and the time, as required by the ISO-8601 standard. Formats 8 through 10 that specify only a time assuming the date is 2000-01-01. In format 11 the string 'now', is converted into the current date and time. Universal Coordinated Time (UTC) is used. Format 12 is the Julian day number expressed as a floating point value.

Other Functions

coalesce(X, Y, ...)

Returns a copy of the first non-NULL argument. If all arguments are NULL then NULL is returned. There must be at least 2 arguments.

ifnull(X, Y)

Returns X if X is not NULL, returns Y if X is NULL, returns NULL if both are NULL. With two arguments this function behaves in the same way as coalesce().

last_insert_rowid()

Returns the ROWID of the last row inserted from this connection to the database.

nullif(X, Y)

Returns the first argument if the arguments are different, otherwise returns NULL.

if(X, Y [,Z])

This function is equivalent to the condition If X is true then Y [ else Z ].

gdbase_version()

Returns the version number of gdbase.

Aggregate Functions

Aggregate functions may only be used in a SELECT statement. If there is no GROUP BY clause, aggregate functions compute their result across all rows of the result set. If a GROUP BY clause is present, agregate functions calculate the results for separately for each group.

In any aggregate function that takes a single argument, that argument can be preceeded by the DISTINCT keyword. In such cases duplicate elements are filtered before being passed into the aggregate function. For example, the function count(DISTINCT X) will return the number of distinct values of the column X instead of the total number of non-null values in this column.

avg(X)

Returns the average value of all not NULL X-es within a group. String values that can not be interpreted as numbers are interpreted as 0. The result of avg() is always a floating point value even if all inputs are integers.

count(X), count(*)

The first form returns the number of times that X is not NULL in the group. The second form (with no argument) returns the total number of rows in the group.

group_concat(X[, Y[, Z]])

The result is a string which is the concatenation of all not NULL values of X. The parameter Y is the separator between the instances of X. A comma (",") is used as the separator if Y is omitted. The optional parameter Z is the maximum buffer size for text data. If the text contains only ASCII characters, the buffer size is equal to text length. If Z <=0 then there are no limits, if Z is omitted then the buffer size is set to 10000.

group_min(X, Y)

Returns the value of the first argument which corresponds to the smallest value of the second argument. See the example for group_max below.

group_max(X, Y)

Returns the value of the first argument which corresponds to the largest value of the second argument. The example below illustrates the use of group_max in a SELECT statement.

Example 43.37. Finding the most recent transactions of customers using group_max.

table 'TAB1':
    customer  day  transaction_id
    1         1    124345324
    2         1    3252352
    3         5    325235
    3         2    4325435
    3         3    3333333
    3         1    111111111
    4         4    4333333
    4         5    3435555
    4         1    9999999

sql TAB2:
    SELECT customer, group_max(transaction_id, day) FROM TAB1 GROUP BY customer

print TAB2
    

Output:

customer | group_max(transaction_id, day) |
+--------+--------------------------------+--
       1 |                      124345324 |
       2 |                        3252352 |
       3 |                         325235 |
       4 |                        3435555 |
    
lower_quartile(X[, W])

Returns the lower_quartile value of the column X. W is an optional parameter denoting a variable with weights. If W is present, a weighted lower quartile of the variable X is returned.

max(X)

Returns the maximum value of all in the group. The usual sort order is used to determine the maximum.

median(X[, W])

Returns the median value of the column X. W is an optional parameter denoting a variable with weights. If W is present, a weighted median of the variable X is returned.

min(X)

Returns the minimum non-NULL value of all values in the group. The usual sort order is used to determine the minimum. NULL is returned only if all values in the group are NULL.

mode(X[, W])

Returns the most frequent value of column X. W is an optional parameter denoting a variable with weights. If W is present, the function returns the value for which the sum of weights is the largest.

quantile(X, N[, W])

Returns the N-th quantile of the sample X. The second parameter must be in the interval <0, 100>. The optional parameter W is a variable with weights for each observation. For N = 25 / 50 / 75 this function is equal to lower_quartile / median / upper_quartile functions.

stdev(X)

Returns the standard deviation value of column X.

upper_quartile(X[, W])

Returns the upper quartile value of the column X. W is an optional parameter denoting a variable with weights. If W is present, a weighted upper quartile of the variable X is returned.

variance(X)

Returns the variance of column X.

sum(X)
,
total(X)

These two functions return the numeric sum of all non-NULL values in the group. If there are no non-NULL input rows then sum() returns NULL but total() returns 0.0. Normally, NULL is not a useful result for the sum of no rows but the SQL standard requires it and most other SQL database engines implement sum() that way, so GDBase does it in the same way to be compatible. The non-standard total() function is provided as a convenient way to work around this design problem in the SQL language.

The result of total() is always a floating point value. The result of sum() is an integer value if all non-NULL inputs are integers. If any input to sum() is neither an integer nor NULL then sum() returns a floating point value which might be an approximation to the true sum.

The function sum() will throw an integer overflow exception if all inputs are integers or NULL and an integer overflow occurs at any point during the computation; total() never throws an exception.

Window functions

Window functions can be used in expressions in a SELECT statement.

Syntax:  For clauses with first_value, last_value, nth_value and lag:

function-name(argument-list) [RESPECT NULLS | IGNORE NULLS] OVER ( [ PARTITION BY partition-expr-list ] [ ORDER BY sort-expr-list ] [ ROWS window-frame ])       
	

For clauses with the remaining window functions and aggregate functions used as window functions:

function-name(argument-list) OVER ( [ PARTITION BY partition-expr-list ] [ ORDER BY sort-expr-list ] [ window-frame ])                

partition-expr-list:
expression[, expression]*

window-frame:
BETWEEN UNBOUNDED PRECEDING AND [ UNBOUNDED FOLLOWING | CURRENT ROW ] |
UNBOUNDED PRECEDING
	

Window functions are similar to aggregate functions in that they operate on multiple rows or groups of rows within the results set. However, the rows on which a window function operates is determined by the PARTITION BY clause instead of the GROUP BY clause. Additionally, window functions are order sensitive and the order in which rows are processed is defined by the ORDER BY clause.

The PARTITION BY clause determines the partitions, i.e. groups of rows on the basis of which the function will be calculated. Every unique combination of the values of expressions following PARTITION BY defines one group. If the PARTITION BY clause is ommitted, all rows are assumed to belong to one partition.

The ORDER BY clause determines the order in which window functions process the rowes in each partition. If the ORDER BY clause is omitted, ordering by constant is assumed. See the ORDER BY section for the details how to construct the ORDER BY clause.

In current implemenentation the PARTITION BY and ORDER BY clauses determine the ordering of rows in the output table: the primary ordering is determined by the expressions following PARTITION BY (i.e. the rows from the same window are adjacent), and the rows in each partition are then ordered as specified by the ORDER BY clause. However, it is not guarnteed that this behavior will not change in future GDBase versions, hence it is recommended to use an explicit ORDER BY clause for the whole SELECT statement in cases when a particular ordering of output table rows is important.

The ROWS clause specifies the so called window, i.e. a contiguous subset of the partition over which the window function is to operate. Currently only the following types of windows can be specifed:

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
The window starts at the first row of the partition and ends at the last row of the partition. This is the default option.
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
The window starts at the first row of the partition and ends at the current row. This variant can be used only with aggregate functions.
ROWS UNBOUNDED PRECEDING
This is the same as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

The IGNORE NULLS clause can be used with the first_value, last_value, nth_value and lag functions. If present, the clause causes the window function to ignore any rows for which expression evaluates to null. The RESPECT NULLS clause has the oppsoite effect: every row is considered in the calculation of the value returned by the function. Using these clauses with other window functions or aggregate functions results in an error.

See the Order of execution in SELECT statements section for informartion about the execution of window functions relative to other parts of SELECT statements.

Using multiple window functions in a single SELECT statement

Use of more than one window function in a single SELECT statememnt is allowed under certain restrictions on the expressions used in OVER() clauses.

Assume that the longest (i.e. with the largest number of expressions) OVER() clause is of the form

OVER ( PARTITION BY expr_1, expr_2, ... , expr_k ORDER BY expr_k+1, expr_k+2, ... , expr_m )                       
                   

Then any other OVER() clause used in the same SELECT statement must be of the form

OVER ( PARTITION BY expr_1, expr_2, ... , expr_l ORDER BY expr_l+1, expr_l+2, ... , expr_n )                      
                   

with n not greater than m.

As an example, the following usage of window functions is possible:

SELECT ... 
rank() OVER ( PARTITION BY a, b, c ORDER BY d, e ), 
rank() OVER ( PARTITION BY a, b, c, d ORDER BY e ), 
rank() OVER ( PARTITION BY a, b ORDER BY c, d ) 
...
                    

while any of the following will result in an error:

SELECT ... 
rank() OVER ( PARTITION BY a ORDER BY b ), 
rank() OVER ( PARTITION BY b ORDER BY a ) 
...
                    

SELECT ... 
rank() OVER ( PARTITION BY a, b ORDER BY c ), 
rank() OVER ( PARTITION BY b ORDER BY c ) 
...
                    

The following window functions are currently supported in GDBase:

first_value(expression)

Returns the first value of expression in the window. This function is order sensitive and can be used with IGNORE NULLS | RESPECT NULLS clauses. With IGNORE NULLS the first non-null value of expression will be returned.

last_value(expression)

Returns the last value of expression in the window. This function is order sensitive and can be used with IGNORE NULLS | RESPECT NULLS clauses. With IGNORE NULLS the last non-null value of expression will be returned.

nth_value(expression, N)

Returns the N-th value of expression in the current partition. If the partition contains less than N rows Null is returned. This function is order sensitive and can be used with IGNORE NULLS | RESPECT NULLS clauses. With IGNORE NULLS the N-th non-null value of expression will be returned.

lag(expression, K[, default_value])

This function returns the value of expression calculated for the K-th row before the current row in the current partition. If row_number() is less than K+1 the function returns default_value (or null if default_value is not provided). K must be a non-negative integer number. This function is order sensitive and can be used with IGNORE NULLS | RESPECT NULLS clauses. With IGNORE NULLS the function returns the value of expression for the K-th row for which expression is non-null, counting back from the current one.

row_number()

Returns the number of current row in the current partition, according to the ORDER BY clause. The first row has number 1.

rank()

The rank of row is calculated as the number of current row minus the number of preceeding rows with the same combination of the ORDER BY clause expression values.

dense_rank()

The value of dense_rank is calculated in the following way: the first row in a partition has dense_rank equal to 1, for any other row the value of dense_rank is the same as for the preceeding row if it had the same combination of the ORDER BY clause expression values and it is increased by 1 otherwise.

percent_rank()

This is a normalized variant of the rank function. It is calculated as

where is the total number of rows in the current partition.

cume_dist()

This function provides the cumulative distribution function for the different values of the ORDER BY clause expression combinations. It is calculated as

where is the number of rows with the same or lower rank as the current row and is the total number of rows in the partition.

ntile(K)

This function can be used for easy calculation of tertiles, quartiles, deciles, etc, i.e. whenever it is necessary to divide the partition into K subsets (called buckets) of approximately the same size. The ntile function returns the bucket number of the current row with respect to the ordering specified in the ORDER BY clause.

If is the number of rows in a partition and , then the first buckets contain rows, and the remaining buckets contain rows.

aggregate functions as window functions

Any aggregate function can be used as a window function. While theoretically the majority of aggregate functions is not order sensitive, the particular ordering of rows may influence the rounding errors in floating point calculations, hence the ORDER BY clause may effect the returned results. Some aggregate functions, such as group_concat are explicitly order sensitive as well.

The following aggregate functions can be used with the DISTINCT keyword in OVER clauses: sum, total, avg, count, stdev, variance, as well as unweighted quantile functions.

Example 43.38.  Ranking window functions

table 'customer_transactions_1':
    customer_id  amount rebate
    1            200    20
    2            300    20
    2            300    20
    1            200    15
    1            50     0
    2            150    10
    1            200    15
    2            300    25
    1            50     5
    1            200    20
    1            50     0

sql t:
    SELECT customer_id, amount, rebate,
    rank() OVER ( PARTITION BY customer_id ORDER BY amount, rebate ) AS rank,
    dense_rank() OVER ( PARTITION BY customer_id ORDER BY amount, rebate ) AS dense_rank,
    percent_rank() OVER ( PARTITION BY customer_id ORDER BY amount, rebate ) AS percent_rank,
    cume_dist() OVER ( PARTITION BY customer_id ORDER BY amount, rebate ) AS cume_dist
    FROM customer_transactions_1

print t
    

Output:

customer_id | amount | rebate | rank | dense_rank | percent_rank       | cume_dist           | 
+-----------+--------+--------+------+------------+--------------------+---------------------+--
          1 |     50 |      0 |    1 |          1 |                0.0 |  0.2857142857142857 | 
          1 |     50 |      0 |    1 |          1 |                0.0 |  0.2857142857142857 | 
          1 |     50 |      5 |    3 |          2 | 0.3333333333333333 | 0.42857142857142855 | 
          1 |    200 |     15 |    4 |          3 |                0.5 |  0.7142857142857143 | 
          1 |    200 |     15 |    4 |          3 |                0.5 |  0.7142857142857143 | 
          1 |    200 |     20 |    6 |          4 | 0.8333333333333334 |                 1.0 | 
          1 |    200 |     20 |    6 |          4 | 0.8333333333333334 |                 1.0 | 
          2 |    150 |     10 |    1 |          1 |                0.0 |                0.25 | 
          2 |    300 |     20 |    2 |          2 | 0.3333333333333333 |                0.75 | 
          2 |    300 |     20 |    2 |          2 | 0.3333333333333333 |                0.75 | 
          2 |    300 |     25 |    4 |          3 |                1.0 |                 1.0 | 
 
    

Example 43.39.  Using the aggregate function avg to calculate the average transaction value for each customer.

table 'customer_transactions_2':
    customer_id transaction_id transaction_amount
    1           1              100
    2           1              100
    1           2              500
    3           1              300
    1           3              50
    2           2              100
    2           3              100
    3           2              150
    1           4              1500
    2           4              50
    3           3              400
    3           4              1200
    1           5              100
    2           5              1500

sql t:
    SELECT customer_id, transaction_id, transaction_amount,
    avg(transaction_amount) OVER ( PARTITION BY customer_id ) AS average_transaction
    FROM customer_transactions_2

print t
    

Output:

customer_id | transaction_id | transaction_amount | average_transaction | 
+-----------+----------------+--------------------+---------------------+--
          1 |              1 |                100 |               450.0 | 
          1 |              2 |                500 |               450.0 | 
          1 |              3 |                 50 |               450.0 | 
          1 |              4 |               1500 |               450.0 | 
          1 |              5 |                100 |               450.0 | 
          2 |              1 |                100 |               370.0 | 
          2 |              2 |                100 |               370.0 | 
          2 |              3 |                100 |               370.0 | 
          2 |              4 |                 50 |               370.0 | 
          2 |              5 |               1500 |               370.0 | 
          3 |              1 |                300 |               512.5 | 
          3 |              2 |                150 |               512.5 | 
          3 |              3 |                400 |               512.5 | 
          3 |              4 |               1200 |               512.5 |