String functions
String functions operate on string expressions only, and will return an error if used on any other values.
The exception to this rule is toString()
, which also accepts numbers, booleans and temporal values (i.e. DATE
, ZONED TIME`
LOCAL TIME
, ZONED DATETIME
, LOCAL DATETIME
or DURATION
values).
Functions taking a STRING
as input all operate on Unicode characters rather than on a standard char[]
.
For example, the size()
function applied to any Unicode character will return 1
, even if the character does not fit in the 16 bits of one char
.
When |
See also String operators.
left()
left()
returns a STRING
containing the specified number of leftmost characters of the given STRING
.
Syntax:
left(original, length)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
|
An expression that returns a positive |
Considerations:
|
|
|
If |
If |
RETURN left('hello', 3)
left('hello', 3) |
---|
|
Rows: 1 |
ltrim()
ltrim()
returns the original STRING
with leading whitespace removed.
Syntax:
ltrim(original)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
Considerations:
|
RETURN ltrim(' hello')
ltrim(' hello') |
---|
|
Rows: 1 |
normalize()
normalize()
returns the given STRING
normalized using the NFC
Unicode normalization form.
Unicode normalization is a process that transforms different representations of the same string into a standardized form. For more information, see the documentation for Unicode normalization forms. |
The normalize()
function is useful for converting STRING
values into comparable forms.
When comparing two STRING
values, it is their Unicode codepoints that are compared.
In Unicode, a codepoint for a character that looks the same may be represented by two, or more, different codepoints.
For example, the character <
can be represented as \uFE64
(﹤) or \u003C
(<).
To the human eye, the characters may appear identical.
However, if compared, Cypher® will return false as \uFE64
does not equal \u003C
.
Using the normalize()
function, it is possible to
normalize the codepoint \uFE64
to \u003C
, creating a single codepoint representation, allowing them to be successfully compared.
Syntax:
normalize(input)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
Considerations:
|
RETURN normalize('\u212B') = '\u00C5' AS result
result |
---|
|
Rows: 1 |
To check if a STRING
is normalized, use the IS NORMALIZED
operator.
normalize(), with specified normal form
normalize()
returns the given STRING
normalized using the specified normalization form.
The normalization form can be of type NFC
, NFD
, NFKC
or NFKD
.
There are two main types of normalization forms:
-
Canonical equivalence: The
NFC
(default) andNFD
are forms of canonical equivalence. This means that codepoints that represent the same abstract character will be normalized to the same codepoint (and have the same appearance and behavior). TheNFC
form will always give the composed canonical form (in which the combined codes are replaced with a single representation, if possible). The`NFD` form gives the decomposed form (the opposite of the composed form, which converts the combined codepoints into a split form if possible). -
Compatability normalization:
NFKC
andNFKD
are forms of compatibility normalization. All canonically equivalent sequences are compatible, but not all compatible sequences are canonical. This means that a character normalized inNFC
orNFD
should also be normalized inNFKC
andNFKD
. Other characters with only slight differences in appearance should be compatibly equivalent.
For example, the Greek Upsilon with Acute and Hook Symbol ϓ
can be represented by the Unicode codepoint: \u03D3
.
-
Normalized in
NFC
:\u03D3
Greek Upsilon with Acute and Hook Symbol (ϓ) -
Normalized in
NFD
:\u03D2\u0301
Greek Upsilon with Hook Symbol + Combining Acute Accent (ϓ) -
Normalized in
NFKC
:\u038E
Greek Capital Letter Upsilon with Tonos (Ύ) -
Normalized in
NFKD
:\u03A5\u0301
Greek Capital Letter Upsilon + Combining Acute Accent (Ύ)
In the compatibility normalization forms (NFKC
and NFKD
) the character is visibly different as it no longer contains the hook symbol.
Syntax:
normalize(input, normalForm)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
|
A keyword specifying the normal form, can be |
Considerations:
|
RETURN normalize('\uFE64', NFKC) = '\u003C' AS result
result |
---|
|
Rows: 1 |
To check if a STRING
is normalized in a specific Unicode normal form, use the IS NORMALIZED
operator with a specified normalization form.
replace()
replace()
returns a STRING
in which all occurrences of a specified STRING
in the given STRING
have been replaced by another (specified) replacement STRING
.
Syntax:
replace(original, search, replace)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
|
An expression that specifies the |
|
An expression that specifies the replacement |
Considerations:
If any argument is |
If |
RETURN replace("hello", "l", "w")
replace("hello", "l", "w") |
---|
|
Rows: 1 |
reverse()
reverse()
returns a STRING
in which the order of all characters in the given STRING
have been reversed.
Syntax:
reverse(original)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
Considerations:
|
RETURN reverse('anagram')
reverse('anagram') |
---|
|
Rows: 1 |
right()
right()
returns a STRING
containing the specified number of rightmost characters in the given STRING
.
Syntax:
right(original, length)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
|
An expression that returns a positive |
Considerations:
|
|
|
If |
If |
RETURN right('hello', 3)
right('hello', 3) |
---|
|
Rows: 1 |
rtrim()
rtrim()
returns the given STRING
with trailing whitespace removed.
Syntax:
rtrim(original)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
Considerations:
|
RETURN rtrim('hello ')
rtrim('hello ') |
---|
|
Rows: 1 |
split()
split()
returns a LIST<STRING>
resulting from the splitting of the given STRING
around matches of the given delimiter.
Syntax:
split(original, splitDelimiter)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
|
The |
Considerations:
|
|
RETURN split('one,two', ',')
split('one,two', ',') |
---|
|
Rows: 1 |
substring()
substring()
returns a substring of the given STRING
, beginning with a zero-based index start and length.
Syntax:
substring(original, start [, length])
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
|
An expression that returns a positive |
|
An expression that returns a positive |
Considerations:
|
If |
If |
If either |
If |
If |
RETURN substring('hello', 1, 3), substring('hello', 2)
substring('hello', 1, 3) | substring('hello', 2) |
---|---|
|
|
Rows: 1 |
toLower()
toLower()
returns the given STRING
in lowercase.
Syntax:
toLower(original)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
Considerations:
|
RETURN toLower('HELLO')
toLower('HELLO') |
---|
|
Rows: 1 |
toString()
toString()
converts an INTEGER
, FLOAT
, BOOLEAN
, STRING
, POINT
, DURATION
, DATE
, ZONED TIME
, LOCAL TIME
, LOCAL DATETIME
or ZONED DATETIME
value to a STRING
.
Syntax:
toString(expression)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns an |
Considerations:
|
If |
This function will return an error if provided with an expression that is not an |
RETURN
toString(11.5),
toString('already a string'),
toString(true),
toString(date({year: 1984, month: 10, day: 11})) AS dateString,
toString(datetime({year: 1984, month: 10, day: 11, hour: 12, minute: 31, second: 14, millisecond: 341, timezone: 'Europe/Stockholm'})) AS datetimeString,
toString(duration({minutes: 12, seconds: -60})) AS durationString
toString(11.5) | toString('already a string') | toString(true) | dateString | datetimeString | durationString |
---|---|---|---|---|---|
|
|
|
|
|
|
Rows: 1 |
toStringOrNull()
The function toStringOrNull()
converts an INTEGER
, FLOAT
, BOOLEAN
, STRING
, POINT
, DURATION
, DATE
, ZONED TIME
, LOCAL TIME
, LOCAL DATETIME
or ZONED DATETIME
value to a STRING
.
Syntax:
toStringOrNull(expression)
Returns:
|
Arguments:
Name | Description |
---|---|
|
Any expression that returns a value. |
Considerations:
|
If the |
RETURN toStringOrNull(11.5),
toStringOrNull('already a string'),
toStringOrNull(true),
toStringOrNull(date({year: 1984, month: 10, day: 11})) AS dateString,
toStringOrNull(datetime({year: 1984, month: 10, day: 11, hour: 12, minute: 31, second: 14, millisecond: 341, timezone: 'Europe/Stockholm'})) AS datetimeString,
toStringOrNull(duration({minutes: 12, seconds: -60})) AS durationString,
toStringOrNull(['A', 'B', 'C']) AS list
toStringOrNull(11.5) | toStringOrNull('already a string') | toStringOrNull(true) | dateString | datetimeString | durationString | list |
---|---|---|---|---|---|---|
|
|
|
|
|
|
|
Rows: 1 |
toUpper()
toUpper()
returns the given STRING
in uppercase.
Syntax:
toUpper(original)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
Considerations:
|
RETURN toUpper('hello')
toUpper('hello') |
---|
|
Rows: 1 |
trim()
trim()
returns the given STRING
with leading and trailing whitespace removed.
Syntax:
trim(original)
Returns:
|
Arguments:
Name | Description |
---|---|
|
An expression that returns a |
Considerations:
|
RETURN trim(' hello ')
trim(' hello ') |
---|
|
Rows: 1 |