Home » Server Options » Text & interMedia » Oracle Text index catsearch order by close match (Oracle 11g )
Oracle Text index catsearch order by close match [message #616116] Thu, 12 June 2014 07:01 Go to next message
ramesh.sannapureddy
Messages: 5
Registered: June 2014
Location: Bangalore
Junior Member
I am using oracle text index of type ctxcat, but I am new oracle text. I am able to query the table and getting the right results. But we want the results in below order.
If the search string contains two strings then order of the results should be,
In top position: Results that have the exact match to the input text
Then: results that have both words
And at the end: results that have at least one of the two words.
Re: Oracle Text index catsearch order by close match [message #616145 is a reply to message #616116] Thu, 12 June 2014 09:25 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
I have provided one method using a CTXCAT index below, followed by another method using a CONTEXT index.

SCOTT@orcl12c> CREATE TABLE test_tab
  2    (test_col  VARCHAR2(60))
  3  /

Table created.

SCOTT@orcl12c> INSERT ALL
  2  INTO test_tab VALUES ('word1 word3 word2')
  3  INTO test_tab VALUES ('word3 word1 word2')
  4  INTO test_tab VALUES ('word1 word2 word3')
  5  INTO test_tab VALUES ('word1 word3')
  6  INTO test_tab VALUES ('word2 word3')
  7  INTO test_tab VALUES ('word1 word2')
  8  INTO test_tab VALUES ('word1')
  9  INTO test_tab VALUES ('word2')
 10  INTO test_tab VALUES ('other words')
 11  SELECT * FROM DUAL
 12  /

9 rows created.

SCOTT@orcl12c> CREATE INDEX test_col_ctxcat
  2  ON test_tab (test_col)
  3  INDEXTYPE IS CTXSYS.CTXCAT
  4  /

Index created.

SCOTT@orcl12c> SELECT test_col
  2  FROM   (SELECT test_col FROM test_tab
  3  	     WHERE  CATSEARCH
  4  		      (test_col,
  5  		       '<query>
  6  			  <textquery lang="ENGLISH" grammar="CONTEXT">
  7  			    <progression>
  8  			      <seq>word1 word2</seq>
  9  			      <seq>word1 AND word2</seq>
 10  			      <seq>word1 OR word2</seq>
 11  			    </progression>
 12  			  </textquery>
 13  			  <score datatype="INTEGER" algorithm="COUNT"/>
 14  			</query>',
 15  			null) > 0)
 16  ORDER  BY DECODE (test_col, 'word1 word2', 0, ROWNUM)
 17  /

TEST_COL
------------------------------------------------------------
word1 word2
word3 word1 word2
word1 word2 word3
word1 word3 word2
word1 word3
word2 word3
word1
word2

8 rows selected.

SCOTT@orcl12c> CREATE INDEX test_col_context
  2  ON test_tab (test_col)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  /

Index created.

SCOTT@orcl12c> SELECT test_col FROM test_tab
  2  WHERE  CONTAINS
  3  	     (test_col,
  4  	      '(word1 word2 * 10 * 10) OR
  5  	       (word1 AND word2) OR
  6  	       (word1 OR word2) * 0.1 * 0.1',
  7  	      1) > 0
  8  ORDER  BY SCORE(1) DESC, UTL_MATCH.EDIT_DISTANCE (test_col, 'word1 word2')
  9  /

TEST_COL
------------------------------------------------------------
word1 word2
word3 word1 word2
word1 word2 word3
word1 word3 word2
word1 word3
word2 word3
word2
word1

8 rows selected.

Re: Oracle Text index catsearch order by close match [message #617960 is a reply to message #616145] Sat, 05 July 2014 09:16 Go to previous messageGo to next message
ramesh.sannapureddy
Messages: 5
Registered: June 2014
Location: Bangalore
Junior Member
Hello Barbara,
Thanks for your quick reply. The solution worked for me. I tried with CATSEARCH.
I am able to sort the results using the order of sequence in progression. But I would like to know the reason that why we do we need decode in order by clause.

Thank you once again.
Re: Oracle Text index catsearch order by close match [message #617961 is a reply to message #617960] Sat, 05 July 2014 11:42 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Without the DECODE, the progressive relaxation query returns the rows that match the first criteria, which is word1 immediately followed by word2 in any order, rows 1, 2, and 3 below, followed by the rows that match the second criteria, word1 and word2, in any order, row 4 below, followed by the rows that match the third criteria, either word1 or word2, rows 5, 6, 7, and 8 below.

SCOTT@orcl12c> SELECT ROWNUM, test_col FROM test_tab
  2  	     WHERE  CATSEARCH
  3  		      (test_col,
  4  		       '<query>
  5  			  <textquery lang="ENGLISH" grammar="CONTEXT">
  6  			    <progression>
  7  			      <seq>word1 word2</seq>
  8  			      <seq>word1 AND word2</seq>
  9  			      <seq>word1 OR word2</seq>
 10  			    </progression>
 11  			  </textquery>
 12  			  <score datatype="INTEGER" algorithm="COUNT"/>
 13  			</query>',
 14  			null) > 0
 15  /

    ROWNUM TEST_COL
---------- ------------------------------------------------------------
         1 word3 word1 word2
         2 word1 word2 word3
         3 word1 word2
         4 word1 word3 word2
         5 word1 word3
         6 word2 word3
         7 word1
         8 word2

8 rows selected.


If you want to have the exact match, row 3 above, in the first position, then you need to use DECODE to identify that match and assign a row number of 0 that will rank before the other row numbers. Note that the nested sub-query that I used before does not appear to be necessary.

SCOTT@orcl12c> SELECT DECODE (test_col, 'word1 word2', 0, ROWNUM) rn, test_col FROM test_tab
  2  	     WHERE  CATSEARCH
  3  		      (test_col,
  4  		       '<query>
  5  			  <textquery lang="ENGLISH" grammar="CONTEXT">
  6  			    <progression>
  7  			      <seq>word1 word2</seq>
  8  			      <seq>word1 AND word2</seq>
  9  			      <seq>word1 OR word2</seq>
 10  			    </progression>
 11  			  </textquery>
 12  			  <score datatype="INTEGER" algorithm="COUNT"/>
 13  			</query>',
 14  			null) > 0
 15  ORDER  BY DECODE (test_col, 'word1 word2', 0, ROWNUM)
 16  /

        RN TEST_COL
---------- ------------------------------------------------------------
         0 word1 word2
         1 word3 word1 word2
         2 word1 word2 word3
         4 word1 word3 word2
         5 word1 word3
         6 word2 word3
         7 word1
         8 word2

8 rows selected.



Re: Oracle Text index catsearch order by close match [message #618042 is a reply to message #617961] Mon, 07 July 2014 02:09 Go to previous messageGo to next message
ramesh.sannapureddy
Messages: 5
Registered: June 2014
Location: Bangalore
Junior Member
Hello Barbara,

Thanks for your explanation. I understood concept.
When I try to add <seq>word1 AND word2</seq> only without other seq's it is not giving any results. Please provide the solution, if we want to check whether the both words exists in any order what would be the condition.
I Tried as <seq>word1 AND word2 WITHIN name</seq>, here name is the node of xml, which is existing in test_col. I am adding the data in test_col in xml format.

One more thing is, After adding the sequences, I have some conditions which has to be applied,
For ex: if the search text present in first name, last name or middle name exists, then results has to be displayed, along with that in the criteria mentioned as country is USA, then data has to be selected only for USA, not for other countries. If I add one more seq then it is applying "or" condition and not the "And" condition. So please provide me the solution for this as well.

Thank you in advance.
Re: Oracle Text index catsearch order by close match [message #618102 is a reply to message #618042] Mon, 07 July 2014 18:21 Go to previous message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Quote:

...When I try to add <seq>word1 AND word2</seq> only without other seq's it is not giving any results....


Then, you are doing something wrong. It works for me, as demonstrated below. You need to start providing a copy and paste of a run of a script for table creation, data insertion, index creation, and query, as I have provided below, so that we can see what you are doing wrong.

SCOTT@orcl12c> CREATE TABLE test_tab (test_col	VARCHAR2(60))
  2  /

Table created.

SCOTT@orcl12c> INSERT ALL
  2  INTO test_tab VALUES ('<name>word1 word3 word2</name><country>USA</country>')
  3  INTO test_tab VALUES ('<name>word3 word1 word2</name><country>USA</country>')
  4  INTO test_tab VALUES ('<name>word1 word2 word3</name><country>USA</country>')
  5  INTO test_tab VALUES ('<name>word1 word3</name><country>USA</country>')
  6  INTO test_tab VALUES ('<name>word2 word3</name><country>USA</country>')
  7  INTO test_tab VALUES ('<name>word1 word2</name><country>USA</country>')
  8  INTO test_tab VALUES ('<name>word1</name><country>USA</country>')
  9  INTO test_tab VALUES ('<name>word2</name><country>USA</country>')
 10  INTO test_tab VALUES ('<name>other words</name><country>USA</country>')
 11  INTO test_tab VALUES ('<name>word2 word1</name><country>USA</country>')
 12  INTO test_tab VALUES ('<name>word1 word3 word2</name><country>INDIA</country>')
 13  INTO test_tab VALUES ('<name>word3 word1 word2</name><country>INDIA</country>')
 14  INTO test_tab VALUES ('<name>word1 word2 word3</name><country>INDIA</country>')
 15  INTO test_tab VALUES ('<name>word1 word3</name><country>INDIA</country>')
 16  INTO test_tab VALUES ('<name>word2 word3</name><country>INDIA</country>')
 17  INTO test_tab VALUES ('<name>word1 word2</name><country>INDIA</country>')
 18  INTO test_tab VALUES ('<name>word1</name><country>INDIA</country>')
 19  INTO test_tab VALUES ('<name>word2</name><country>INDIA</country>')
 20  INTO test_tab VALUES ('<name>other words</name><country>INDIA</country>')
 21  INTO test_tab VALUES ('<name>word2 word1</name><country>INDIA</country>')
 22  SELECT * FROM DUAL
 23  /

20 rows created.

SCOTT@orcl12c> CREATE INDEX test_col_ctxcat ON test_tab (test_col) INDEXTYPE IS CTXSYS.CTXCAT
  2  /

Index created.

SCOTT@orcl12c> SELECT * FROM test_tab
  2  WHERE  CATSEARCH
  3  	      (test_col,
  4  	       '<query>
  5  		  <textquery lang="ENGLISH" grammar="CONTEXT">
  6  		    <progression>
  7  		      <seq>word1 AND word2</seq>
  8  		    </progression>
  9  		  </textquery>
 10  		  <score datatype="INTEGER" algorithm="COUNT"/>
 11  		</query>',
 12  		null) > 0
 13  /

TEST_COL
------------------------------------------------------------
<name>word1 word3 word2</name><country>USA</country>
<name>word3 word1 word2</name><country>USA</country>
<name>word1 word2 word3</name><country>USA</country>
<name>word1 word2</name><country>USA</country>
<name>word2 word1</name><country>USA</country>
<name>word1 word3 word2</name><country>INDIA</country>
<name>word3 word1 word2</name><country>INDIA</country>
<name>word1 word2 word3</name><country>INDIA</country>
<name>word1 word2</name><country>INDIA</country>
<name>word2 word1</name><country>INDIA</country>

10 rows selected.


Quote:

...if we want to check whether the both words exists in any order what would be the condition....


You could use the syntax shown above or the one shown below. Catsearch automatically applies an AND condition between words.

SCOTT@orcl12c> SELECT * FROM test_tab
  2  WHERE  CATSEARCH (test_col, 'word1 word2', null) > 0
  3  /

TEST_COL
------------------------------------------------------------
<name>word1 word3 word2</name><country>USA</country>
<name>word3 word1 word2</name><country>USA</country>
<name>word1 word2 word3</name><country>USA</country>
<name>word1 word2</name><country>USA</country>
<name>word2 word1</name><country>USA</country>
<name>word1 word3 word2</name><country>INDIA</country>
<name>word3 word1 word2</name><country>INDIA</country>
<name>word1 word2 word3</name><country>INDIA</country>
<name>word1 word2</name><country>INDIA</country>
<name>word2 word1</name><country>INDIA</country>

10 rows selected.


Quote:

...I Tried as <seq>word1 AND word2 WITHIN name</seq>, here name is the node of xml, which is existing in test_col. I am adding the data in test_col in xml format.


Catsearch does not support WITHIN, even with query template. You will need to use a CONTEXT index with a section group and sections, in order to search WITHIN specific sections using CONTAINS, as demonstrated below.

SCOTT@orcl12c> CREATE INDEX test_col_context ON test_tab (test_col) INDEXTYPE IS CTXSYS.CONTEXT
  2  PARAMETERS ('SECTION GROUP  CTXSYS.AUTO_SECTION_GROUP')
  3  /

Index created.

SCOTT@orcl12c> SELECT * FROM test_tab
  2  WHERE  CONTAINS
  3  	      (test_col,
  4  	       '(word1 AND word2) WITHIN name') > 0
  5  /

TEST_COL
------------------------------------------------------------
<name>word1 word3 word2</name><country>USA</country>
<name>word3 word1 word2</name><country>USA</country>
<name>word1 word2 word3</name><country>USA</country>
<name>word1 word2</name><country>USA</country>
<name>word2 word1</name><country>USA</country>
<name>word1 word3 word2</name><country>INDIA</country>
<name>word3 word1 word2</name><country>INDIA</country>
<name>word1 word2 word3</name><country>INDIA</country>
<name>word1 word2</name><country>INDIA</country>
<name>word2 word1</name><country>INDIA</country>

10 rows selected.

SCOTT@orcl12c> SELECT * FROM test_tab
  2  WHERE  CONTAINS
  3  	      (test_col,
  4  	       '<query>
  5  		  <textquery lang="ENGLISH" grammar="CONTEXT">
  6  		    <progression>
  7  		      <seq>(word1 AND word2) WITHIN name</seq>
  8  		    </progression>
  9  		  </textquery>
 10  		  <score datatype="INTEGER" algorithm="COUNT"/>
 11  		</query>',
 12  	       1) > 0
 13  /

TEST_COL
------------------------------------------------------------
<name>word1 word3 word2</name><country>USA</country>
<name>word3 word1 word2</name><country>USA</country>
<name>word1 word2 word3</name><country>USA</country>
<name>word1 word2</name><country>USA</country>
<name>word2 word1</name><country>USA</country>
<name>word1 word3 word2</name><country>INDIA</country>
<name>word3 word1 word2</name><country>INDIA</country>
<name>word1 word2 word3</name><country>INDIA</country>
<name>word1 word2</name><country>INDIA</country>
<name>word2 word1</name><country>INDIA</country>

10 rows selected.


Quote:

...After adding the sequences, I have some conditions which has to be applied,
For ex: if the search text present in first name, last name or middle name exists, then results has to be displayed, along with that in the criteria mentioned as country is USA, then data has to be selected only for USA, not for other countries. If I add one more seq then it is applying "or" condition and not the "And" condition. So please provide me the solution for this as well....


The following uses the CONTEXT index with section group from the previous example. The first query lists only what you asked for. The second query uses that criteria as the only sequence in a progressive relaxation. The third query uses that sequence as the first of multiple sequences, so that those rows are returned on top. If you want the country restriction applied to the other sequences, then you will need to add it, as in the first sequence.

SCOTT@orcl12c> SELECT * FROM test_tab
  2  WHERE  CONTAINS
  3  	      (test_col,
  4  	       '(word1 AND word2) WITHIN name AND
  5  		USA WITHIN country') > 0
  6  /

TEST_COL
------------------------------------------------------------
<name>word1 word3 word2</name><country>USA</country>
<name>word3 word1 word2</name><country>USA</country>
<name>word1 word2 word3</name><country>USA</country>
<name>word1 word2</name><country>USA</country>
<name>word2 word1</name><country>USA</country>

5 rows selected.

SCOTT@orcl12c> SELECT * FROM test_tab
  2  WHERE  CONTAINS
  3  	      (test_col,
  4  	       '<query>
  5  		  <textquery lang="ENGLISH" grammar="CONTEXT">
  6  		    <progression>
  7  		      <seq>(word1 AND word2) WITHIN name AND USA WITHIN country</seq>
  8  		    </progression>
  9  		  </textquery>
 10  		  <score datatype="INTEGER" algorithm="COUNT"/>
 11  		</query>',
 12  	       1) > 0
 13  /

TEST_COL
------------------------------------------------------------
<name>word1 word3 word2</name><country>USA</country>
<name>word3 word1 word2</name><country>USA</country>
<name>word1 word2 word3</name><country>USA</country>
<name>word1 word2</name><country>USA</country>
<name>word2 word1</name><country>USA</country>

5 rows selected.

SCOTT@orcl12c> SELECT * FROM test_tab
  2  WHERE  CONTAINS
  3  	      (test_col,
  4  	       '<query>
  5  		  <textquery lang="ENGLISH" grammar="CONTEXT">
  6  		    <progression>
  7  		      <seq>(word1 AND word2) WITHIN name AND USA WITHIN country</seq>
  8  		      <seq>word1 word2</seq>
  9  		      <seq>word1 AND word2</seq>
 10  		      <seq>word1 OR word2</seq>
 11  		    </progression>
 12  		  </textquery>
 13  		  <score datatype="INTEGER" algorithm="COUNT"/>
 14  		</query>',
 15  	       1) > 0
 16  /

TEST_COL
------------------------------------------------------------
<name>word1 word3 word2</name><country>USA</country>
<name>word3 word1 word2</name><country>USA</country>
<name>word1 word2 word3</name><country>USA</country>
<name>word1 word2</name><country>USA</country>
<name>word2 word1</name><country>USA</country>
<name>word3 word1 word2</name><country>INDIA</country>
<name>word1 word2 word3</name><country>INDIA</country>
<name>word1 word2</name><country>INDIA</country>
<name>word1 word3 word2</name><country>INDIA</country>
<name>word2 word1</name><country>INDIA</country>
<name>word1 word3</name><country>USA</country>
<name>word2 word3</name><country>USA</country>
<name>word1</name><country>USA</country>
<name>word2</name><country>USA</country>
<name>word1 word3</name><country>INDIA</country>
<name>word2 word3</name><country>INDIA</country>
<name>word1</name><country>INDIA</country>
<name>word2</name><country>INDIA</country>

18 rows selected.


Previous Topic: Text Index return min of words specified
Next Topic: How to use Oracle text (split from hijacked topic "Find Words in a String")
Goto Forum:
  


Current Time: Thu Mar 28 15:26:00 CDT 2024