2017-03-11

chg

← Older revision

Revision as of 11:33, 11 March 2017

(4 intermediate revisions by the same user not shown)

Line 1:

Line 1:



{{Note|This page is work in progress.|important}}

{{InfoBox

{{InfoBox



|Description=
Fulltext
search support

+

|Name=Full-text search

+

|Description=
Full-text
search support
for properties which data types use strings of characters or text to store their database tables . e.g. [[Help:Type Text|"Text"]], [[Help:Type Page|"Page"]] or [[Help:Type URL|"URL"]].

|Keyword=queries;fulltext;full-text;full-text search

|Keyword=queries;fulltext;full-text;full-text search

}}

}}

+

[[Semantic MediaWiki 2.5.0]] adds an experimental support for accessing the full-text capabilities of the relational databases (SQL back-end) for properties which data types use strings of characters or text to store their database tables . e.g. [[Help:Type Text|"Text"]], [[Help:Type Page|"Page"]] or [[Help:Type URL|"URL"]]. These datatypes use either <code>CHAR</code>, <code>VARCHAR</code>, or <code>TEXT</code> to store their data in the database tables.



[[
Semantic MediaWiki 2.5.0
]]
adds an
experimental
support (not
enabled
by default)
for
accessing
the
full-text capabilities of the SQL back-end
. Support was added for MySQL[[CiteRef::gh:smw:1481]] and SQLite[[CiteRef::gh:smw:1801]] while
Postgres is currently not supported
[[CiteRef::gh:smw:1956]][[CiteRef::postgres:ft:vector]].

+

{{Note|



+

* This feature is not
[[
Help:$smwgEnabledFulltextSearch|enabled by default
]]
since this feature is still considered
experimental
. In may be
enabled for the
wiki with [[Help:$smwgEnabledFulltextSearch|configuration parameter <code>$smwgEnabledFulltextSearch</code>]]
.



The search is only supported by the
<code>
SQLStore
</code>
(
the <code>SPARQLStore</code>
requires
the native support of full-text search capabilities by the triple-store
)
.

+

*
Support was added for MySQL[[CiteRef::gh:smw:1481]] and SQLite[[CiteRef::gh:smw:1801]] while
PostgreSQL
[[CiteRef::gh:smw:1956]][[CiteRef::postgres:ft:vector]]
is currently not supported
.



+

* Only [[Help:$smwgDefaultStore|
<code>
SMWSQLStore3
</code>
]] is supported since
the <code>SPARQLStore</code>
would require
the native support of full-text search capabilities by the triple-store.



==Requirements==

+

}}



* Semantic MediaWiki 2.5+

+

== Requirements ==



* MySQL 5.5+
/
MariaDB 10.0.5+[[CiteRef::gh:smw:1481]]

+

*
[[
Semantic MediaWiki 2.5
.0]]
+

+

* MySQL 5.5+
or
MariaDB 10.0.5+[[CiteRef::gh:smw:1481]]

* SQLite 3.8+[[CiteRef::gh:smw:1801]]

* SQLite 3.8+[[CiteRef::gh:smw:1801]]

* PHP 5.5+

* PHP 5.5+



* <code>
SQLStore
</code>

+

*
[[Help:$smwgDefaultStore|
<code>
SMWSQLStore3
</code>
]]

== Features and limitations ==

== Features and limitations ==

+

* The <code>FT_SEARCH</code> table aggregates search content for datatypes storing their data as <code>BLOB</code> and <code>URI</code> values against an index search is being executed

+

* Supported operations rely on the relational backend database ([https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html MySQL], [https://mariadb.com/kb/en/mariadb/fulltext-index-overview/ MariaDB] and [https://sqlite.org/fts3.html SQLite])

+

* For MySQL and MariaDB databases, <code>IN BOOLEAN MODE</code> is used as default search mode

+

* Relevance and scores are not used for any sorting purpose, e.g. as in best match

+

* <code>TextSanitizer</code> relies on <code>[https://github.com/onoi/tesa onoi/tesa]</code> as a library to help with the sanitization of text or string elements to provide some text manipulation support as well as a possibility to use language detection if enabled. This library is pre-installed for use by Semantic MediaWiki.

+

* Custom stopwords are only applied by <code>onoi/tesa</code> library in case the language detection is enabled but MySQL/MariaDB provide their own standard list [[CiteRef::mysql:fulltext:stopwords]] which are enabled by default



* <code>FT_SEARCH</code> table aggregated search content for <code>BLOB</code> and <code>URI</code> values against an index search is being executed

+

=== Chinese
,
Japanese
, and
Korean support
(
CJK
) ===



* Supported operations rely on the backend database ([https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html MySQL]
,
[https://mariadb.com/kb/en/mariadb/fulltext-index-overview/ MariaDB])



* For MySQL and MariaDB
,
<code>IN BOOLEAN MODE</code> is used as default search mode



* relevance
and
scores are not used for any sorting purpose
(
e.g. as in best match
)



* <code>TextSanitizer</code> relies on <code>[https://github.com/onoi/tesa onoi/tesa]</code> to provide some text manipulation support as well as a possibility to use language detection if enabled



* Custom stopwords are only applied by <code>onoi/tesa</code> in case the language detection is enabled but MySQL/MariaDB provide their own standard list [[CiteRef::mysql:fulltext:stopwords]] which are enabled by default





=== CJK support
===



* General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces

* General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces



* <code>onoi/tesa</code> provides some simple <code>Tokenizer</code>'s
(
which
doesn't
require language detection
) that
will try to provide rudimentary CJK search out-of-the box
(expects
ICU 54+
)

+

*
The
<code>onoi/tesa</code>
library
provides some simple <code>Tokenizer</code>'s which
does not
require language detection
and
will try to provide rudimentary CJK search out-of-the box
. This however requires
ICU 54+
which is still not being used by MediaWiki.

* [http://mroonga.org/docs/characteristic.html Mroonga] is a MySQL storage engine and said to be a CJK-ready fulltext search, column store

* [http://mroonga.org/docs/characteristic.html Mroonga] is a MySQL storage engine and said to be a CJK-ready fulltext search, column store

* MySQL comes with an optional [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html ngram Full-Text Parser] and [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-mecab.html MeCab Full-Text Parser Plugin].

* MySQL comes with an optional [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html ngram Full-Text Parser] and [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-mecab.html MeCab Full-Text Parser Plugin].



*According to https://jira.mariadb.org/browse/MDEV-10267, MariadDB is missing those parser plug-ins

+

* According to
[
https://jira.mariadb.org/browse/MDEV-10267
this issue]
, MariadDB is missing those parser plug-ins





== Settings and configurations ==

+

== Configuration settings ==

* <code>[[Help:$smwgEnabledFulltextSearch|$smwgEnabledFulltextSearch]]</code> to enable the feature

* <code>[[Help:$smwgEnabledFulltextSearch|$smwgEnabledFulltextSearch]]</code> to enable the feature

* <code>[[Help:$smwgFulltextDeferredUpdate|$smwgFulltextDeferredUpdate]]</code>

* <code>[[Help:$smwgFulltextDeferredUpdate|$smwgFulltextDeferredUpdate]]</code>

Line 44:

Line 43:

* <code>[[Help:$smwgFulltextSearchIndexableDataTypes|$smwgFulltextSearchIndexableDataTypes]]</code>

* <code>[[Help:$smwgFulltextSearchIndexableDataTypes|$smwgFulltextSearchIndexableDataTypes]]</code>



Changes to any of the above settings, requires to re-run the
<code>
[[Help:rebuildFulltextSearchTable.php|rebuildFulltextSearchTable.php]]
</code> script.

+

{{Note|
Changes to any of the above settings, requires to re-run the [[Help:rebuildFulltextSearchTable.php|
"
rebuildFulltextSearchTable.php
" maintenance script.
]]
}}

== Manuals and instructions ==

== Manuals and instructions ==



+

;
for
users



This feature is not enabled by default, please read the following manuals and instructions
for
insights on:

+

* [[
Help:Full-text search
/Indexing|Indexing]] describes some methods on how to manually create and update the index table



+

; for system administrtors



* [[
Fulltext
/Indexing|Indexing]] describes some methods on how to manually create and update the index table

+

* [[
Help:Full-text search
/Searching|Searching]] contains some examples and descriptions about the available search syntax



* [[
Fulltext
/Searching|Searching]] contains some examples and descriptions about the available search syntax

+

; for defelopers



* [[
Fulltext
/Technical notes|Technical notes]] has some notes about the technical implementation, fine-tuning, and performance

+

* [[
Help:Full-text search
/Technical notes|Technical notes]] has some notes about the technical implementation, fine-tuning, and performance

== See also ==

== See also ==



* [[Case insensitive sortkey support]]

+

*
Help page on
[[
Help:
Case
insensitive sortkey support|case
insensitive sortkey support]]

<div style="display: none">

<div style="display: none">

Line 68:

Line 67:

{{#scite:gh:smw:1956

{{#scite:gh:smw:1956

|type=pullrequest

|type=pullrequest



|citation text=[https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/1956 Semantic MediaWiki: GitHub pull request #1956] notes that "... any interested developer who is eager to help with implementing a
Postgres
solution ..."

+

|citation text=[https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/1956 Semantic MediaWiki: GitHub pull request #1956] notes that "... any interested developer who is eager to help with implementing a
PostgreSQL
solution ..."

}}

}}

{{#scite:gh:smw:2122

{{#scite:gh:smw:2122

Show more