chg
← Older revision
Revision as of 11:33, 11 March 2017
(4 intermediate revisions by the same user not shown)
Line 1:
Line 1:
−
{{Note|This page is work in progress.|important}}
{{InfoBox
{{InfoBox
−
|Description=
Fulltext
search support
+
|Name=Full-text search
+
|Description=
Full-text
search support
for properties which data types use strings of characters or text to store their database tables . e.g. [[Help:Type Text|"Text"]], [[Help:Type Page|"Page"]] or [[Help:Type URL|"URL"]].
|Keyword=queries;fulltext;full-text;full-text search
|Keyword=queries;fulltext;full-text;full-text search
}}
}}
+
[[Semantic MediaWiki 2.5.0]] adds an experimental support for accessing the full-text capabilities of the relational databases (SQL back-end) for properties which data types use strings of characters or text to store their database tables . e.g. [[Help:Type Text|"Text"]], [[Help:Type Page|"Page"]] or [[Help:Type URL|"URL"]]. These datatypes use either <code>CHAR</code>, <code>VARCHAR</code>, or <code>TEXT</code> to store their data in the database tables.
−
[[
Semantic MediaWiki 2.5.0
]]
adds an
experimental
support (not
enabled
by default)
for
accessing
the
full-text capabilities of the SQL back-end
. Support was added for MySQL[[CiteRef::gh:smw:1481]] and SQLite[[CiteRef::gh:smw:1801]] while
Postgres is currently not supported
[[CiteRef::gh:smw:1956]][[CiteRef::postgres:ft:vector]].
+
{{Note|
−
+
* This feature is not
[[
Help:$smwgEnabledFulltextSearch|enabled by default
]]
since this feature is still considered
experimental
. In may be
enabled for the
wiki with [[Help:$smwgEnabledFulltextSearch|configuration parameter <code>$smwgEnabledFulltextSearch</code>]]
.
−
The search is only supported by the
<code>
SQLStore
</code>
(
the <code>SPARQLStore</code>
requires
the native support of full-text search capabilities by the triple-store
)
.
+
*
Support was added for MySQL[[CiteRef::gh:smw:1481]] and SQLite[[CiteRef::gh:smw:1801]] while
PostgreSQL
[[CiteRef::gh:smw:1956]][[CiteRef::postgres:ft:vector]]
is currently not supported
.
−
+
* Only [[Help:$smwgDefaultStore|
<code>
SMWSQLStore3
</code>
]] is supported since
the <code>SPARQLStore</code>
would require
the native support of full-text search capabilities by the triple-store.
−
==Requirements==
+
}}
−
* Semantic MediaWiki 2.5+
+
== Requirements ==
−
* MySQL 5.5+
/
MariaDB 10.0.5+[[CiteRef::gh:smw:1481]]
+
*
[[
Semantic MediaWiki 2.5
.0]]
+
+
* MySQL 5.5+
or
MariaDB 10.0.5+[[CiteRef::gh:smw:1481]]
* SQLite 3.8+[[CiteRef::gh:smw:1801]]
* SQLite 3.8+[[CiteRef::gh:smw:1801]]
* PHP 5.5+
* PHP 5.5+
−
* <code>
SQLStore
</code>
+
*
[[Help:$smwgDefaultStore|
<code>
SMWSQLStore3
</code>
]]
== Features and limitations ==
== Features and limitations ==
+
* The <code>FT_SEARCH</code> table aggregates search content for datatypes storing their data as <code>BLOB</code> and <code>URI</code> values against an index search is being executed
+
* Supported operations rely on the relational backend database ([https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html MySQL], [https://mariadb.com/kb/en/mariadb/fulltext-index-overview/ MariaDB] and [https://sqlite.org/fts3.html SQLite])
+
* For MySQL and MariaDB databases, <code>IN BOOLEAN MODE</code> is used as default search mode
+
* Relevance and scores are not used for any sorting purpose, e.g. as in best match
+
* <code>TextSanitizer</code> relies on <code>[https://github.com/onoi/tesa onoi/tesa]</code> as a library to help with the sanitization of text or string elements to provide some text manipulation support as well as a possibility to use language detection if enabled. This library is pre-installed for use by Semantic MediaWiki.
+
* Custom stopwords are only applied by <code>onoi/tesa</code> library in case the language detection is enabled but MySQL/MariaDB provide their own standard list [[CiteRef::mysql:fulltext:stopwords]] which are enabled by default
−
* <code>FT_SEARCH</code> table aggregated search content for <code>BLOB</code> and <code>URI</code> values against an index search is being executed
+
=== Chinese
,
Japanese
, and
Korean support
(
CJK
) ===
−
* Supported operations rely on the backend database ([https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html MySQL]
,
[https://mariadb.com/kb/en/mariadb/fulltext-index-overview/ MariaDB])
−
* For MySQL and MariaDB
,
<code>IN BOOLEAN MODE</code> is used as default search mode
−
* relevance
and
scores are not used for any sorting purpose
(
e.g. as in best match
)
−
* <code>TextSanitizer</code> relies on <code>[https://github.com/onoi/tesa onoi/tesa]</code> to provide some text manipulation support as well as a possibility to use language detection if enabled
−
* Custom stopwords are only applied by <code>onoi/tesa</code> in case the language detection is enabled but MySQL/MariaDB provide their own standard list [[CiteRef::mysql:fulltext:stopwords]] which are enabled by default
−
−
=== CJK support
===
−
* General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces
* General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces
−
* <code>onoi/tesa</code> provides some simple <code>Tokenizer</code>'s
(
which
doesn't
require language detection
) that
will try to provide rudimentary CJK search out-of-the box
(expects
ICU 54+
)
+
*
The
<code>onoi/tesa</code>
library
provides some simple <code>Tokenizer</code>'s which
does not
require language detection
and
will try to provide rudimentary CJK search out-of-the box
. This however requires
ICU 54+
which is still not being used by MediaWiki.
* [http://mroonga.org/docs/characteristic.html Mroonga] is a MySQL storage engine and said to be a CJK-ready fulltext search, column store
* [http://mroonga.org/docs/characteristic.html Mroonga] is a MySQL storage engine and said to be a CJK-ready fulltext search, column store
* MySQL comes with an optional [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html ngram Full-Text Parser] and [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-mecab.html MeCab Full-Text Parser Plugin].
* MySQL comes with an optional [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html ngram Full-Text Parser] and [https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-mecab.html MeCab Full-Text Parser Plugin].
−
*According to https://jira.mariadb.org/browse/MDEV-10267, MariadDB is missing those parser plug-ins
+
* According to
[
https://jira.mariadb.org/browse/MDEV-10267
this issue]
, MariadDB is missing those parser plug-ins
−
−
== Settings and configurations ==
+
== Configuration settings ==
* <code>[[Help:$smwgEnabledFulltextSearch|$smwgEnabledFulltextSearch]]</code> to enable the feature
* <code>[[Help:$smwgEnabledFulltextSearch|$smwgEnabledFulltextSearch]]</code> to enable the feature
* <code>[[Help:$smwgFulltextDeferredUpdate|$smwgFulltextDeferredUpdate]]</code>
* <code>[[Help:$smwgFulltextDeferredUpdate|$smwgFulltextDeferredUpdate]]</code>
Line 44:
Line 43:
* <code>[[Help:$smwgFulltextSearchIndexableDataTypes|$smwgFulltextSearchIndexableDataTypes]]</code>
* <code>[[Help:$smwgFulltextSearchIndexableDataTypes|$smwgFulltextSearchIndexableDataTypes]]</code>
−
Changes to any of the above settings, requires to re-run the
<code>
[[Help:rebuildFulltextSearchTable.php|rebuildFulltextSearchTable.php]]
</code> script.
+
{{Note|
Changes to any of the above settings, requires to re-run the [[Help:rebuildFulltextSearchTable.php|
"
rebuildFulltextSearchTable.php
" maintenance script.
]]
}}
== Manuals and instructions ==
== Manuals and instructions ==
−
+
;
for
users
−
This feature is not enabled by default, please read the following manuals and instructions
for
insights on:
+
* [[
Help:Full-text search
/Indexing|Indexing]] describes some methods on how to manually create and update the index table
−
+
; for system administrtors
−
* [[
Fulltext
/Indexing|Indexing]] describes some methods on how to manually create and update the index table
+
* [[
Help:Full-text search
/Searching|Searching]] contains some examples and descriptions about the available search syntax
−
* [[
Fulltext
/Searching|Searching]] contains some examples and descriptions about the available search syntax
+
; for defelopers
−
* [[
Fulltext
/Technical notes|Technical notes]] has some notes about the technical implementation, fine-tuning, and performance
+
* [[
Help:Full-text search
/Technical notes|Technical notes]] has some notes about the technical implementation, fine-tuning, and performance
== See also ==
== See also ==
−
* [[Case insensitive sortkey support]]
+
*
Help page on
[[
Help:
Case
insensitive sortkey support|case
insensitive sortkey support]]
<div style="display: none">
<div style="display: none">
Line 68:
Line 67:
{{#scite:gh:smw:1956
{{#scite:gh:smw:1956
|type=pullrequest
|type=pullrequest
−
|citation text=[https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/1956 Semantic MediaWiki: GitHub pull request #1956] notes that "... any interested developer who is eager to help with implementing a
Postgres
solution ..."
+
|citation text=[https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/1956 Semantic MediaWiki: GitHub pull request #1956] notes that "... any interested developer who is eager to help with implementing a
PostgreSQL
solution ..."
}}
}}
{{#scite:gh:smw:2122
{{#scite:gh:smw:2122