Semantic-mediawiki.org

Cyc

2014-01-02

← Older revision

Revision as of 19:55, 2 January 2014

Line 1:

Line 1:

−

'''Cyc'''
– projekt z dziedziny
[[
Sztuczna inteligencja
|
sztucznej inteligencji
]] (
AI
),
mający na celu stworzenie kompletnej bazy wiedzy, tak zwanego
''
zdrowego rozsądku
''
. Ma to stanowić podstawę
,
która umożliwi programom AI
,
przeprowadzanie rozumowania podobnego do ludzkiego
.

+

'''Cyc'''
is an
[[
List of notable artificial intelligence projects
|
artificial intelligence project
]]
that attempts to assemble a comprehensive [[ontology
(
computer science
)
|ontology]] and [[knowledge base]] of everyday [[common sense knowledge]]
,
with the goal of enabling [[artificial intelligence|AI]] applications to perform human-like reasoning.

+

+

The project was started in 1984 by [[Douglas Lenat]] at [[Microelectronics and Computer Technology Corporation|MCC]] and is developed by the Cycorp company.

+

Parts of the project are released as
''
'OpenCyc'
'',
which provides an API
,
[http://sw.opencyc.org/ RDF endpoint], and [[data dump]] under an [[open source]] license
.

{{Infobox info

{{Infobox info

|Nazwa=Cyc

|Nazwa=Cyc

Line 9:

Line 12:

|Strona internetowa=[http://www.cyc.com www.cyc.com]

|Strona internetowa=[http://www.cyc.com www.cyc.com]

}}

}}

−

Projekt został zapoczątkowany w
[[
1984
]]
roku
,
przez dr
[[
Doug Lenat|Douga Lenata
]].
Mimo iż nazwa
"Cyc" (
czyt.
''
sajk
'')
pochodzi od angielskiego słowa
"
encyclopedia
" ([[
encyklopedia
]]
)
, to
baza wiedzy tworzona w ramach tego projektu zawiera dużo więcej informacji o opisywanych w niej obiektach
,
niż tylko proste definicje
.
Struktura bazy wiedzy pozwala na automatyczne przeprowadzenie rozumowania i wyciąganie wniosków
.
Wstępnie projekt był planowany na 10 lat
,
jednak po dziś dzień nadal jest aktywnie rozwijany i trudno powiedzieć czy zakończy się sukcesem
.
Obecnie
Cyc
jest własnością korporacji
''
Cycorp
''.
Jednym z pierwszych praktycznych zastosowań systemu jest
[[
CycSecure
]],
który bada bezpieczeństwo rzeczywistej
[[
Sieć komputerowa
|
sieci komputerowej
]]
przeprowadzając symulacje ataków na tę sieć
.

+

==Overview==

+

The project was started in 1984 as part of
[[
Microelectronics and Computer Technology Corporation
]]
. The objective was to codify
,
in machine-usable form, millions of pieces of knowledge that compose human common sense. CycL presented a proprietary knowledge representation schema that utilized first-order relationships.

+

The Cyc Project was spun off into Cycorp, Inc. in
[[
Austin, Texas
]]
in 1994
.

+

+

The name
"Cyc" (
from "encyclopedia", pronounced {{IPA|[saɪk]}} like
''
syke
'')
is a registered trademark owned by Cycorp. The original knowledge base is proprietary, but a smaller version of the knowledge base, intended to establish a common vocabulary for automatic reasoning, was released as OpenCyc under an [[open source]] (Apache) license. More recently, Cyc has been made available to AI researchers under a research-purposes license as [[ResearchCyc]].

+

+

Typical pieces of knowledge represented in the database are
"
Every tree is a plant
"
and "Plants die eventually". When asked whether trees die, the inference engine can draw the obvious conclusion and answer the question correctly. The Knowledge Base
(
KB) contains over one million human-defined assertions, rules or common sense ideas. These are formulated in the language
[[
CycL
]],
which is based on [[predicate calculus]] and has a [[syntax]] similar
to
that of the [[Lisp programming language]].

+

+

Much of the current work on the Cyc project continues to be [[knowledge engineering]]
,
representing facts about the world by hand, and implementing efficient inference mechanisms on that knowledge
.
Increasingly, however, work at Cycorp involves giving the Cyc system the ability to communicate with end users in [[natural language]], and to assist with the [[knowledge formation]] process via [[machine learning]]
.

+

+

Like many companies
,
Cycorp has ambitions to use the Cyc [http://www.cyc.com/cyc/cycrandd/areasofrandd_dir/cycrandd/nlu natural language understanding tools] to parse the entire internet to extract structured data.

+

+

In 2008, Cyc resources were mapped to many [[Wikipedia]] articles, potentially easing connecting with other open datasets like [[DBpedia]] and [[Freebase (database)|Freebase]]
.

+

+

==Knowledge base==

+

The concept names in
Cyc
are known as
''
constants
''.
Constants start with an optional "#$" and are case-sensitive. There are constants for:

+

* Individual items known as ''individuals'', such as #$BillClinton or #$France.

+

* ''Collections'', such as #$Tree-ThePlant (containing all trees) or #$EquivalenceRelation (containing all
[[
equivalence relation
]]
s). A member of a collection is called an ''instance'' of that collection.

+

* ''Truth Functions'' which can be applied to one or more other concepts and return either true or false. For example #$siblings is the sibling relationship
,
true if the two arguments are siblings. By convention, truth function constants start with a lower-case letter. Truth functions may be broken down into logical connectives (such as #$and, #$or, #$not, #$implies), quantifiers (#$forAll, #$thereExists, etc.) and
[[
Predicate (logic)
|
predicate
]]
s.

+

* ''Functions'', which produce new terms from given ones. For example, #$FruitFn, when provided with an argument describing a type (or collection) of plants, will return the collection of its fruits. By convention, function constants start with an upper-case letter and end with the string "Fn".

+

+

The most important predicates are #$isa and #$genls. The first one describes that one item is an [[Instance (computer science)|instance]] of some collection, the second one that one collection is a subcollection of another one. Facts about concepts are asserted using certain CycL ''sentences''. Predicates are written before their arguments, in parentheses:

+

(#$isa #$BillClinton #$UnitedStatesPresident)

+

"Bill Clinton belongs to the collection of U.S. presidents" and

+

(#$genls #$Tree-ThePlant #$Plant)

+

"All trees are plants".

+

(#$capitalCity #$France #$Paris)

+

"Paris is the capital of France."

+

+

Sentences can also contain variables, strings starting with "?". These sentences are called "rules". One important rule asserted about the #$isa predicate reads

+

(#$implies

+

(#$and

+

(#$isa ?OBJ ?SUBSET)

+

(#$genls ?SUBSET ?SUPERSET))

+

(#$isa ?OBJ ?SUPERSET))

+

with the interpretation "if OBJ is an instance of the collection [[subset|SUBSET]] and SUBSET is a subcollection of [[superset|SUPERSET]], then OBJ is an instance of the collection SUPERSET". Another typical example is

+

(#$relationAllExists #$biologicalMother #$ChordataPhylum #$FemaleAnimal)

+

which means that for every instance of the collection #$ChordataPhylum (i.e. for every [[chordate]]), there exists a female animal (instance of #$FemaleAnimal) which is its mother (described by the predicate #$biologicalMother).

+

+

The [[knowledge base]] is divided into ''microtheories'' (Mt), collections of concepts and facts typically pertaining to one particular realm of knowledge. Unlike the knowledge base as a whole, each microtheory is required to be free from contradictions. Each microtheory has a name which is a regular constant; microtheory constants contain the string "Mt" by convention. An example is #$MathMt, the microtheory containing mathematical knowledge. The microtheories can inherit from each other and are organized in a hierarchy:

+

one specialization of #$MathMt is #$GeometryGMt, the microtheory about geometry
.

−

Baza danych – tzw
.
''baza wiedzy''
(
ang. Knowledge base – KB) – jest napisana w języku CycL
,
który trochę przypomina język
[[
Lisp
]]
. Programiści CycL nazywani są z angielska "cyclists". Podstawowymi elementami składowymi bazy danych są tzw. ''stałe'' (ang. ''constants''). Można je podzielić na kilka podstawowych grup: elementy indywidualne – koncepty (np. #$Poland
,
#$HomerSimpson), kolekcje (np. #$Tree-ThePlant – jako kolekcja wszystkich drzew), operatory logiczne (np. #$
and
, #$implies
)
, kwantyfikatory (np. #$forAll), predykaty (np. #$isa, #$genls) i funkcje (np. #$FruitFn). Wszystkie ''stałe'' są połączone z innymi stałymi przez predykaty i należą do tzw. ''mikro-teorii'', które muszą być wewnętrznie niesprzeczne. Każda mikro-teoria jest identyfikowana przez stałą
.

+

==Inference engine==

+

An [[inference engine]] is a computer program that tries to derive answers from a knowledge base
.

+

The Cyc inference engine performs general [[logical deduction]]
(
including [[modus ponens]]
, [[
modus tollens
]],
[[universal quantification]]
and
[[existential quantification]]
).

−

Cyc obecnie jest dostępny za darmo w okrojonej wersji nazwanej OpenCyc. Dodatkowo dostępna jest również wersja ResearchCyc, która jest udostępniana naukowcom i instytucjom badawczym, również za darmo.

+

==Releases==

−

===
Linki zewnętrzne
===

+

===
OpenCyc
===

−

* [http://cyc
.
com Oficjalna strona Cycorp
,
Inc
.
]

+

The latest version of OpenCyc, 4
.
0
,
was released in June 2012
. OpenCyc
4
.
0 includes the entire Cyc ontology containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other; however, these are mainly taxonomic assertions, not the complex rules available in Cyc
.
The knowledge base contains 239,000 concepts and 2,093,000 facts and can be browsed on the OpenCyc website
.

−

* [http://opencyc.org Strona projektu
OpenCyc
] – do pobrania wersje dla [[Linux]] i [[Microsoft Windows|Windows]]

+

−

* [http://video
.
google
.
com/videoplay?docid=-7704388615049492068 Wykład na temat Cyca] – prowadzony przez dr Douga Lenata (google video)
.

+

+

The first version of OpenCyc was released in spring 2002 and contained only 6,000 concepts and 60,000 facts. The knowledge base is released under the [[Apache License]]. [[Cycorp]] has stated its intention to release OpenCyc under parallel, unrestricted licences to meet the needs of its users. The [[CycL]] and [[SubL]] interpreter (the program that allows you to browse and edit the database as well as to draw inferences) is released free of charge, but only as a binary, without source code. It is available for [[Linux]] and [[Microsoft Windows]]. The open source Texai project has released the [[Resource Description Framework|RDF]]-compatible content extracted from OpenCyc.

+

[[Category:Common Lisp software]]

+

[[Category:Ontology (information science)]]

+

[[Category:Knowledge bases]]

+

[[Category:Artificial intelligence]]

+

[[Category:Open data]]

[[Kategoria:Sztuczna inteligencja]]

[[Kategoria:Sztuczna inteligencja]]

[[Kategoria:Bazy danych]]

[[Kategoria:Bazy danych]]