Why Solr?
It isn’t really feasible to execute blazing fast search queries on very big SQL databases for 2 different reasons. The first reason comes SQL databases favoring lack of radiancy over performance. Basically, you’d need to use JOINs in your SELECT. The second reason is about the nature of data in documents: it’s essentially unstructured plain text so that SELECT would need LIKE. Both joins and likes are performance killers, so this way is a no-go in real-life search engines.
Therefore, most of them propose a way to look at data that is very different from SQL, inverted index(es). This kind of data structure is a glorified dictionary where:
key are individual terms
values are list of documents that match term
Nothing fancy, but this view of data makes for very fast research in very high-volume databases. Note that the term ‘document’ is used very loosely in that it’s should be a field-structured view of the initial document (see below).
Index structure
Though
Solr belongs to the NoSQL database family, it is no schemaless. Schema configuration takes place in a dedicated schema.xml file: individual fields must be defined, and with each its type. Different document types may be different in structure and have few (no?) fields in common. In this case, each document type may be set its own index with its own schema.
Predefined types like strings, integers and dates are available out-of-the-box. Types can be declared searchables (called indexed) and/or stored (returned in queries). For examples, books could (would?) include not only their content, but also author(s), publisher(s), date of publishing, etc.