Class SolrIndexService

java.lang.Object
org.imixs.workflow.engine.solr.SolrIndexService

@DeclareRoles({"org.imixs.ACCESSLEVEL.NOACCESS","org.imixs.ACCESSLEVEL.READERACCESS","org.imixs.ACCESSLEVEL.AUTHORACCESS","org.imixs.ACCESSLEVEL.EDITORACCESS","org.imixs.ACCESSLEVEL.MANAGERACCESS"}) @RolesAllowed({"org.imixs.ACCESSLEVEL.NOACCESS","org.imixs.ACCESSLEVEL.READERACCESS","org.imixs.ACCESSLEVEL.AUTHORACCESS","org.imixs.ACCESSLEVEL.EDITORACCESS","org.imixs.ACCESSLEVEL.MANAGERACCESS"}) public class SolrIndexService extends Object
The SolrIndexService provides methods to add, update and remove imixs documents from a solr index.

The service validates the solr index schema and updates the schema it changed.

The SolrIndexService is used by the SolrUpdateService and the SolrSearchService which are extending and implementing the Imix-Index concept.

The SolrIndexService can be configured by the following properties:

  • solr.api - api endpoint for the solr index
  • solr.core - name of the solr index core (default 'imixs-workflow')
  • solr.configset - an optinal solr configset (default '_default')
  • solr.user - userid for optional basic authentication
  • solr.password - password for optional basic authentication
Version:
1.0
Author:
rsoika
  • Field Details

  • Constructor Details

    • SolrIndexService

      public SolrIndexService()
  • Method Details

    • init

      @PostConstruct public void init()
      Create a rest client instance
    • setup

      public void setup(@Observes SetupEvent setupEvent) throws RestAPIException
      This method verifies the schema of the Solr core. If field definitions have change, a schema update is posted to the Solr rest API.

      The method assumes that a core is already created with a manageable schema.

      Parameters:
      setupEvent -
      Throws:
      RestAPIException
    • updateSchema

      public void updateSchema(String schema) throws RestAPIException
      Updates the schema definition of an existing Solr core.

      The schema definition is build by the method builUpdateSchema(). The updateSchema adds or replaces field definitions depending on the fieldList definitions provided by the Imixs SchemaService. See the method builUpdateSchema() for details.

      The method asumes that a core already exits. Otherwise an exception is thrown.

      Parameters:
      schema - - existing schema defintion
      Throws:
      RestAPIException
    • indexDocuments

      public void indexDocuments(List<ItemCollection> documents) throws RestAPIException
      This method adds a collection of documents to the Lucene Solr index. The documents are added immediately to the index. Calling this method within a running transaction leads to a uncommitted reads in the index. For transaction control, it is recommended to use instead the the method solrUpdateService.updateDocuments() which takes care of uncommitted reads.

      This method is used by the JobHandlerRebuildIndex only.

      Parameters:
      documents - of ItemCollections to be indexed
      Throws:
      RestAPIException
    • indexDocument

      public void indexDocument(ItemCollection document) throws RestAPIException
      This method adds a single document to the Lucene Solr index. The document is added immediately to the index. Calling this method within a running transaction leads to a uncommitted reads in the index. For transaction control, it is recommended to use instead the the method solrUpdateService.updateDocuments() which takes care of uncommitted reads.
      Parameters:
      documents - of ItemCollections to be indexed
      Throws:
      RestAPIException
    • removeDocuments

      public void removeDocuments(List<String> documentIDs) throws RestAPIException
      This method removes a collection of documents from the Lucene Solr index.
      Parameters:
      documents - of collection of UniqueIDs to be removed from the index
      Throws:
      RestAPIException
    • removeDocument

      public void removeDocument(String id) throws RestAPIException
      This method removes a single document from the Lucene Solr index.
      Parameters:
      document - - UniqueID of the document to be removed from the index
      Throws:
      RestAPIException
    • rebuildIndex

      public void rebuildIndex()
      This method forces an update of the full text index.
    • query

      public String query(String searchTerm, int pageSize, int pageIndex, SortOrder sortOrder, DefaultOperator defaultOperator, boolean loadStubs) throws QueryException
      This method post a search query and returns the result.

      The method will return the documents containing all stored or DocValues fields. Only if the param 'loadStubs' is false, then only the field '$uniqueid' will be returnded by the method. The caller is responsible to load the full document from DocumentService.

      Because fieldnames must not contain $ symbols we need to replace those field names used in a query.

      Parameters:
      searchterm -
      Returns:
      Throws:
      QueryException
    • adaptSolrFieldName

      public String adaptSolrFieldName(String itemName)
      This method adapts an Solr field name to the corresponding Imixs Item name. Because Solr does not accept $ char at the beginning of an field we need to replace starting _ with $ if the item is part of the Imixs Index Schema.
      Parameters:
      itemName -
      Returns:
      adapted Imixs item name
    • adaptImixsItemName

      public String adaptImixsItemName(String itemName)
      This method adapts an Imixs item name to the corresponding Solr field name. Because Solr does not accept $ char at the beginning of an field we need to replace starting $ with _ if the item is part of the Imixs Index Schema.
      Parameters:
      itemName -
      Returns:
      adapted Solr field name
    • buildUpdateSchema

      protected String buildUpdateSchema(String oldSchema)
      This method builds a JSON structure to be used to update an existing Solr schema. The method adds or replaces field definitions into a solr update schema.

      The param oldSchema contains the current schema definition of the core.

      In Solr there a two field types defining if the value of a field is stored and returned by a

      {"add-field":{name=field1, type=text_general, stored=true, docValues=true}}

      For both cases the values are stored in the lucene index and returned by a query.

      Stored fields (stored=true) are row orientated. That means that like in a sql table the values are stored based on the ID of the document.

      In difference the docValues are stored column orientated (forward index). The values are ordered based on the search term. For features like sorting, grouping or faceting, docValues increase the performance in general. So it may look like docValues are the better choice. But one important different is how the values are stored. In case of a stored field with multi-values, the values are exactly stored in the same order as they were indexed. DocValues instead are sorted and reordered. So this will falsify the result of a document returned by a query.

      In Imixs-Workflow we use the stored attribute to return parts of a document at query time. We call this a document-stub which contains only a subset of fields. Later we load the full document from the SQL database. As stored fields in our workflow application are also often used for sorting we combine both attributes. In case of a non-stored field we set also docValues=false to avoid unnecessary storing of fields.

      Returns:
    • buildAddDoc

      protected String buildAddDoc(List<ItemCollection> documents)
      This method returns a XML structure to add new documents into the solr index.
      Returns:
      xml content to update documents
    • stripControlCodes

      protected String stripControlCodes(String s)
      This helper method is to strip control codes and extended characters from a string. We can not put those chars into the XML request send to solr.

      Background:

      In ASCII, the control codes have decimal codes 0 through to 31 and 127. On an ASCII based system, if the control codes are stripped, the resultant string would have all of its characters above 32 and not 127

      Parameters:
      s -
      include -
      Returns:
    • stripCDATA

      protected String stripCDATA(String s)
      This helper method strips CDATA blocks from a string. We can not post embedded CDATA in an alredy existing CDATA when we post the xml to solr.

      Parameters:
      s -
      Returns:
    • flushEventLogByCount

      public boolean flushEventLogByCount(int count)
      This method flushes a given count of eventLogEntries. The method return true if no more eventLogEntries exist.
      Parameters:
      count - the max size of a eventLog engries to remove.
      Returns:
      true if the cache was totally flushed.
    • flushEventLog

      public boolean flushEventLog(int junkSize)
      Flush the EventLog cache. This method is called by the LuceneSerachService only.

      The method flushes the cache in smaller blocks of the given junkSize. to avoid a heap size problem. The default flush size is 16. The eventLog cache is tracked by the flag 'dirtyIndex'.

      issue #439 - The method returns false if the event log contains more entries as defined by the given JunkSize. In this case the caller should recall the method which runs always in a new transaction. The goal of this mechanism is to reduce the event log even in cases the outer transaction breaks.

      Returns:
      true if the the complete event log was flushed. If false the method must be recalled.
      See Also:
      • LuceneSearchService