Archive

Tag Archives: Solr

A recent contract I was working on had decided to use Solr to implement full-text search over a product catalogue for an e-commerce platform. Naturally we were approaching development with a TDD-mindset, and were keen to implement both Unit Tests for core business functionality, and also integration tests for for a more end-to-end style of testing. The primary application stack consists of Spring (Core, Data, MVC), MySQL and Solr 4.

Just a slight aside, but for anyone looking to implement full-text search the primary candidates are Solr and ElasticSearch. I won’t discuss the merits of either implementation further as it’s best to evaluate each in respect to your use cases (and here is an excellent resource to help you decide http://solr-vs-elasticsearch.com/

With our chosen frameworks and datastores we found the Unit testing relatively straight-forward, and decided to use JUnit (driven via the Maven surefire plugin), Mockito for mocking external dependencies (persistence layer, API calls etc), and PowerMock for the difficult mocking (for example, mocking static method calls of several reliable-but-decidedly-old-skool dependencies).

Integration testing was also relatively easy to setup – we chose to again drive tests via JUnit (this time via the failsafe plugin), and use Spring’s @ContextConfiguration and AbstractTransactionalJUnit4SpringContextTests to manage injected sub-components (@Autowires etc) and instantiate various parts of the application for testing, and we also ran an embedded H2 database to allow realistic simulation of a SQL datastore (just an aside, in ~99% of ‘standard’ use cases I have found H2 to behave identically to MySQL, but there are a couple of corner cases to watch out for – this will be another blog post :))

The Problem – How do we run an embedded Solr?

When we first started using Solr 4 we naturally wanted to create integration tests running against this datastore, and we wanted to run this in the same manner as we did with H2 – executing as a light-weight in-memory (embedded) process that we could create, pre-load, and destroy relatively quickly.

We soon found the EmbeddedSolrServer Class distributed within the Solr package, and although useful it didn’t fit in exactly with the way we wanted to design and deploy the Solr communication layer within our Spring application. For production use we wanted to instantiate a SolrServer bean for which we supply the target endpoint on the network (and under the hood this SolrServer bean would actually be instantiated using a custom HttpSolrServer Class). We needed a way to create an ’embedded’ version that implemented the SolrServer interface, but also allowed us to override the Solr config and data directory (to load pre-canned indexes etc)

After a fair bit of searching we stumbled over ZoomInfo’s excellent blog in which they had shared their version of an embedded SolrServer that could easily be exposed as a Spring bean. They called the Class the InProcessSolrServer

We would like to offer many thanks to ZoomInfo for sharing there great work, and this Class provided us with many months of good service. However, with the latest releases of Solr (4.2 +) ZoomInfo’s InProcessSolrServer will no longer compile due to an interface change within the Solr internals.

In the spirit of sharing the wealth I wanted to blog an update to the original ZoomInfo code, which addresses the interface change, and I’ve also included the Spring scaffolding in the gist below to give you an idea of how we run this code.

package uk.co.taidev.solrtesting.solr;
import com.google.common.base.Throwables;
import com.google.common.io.Files;
import com.iat.compassmassive.normalisedproductloader.springutils.SpringProfileName;
import org.apache.solr.client.solrj.SolrRequest;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
import org.apache.solr.client.solrj.response.UpdateResponse;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.SolrCore;
import org.apache.solr.core.SolrResourceLoader;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.search.SolrIndexSearcher;
import org.apache.solr.util.RefCounted;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Profile;
import org.springframework.stereotype.Component;
import java.io.Closeable;
import java.io.File;
import java.io.IOException;
import java.util.Collection;
/**
* SolrServer sub-class that manages the life-cycle of an in-process(embedded) Solr server.
* <p/>
* Modified from original source provided by ZoomInfo @ http://browse.feedreader.com/c/ZoomInfo_Blog/12021683
* <p/>
* Required dependencies: Spring 3.2.X, Solr, 4.2.X+ (or 4.3.X), Guava 14+
* <p/>
* User: Daniel Bryant
* Date: 01/07/13
*/
@Component
@Profile(SpringProfileName.DEVELOPMENT)
public class InProcessSolrServer extends SolrServer implements Closeable {
//
//------------------ static -------------------------
//
private static final Logger LOGGER = LoggerFactory.getLogger(InProcessSolrServer.class);
private static final String DEFAULT_SOLR_HOME_DIR_PATH = "./src/test/resources/solr/";
//
//------------------ instance-------------------------
//
private File solrHomeDir = null;
private File dataDir = null;
private SolrServer delegate = null;
private transient SolrCore core = null;
//
//------------------ constructor -------------------------
//
/**
* Create an InProcessSolrServer using the default Solr Home Directory.
*/
public InProcessSolrServer() {
this(DEFAULT_SOLR_HOME_DIR_PATH);
}
/**
* Create an InProcessSolrServer using the specified Solr Home Directory and a Solr Data Directory placed
* beneath the system's temporary directory (as defined by the Guava method Files.createTempDir()).
*
* @param solrHomeDirPath path to Solr Root Directory
*/
public InProcessSolrServer(String solrHomeDirPath) {
try {
System.setProperty("solr.solr.home", solrHomeDirPath);
System.setProperty("solr.data.dir", Files.createTempDir().getAbsolutePath());
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
delegate = new EmbeddedSolrServer(coreContainer, "");
} catch (Exception e) {
throw Throwables.propagate(e);
}
}
//
//------------------ public -------------------------
//
/**
* This method passes all queries and indexing events on to an in-process delegate.
*
* @param req Solr Request
* @return NamedList
* @throws SolrServerException if an error occurs when processing the request
* @throws IOException if an IOException occurs when processing the request
*/
@Override
public NamedList<Object> request(final SolrRequest req) throws SolrServerException, IOException {
try {
return getDelegate().request(req);
} catch (final Exception e) {
Throwables.propagateIfInstanceOf(e, SolrServerException.class);
Throwables.propagateIfInstanceOf(e, IOException.class);
throw Throwables.propagate(e);
}
}
/**
* Closes the Solr Core.
*/
@Override
public synchronized void close() {
if (core != null) {
core.close();
core = null;
}
}
/**
* SolrIndexSearcher adds schema awareness and caching functionality over the Lucene IndexSearcher.
* http://lucene.apache.org/solr/normalisedproductloader/org/apache/solr/search/SolrIndexSearcher.html
*
* @return RefCounted SolrIndexSearcher
* @throws SolrServerException
*/
public RefCounted<SolrIndexSearcher> getIndexSearcher() throws SolrServerException {
getDelegate(); // force the delegate to be created
return core.getSearcher();
}
/**
* Returns the index schema used by this Solr server.
*
* @return delegate SolrServer primary core IndexSchema
* @throws SolrServerException
*/
public IndexSchema getIndexSchema() throws SolrServerException {
getDelegate(); // force the delegate to be created
return core.getSchema();
}
/**
* Prepares this SolrServer for shutdown.
*/
@Override
public void shutdown() {
LOGGER.debug("shutdown entry...");
close();
LOGGER.debug("...core closed...");
}
@Override
public UpdateResponse addBeans(Collection<?> beans) throws SolrServerException, IOException {
UpdateResponse updateResponse = super.addBeans(beans);
super.commit();
return updateResponse;
}
@Override
public UpdateResponse deleteByQuery(String query) throws SolrServerException, IOException {
UpdateResponse updateResponse = super.deleteByQuery(query);
super.commit();
return updateResponse;
}
//
//------------------ protected -------------------------
//
@Override
@SuppressWarnings("FinalizeDeclaration")
protected void finalize() throws Throwable {
close();
super.finalize();
}
//
//------------------ private -------------------------
//
/**
* This method creates an in-process Solr server that otherwise behaves just as expected.
*/
private synchronized SolrServer getDelegate() throws SolrServerException {
if (delegate != null) {
return delegate;
}
try {
CoreContainer container = new CoreContainer(SolrResourceLoader.locateSolrHome());
CoreDescriptor descriptor = new CoreDescriptor(container, "core1", solrHomeDir.getCanonicalPath());
core = container.create(descriptor);
container.register("core1", core, false);
delegate = new EmbeddedSolrServer(container, "core1");
return delegate;
} catch (IOException ex) {
throw new SolrServerException(ex);
}
}
/**
* Sets the Solr root directory. In Solr’s documentation, this is generally referred to as "/solr-root". The "conf"
* directory (containing schema, stopwords, synonyms etc) will be a subdirectory of this.
*
* @param solrHomeDir Solr 'Home Directory'
*/
private void setSolrHomeDir(final File solrHomeDir) {
this.solrHomeDir = solrHomeDir;
System.setProperty("solr.home", solrHomeDir.getPath());
if (this.dataDir == null) {
setDataDir(new File(solrHomeDir, "data"));
}
}
/**
* Sets the Solr data directory. This is the parent directory of the "index" and "spellchecker" directories.
*
* @param dataDir Solr 'Data Directory'
*/
private void setDataDir(final File dataDir) {
this.dataDir = dataDir;
System.setProperty("solr.data.dir", dataDir.getPath());
}
}

I hope this helps, and if you have any questions then please feel free to comment or tweet 🙂

I’m currently working on a Java-based component which is utilising Solr heavily. We bundle the Solr core library dependency within a fat JAR (which we deploy standalone), and this dependency is managed via the de facto Maven approach.

The Problem

After a series of new features were added to this component by a member of the team we suddenly noticed that the usual Solr logging to the console has stopped. At first we thought Solr had stopped working (which caused a little panic 🙂 ), but even though no logging was being displayed we could still access the web console, and everything appeared to be functioning correctly.

What most of the team didn’t realise is that during the addition of new features one of the developers also bumped the version of the Solr dependency from 4.2.0 to 4.3.0. The intentions were good – get the latest and greatest version, and the expectation is that a minor version number increase usually fixes bugs and adds a few pieces of new functionality.

However, this time around it was probably worth reading the release notes, as the Solr team have fundamentally altered their approach to logging in the 4.3.0 release  http://wiki.apache.org/solr/SolrLogging

The Solution

The fix for us was to include an appropriate log4j.properties config file within our component’s Maven resources directory. We were already including the slf4j and log4j dependencies within our component Maven POM, and so we didn’t need to perform any additional steps to incorporate these into the deployment artifact (our fat JAR) as mentioned by the Solr team at http://wiki.apache.org/solr/SolrLogging