Tuesday, August 14, 2012

JCR standalone with MySQL


In this post you'll see the steps required to use jackrabbit repository in mode:

 • manual
 • persistent (mysql)


I am going to use an update version of the code used in the previous two posts that can be found at https://github.com/mgardellini/jcr-poc (branch: manual_persistent). The database used is mysql but the same logic applies to any other like Derby or MSSQL.

1) create a database called jackrabbit in your local mysql installation
2) download jackrabbit 2.5.1 standalone from:


3) copy the downloaded jar in your <anywhere>/jackrabbit_home, let's call this dir JACKRABBIT_HOME

   java -jar jackrabbit-standalone-2.5.1.jar

5) wait for it to fully start then stop the server (ctrl+d)
6) copy repository/repository.xml to JACKRABBIT_HOME/jackrabbit
7) create the directory JACKRABBIT_HOME/lib
8) download/get from your local m2repo copy mysql-connector-java-5.1.21.jar in JACKRABBIT_HOME/lib
9) restart jackrabbit as:

   java -cp "jackrabbit-standalone-2.5.1.jar:lib/mysql-connector-java-5.1.21.jar" org.apache.jackrabbit.standalone.Main

10) access http://localhost:8080/ you should see the initial page and with error messages or forms of any kind
11) add this dependency to your pom:

   <dependency>

    <groupId>org.apache.jackrabbit</groupId>
    <artifactId>jackrabbit-jcr-rmi</artifactId>
    <version>2.5.1</version>
   </dependency>

10) change your code in order to use JcrUtils for logging in the repo (see this link for details):

   JcrUtils.getRepository("http://localhost:8080/rmi");

As you can see it is pretty straight forward. You can now either play with the tests in order to add and search nodes or you can add more beans to your implementation, save them and see how jackrabbit reacts.

You probably want to change the design of the DataHandler in order to avoid passing session, namespace, workspace and nodes. Again the project is a mere PoC for understanding how to setup and use Jackrabbit.

Thursday, August 9, 2012

JCR project - part 2

In this post I'll show how to instantiate a repository using a repo home directory and a repository.xml file. To focus on the actual code and project configuration I am using the xml generated by Jackrabbit when instantiating an Automatic and Transient repository as shown in the previous post.

These are the steps:

1) Create src/repository directory in your project folder
2) Create repository.xml in the directory at point 1
3) Paste the content below in the file at point 2


<?xml version="1.0"?>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
license agreements. See the NOTICE file distributed with this work for additional
information regarding copyright ownership. The ASF licenses this file to
You under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of
the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
by applicable law or agreed to in writing, software distributed under the
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
OF ANY KIND, either express or implied. See the License for the specific
language governing permissions and limitations under the License. -->

<!DOCTYPE Repository
          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">

<!-- Example Repository Configuration File Used by - org.apache.jackrabbit.core.config.RepositoryConfigTest.java
- -->
<Repository>
<!-- virtual file system where the repository stores global state (e.g.
registered namespaces, custom node types, etc.) -->
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository" />
</FileSystem>

<!-- data store configuration -->
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore" />

<!-- security configuration -->
<Security appName="Jackrabbit">
<!-- security manager: class: FQN of class implementing the JackrabbitSecurityManager
interface -->
<SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager"
workspaceName="security"
>
<!-- workspace access: class: FQN of class implementing the WorkspaceAccessManager
interface -->
<!-- <WorkspaceAccessManager class="..."/> -->
<!-- <param name="config" value="${rep.home}/security.xml"/> -->
</SecurityManager>

<!-- access manager: class: FQN of class implementing the AccessManager
interface -->
<AccessManager
class="org.apache.jackrabbit.core.security.DefaultAccessManager"
>
<!-- <param name="config" value="${rep.home}/access.xml"/> -->
</AccessManager>

<LoginModule
class="org.apache.jackrabbit.core.security.authentication.DefaultLoginModule"
>
<!-- anonymous user name ('anonymous' is the default value) -->
<param name="anonymousId" value="anonymous" />
<!-- administrator user id (default value if param is missing is 'admin') -->
<param name="adminId" value="admin" />
</LoginModule>
</Security>

<!-- location of workspaces root directory and name of default workspace -->
<Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="default" />
<!-- workspace configuration template: used to create the initial workspace
if there's no workspace yet -->
<Workspace name="${wsp.name}">
<!-- virtual file system of the workspace: class: FQN of class implementing
the FileSystem interface -->
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${wsp.home}" />
</FileSystem>
<!-- persistence manager of the workspace: class: FQN of class implementing
the PersistenceManager interface -->
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.pool.DerbyPersistenceManager"
>
<param name="url" value="jdbc:derby:${wsp.home}/db;create=true" />
<param name="schemaObjectPrefix" value="${wsp.name}_" />
</PersistenceManager>
<!-- Search index and the file system it uses. class: FQN of class implementing
the QueryHandler interface -->
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index" />
<param name="supportHighlighting" value="true" />
</SearchIndex>
</Workspace>

<!-- Configures the versioning -->
<Versioning rootPath="${rep.home}/version">
<!-- Configures the filesystem to use for versioning for the respective
persistence manager -->
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/version" />
</FileSystem>

<!-- Configures the persistence manager to be used for persisting version
state. Please note that the current versioning implementation is based on
a 'normal' persistence manager, but this could change in future implementations. -->
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.pool.DerbyPersistenceManager"
>
<param name="url" value="jdbc:derby:${rep.home}/version/db;create=true" />
<param name="schemaObjectPrefix" value="version_" />
</PersistenceManager>
</Versioning>

<!-- Search index for content that is shared repository wide (/jcr:system
tree, contains mainly versions) -->
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${rep.home}/repository/index" />
<param name="supportHighlighting" value="true" />
</SearchIndex>

<!-- Run with a cluster journal -->
<Cluster id="node1">
<Journal class="org.apache.jackrabbit.core.journal.MemoryJournal" />
</Cluster>
</Repository>


4) Edit your pom.xml in order to instruct maven to copy the repository directory and its content in target/. You simply need to specify the resource in the build part of the pom.


<build>
<resources>
<resource>
<directory>src/repository</directory>
<includes>
<include>*/**</include>
</includes>
<targetPath>repository</targetPath>
</resource>
</resources>
</build>


If you now run: mvn clean test-compile you should see these two entries:

  • target//classes/repository
  • target//classes/repository/repository.xml


5) If you have an interface that your repository initializer implements then change the initializeRepository method in order to add the (String configFile, String repHomeDir) parameters. 
The configFile points to the repository.xml file, relative path from your project home is enough for test purposes, while the repoHomeDir points to the repository directory.

6) The last step is instructing the creation of the repository using the RegistryHelper. Note that the code leverages the Java Naming and Directory Interface. 
Now rather than creating a repository automatically with the TransientRepository as follows:

repository = new TransientRepository(new File("target"));

You need to execute this code:

Hashtable<String, String> env = new Hashtable<String, String>();
env.put(Context.INITIAL_CONTEXT_FACTORY,
"org.apache.jackrabbit.core.jndi"
+ ".provider.DummyInitialContextFactory");

env.put(Context.PROVIDER_URL, "localhost");

InitialContext ctx = new InitialContext(env);

RegistryHelper.registerRepository(ctx, "repo", configFile, repHomeDir,
true);

repository = (Repository) ctx.lookup("repo");

Note that the automatic way needs the directory where the repository.xml file and the repository directory will be created to be passed as a parameter.

Run your test everything should work without any problem. This is one of the big advantage of using JCR: changing the underlying storage implementation does not require any change to your business logic!

You can now try to play with the repository DataStore for the binary files or the PersistanceStorage in order to use different implementation. This link may be handy.

The source code is here (branch: manual_transient)

JCR project - part 1

This is a brief tutorial on how to start playing with JCR and Jackrabbit. At the end of it you will have a some code that creates a repository and allows you to interact with it by adding and search content.

In order to keep the configuration simple you will going to use the Automatic Configuration for a Transient Repository. This means that you don't need to create a repository descriptor in order to define how your data will be stored. Moreover since the repository is transient when you shut down the program all the data will be lost.

The technologies involved are:

  • JCR 2.0
  • Jackrabbit 2.5.0
  • derby 10.9.1.0
  • slf4j 1.6.6

The source code of the project can be found here in github (branch: automatic_transient) but I will show you how to implement it yourself.


Let's go step by step:

1) Create a quickstart project using maven (follow the link for details)

You can use these as the project parameters:


        <groupId>com.acme</groupId>
<artifactId>jcr-poc</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>jcr-poc</name>


1.1) If you want to import the project in eclipse then simply run: mvn eclipse:eclipse in the project directory

2) Add the following dependencies to your pom:


<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-core</artifactId>
<version>2.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.derby</groupId>
<artifactId>derby</artifactId>
<version>10.9.1.0</version>
</dependency>
<dependency>
<groupId>javax.jcr</groupId>
<artifactId>jcr</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.6.6</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.6.6</version>
</dependency>
<!-- Test -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.9</version>
<scope>test</scope>
</dependency>
</dependencies>

3) Since you are all set it's time to create (initialize) your repository. Create a RepositoryInitializerImpl class that implements the following method:

  • Session initializeRepository() throws Exception
The goal of this method is to create:
  • the repository
  • a session
  • a workspace
  • a namespace 
that will be used to manipulate the content.

public Session initializeRepository() throws Exception {

repository = new TransientRepository(new File("target"));
log.debug("Transient repository created");

session = repository.login(new SimpleCredentials("admin", "admin"
.toCharArray()));
log.debug("Session created");

workspace = session.getWorkspace();
log.debug("Workspace created");

Node root = session.getRootNode();
log.debug("Root node created");

try {
workspace.getNamespaceRegistry().registerNamespace(namespace, url);
log.debug("Workspace added as: " + namespace + ", " + url);
} catch(NamespaceException e) {
log.warn(e.getMessage());
}

repoMainNode = root.addNode(namespace + REPOSITORY);

log.debug("Saving the session");
session.save();

return session;
}

Notice that a session is created by logging in the repository. When you run the tests (you will!) open the repository.xml under the target directory and take a look at the LoginModule element. Logging in as an anonymous user does not need a password but does not allow the creation of any namespace.

Within this method you can actually add content to the root node or to the repoMainNode and then use the session to commit your operations. Besides I implemented a DataHandlerImpl class that performs addition and search. The design is faulty therefore feel free to improve it!

4) Create Article and Author beans

The schema that I used is something like this:

<repository>
     <article doi='1'>
          <authors>
               <author firstname='io' lastname='me'/>
               <author firstname='you' lastname='yourself'/>
          </authors>
     </article>
     <article doi='2'>
          <authors>
               <author firstname='io' lastname='me'/>
               <author firstname='he' lastname='himself'/>
          </authors>
     </article>
</repository>


You can think of JCR as a n-ary three where each node has one or more type associated and any node can be linked to any other by using an id. In order to retrieve nodes from the repository you can use:

  • XPath (deprecated in version 2.0)
  • SQL2
  • SQL
  • JQOM (Java Query Object Model)
I actually used XPath to implement the search but I will soon rewrite (and post) the operations using JQOM and SQL2. The operations are:
  • add a new article
  • add an author to an article
  • retrieve articles for an author
Adding a new article is straight forward as shown below:

Node articleNode = repoMainNode.addNode(namespace + ARTICLE_PREFIX);

articleNode.setProperty(namespace + DOI_PREFIX, article.getDoi());
articleNode.setProperty(namespace + TITLE_PREFIX, article.getTitle());

Node authorsNode = articleNode.addNode(namespace + AUTHORS_LIST_PREFIX);

for (Author author : article.getAuthors()) {
Node authorNode = authorsNode.addNode(namespace + AUTHOR_PREFIX);
authorNode.setProperty(namespace + AUTHOR_FIRST_NAME_PREFIX,
author.getFirstName());
authorNode.setProperty(namespace + AUTHOR_LAST_NAME_PREFIX,
author.getLastName());
}

session.save();

Saving the session is equal to commit when performing a database query thus you need to do it in order to save your data.

Searching is less straight forward but it is still pretty easy. You need to use the QueryManager object to create an execute a query in any language. Here follows a snippet of the code to collect an article based on the doi:

QueryManager queryManager = workspace.getQueryManager();
Query query = queryManager
.createQuery("//" + namespace + ":article[@" + namespace
+ ":doi = '" + article.getDoi() + "']", Query.XPATH);

log.debug(query.getStatement());
QueryResult results = query.execute();

When you run the code you should see something similar to this query statement: //test:article[@test:doi = '10.1038/2012.11109']

Notice that I used test as the namespace for my content.

A slightly more complex query is needed in order to get the articles for a given author, at runtime you'll this entry somewhere in your log: //element(test:article, nt:unstructured)[test:authors/test:author/test:firstName='John' and test:authors/test:author/test:lastName='Doe']

Take a look at this link for some more information about XPath and JCR.

Last but not least you can write some tests to see how your repository works. I used JUnit @Before and @After annotations. As a matter of facts before running each test I create my DataHandlerImpl object setting the repository parameters (here is the flaw). While after each test I execute session.logout(); in order to commit the operation.

If you haven't done it before I suggest you to take a look at the code in github (branch: automatic_transient). I also strongly encourage you to take a look at the long and verbose log that your tests will produce and, more important, to the xml files in the repository that Jackrabbit created for you.