Thursday, August 9, 2012

JCR project - part 1

This is a brief tutorial on how to start playing with JCR and Jackrabbit. At the end of it you will have a some code that creates a repository and allows you to interact with it by adding and search content.

In order to keep the configuration simple you will going to use the Automatic Configuration for a Transient Repository. This means that you don't need to create a repository descriptor in order to define how your data will be stored. Moreover since the repository is transient when you shut down the program all the data will be lost.

The technologies involved are:

  • JCR 2.0
  • Jackrabbit 2.5.0
  • derby 10.9.1.0
  • slf4j 1.6.6

The source code of the project can be found here in github (branch: automatic_transient) but I will show you how to implement it yourself.


Let's go step by step:

1) Create a quickstart project using maven (follow the link for details)

You can use these as the project parameters:


        <groupId>com.acme</groupId>
<artifactId>jcr-poc</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>jcr-poc</name>


1.1) If you want to import the project in eclipse then simply run: mvn eclipse:eclipse in the project directory

2) Add the following dependencies to your pom:


<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-core</artifactId>
<version>2.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.derby</groupId>
<artifactId>derby</artifactId>
<version>10.9.1.0</version>
</dependency>
<dependency>
<groupId>javax.jcr</groupId>
<artifactId>jcr</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.6.6</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.6.6</version>
</dependency>
<!-- Test -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.9</version>
<scope>test</scope>
</dependency>
</dependencies>

3) Since you are all set it's time to create (initialize) your repository. Create a RepositoryInitializerImpl class that implements the following method:

  • Session initializeRepository() throws Exception
The goal of this method is to create:
  • the repository
  • a session
  • a workspace
  • a namespace 
that will be used to manipulate the content.

public Session initializeRepository() throws Exception {

repository = new TransientRepository(new File("target"));
log.debug("Transient repository created");

session = repository.login(new SimpleCredentials("admin", "admin"
.toCharArray()));
log.debug("Session created");

workspace = session.getWorkspace();
log.debug("Workspace created");

Node root = session.getRootNode();
log.debug("Root node created");

try {
workspace.getNamespaceRegistry().registerNamespace(namespace, url);
log.debug("Workspace added as: " + namespace + ", " + url);
} catch(NamespaceException e) {
log.warn(e.getMessage());
}

repoMainNode = root.addNode(namespace + REPOSITORY);

log.debug("Saving the session");
session.save();

return session;
}

Notice that a session is created by logging in the repository. When you run the tests (you will!) open the repository.xml under the target directory and take a look at the LoginModule element. Logging in as an anonymous user does not need a password but does not allow the creation of any namespace.

Within this method you can actually add content to the root node or to the repoMainNode and then use the session to commit your operations. Besides I implemented a DataHandlerImpl class that performs addition and search. The design is faulty therefore feel free to improve it!

4) Create Article and Author beans

The schema that I used is something like this:

<repository>
     <article doi='1'>
          <authors>
               <author firstname='io' lastname='me'/>
               <author firstname='you' lastname='yourself'/>
          </authors>
     </article>
     <article doi='2'>
          <authors>
               <author firstname='io' lastname='me'/>
               <author firstname='he' lastname='himself'/>
          </authors>
     </article>
</repository>


You can think of JCR as a n-ary three where each node has one or more type associated and any node can be linked to any other by using an id. In order to retrieve nodes from the repository you can use:

  • XPath (deprecated in version 2.0)
  • SQL2
  • SQL
  • JQOM (Java Query Object Model)
I actually used XPath to implement the search but I will soon rewrite (and post) the operations using JQOM and SQL2. The operations are:
  • add a new article
  • add an author to an article
  • retrieve articles for an author
Adding a new article is straight forward as shown below:

Node articleNode = repoMainNode.addNode(namespace + ARTICLE_PREFIX);

articleNode.setProperty(namespace + DOI_PREFIX, article.getDoi());
articleNode.setProperty(namespace + TITLE_PREFIX, article.getTitle());

Node authorsNode = articleNode.addNode(namespace + AUTHORS_LIST_PREFIX);

for (Author author : article.getAuthors()) {
Node authorNode = authorsNode.addNode(namespace + AUTHOR_PREFIX);
authorNode.setProperty(namespace + AUTHOR_FIRST_NAME_PREFIX,
author.getFirstName());
authorNode.setProperty(namespace + AUTHOR_LAST_NAME_PREFIX,
author.getLastName());
}

session.save();

Saving the session is equal to commit when performing a database query thus you need to do it in order to save your data.

Searching is less straight forward but it is still pretty easy. You need to use the QueryManager object to create an execute a query in any language. Here follows a snippet of the code to collect an article based on the doi:

QueryManager queryManager = workspace.getQueryManager();
Query query = queryManager
.createQuery("//" + namespace + ":article[@" + namespace
+ ":doi = '" + article.getDoi() + "']", Query.XPATH);

log.debug(query.getStatement());
QueryResult results = query.execute();

When you run the code you should see something similar to this query statement: //test:article[@test:doi = '10.1038/2012.11109']

Notice that I used test as the namespace for my content.

A slightly more complex query is needed in order to get the articles for a given author, at runtime you'll this entry somewhere in your log: //element(test:article, nt:unstructured)[test:authors/test:author/test:firstName='John' and test:authors/test:author/test:lastName='Doe']

Take a look at this link for some more information about XPath and JCR.

Last but not least you can write some tests to see how your repository works. I used JUnit @Before and @After annotations. As a matter of facts before running each test I create my DataHandlerImpl object setting the repository parameters (here is the flaw). While after each test I execute session.logout(); in order to commit the operation.

If you haven't done it before I suggest you to take a look at the code in github (branch: automatic_transient). I also strongly encourage you to take a look at the long and verbose log that your tests will produce and, more important, to the xml files in the repository that Jackrabbit created for you. 


No comments:

Post a Comment