2016-07-22

Using Maven with OSGi Part 2

Introduction

In previous installment I've described the foundation block of Maven - Eclipse Aether library. I've mentioned declarative usage of Maven using mvn: URI scheme.

In Karaf 4:

karaf@root()> bundle:install mvn:commons-io/commons-io/2.5
Bundle ID: 52
or in JBoss Fuse:
JBossFuse:karaf@root> osgi:install mvn:commons-io/commons-io/2.5
Bundle ID: 295

In this part I'll describe how Maven and Aether library are used in Karaf 4 and in JBoss Fuse 6.3.x (standalone mode). I'll leave JBoss Fuse fabric mode for next installment.

Let's recall some fundamental concepts:

local repository - accessed by Aether with the help of org.eclipse.aether.repository.LocalRepositoryManager interface and org.eclipse.aether.repository.LocalRepository class. Effectively local repository is a wrapper for locally accessible filesystem directory that follows specific structure (organization of Maven artifacts).
remote repository - accessed by Aether with the help of org.eclipse.aether.repository.RemoteRepository interface. Effectively remote repository is a wrapper for URI, a set of policies related to snapshot/release versions plus proxy, mirroring and authentication information.

pax-url-aether

The above commands rely on mvn: protocol implementation of java.net.URLStreamHandler interface. This implementation is provided by OPS4J PAX URL Aether library and uses Eclipse Aether to handle Maven artifact resolution.

When pax-url-aether bundle is installed and active in OSGi framework, it provides OSGi service with org.ops4j.pax.url.mvn.MavenResolver interface.

In Karaf 4:

karaf@root()> bundle:services -p 4

OPS4J Pax Url - aether: (4) provides:
-------------------------------------
objectClass = [org.osgi.service.cm.ManagedService]
service.bundleid = 4
service.id = 9
service.pid = org.ops4j.pax.url.mvn
service.scope = singleton
----
objectClass = [org.osgi.service.url.URLStreamHandlerService]
service.bundleid = 4
service.id = 10
service.scope = singleton
url.handler.protocol = mvn
----
objectClass = [org.ops4j.pax.url.mvn.MavenResolver]
service.bundleid = 4
service.id = 19
service.scope = singleton

In JBoss Fuse:

JBossFuse:karaf@root> ls 4
You are about to access system bundle 4.  Do you wish to continue (yes/no): yes

OPS4J Pax Url - aether: (4) provides:
-------------------------------------
objectClass = [org.osgi.service.cm.ManagedService]
service.id = 8
service.pid = org.ops4j.pax.url.mvn
----
objectClass = [org.osgi.service.url.URLStreamHandlerService]
service.id = 9
url.handler.protocol = mvn
----
objectClass = [org.ops4j.pax.url.mvn.MavenResolver]
service.id = 21

Configuration

This time I won't show any code examples. We don't need them. As with most of other OSGi services we can configure how pax-url-aether uses Aether with Configuration Admin service. Describing Configuration Admin is a good task for another blog post. There are some articles about Configuration Admin and if something's not clear, please refer to original documentation.

To configure org.ops4j.pax.url.mvn.MavenResolver service (the implementation is org.ops4j.pax.url.mvn.internal.AetherBasedResolver) we use org.ops4j.pax.url.mvn PID (persistent identifier). We can configure this PID manually using Configuration Admin API or using etc/org.ops4j.pax.url.mvn.cfg file.

Internally, pax-url-aether uses org.ops4j.pax.url.mvn.internal.config.MavenConfigurationImpl object. It contains information used by org.ops4j.pax.url.mvn.internal.AetherBasedResolver.

Let's see what's the default configuration (sightly formatted) provided by Karaf 4 (/work directory is a docker mount volume):

karaf@root()> property-list --pid org.ops4j.pax.url.mvn
   felix.fileinstall.filename = file:/work/etc/org.ops4j.pax.url.mvn.cfg
   org.ops4j.pax.url.mvn.defaultRepositories = \
      file:/work/system@id=system.repository@snapshots, \
      file:/work/data/kar@id=kar.repository@multi@snapshots, \
      file:/work/system@id=child.system.repository@snapshots
   org.ops4j.pax.url.mvn.repositories = \
      http://repo1.maven.org/maven2@id=central, \
      http://repository.springsource.com/maven/bundles/release@id=spring.ebr.release, \
      http://repository.springsource.com/maven/bundles/external@id=spring.ebr.external, \
      http://zodiac.springsource.com/maven/bundles/release@id=gemini, \
      http://repository.apache.org/content/groups/snapshots-group@id=apache@snapshots@noreleases, \
      https://oss.sonatype.org/content/repositories/snapshots@id=sonatype.snapshots.deploy@snapshots@noreleases, \
      https://oss.sonatype.org/content/repositories/ops4j-snapshots@id=ops4j.sonatype.snapshots.deploy@snapshots@noreleases, \
      http://repository.springsource.com/maven/bundles/external@id=spring-ebr-repository@snapshots@noreleases
   org.ops4j.pax.url.mvn.useFallbackRepositories = false
   service.pid = org.ops4j.pax.url.mvn

And JBoss Fuse:

JBossFuse:karaf@root> config:proplist --pid org.ops4j.pax.url.mvn
   felix.fileinstall.filename = file:/data/servers/jboss-fuse-6.3.0.redhat-145/etc/org.ops4j.pax.url.mvn.cfg
   org.ops4j.pax.url.mvn.defaultRepositories = \
      file:/data/servers/jboss-fuse-6.3.0.redhat-145/system@snapshots@id=karaf.system,\
      file:/home/ggrzybek/.m2/repository@snapshots@id=local,\
      file:/data/servers/jboss-fuse-6.3.0.redhat-145/local-repo@snapshots@id=karaf.local-repo,\
      file:/data/servers/jboss-fuse-6.3.0.redhat-145/system@snapshots@id=child.karaf.system
   org.ops4j.pax.url.mvn.globalChecksumPolicy = warn
   org.ops4j.pax.url.mvn.globalUpdatePolicy = daily
   org.ops4j.pax.url.mvn.localRepository = /data/servers/jboss-fuse-6.3.0.redhat-145/data/repository
   org.ops4j.pax.url.mvn.repositories = \
      http://repo1.maven.org/maven2@id=maven.central.repo, \
      https://maven.repository.redhat.com/ga@id=redhat.ga.repo, \
      https://maven.repository.redhat.com/earlyaccess/all@id=redhat.ea.repo, \
      https://repository.jboss.org/nexus/content/groups/ea@id=fuseearlyaccess
   org.ops4j.pax.url.mvn.settings = /data/servers/jboss-fuse-6.3.0.redhat-145/etc/maven-settings.xml
   org.ops4j.pax.url.mvn.useFallbackRepositories = false
   service.pid = org.ops4j.pax.url.mvn

The above configurations differ a bit. JBoss Fuse provides more explicit configuration. Let's describe each used (and assumed) properties that can be used to configure pax-url-aether (org.ops4j.pax.url.mvn.internal.AetherBasedResolver).

org.ops4j.pax.url.mvn.defaultRepositories

this is a list of local repositories searched for an artifact in the first phase of artifact resolution. This repository should not contain URIs other than file://-based. Each repository from this list is treated as local repository. pax-url-aether iterates over this list and checks one repository at a time. If neither location contains the artifact being resolved, pax-url-aether switches to second phase that involves remote repositories.
Also these repositories do not require write access - Aether doesn't write any files there.

Access to these repositories can be presented using the following code (did I promise not to show any code example? sorry...):

RepositorySystem system = locator.getService(RepositorySystem.class);
RepositorySystemSession session = MavenRepositorySystemUtils.newSession();

String basedir = singleRepositoryFromListOfDefaultRepositories;
((DefaultRepositorySystemSession)session).setLocalRepositoryManager(system.newLocalRepositoryManager(session, new LocalRepository(basedir)));

ArtifactRequest req = new ArtifactRequest();
req.setArtifact(new DefaultArtifact("commons-io", "commons-io", "jar", "2.5"));

ArtifactResult res = system.resolveArtifact(session, req);

We don't invoke any req.addRepository(repositoryBuilder.build());, so Aether doesn't try to go to any external location.

Line 4 shows that we're trying one of repositories from org.ops4j.pax.url.mvn.defaultRepositories at a time. Each such local repository is checked independently.

org.ops4j.pax.url.mvn.repositories

this is a list of remote repositories searched for an artifact in the second phase of artifact resolution. This repository may contain URIs with file: scheme, but it's better to add such repositories to org.ops4j.pax.url.mvn.defaultRepositories. Each repository is accessed using some configured connector (pax-url-aether uses a connector that underneath invokes httpclient 4.x library - that's why configuring org.apache.http.headers logger may be a good idea)

org.ops4j.pax.url.mvn.localRepository

this is a local repository that supports Aether in second phase of artifact resolution. Its role is a bit different than the role of org.ops4j.pax.url.mvn.defaultRepositories. When Aether actually resolves artifact in one of the remote repositories, it stores the downloaded artifact to org.ops4j.pax.url.mvn.localRepository. That's why write access is required for this location.

If not specified, this property defaults to: ${user.home}/.m2/repository! Be aware of this if Aether seems to find artifacts that are not expected.

Second phase of artifact resolution can be presented using the following code (I love clean code!):

RepositorySystem system = locator.getService(RepositorySystem.class);
RepositorySystemSession session = MavenRepositorySystemUtils.newSession();

String basedir = localRepository;
((DefaultRepositorySystemSession)session).setLocalRepositoryManager(system.newLocalRepositoryManager(session, new LocalRepository(basedir)));

ArtifactRequest req = new ArtifactRequest();
req.addRepository(new RemoteRepository.Builder("ID1", "default", "http://uri1").build());
req.addRepository(new RemoteRepository.Builder("ID2", "default", "http://uri2").build());
req.setArtifact(new DefaultArtifact("commons-io", "commons-io", "jar", "2.5"));

ArtifactResult res = system.resolveArtifact(session, req);

Here, for each remote repository from the list of URIs in org.ops4j.pax.url.mvn.repositories property we call org.eclipse.aether.resolution.ArtifactRequest.addRepository().

Also, in line 4 we use local repository from org.ops4j.pax.url.mvn.localRepository property - the same local repository is used for each remote repository being searched.

org.ops4j.pax.url.mvn.useFallbackRepositories

If true, then Aether will always use http://repo1.maven.org/maven2 repository in addition to any remote repositories specified. I prefer explicit declaration of Maven Central repository (if needed), so it's better to say false here.

org.ops4j.pax.url.mvn.settings

Ah, big topic here. We can specify an explicit location of an XML document following Maven Settings XML Schema.

If not specified, pax-url-aether searches for settings file in the following locations:
  • ${user.home}/.m2/settings.xml
  • ${maven.home}/conf/settings.xml
  • $M2_HOME/conf/settings.xml

In Karaf 4, implicit location is used (most probably ${user.home}/.m2/settings.xml). In JBoss Fuse, explicit ${karaf.etc}/maven-settings.xml value is configured and default, commented template is shipped.

Why specify custom settings.xml file, when we have properties such as org.ops4j.pax.url.mvn.repositories? There are few things that can be specified only there:

  • HTTP proxies
  • custom HTTP headers added when accessing particular remote repositories

Here's the example of HTTP proxy configuration:

<!--
    This is the place to configure http proxies used by Aether.
    If there's no proxy for "https" protocol, proxy for "http" will be used when accessing remote repository
-->
<proxies>
    <proxy>
        <id>proxy</id>
        <host>127.0.0.1</host>
        <port>3128</port>
        <protocol>http</protocol>
        <username></username>
        <password></password>
        <nonProxyHosts>127.0.0.*|*.repository.corp</nonProxyHosts>
    </proxy>
</proxies>

And here's the example of specifying custom HTTP headers:

<!--
    pax-url-aether may use the below configuration to add custom HTTP headers when accessing remote repositories
    with a given identifier
-->
<servers>
    <server>
        <id>maven.central.repo</id>
        <configuration>
            <httpHeaders>
                <httpHeader>
                    <name>User-Agent</name>
                    <value>Karaf</value>
                </httpHeader>
                <httpHeader>
                    <name>Secret-Header</name>
                    <value>secret_value</value>
                </httpHeader>
            </httpHeaders>
        </configuration>
    </server>
</servers>

With custom headers specification, we can see these in logs when accessing repository with ID=maven.central.repo (see below for repository URI specification):

17:30:44,590 | DEBUG | ... | http-outgoing-0 >> GET /maven2/commons-io/commons-io/2.7/commons-io-2.7.jar HTTP/1.1
17:30:44,590 | DEBUG | ... | http-outgoing-0 >> Cache-control: no-cache
17:30:44,590 | DEBUG | ... | http-outgoing-0 >> Cache-store: no-store
17:30:44,590 | DEBUG | ... | http-outgoing-0 >> Pragma: no-cache
17:30:44,591 | DEBUG | ... | http-outgoing-0 >> Expires: 0
17:30:44,591 | DEBUG | ... | http-outgoing-0 >> Accept-Encoding: gzip
17:30:44,591 | DEBUG | ... | http-outgoing-0 >> User-Agent: Karaf
17:30:44,591 | DEBUG | ... | http-outgoing-0 >> Secret-Header: secret_value
17:30:44,591 | DEBUG | ... | http-outgoing-0 >> Host: repo1.maven.org
org.ops4j.pax.url.mvn.repositories - again

After describing org.ops4j.pax.url.mvn.settings let's get back for a moment to org.ops4j.pax.url.mvn.repositories.

If a list of remote repositories in org.ops4j.pax.url.mvn.repositories is prefixed with + sign, all repositories available in all active profiles defined in settings.xml file are appended to effective list of remote repositories searched.

For example if we have this:

org.ops4j.pax.url.mvn.repositories= \
    +http://repo1.maven.org/maven2@id=maven.central.repo

And this in settings.xml:

<!--
    If org.ops4j.pax.url.mvn.repositories property is _prepended_ with '+' sign, repositories from all active
    profiles will be _appended_ to the list of searched remote repositories
-->
<profiles>
    <profile>
        <id>default</id>
        <repositories>
            <repository>
                <id>private.repository</id>
                <url>http://localhost:8181/maven-repository</url>
            </repository>
        </repositories>
    </profile>
</profiles>
<activeProfiles>
    <activeProfile>default</activeProfile>
</activeProfiles>

We can see this in logs during sample resolution:

18:10:17,734 | DEBUG | ... | Using transporter WagonTransporter with priority -1.0 for http://repo1.maven.org/maven2/
18:10:17,736 | DEBUG | ... | Using connector BasicRepositoryConnector with priority 0.0 for http://repo1.maven.org/maven2/
18:10:17,800 | DEBUG | ... | http-outgoing-8 >> GET /maven2/commons-io/commons-io/2.7/commons-io-2.7.jar HTTP/1.1
18:10:17,802 | DEBUG | ... | http-outgoing-8 >> Host: repo1.maven.org
...
18:10:17,872 | DEBUG | ... | Using transporter WagonTransporter with priority -1.0 for http://localhost:8181/maven-repository/
18:10:17,873 | DEBUG | ... | Using connector BasicRepositoryConnector with priority 0.0 for http://localhost:8181/maven-repository/
18:10:17,875 | DEBUG | ... | http-outgoing-9 >> GET /maven-repository/commons-io/commons-io/2.7/commons-io-2.7.jar HTTP/1.1
18:10:17,876 | DEBUG | ... | http-outgoing-9 >> Host: localhost:8181
...
org.ops4j.pax.url.mvn.globalChecksumPolicy

When Aether fetches artifact from remote repository, it always tries to download SHA1/MD5 checksum for the artifact. It may fail to do so. If repository URI doesn't specify per-repository value, this global property's value is used. Actually if this global value is specified, per-repository values are ignored.

This property may have 3 values determining Aether's behavior:

  • fail - resolution fails
  • warn - information is printed at WARN level
  • ignore - nothing happens.

Note that there's no way to prevent fetching checksums.

org.ops4j.pax.url.mvn.globalUpdatePolicy

When Aether fetches SNAPSHOT artifacts, it needs to fetch maven-metadata.xml first. Before hitting org.ops4j.pax.url.mvn.repositories, Aether checks the presence of resolver-status.properties file in org.ops4j.pax.url.mvn.localRepository location (this status file is specific to given groupId, artifactId and version, for example: <REPOSITORY>/commons-io/commons-io/2.5-SNAPSHOT/resolver-status.properties). We can control whether Aether actually should refresh metadata information:

  • always - Aether always fetches maven-metadata.xml when resolving SNAPSHOTs
  • never - opposite of the above
  • daily - Aether fetches maven-metadata.xml if a day passed since timestamp written in maven-metadata-ID_OF_REPOSITORY.xml.lastUpdated property inside resolver-status.properties file.
  • interval:<NUMBER_OF_MINUTES> - Aether fetches maven-metadata.xml if given number of minutes passed.

Maven Repository URI

I've mentioned repository URI in few places. When specifying URI on a org.ops4j.pax.url.mvn.repositories list, we may use the following format:

http(s)://host:port/path@snapshots@noreleases@id=ID@other_options

Options that may be specified are:

  • id=ID - this option may (should) be specified to identify a repository. We may then refer to the repository for example when specifying custom headers.
  • snapshots - whether the repository should be used when resolving SNAPSHOT artifacts
  • noreleases - whether the repository should not be used when resolving non-SNAPSHOT artifacts
  • releasesUpdate=daily|never|always|interval:MINUTES - see description of org.ops4j.pax.url.mvn.globalUpdatePolicy property
  • snapshotsUpdate=daily|never|always|interval:MINUTES - see description of org.ops4j.pax.url.mvn.globalUpdatePolicy property
  • update=daily|never|always|interval:MINUTES - see description of org.ops4j.pax.url.mvn.globalUpdatePolicy property
  • releasesChecksum=fail|warn|ignore - see description of org.ops4j.pax.url.mvn.globalChecksumPolicy property
  • snapshotsChecksum=fail|warn|ignore - see description of org.ops4j.pax.url.mvn.globalChecksumPolicy property
  • checksum=fail|warn|ignore - see description of org.ops4j.pax.url.mvn.globalChecksumPolicy property

Other options

There are other properties that can be configured in org.ops4j.pax.url.mvn PID:

org.ops4j.pax.url.mvn.defaultLocalRepoAsRemote
Whether local repository specified in org.ops4j.pax.url.mvn.localRepository should be added as first remote repository inserted to the list configured with org.ops4j.pax.url.mvn.repositories property - it's a bad idea...

Caveats

Due to highly asynchronous nature of OSGi™ (and in particular - a slight race condition between pax-url-aether that configures org.ops4j.pax.url.mvn.MavenResolver service on one side and felix.fileinstall and felix.configadmin bundles that create configuration for org.ops4j.pax.url.mvn PID on other side), there's short period of time where other configuration may be used in org.ops4j.pax.url.mvn.MavenResolver service.

To prevent such interregnum, I suggest duplicating Maven/Aether properties from ${karaf.etc}/org.ops4j.pax.url.mvn.cfg in ${karaf.etc}/config.properties. If pax-url-aether can't find ConfigurationAdmin (yet), it defaults to bundle properties and these may be specified in etc/config.properties.

Summary

I hope the above information will clear all confusion related to pax-url-aether configuration in OSGi framework as JBoss Fuse or Karaf.

Using Maven with OSGi Part 1

Introduction

(If you're interested in pax-url-aether configuration in JBoss Fuse standalone mode or in Karaf, please visit Part 2 of the series.)

In this short series of articles I'd like to show how Maven can be used inside OSGi environment. Both Apache Karaf and JBoss Fuse use Maven extensively and it's important to understand how it really works to be able to use it successfully.

I personally think that learning internals of any technology is the best way to use and maintain it in the longer period. Ultimately getting the official source code and reading it in your favourite IDE is much better than relying on official (or unofficial) documentation.

Of course sometimes (usually) there's no time to dig through the internals, so I hope this article will provide an alternative.

Runtime

JBoss Fuse is a technology based on Apache Karaf runtime and from OSGi point of view there's no big difference. Both runtimes allow to install OSGi bundles and Karaf features that provide OSGi services of the runtime and user applications. JBoss Fuse provide even higher abstraction of profiles that group bundles, features and other items.

Apache Karaf and JBoss Fuse may reference Maven artifacts directly using mvn:groupId/artifactId/version[/type[/classifier]] URIs. These may reference bundles, features and other artifacts.

Following my usual way of learning, I'll start with low level details and continue with higher level mechanism and concepts.

Maven

Although there are good alternatives (like Gradle), Apache Maven is still de-facto standard tool for build and dependency management. When we decompose Maven tool into parts, we can reuse the dependency management part in our code. Dependency management is one of the most important aspect of software development and inside OSGi runtime, Maven dependencies are only one of the layer of dependency management. However we won't cover OSGi bundle and Karaf feature dependencies here.

We would like to fetch any artifact stored in one of external Maven repositories and use it (as bundle, feature or configuration file) inside the runtime. We'd like to do it the Maven-way, i.e., declaratively. The best example is installation of external Maven artifact inside OSGi runtime. Like Karaf:

karaf@root()> bundle:install mvn:commons-io/commons-io/2.5
Bundle ID: 52
or Fuse:
JBossFuse:karaf@root> osgi:install mvn:commons-io/commons-io/2.5
Bundle ID: 295

These commands work out of the box, but usually there's a need to change the default configuration, e.g., configure additional remote repositories, change credentials, configure HTTP proxies, etc. Before describing configuration options, let's start with the basics.

Aether

Eclipse Aether is a set of libraries used internally by Maven for dependency resolution. There are various tasks that can be performed using Aether, like finding a closure of artifacts for a graph of dependencies, but even with this low-level library we'll focus on one particular task - getting artifacts from remote repositories.

Official Aether Wiki page is sufficient to get started and see how to use it in code. I'll provide more detailed information in order to describe important concepts.

Aether uses an interface-based API where actual implementations of the interfaces are configured using CDI. There are two most important interfaces used:

  • org.eclipse.aether.RepositorySystem - an entry to repository system that provides various resolution methods
  • org.eclipse.aether.RepositorySystemSession - provide additional information specific to operations performed on RepositorySystem
and a set of classes:
  • org.eclipse.aether.*.*Request - various request classes passed as commands to RepositorySystem. We'll focus mainly on org.eclipse.aether.resolution.ArtifactRequest.

RepositorySystem is configured in dependency-injection style - we can select concrete implementations of several SPI interfaces that alter some aspects of Aether, while RepositorySystemSession is configured using properties and directly set objects. Session alters a way in which repository deals with requests.

So let's check how these work together. First let's configure the repository system:

DefaultServiceLocator locator = MavenRepositorySystemUtils.newServiceLocator();
locator.setService(RepositoryConnectorFactory.class, BasicRepositoryConnectorFactory.class);
locator.setService(TransporterFactory.class, FileTransporterFactory.class);
locator.setService(TransporterFactory.class, HttpTransporterFactory.class);
locator.setService(org.eclipse.aether.spi.log.LoggerFactory.class, Slf4jLoggerFactory.class);
RepositorySystem system = locator.getService(RepositorySystem.class);

Nothing extraordinary: we'll have access to http: and file: based repositories and SLF4J API will be used for logging.

Now let's configure session. The configuration property is arbitrary and more properties will be described later.

RepositorySystemSession session = MavenRepositorySystemUtils.newSession();
((DefaultRepositorySystemSession)session).setConfigProperty("aether.connector.basic.threads", "2");
LocalRepositoryManager localRepositoryManager = system.newLocalRepositoryManager(session, new LocalRepository("/home/user/.m2/repository"));
((DefaultRepositorySystemSession)session).setLocalRepositoryManager(localRepositoryManager);

And finally let's perform some operation - artifact resolution:

ArtifactRequest req = new ArtifactRequest();
req.setArtifact(new DefaultArtifact("commons-io", "commons-io", "jar", "2.5"));
req.addRepository(new RemoteRepository.Builder("central", "default", "http://repo1.maven.org/maven2").build());
ArtifactResult res = system.resolveArtifact(session, req);

The above tells Aether to resolve artifact commons-io:commons-io:jar:2.5 using local repository inside /home/user/.m2/repository and if it's not found there, to search for the artifact inside http://repo1.maven.org/maven2 remote repository. We could (and it's usual practice) configure more remote repositories (using org.eclipse.aether.resolution.ArtifactRequest#addRepository()) to be searched if artifact isn't available locally.

The code above isn't needed to use Maven inside Karaf or JBoss Fuse, but it brings two super important concepts:

local repository - accessed by Aether with the help of org.eclipse.aether.repository.LocalRepositoryManager interface and org.eclipse.aether.repository.LocalRepository class. Effectively local repository is a wrapper for locally accessible filesystem directory that follows specific structure (organization of Maven artifacts).
remote repository - accessed by Aether with the help of org.eclipse.aether.repository.RemoteRepository interface. Effectively remote repository is a wrapper for URI, a set of policies related to snapshot/release versions plus proxy, mirroring and authentication information.

The key point is that if an artifact can't be found in local repository it is being searched for in (one of the) remote repositories. Proper code should ensure that local repositories are always searched before remote repositories.

Logs

For debugging purposes it is very helpful to see all the operations in logs. We can increase logging level for few loggers:

log4j.logger.org.eclipse.aether = DEBUG
log4j.logger.org.apache.http.headers = DEBUG

Also, we'll add another remote repository to see how Aether checks them all:

req.addRepository(new RemoteRepository.Builder("jboss-public", "default", "https://repository.jboss.org/nexus/content/groups/public").build());
req.addRepository(new RemoteRepository.Builder("central", "default", "http://repo1.maven.org/maven2").build());

Here are the logs when commons-io:commons-io:2.5:jar artifact is resolved and it is not available in local repository:

11:13:47.181 DEBUG {main} [o.e.a.i.i.DefaultLocalRepositoryProvider] : Using manager EnhancedLocalRepositoryManager with priority 10.0 for target/repo-1469178827169
11:13:47.188 INFO  {main} [g.t.m.a.AetherTest] : Request: commons-io:commons-io:jar:2.5 < [jboss-public (https://repository.jboss.org/nexus/content/groups/public, default, releases+snapshots), central (http://repo1.maven.org/maven2, default, releases+snapshots)]
11:13:47.631 DEBUG {main} [o.e.a.i.i.DefaultTransporterProvider] : Using transporter HttpTransporter with priority 5.0 for https://repository.jboss.org/nexus/content/groups/public
11:13:47.632 DEBUG {main} [o.e.a.i.i.DefaultRepositoryConnectorProvider] : Using connector BasicRepositoryConnector with priority 0.0 for https://repository.jboss.org/nexus/content/groups/public
11:13:49.015 DEBUG {main} [o.a.h.headers] : >> GET /nexus/content/groups/public/commons-io/commons-io/2.5/commons-io-2.5.jar HTTP/1.1
11:13:49.015 DEBUG {main} [o.a.h.headers] : >> Host: repository.jboss.org
...
11:13:49.385 DEBUG {main} [o.a.h.headers] : << HTTP/1.1 404 Not Found
...
11:13:49.572 DEBUG {main} [o.e.a.i.i.DefaultTransporterProvider] : Using transporter HttpTransporter with priority 5.0 for http://repo1.maven.org/maven2
11:13:49.572 DEBUG {main} [o.e.a.i.i.DefaultRepositoryConnectorProvider] : Using connector BasicRepositoryConnector with priority 0.0 for http://repo1.maven.org/maven2
11:13:49.704 DEBUG {main} [o.a.h.headers] : >> GET /maven2/commons-io/commons-io/2.5/commons-io-2.5.jar HTTP/1.1
11:13:49.705 DEBUG {main} [o.a.h.headers] : >> Host: repo1.maven.org
...
11:13:49.770 DEBUG {main} [o.a.h.headers] : << HTTP/1.1 200 OK
...
11:13:50.079 DEBUG {main} [o.a.h.headers] : >> GET /maven2/commons-io/commons-io/2.5/commons-io-2.5.jar.sha1 HTTP/1.1
11:13:50.079 DEBUG {main} [o.a.h.headers] : >> Host: repo1.maven.org
...
11:13:50.145 DEBUG {main} [o.a.h.headers] : << HTTP/1.1 200 OK
...
11:13:50.156 DEBUG {main} [o.e.a.i.i.EnhancedLocalRepositoryManager] : Writing tracking file /data/ggrzybek/sources/_testing/grgr-test-maven/target/repo-1469178827169/commons-io/commons-io/2.5/_remote.repositories
11:13:50.161 INFO  {main} [g.t.m.a.AetherTest] : Result: commons-io:commons-io:jar:2.5 < central (http://repo1.maven.org/maven2, default, releases+snapshots)

As we can see here's the sequence of events:

  1. Aether uses local repository at target/repo-1469178827169 location
  2. https://repository.jboss.org/nexus/content/groups/public is checked first and we get HTTP 404
  3. http://repo1.maven.org/maven2 is checked next and we get HTTP 200
  4. Aether fetches SHA1 checksum then for found artifact
  5. Aether writes tracking file at target/repo-1469178827169/commons-io/commons-io/2.5/_remote.repositories that looks like this:
    #NOTE: This is an Aether internal implementation file, its format can be changed without prior notice.
    #Fri Jul 22 11:13:50 CEST 2016
    commons-io-2.5.jar>central=
    
    This file allows us to recall where the artifact was downloaded from.

SNAPSHOTs

Let's see how Aether works when resolving SNAPSHOT versions. We'll reuse the same remote repositories as before. By default using new RemoteRepository.Builder("central", "default", "http://repo1.maven.org/maven2").build() gives us remote repository that's enabled regardless of whether we use the repository to resolve SNAPSHOT or non-SNAPSHOT artifacts. We can of course change it:

RemoteRepository.Builder b1 = new RemoteRepository.Builder("central", "default", "http://repo1.maven.org/maven2");
RemoteRepository.Builder b2 = new RemoteRepository.Builder("jboss-public", "default", "https://repository.jboss.org/nexus/content/groups/public");
RepositoryPolicy enabledPolicy = new RepositoryPolicy(true, RepositoryPolicy.UPDATE_POLICY_ALWAYS, RepositoryPolicy.CHECKSUM_POLICY_FAIL);
RepositoryPolicy disabledPolicy = new RepositoryPolicy(false, RepositoryPolicy.UPDATE_POLICY_ALWAYS, RepositoryPolicy.CHECKSUM_POLICY_FAIL);
b1.setReleasePolicy(enabledPolicy);
b1.setSnapshotPolicy(enabledPolicy);
b2.setReleasePolicy(disabledPolicy);
b2.setSnapshotPolicy(enabledPolicy);
req.addRepository(b1.build());
req.addRepository(b2.build());

In the above example, we explicitly enable resolving SNAPSHOT artifacts in central and jboss-public repositories. We won't try to resolve non-SNAPSHOT artifacts in jboss-public. Here are the logs related to resolving commons-io:commons-io:2.5-SNAPSHOT:jar:

12:11:17.195 DEBUG {main} [o.e.a.i.i.DefaultLocalRepositoryProvider] : Using manager EnhancedLocalRepositoryManager with priority 10.0 for target/repo-1469182277187
12:11:17.201 INFO  {main} [g.t.m.a.AetherTest] : Request: commons-io:commons-io:jar:2.5-SNAPSHOT < [central (http://repo1.maven.org/maven2, default, releases+snapshots), jboss-public (https://repository.jboss.org/nexus/content/groups/public, default, snapshots)]
12:11:17.851 DEBUG {DefaultMetadataResolver-0-1} [o.e.a.i.i.DefaultTransporterProvider] : Using transporter HttpTransporter with priority 5.0 for https://repository.jboss.org/nexus/content/groups/public
12:11:17.852 DEBUG {DefaultMetadataResolver-0-1} [o.e.a.i.i.DefaultRepositoryConnectorProvider] : Using connector BasicRepositoryConnector with priority 0.0 for https://repository.jboss.org/nexus/content/groups/public
12:11:17.853 DEBUG {DefaultMetadataResolver-0-0} [o.e.a.i.i.DefaultTransporterProvider] : Using transporter HttpTransporter with priority 5.0 for http://repo1.maven.org/maven2
12:11:17.854 DEBUG {DefaultMetadataResolver-0-0} [o.e.a.i.i.DefaultRepositoryConnectorProvider] : Using connector BasicRepositoryConnector with priority 0.0 for http://repo1.maven.org/maven2
12:11:18.158 DEBUG {DefaultMetadataResolver-0-0} [o.a.h.headers] : >> GET /maven2/commons-io/commons-io/2.5-SNAPSHOT/maven-metadata.xml HTTP/1.1
12:11:18.158 DEBUG {DefaultMetadataResolver-0-0} [o.a.h.headers] : >> Host: repo1.maven.org
...
12:11:18.225 DEBUG {DefaultMetadataResolver-0-0} [o.a.h.headers] : << HTTP/1.1 404 Not Found
...
12:11:18.245 DEBUG {DefaultMetadataResolver-0-0} [o.e.a.i.i.DefaultUpdateCheckManager] : Writing tracking file /data/ggrzybek/sources/_testing/grgr-test-maven/target/repo-1469182277187/commons-io/commons-io/2.5-SNAPSHOT/resolver-status.properties
12:11:19.332 DEBUG {DefaultMetadataResolver-0-1} [o.a.h.headers] : >> GET /nexus/content/groups/public/commons-io/commons-io/2.5-SNAPSHOT/maven-metadata.xml HTTP/1.1
12:11:19.332 DEBUG {DefaultMetadataResolver-0-1} [o.a.h.headers] : >> Host: repository.jboss.org
...
12:11:19.611 DEBUG {DefaultMetadataResolver-0-1} [o.a.h.headers] : << HTTP/1.1 200 OK
...
12:11:19.850 DEBUG {DefaultMetadataResolver-0-1} [o.a.h.headers] : >> GET /nexus/content/groups/public/commons-io/commons-io/2.5-SNAPSHOT/maven-metadata.xml.sha1 HTTP/1.1
12:11:19.850 DEBUG {DefaultMetadataResolver-0-1} [o.a.h.headers] : >> Host: repository.jboss.org
...
12:11:20.079 DEBUG {DefaultMetadataResolver-0-1} [o.a.h.headers] : << HTTP/1.1 200 OK
...
12:11:20.082 DEBUG {DefaultMetadataResolver-0-1} [o.e.a.i.i.DefaultUpdateCheckManager] : Writing tracking file /data/ggrzybek/sources/_testing/grgr-test-maven/target/repo-1469182277187/commons-io/commons-io/2.5-SNAPSHOT/resolver-status.properties
12:11:20.107 DEBUG {main} [o.e.a.i.i.DefaultTransporterProvider] : Using transporter HttpTransporter with priority 5.0 for https://repository.jboss.org/nexus/content/groups/public
12:11:20.107 DEBUG {main} [o.e.a.i.i.DefaultRepositoryConnectorProvider] : Using connector BasicRepositoryConnector with priority 0.0 for https://repository.jboss.org/nexus/content/groups/public
12:11:20.694 DEBUG {main} [o.a.h.headers] : >> GET /nexus/content/groups/public/commons-io/commons-io/2.5-SNAPSHOT/commons-io-2.5-20151119.212356-154.jar HTTP/1.1
12:11:20.694 DEBUG {main} [o.a.h.headers] : >> Host: repository.jboss.org
...
12:11:20.901 DEBUG {main} [o.a.h.headers] : << HTTP/1.1 200 OK
...
12:11:21.590 DEBUG {main} [o.e.a.i.i.EnhancedLocalRepositoryManager] : Writing tracking file /data/ggrzybek/sources/_testing/grgr-test-maven/target/repo-1469182277187/commons-io/commons-io/2.5-SNAPSHOT/_remote.repositories
12:11:21.591 INFO  {main} [g.t.m.a.AetherTest] : Result: commons-io:commons-io:jar:2.5-20151119.212356-154 < jboss-public (https://repository.jboss.org/nexus/content/groups/public, default, snapshots)

here's the sequence of events:

  1. Aether uses local repository at target/repo-1469182277187 location
  2. commons-io/commons-io/2.5-SNAPSHOT/maven-metadata.xml metadata artifacts are fetched in parallel from both remote repositories.
  3. Metadata is found only in jboss-public repository
  4. Aether fetches metadata SHA1 checksum
  5. target/repo-1469182277187/commons-io/commons-io/2.5-SNAPSHOT/resolver-status.properties is written to track the information about metadata
  6. target/repo-1469182277187/commons-io/commons-io/2.5-SNAPSHOT/maven-metadata-jboss-public.xml file shows that 2.5-20151119.212356-154 is the latest version of SNAPSHOT artifact
  7. Aether downloads commons-io/commons-io/2.5-SNAPSHOT/commons-io-2.5-20151119.212356-154.jar from jboss-public
  8. Aether writes tracking file at target/repo-1469182277187/commons-io/commons-io/2.5-SNAPSHOT/_remote.repositories that looks like this:
    #NOTE: This is an Aether internal implementation file, its format can be changed without prior notice.
    #Fri Jul 22 12:11:21 CEST 2016
    commons-io-2.5-20151119.212356-154.jar>jboss-public=
    
    This file allows us to recall where the artifact was downloaded from.

There's one more thing worth noting - this time Aether invokes some operations in separate threads (DefaultMetadataResolver-0-* threads in addition to main thread). Aether usually does that when doing more than one task at a time.

With non-SNAPSHOT artifact resolution, we checked one repository at a time, because we had one task - org.eclipse.aether.resolution.ArtifactRequest

With SNAPSHOT artifact resolution, Aether internally invokes two org.eclipse.aether.resolution.MetadataRequest tasks (one for each remote repository) to find the latest SNAPSHOT.

The number of threads used in this operation can be controlled with aether.metadataResolver.threads configuration property.

Summary

In this article I presented some of the internal details of Aether library. We've seen pure Java code examples. In next installment we'll enter OSGi world and see how can we use higher level libraries.