When testing a project that uses Spring Data repositories, we may want to insert ‘known-good’ data into arbitrary MongoDB collections, reading it, for example, from JSON iles (typically somewhere in the test/resources subtree).

However, some of the data stored in Mongo is in BSON format, which extends the plain JSON format with some Object types (such as ISODate and ObjectId) which have their respective Java counterparts (respectively, java.util.Date and org.bson.types.ObjectId): in order to be congruent with the POJO definitions of our models, we need to convert these values into their corresponding type prior to saving to Mongo.

Approaches

Naive approach

One approach would be to simply try the conversion on any given String field to parse it into any of the possible types; if all the conversions fail, then we assume it’s a genuine string and we save it as such.

This is expensive, from a computing viewpoint, and not very sophisticated, but it sure is simple to implement and to understand.

A significant drawback, though, is that it causes issues for those fields that are genuinely expected to be strings, but are, nonetheless, “convertible” to another type (eg, a UUID).

Use Third-Party libraries

A couple of open-source projects (fongo and nosql-unit) are available that would address this problem (and many others).

We could just add these as Maven dependencies and use their suggested mechanism.

Custom Parsers

The most flexible (but more complex to implement) option would be to implement our own custom language and parsers to enable conversion from/to Java objects from strings:

{
    "_id": {"$oid": "54ad737cab0e700c67651e72},
    "username": "joe",
    "created_at": { "$date": "2014-01-01T10:01:22+000"},
    /* other data follows */
}

This requires parsing the full JSON (possibly read in as a Map<String, ?> via ObjectMapper) then recursively traverse it and execute the custom conversions, according to the various $ operators.

Finally, we would write the data to Mongo via the Java Driver and a DBObject created from the converted Map.

Semi-Custom Approach, Leveraging Jackson

Given that we already have the POJOs defined, we can use an ObjectMapper to convert the JSON into the appropriate Java object and then persist it using a generic MongoOperations instance.

Example code in the test method:

@Inject
private MyPojoRepository repository;

@Inject
protected MongoOperations ops;

TestDataHelper<MyPojo> helper;

public final String DATA = "/data/my_data.json";

@Before
public void setupData() throws IOException {
    clearColl();
    helper = new TestDataHelper<>(DATA, ops, MyPojo.class);
    helper.saveData();
}

@Test
public void testCanFindById() {
    String id = "1234";
    MyPojo found = repository.findOne(id);
    assertNotNull(found);
    assertThat(found.getSomeData(), equalTo("expected"));
}

The format of the JSON data can be kept extremely simple:

{
    "title": "Test data",
    "collection": "tests",
    "data": [
        {
            "id": 1,
            "name": "test-1",
            "value": "not my fault"
        },
        {
            "id": 2,
            "name": "test-2",
            "value": "did this and that"
        }
    ]
}

This has the advantage of simplicity and enables us to use @JsonXxxx annotations in the entity class(es) to overcome special cases; however, it is relatively inflexible, as our entity classes must be (de)serialized from/to JSON.

Implementation

Given the simplicity and relative wide applicability of the Leverage Jackson method outlined above, this is the one we will demonstrate here, with a full implementation in the Spring Template project (see the TestDataHelper and MultiTestDataHelper classes).

In order to enable full JSON (de)serialization, we need to first define a “wrapper” class that will contain both the data (or a generic <T> type) as well as the metadata (in our case, the collection name where the data ought to be saved to):

public class TestData<E> {

  String title;
  String collection;
  List<E> data;

  /**
   * This class should never need to be created directly,
   * use {@link com.fasterxml.jackson.databind.ObjectMapper} to
   * deserialize from a JSON data file.
   */
  private TestData() {}

  public String getTitle() {
    return title;
  }

  public String getCollection() {
    return collection;
  }

  public List<E> getData() {
    return data;
  }
}

The TestDataHelper class then only needs to create an ObjectMapper and point it to the JSON data (typically, kept in a test/resource/data folder which is part of the classpath, if the project is built under Maven):

public class TestDataHelper<T> {

    private TestData<T> testData;

    // ...

    private void readValues() throws IOException {
        testData = mapper.readValue(inTestData,
                        new TypeReference<TestData<T>>() { });

        // At this point the inner data is not of the right type,
        // but a List<Map<?, ?>>, so we need to convert it
        List<T> typedResult = new ArrayList<>(testData.data.size());

        for (int i = 0; i < testData.getData().size(); ++i) {
          Map<?, ?> itemAsMap = (Map<?, ?>) testData.getData().get(i);  // 1
          T item = mapper.convertValue(itemAsMap, clazz); // 2
          typedResult.add(item);
        }
        testData.data = typedResult;
    }
}

As noted in the code above, unfortunately, even using the TypeReference construct and typing it to be of the right generic type, due to Java type erasure [1] Jackson still fails to correctly deserialize the internal array as a list of type T values, creating instead a HashMap<String, Object>: this requires us to further “convert” the inner objects to be of the right Java type.

Because of erasure, the assignment (by the code in ObjectMapper) of HashMap``s to the elements of ``TestData.data (which is of type List<T>), does not cause a ClassCastException – however, requires us to (somewhat paradoxically) to cast the i-th element to the “wrong” type so that we can then convert it to the “right” type.

The joy of generics and type erasure [2] .

The other awkward area is the need to pass in a Class<T> object at construction, as Java does not allow to dereference a .class from a generic type; however, an object of that type is needed by Jackson (most likely, for the same reason) internally to correctly execute the conversion.

Hence, the TestDataHelper constructor looks like the following:

public TestDataHelper(String resourceName, MongoOperations operations,
      Class<T> clazz) {
    this.resourceName = resourceName;
    this.inTestData = getClass().getResourceAsStream(resourceName);
    this.operations = operations;
    this.clazz = clazz;
}

See the full code in the Spring Template github project.

Multiple Collections

From here, it’s pretty straightforward to generalize the approach to the case in which one needs to test data in multiple collections (eg, when one document in one collection makes a reference to a doc in another):

public class MultiTestDataHelper {

    private Map<Class<?>, TestDataHelper<?>> helpersMap = new HashMap<>();

    // ...

    public void addTestHelper(String resourceName, Class<?> classFor) {
      helpersMap.put(classFor,
             new TestDataHelper<>(resourceName, ops,   classFor));
    }

    public void saveAllData() throws IOException {
      for (TestDataHelper<?> helper : helpersMap.values()) {
        helper.saveData();
      }
    }

    public<T> void saveDataFor(Class<T> clazz) throws IOException {
      getHelperFor(clazz).saveData();
    }

    public<T> List<T> getDataFor(Class<T> clazz) throws IOException {
      return getHelperFor(clazz).getTestData();
    }

    public boolean isAllDataValid() {
      for (TestDataHelper<?> helper : helpersMap.values()) {
        if (!helper.isDataFileValid()) return false;
      }
      return true;
    }
}

Usage

As outlined above, using these helper classes is a two-step process:

define the test data in JSON;
load and save the data using the helper classes, then test agains the data.

An example JSON file may look like the following:

{
    "title": "Test data for MultiTest - these are dogs",
    "collection": "dogs",
    "data": [
        {
            "id": 197,
            "species": "Doberman",
            "name": "Fuffy",
            "apple_id": 1
        },
        {
            "id": 245,
            "species": "German Shepherd",
            "name": "Fritz",
            "apple_id": 2
        },
        {
            "id": 246,
            "species": "Italian Shepherd",
            "name": "Antonio",
            "apple_id": 2
        }

    ]
}

to test DB activities against the following POJO:

@Document(collection = "dogs")
public static class Dog {

  @Id
  Long id;
  String name;
  String species;
  Long appleId;

}

and the test class would use it like this:

@Test
public void testSaveAllData() throws Exception {
  instance.addTestHelper("/data/multi/test-dogs.json", Dog.class);
  instance.addTestHelper("/data/multi/test-apples.json", Apple.class);
  instance.saveAllData();
  assertEquals(2, operations.findAll(Apple.class).size());
  assertEquals(3, operations.findAll(Dog.class).size());
}

Again, example usage can be found in the github repository.

NOTES

[1]

Type erasure was implemented in Java to support backward compatibility when generics were introduced: as everything that is “backward compatible”, it sounded like a good idea at the time (Java 5), but by now (when Java 8 is with us) turns out to be a major drag. For more info on the topic, please consult Effective Java chapter 5, Generics or Java Generics chapter 6 on Reification.

[2]	Interestingly enough, Scala, which essentially came of age after Java 5 made it abundantly clear that (a) generics were a really useful feature and (b) type erasure was rather inconvenient, but they still had to go ahead with it, to maintain interoperability with Java code and the JVM.

Code Trips & Tips

Approaches

Naive approach

Use Third-Party libraries

Custom Parsers

Semi-Custom Approach, Leveraging Jackson

Implementation

Multiple Collections

Usage

Leave a comment Cancel reply

Trending

Running Your Own LLM Chat App on Apple Silicon with vLLM-MLX

When LLM do something impressive, yet fail the common sense test

Publish a Container to Amazon ECR using GitHub Actions

Implementing Pooled Embeddings in CUDA Kernel

Data-driven tests for Spring Data (MongoDB)

Approaches

Naive approach

Use Third-Party libraries

Custom Parsers

Semi-Custom Approach, Leveraging Jackson

Implementation

Multiple Collections

Usage

Share this:

Leave a comment Cancel reply

Trending

Running Your Own LLM Chat App on Apple Silicon with vLLM-MLX

When LLM do something impressive, yet fail the common sense test

Publish a Container to Amazon ECR using GitHub Actions

Implementing Pooled Embeddings in CUDA Kernel