vO.3 Release Notes

Introducing Blue Brain Nexus Forge

This is an initial release of Nexus Forge and a first step towards our goal to make building and using Knowledge Graphs easier.

Knowledge Graphs are often built from heterogenous data and knowledge (i.e. data models such as ontologies, schemas) coming from different sources and with different formats (ie. structured, unstructured). Building and using Knowledge Graphs from such data and knowledge require many components ranging from data transformations (e.g ETL), data models (such as ontologies, schemas) defining the targeted domain, scalable stores for managing the resulting graph, as well as data and factual knowledge search and access. While many systems and tools that implement these components exist separately, they often come with a high level of complexity when dealing with data, ontologies and schemas.

Blue Brain Nexus Forge enables data scientists, data and knowledge engineers to uniquely combine these components under a consistent, single and generic Python Framework. With Blue Brain Nexus Forge, non-expert users can build and use Knowledge Graphs to:

  • Discover and reuse data models such as ontologies and schemas to shape, constrain, link and add semantics to datasets.

  • Build Knowledge Graphs from heterogeneous sources and formats: defining, executing and sharing data mappers to transform data from a source format to a target one conformant to schemas and ontologies

  • Integrate with various stores for Knowledge Graph storage, management, scaling capabilities, data search and access

  • Validate and register data and metadata

  • Search and download data and metadata from a Knowledge Graph without the complexity of underlying technologies such as JSON-LD/RDF (Resource Description Framework).

Find below the different Nexus Forge interfaces exposing its main features.

Forge

This is the main interface for configuring and using Nexus Forge main features.

Forge initialisation signature is:

KnowledgeGraphForge(configuration: Union[str, Dict], **kwargs)

where the configuration accepts a YAML file or a JSON dictionary, and **kwargs can be used to override the configuration provided for the Store.

The YAML configuration has the following structure:

Model:
  name: <a class name of a Model>
  origin: <'directory', 'url', or 'store'>
  source: <a directory path, an URL, or the class name of a Store>
  bucket: <when 'origin' is 'store', a Store bucket, a section or segment in the Store>
  endpoint: <when 'origin' is 'store', a Store endpoint, default to Store:endpoint>
  token: <when 'origin' is 'store', a Store token, default to Store:token>
  context:
    iri: <an IRI>
    bucket: <when 'origin' is 'store', a Store bucket, default to Model:bucket>
    endpoint: <when 'origin' is 'store', a Store endpoint, default to Model:endpoint>
    token: <when 'origin' is 'store', a Store token, default to Model:token>
Store:
  name: <a class name of a Store>
  endpoint: <an URL>
  bucket: <a bucket as a string>
  token: <a token as a string>
  versioned_id_template: <a string template using 'x' to access resource fields>
  file_resource_mapping: <an Hjson string, a file path, or an URL>
Resolvers:
  <scope>:
    - resolver: <a class name of a Resolver>
      origin: <'directory', 'web_service', or 'store'>
      source: <a directory path, a web service endpoint, or the class name of a Store>
      targets:
        - identifier: <a name, or an IRI>
          bucket: <a file name, an URL path, or a Store bucket>
      result_resource_mapping: <an Hjson string, a file path, or an URL>
      endpoint: <when 'origin' is 'store', a Store endpoint, default to Store:endpoint>
      token: <when 'origin' is 'store', a Store token, default to Store:token>
Formatters:
  <identifier>: <a string template with replacement fields delimited by braces, i.e. '{}'>

and the python configuration would be like:

{
    "Model": {
        "name": <str>,
        "origin": <str>,
        "source": <str>,
        "bucket": <str>,
        "endpoint": <str>,
        "token": <str>,
        "context": {
              "iri": <str>,
              "bucket": <str>,
              "endpoint": <str>,
              "token": <str>,
        }
    },
    "Store": {
        "name": <str>,
        "endpoint": <str>,
        "bucket": <str>,
        "token": <str>,
        "versioned_id_template": <str>,
        "file_resource_mapping": <str>,
    },
    "Resolvers": {
        "<scope>": [
            {
                "resolver": <str>,
                "origin": <str>,
                "source": <str>,
                "targets": [
                    {
                        "identifier": <str>,
                        "bucket": <str>,
                    },
                    ...,
                ],
                "result_resource_mapping": <str>,
                "endpoint": <str>,
                "token": <str>,
            },
            ...,
        ],
    },
    "Formatters": {
        "<name>": <str>,
        ...,
    },
}

The required minimal configuration is:

  • name for Model and Store

  • origin and source for Model

See nexus-forge/examples/configurations/ for YAML examples.

Create a forge instance:

forge = KnowledgeGraphForge("../path/to/configuration.yml")

Resource

A Resource is an identifiable data object with a set of properties. It is mainly identified by its Type, which value is a concept, such as, Person, Contributor, Organisation, Experiment, etc. Reserved properties of a Resource are: id, type and context.

Create a resource using keyword arguments, a JSON dictionary, or a dataframe:

resource = Resource(name="Jane Doe", type="Person", email="jane.doe@examole.org")
data = {
 "name": "Jane Dow",
 "type" : "Person",
 "email" : "jane.doe@examole.org"
}
resource = Resource(data)
import pandas as pd

dataframe = pd.read_csv("data.csv")

resources = forge.from_dataframe(dataframe)

A resource can have files attached by assigning the output of forge.attach method to a property in the resource:

resource.picture = forge.attach("path/to/file.jpg")

Dataset

A Dataset is a specialization of a Resource that provides users with operations to handle files and describe them with metadata. The metadata of Datasets refers specifically to, but not limited to:

  • provenance: contribution (people or organizations that contributed to the creation of the Dataset),

    • generation (links to resources used to generate this Dataset),

    • derivation (links another resource from which the Dataset is generated),

    • invalidation (data became invalid)

  • access: distribution (a downloadable form of this Dataset, at a specific location, in a specific format)

The Dataset class provides methods for adding files to a Dataset. The added files will only be uploaded in the Store when the forge.register function is called on the Dataset so that the user flow is not slowed down and for efficiency purpose. We implemented this using the concept of LazyAction, which is a class that will hold an action that will be executed when required.

After the registration of the Dataset, a DataDownload resource will be created with some other automatically extracted properties, such as content type, size, file name, etc.

Dataset(forge: KnowledgeGraphForge, type: str = "Dataset", **properties)
  add_parts(resources: List[Resource], versioned: bool = True) -> None
  add_distribution(path: str, content_type: str = None) -> None
  add_contribution(agent: str, **kwargs) -> None
  add_generation(**kwargs) -> None
  add_derivation(resource: Resource, versioned: bool = True, **kwargs) -> None
  add_invalidation(**kwargs) -> None
  add_files(path: str, content_type: str = None) -> None
  download(source: str, path: str) -> None

Storing

Storing allows users to persist and manage Resources in the configured Store. The current supported stores are:

  • DemoStore: an in-memory Store (do not use it in production)

  • BlueBrainNexus

The Store interface can be extended to support other stores.

Resources contain additional information in hidden properties to allow users to recover from errors:

  • c_synchronized`, indicates that the last action succeeded

  • _last_action, contains information about the last action that took place in the resource (e.g. register, update, etc.)

  • _store_metadata, keeps additional resource metadata provided by the store such as version, creation date, etc.

register(data: Union[Resource, List[Resource]]) -> None
update(data: Union[Resource, List[Resource]]) -> None
deprecate(data: Union[Resource, List[Resource]]) -> None

Querying

It is possible to search for resources from the store by (1) retrieving them by id, (2) specifying filters with the properties and specific values and (3) using SPARQL query.

retrieve(id: str, version: Optional[Union[int, str]] = None) -> Resource
paths(type: str) -> PathsWrapper
search(*filters, **params) -> List[Resource]
sparql(query: str) -> List[Resource]
download(data: Union[Resource, List[Resource]], follow: str, path: str) -> None

Versioning

The user can create versions of Resources, if the Store supports this feature.

tag(data: Union[Resource, List[Resource]], value: str) -> None
freeze(data: Union[Resource, List[Resource]]) -> None

Converting

To use Resources with other libraries such as pandas, different data conversion functions are available.

as_json(data: Union[Resource, List[Resource]], expanded: bool = False, store_metadata: bool = False) -> Union[Dict, List[Dict]]
as_jsonld(data: Union[Resource, List[Resource]], compacted: bool = True, store_metadata: bool = False) -> Union[Dict, List[Dict]]
as_triples(data: Union[Resource, List[Resource]], store_metadata: bool = False) -> List[Tuple[str, str, str]]
as_dataframe(data: List[Resource], na: Union[Any, List[Any]] = [None], nesting: str = ".", expanded: bool = False, store_metadata: bool = False) -> DataFrame
from_json(data: Union[Dict, List[Dict]], na: Union[Any, List[Any]] = None) -> Union[Resource, List[Resource]]
from_jsonld(data: Union[Dict, List[Dict]]) -> Union[Resource, List[Resource]]
from_triples(data: List[Tuple[str, str, str]]) -> Union[Resource, List[Resource]]
from_dataframe(data: DataFrame, na: Union[Any, List[Any]] = np.nan, nesting: str = ".") -> Union[Resource, List[Resource]]

Formatting

A preconfigured set of string formats can be provided to ensure the consistency of data.

format(what: str, *args) -> str

Resolving

Resolvers are helpers to find commonly used resources that one may want to link to. For instance, one could have a set of pre-defined identifiers of Authors, and to make several references to the same Authors, a resolver can be used.

resolve(text: str, scope: Optional[str] = None, resolver: Optional[str] = None, target: Optional[str] = None, type: Optional[str] = None, strategy: ResolvingStrategy = ResolvingStrategy.BEST_MATCH) -> Optional[Union[Resource, List[Resource]]]

Reshaping

Reshaping allows trimming Resources by a specific set of properties.

reshape(data: Union[Resource, List[Resource]], keep: List[str], versioned: bool = False) -> Union[Resource, List[Resource]]

Modeling

Schemas describing the shapes and constrains applying to Resources can be loaded and accessed from the Modeling functions. Users can explore predefined domain Types and the properties that describe them using Templates. Templates can be used to create resources with the specified properties. Those resources can then be validated.

context() -> Optional[Dict]
prefixes(pretty: bool = True) -> Optional[Dict[str, str]]
types(pretty: bool = True) -> Optional[List[str]]
template(type: str, only_required: bool = False) -> None
validate(data: Union[Resource, List[Resource]]) -> None

At the time of this release, the supported schemas language is W3C SHACL. Nexus Forge can be extended to support other schema language (e.g JSON Schema).

Mapping

Mappings are pre-defined configuration files that encode the logic or rules on how to transform data from a specific source into Resources of a targeted Type and conformant to targeted schema. Mapping rules are to be executed by a mapper.

sources(pretty: bool = True) -> Optional[List[str]]
mappings(source: str, pretty: bool = True) -> Optional[Dict[str, List[str]]]
mapping(entity: str, source: str, type: Callable = DictionaryMapping) -> Mapping
map(data: Any, mapping: Union[Mapping, List[Mapping]], mapper: Callable = DictionaryMapper, na: Union[Any, List[Any]] = None) -> Union[Resource, List[Resource]]

Nexus Forge comes with a DictionaryMapper taking JSON structured data as a source and transform it into another JSON structured data.