Earthster Developer Blog

Developer Blog


Thursday 22 July 2010

Reference data for chemicals and other substances

The Earthster RefData project is developing reference data relevant to Life Cycle Analysis (LCA) and publishing it as linked open data. A key component of this reference are lists of elementary flows, that is flows of substances or energy to or from the environment, that are relevant to LCA.

I have been looking for reference URLs that I can use to identify these substances which ideally could be dereferenced to retrieve RDF data about each substance. That data doesn't seem to exist at the moment.

Looking at chemicals, there are a number of identifier schemes for identifying chemicals. The most commonly used seems to be CAS Numbers which are issued by the American Chemical Society. The society operates the CAS Registry, a database of information about chemicals identified by these numbers, but this is not freely available.

National Center for Biotechnology Information (NCBI) maintain PubChem, a freely available database of information about chemicals. I even found a reference to an RDF translation of the database, but unfortunately the link is broken. Unfortunately, PubChem does not use CAS numbers as the identifier for a chemical. A search for aniline for example yields one page that does not contain a reference to aniline's CAS number, and another page that does. These two pages have different SIDs but share a compound ID. In fact there are a lot of pages with the same compound ID and different SIDs. I wonder if there is one that is the reference page. More work is needed to understand the structure and contect of PubChem to see if it is a potential source of LOD reference data for chemicals.

Wikipedia contains quite a rich set of information about chemicals in its info boxes and DBPedia is extracting some of that information. The DBPedia info box ontology defines a property for CAS numbers, however, my recent investigations indicated that extraction of this property can be patchy - its not in the RDF for cases where it is present in the wikipedia entry. No doubt that will improve, and we might even be able to help with that.

So far, I haven't found LOD reference data chemicals that I can use. Still looking ...

RefData Server is public

This post is a little overdue as the first snapshot release of the RefData Server went up on the Earthster Google code site on 08 July. RefData Server is the web face of the Earthster reference data activity which is developing a collection of Life Cycle Assessment (LCA) reference data for publication on the web. It is a configuration of the the Earthster LodServer tool.

This first snapshot release takes the data produced by the RefData flows application and publishes it as HTML and RDF on the web. Currently this data includes compartments, substances and elementary flows from EcoSpold 1 and ILCD. It also includes mappings between these concepts.

Sunday 20 June 2010

The FlowData project is now public

Earthster has a number of concurrent projects. The RefData project aims to publish reference data useful to LCA practitioners and to enable the community to comment on and add to this reference data. The first focus of this project is producing a reference list of elementary flows.

The FlowData subproject of RefData ingests data about elementary flows and turns it into RDF formated reference data. The input data used so far has been supplied by GreenDelta who produced it whilst working on their format converter which not only translates between EcoSpold and ILCD formatted files but also, from version 8, between the different nomenclatures used by the two formats.

The first version of this FlowData project has been loaded into the RefData Mercurial repository and the reference data it is producing made available as a zip file. This release does not indicate that the data is ready for production use. It is merely the first step of developing the code in public.

Friday 18 June 2010

Flows and Effects

If you look at the EcoSpold 1 reference list of elementary flows you will find flows with terms like Transformation, from arable, non-irrigated in the same slot where other flows have terms like zinc and Aluminium, 24% in bauxite, 11% in crude ore, in ground.

ISO 14044 defines an elementary flow to be either:

(1) material or energy entering the system being studied, which has been drawn from the environment without previous human transformation
(2) material or energy leaving the system being studied, which is discarded into the environment without subsequent human transformation

Land transformations don't really fit that definition. This suggests that the data structures defined originally to hold information about elementary flows are being used to hold other kinds of information. This is not an unusual practice in IT systems, but it is not a very good practice and is usually adopted only when the IT system does not meet and cannot be effectively changed to meet, changing requirements.

The Earthster Core Ontology (ECO) introduces the concept of an Effect. Processes have effects on the environment. There are also also social and other effects. Emitting CO2 to the atmosphere is an effect of a process and this kind of effect is an elementary flow. So Elementary Flows are a subclass of Effect. Other kinds of effect such as land use and land transformation are modelled as effects in ECO, but they are not modelled as elementary flows; in those cases nothing is flowing and they do not fit the ISO 14044 definition of an elementary flow.

Elementary flows in ECO must have a flowable, i.e. something that flows. In accordance with the ISO 14044 definition, there are two subclasses of flowable, Energy and Substance.

Here we see the flexibility that use of Semantic Web allows, enabling the refinement of concepts and the introduction of new concepts as requirements change.

Thursday 17 June 2010

What is a compartment?

A compartment is a concept from the field of Life Cycle Assessment (LCA). The basic idea is that the impact of an emission from a factory, say, depends on where the emission ocurrs. If CO2 is disolved in a lake it will have a different impact from being disolved in a river which will also be different from releasing it into the atmosphere. Compartments are used to classify emissions into different categories to help estimate what impact they have.

The Earthster Core Ontology (ECO) is a core domain ontology for LCA. The first release of ECO modeled compartments as a subspace of the environment defined by properties such as a medium (e.g. air or water), a longevity (either long term or not) and a population density (urban or not). These were represented in OWL as individuals rather than as classes. However, as I wrote the code to turn the list of EcoInvent and ILCD compartments into RDF, I decided to think again about the nature of compartments and how the they might be modelled in OWL.

One of the key inputs I received when first working on ECO was a concern from domain experts that the idea of compartments was rather limiting. The suggestion was that there is a need to be able classify specific flows in arbitrary ways based on arbitrary properties of a flow

The way the term compartment is used in LCA can be a little confusing. The LCA literature talks about emissions to compartments. ISO 14044, EcoInvent and ILCD name compartments with terms like emission to air. So if you accept these terms, the literature is really saying emission to emission to air, which is a bit odd. This is done because compartments are used not just for emissions but also resource consumption. So air is not enough to identify a compartment because resources from air and emissions to air are two different compartments. The first version of ECO represented the direction of a flow as a property of a flow rather than a property of a compartment. Overall, that works well, but it does make it difficult to give URIs to the compartments as defined in EcoSpold and ILCD. Hence the need to reconsider.

I tried to track down a definition of what a compartment is, which proved to be quite elusive until I found the ILCD nomenclature and other conventions document.

Compartments are used to classify elementary flows. However, as they are referred to in the LCA literature, they are not classes or types, but individuals. The literature does not refer to elementary flows of type emission to air, but of an emission to the compartment emission to air. So compartments are individuals used for classification, rather than classes themselves. This may not be an ideal conceptual model, but it is the one we have for now and so ECO should be able to represent it. It may evolve in future.

Properties compartments, as currently conceived, have include:
  • whether it is a resource consumption compartment or an emissions compartment
  • the medium to or from which a flow occurs, e.g. air or sea water
  • the time frame of the flow, e.g. long term or not
  • population density - whether urban or not
A goal of ECO is to be extensible so that new concepts, such as new compartments, can be easily added in the future. Another goal is to make effective use of OWL's capabilities so that emissions to sea water can be automatically recognised as a subcategory of emissions to water. ECO now defines classes corresponding to the values of this extensible list of properties. There are disjoint subclasses of Compartment that are resource consumption compartments and emissions compartments. There are subclasses that correspond to the various compartment media such as air, water and sea water. There are respectively disjoint subclasses corresponding to long term and not long term effects and to compartments with urban and rural population densities. Any current EcoInvent or ILCD compartment can be defined as an instance of one or more of these classes.

It ispossible to extend this set of classes and to use OWL inference to determine subclass relationships amongst them.

As reported above, one of the constraints on the first version of ECO was to support a more general classification mechanism than compartments. The present design offers a choice for future extensibility. The concept of compartment can be extended and refined and that can be the primary mechanism for classifying flows. Alternatively, flows may be classified more directly in terms of their immediate properties. A combination of both is also possible.

Thus the present design reflects the concept of compartments as they are currently conceived, enables of the concept of a compartment to be extended and refined and also supports other classification schemes.