Linked data to streaming data sensors sources for direct access

How


BACKGROUND
The use of Linked Data (LD), or Web of Data, has been increasing in the last two decades cooperatively with the concept of the "Semantic Web" and aims to enable humans and computer systems to publish, share and connect the data on the Web (Poveda-Villalón, 2012;Vandenbussche et al., 2016).This expansion was possible because the previous works on the Linked Data principles and Semantic Web (Berners-Lee, 2006) provided conditions to expand these technologies.To support the Linked Data and Semantic Web, a set of integrated technologies like Resource Description Framework (RDF) graphs, Uniform Resource Identifiers (URI), RDF Query Language (SPARQL), and the Web Ontology Language are standardised by the World Wide Web Consortium (W3C) (Pauwels et al., 2018).Semantic web technologies are considered one of the most promising methods for achieving efficient information exchange because they can share data and improve interoperability among highly heterogeneous systems (Tchouanguem Djuedja et al., 2021).
An RDF graph is a data model that uses semantic triples in the schema of subject-predicateobject expressions linked to each other via URIs (Pauwels and Terkaj, 2016;Pauwels et al., 2018).It can be written in RDF/XML or different formats, such as Turtle or JSON-LD.Larger RDF datasets are stored in specialised databases called triple or quad stores for easy access, linking, and querying over networks (Pauwels and Terkaj, 2016;Pauwels et al., 2018).For example, the construction industry uses the term Linked Building Data (LBD) to describe the implementation of semantic web technologies (SWT) for organising information over a collection of RDFs for multiple stakeholders over distinguished software and the web (Tchouanguem Djuedja et al., 2021).While the Linked Building Data field is still in its infancy, it has already produced promising results that have garnered attention from industry professionals.
The IFC is an open standard data model of the buildingSMART standardisation organisation used for exchanging information in the Architecture, Engineering, Construction, Owner Operator (AECOO) industry and achieving interoperability as one of the BIM premises.Adding to this, some research on the industrial sector has started to apply the IFC for interoperability (D.L. de M. Nascimento, Roeder, Calvetti, Lopez, et al., 2022;D. L. de M. Nascimento, Roeder, Calvetti, Mustelier, et al., 2022;D. L. M. Nascimento et al., 2022;Calvetti et al., 2023).Also, Linked Open Data (LOD) and SWT are required and efficient tools to manage information in the construction business and support the data interoperability (Pauwels and Terkaj, 2016).This scenario led efforts to the development of the ifcOWL, the Ontology Web Language (OWL), to make the IFC available in RDF format (Beetz, van Leeuwen and de Vries, 2009;Pauwels et al., 2015;Pauwels and Terkaj, 2016).
Semantic web technologies have been used to monitor industrial plants and to manage and exchange an enormous amount of data for supporting the design and operation phases (Terkaj, Tolio and Urgo, 2015).Also, promising results from analysing the use of Linked Data for building performance were obtained by combining the Internet of Things (IoT) technologies capable of reading Time-series data from various sensors and data storing in InfluxDB (Donkers et al., 2021).
The IoT concept allows physical objects to be interconnected through the integration of embedded electronics, software, and sensors (Lekic and Gardasevic, 2018).This connectivity enables these objects to exchange data with stakeholders and other connected devices (Lekic and Gardasevic, 2018).
The IoT enhances the ability to remotely monitor and manipulate physical entities via existing network infrastructure, thereby increasing efficiency, precision, and economic benefits while reducing human intervention.The incorporation of sensors into the IoT paradigm has given rise to intelligent cities, homes, power grids, virtual power plants, and transportation systems.This work presents technologies commonly used in IoT, such as MQTT, WebSockets, server-sent events for sending messages, and InfluxDB for storing Time-series data.MQTT (Message Queuing Telemetry Transport) was released in 1999 by IBM and is a lightweight messaging protocol that enables efficient communication between devices or applications with limited processing power and bandwidth (Soni and Makwana, 2017).It follows a publish-subscribe model, where publishers send messages on specific topics, and subscribers receive notifications on the issues of their interest.MQTT's small packet size and low overhead make it an ideal choice for IoT devices with limited memory, processing power, and network resources.Server-Sent Events and WebSocket tools are utilised for server-to-client interaction.Both facilitate the delivery of messages from the server to the client (browser), but WebSocket also allows for messages to be transmitted in the reverse direction (Słodziak and Nowak, 2016).
InfluxDB is an open-source time-series database deployed by InfluxData that stores data as individual points consisting of a fieldset and timestamp (Kang and Hong, 2018).These points can be indexed or unindexed, providing advantages to CPU performance, disk space, and query response time.Its suitability for sensor monitoring cases with big data has made it a popular choice for time-series data storage.Node-RED is a flexible tool for simulating the integration of these technologies.Node-RED is a freely available development tool that utilises a flow-based design to create prototypes and expedites the development of applications by integrating IoT hardware devices, Application Programming Interfaces (APIs), and online services (Lekic and Gardasevic, 2018).In order to combine and integrate the different types of information received from the sensors, an API is usually combined with a data treatment that filters and directs every kind of data to specific processing according to its typology.The monolithic Databus concept can be expanded to efficient microservices, thus increasing the system's efficiency.service is an architectural approach employed in enterprise applications that involves decomposing a large, monolithic application into numerous autonomous, smaller standalone components (Mütsch, 2016).This strategy enhances adaptability and scalability by developing fine-grained services that interact through lightweight mechanisms, often via an HTTP resource API (Mütsch, 2016).

Streaming data sources approach
The approach proposed in this paper is based on the premise that sensor time-series data can be searchable the same way any Knowledge Base information can, by publishing traditional timeseries databases and stream endpoints in Linked Data format.This approach can offer several benefits.These include efficient data management, improved data accessibility, enhanced data integration, scalability, near real-time analysis, and data provenance and trust support.The Industrial Internet of Things (IIoT) enhances efficiency through industry by creating a secure datasharing mechanism for sensors and actuators (Meng and Li, 2021).In machine manufacturing, digital services in the IIoT sector bring solutions like effective condition monitoring and predictive maintenance (Kammerer et al., 2020).Data Interlinking, the process of connecting the published dataset with related data sources in the cloud, is a significant challenge in publishing the Linked Data (Martin, Kühl and Satzger, 2021).MQTT and Kafka can collaborate for near real-time bidirectional data processing (Choii et al., 2010).These applications benefit from sharing sensor data via URIs and interoperability between MQTT data, RDF datasets, and IFC schema.

METHOD
The research demonstrates that the prototype fulfils its intended functions.The Proof of Concept (PoC) conducted comprises three main elements: (1) the research prototypes, (2) a proof of concept demonstration, which confirms that the prototypes deliver the anticipated outcomes, and (3) a post facto arguments, which advocate for the continued deployment of the prototypes targeting further developments (Elliott, 2021).The methodology suggested in this study advocates that data from sensor time series can be integrated into triple stores and accessed through an API.This prototype API can be set up to link the URIs to the appropriate data points.After, for demonstration, a system was developed using Node-Red with multiple data connection types and InfluxDB to implement and test a streaming sensor data point over URIs.The Node-Red is not designed to build highly scalable and collaborative backends -your best bet is to use it to create a rapid prototype first and then consider a language or framework, like Node.js, Nest or Golang, to plan tasks for the development team.
Finally, the findings of the search were conducted for comparable properties across several established ontologies, including Sensor Data simulator and storage, Time-series storage, Linked data endpoints, RDF store, and OWL ontologies design.Results and discussions highlight possible practical industrial uses.Further developments are envisaged by testing a new approach to query the SPARQL endpoint using an API.It suggests exploring data streaming and developing an IFC file generator for enhancing supply-chain data use.The importance of user-friendly approaches and system-to-system interaction for data interoperability is also highlighted.

Search for equivalent properties in existing ontologies
Reusing existing and consolidated properties is good practice instead of creating new ones whenever possible.So, before creating our properties, we conducted a search for existing ones with the aid of the Linked Open Vocabulary (https://lov.linkeddata.es/dataset/lov/)(Linked Open Vocabularies (LOV), no date).Table 1 summarises the main ontologies found.Streaming data sources deployment MQTT is a vital protocol for IIoT (Industrial Internet of Things) due to its lightweight nature, scalability, reliability, Quality of Service (QoS), and security features (Quincozes, Emilio and Kazienko, 2019).It is designed for low bandwidth and unstable network environments, making it ideal for real-time messaging services in network-connected devices (Salagean and Zinca, 2020;Roldán-Gómez et al., 2022).MQTT can handle large-scale projects, ensuring message delivery even in unreliable networks (Salagean and Zinca, 2020;Roldán-Gómez et al., 2022).It also supports various authentication and data security mechanisms (Salagean and Zinca, 2020).In the context of IIoT and enterprise layers, MQTT ensures efficient communication between IoT devices and the cloud (Quincozes, Emilio and Kazienko, 2019;Salagean and Zinca, 2020;Roldán-Gómez et al., 2022).
Based on that, a system was created with a sensor simulator using the MQTT protocol and a time-series storage to deploy the sensor streaming data approach.The sensor simulator was built with a random integer numbers generator that, each second, publishes the values to an MQTT topic on an MQTT broker (MQTT server), see Figures 1 and 2. That will allow us to test the approach envisaged as a proof of concept (PoC).Over that, most industrial projects may scale up that PoC.The storage receives data from MQTT and stores a measurement at an InfluxDB time-series database (DB) as presented in Figures 3 and 4.This GET REST API Endpoint queries the time-series database to get the sensor's current value.When the endpoint is reached, it gets the current timestamp and queries InfluxDB's last written value.It returns the sensor value if its timestamp is greater or equal to the timestamp when the endpoint is reached.An empty value is returned if it is lesser, meaning the last sensor value written to the database is outdated.This endpoint is accessed by accessing the URL <http://my_server/timeseries/Sensor_1?currentvalue>.Figure 5   To get the last written value, another endpoint was developed (Figures 7 and 8).This GET REST API Endpoint queries the time-series database to get the sensor's latest value and timestamp.This endpoint is accessed by accessing the URL <http://my_server/timeseries/Sensor_1?lastvalue>; <http://my_server/timeseries/Sensor_1?lastvaluetimestamp>.The Time-series URI thought so users can intuitively discover that they must fill the initial and final times in timestamp format.However, it cannot be accessed directly without user intervention, making it not user-friendly because the user should choose the initial and final timestamps and recreate the URI.A more user-friendly feature will be explored in future works.The URIs example is: http://my_server/timeseries/Sensor_1?initial_timestamp=1644699969&final_timestamp=1647032 769 This example's initial timestamp stands for Sat Feb 12 2022 21:06:09 GMT+0000, and the final timestamp stands for Fri Mar 11 2022 21:06:09 GMT+0000.A period of one month approximately one year ago.
In addition to the linked data endpoints, a static file endpoint was created for the IFC users.This GET REST API Endpoint queries the time-series and sensor IFC databases.The requester sends the sensor id and initial and end times in timestamp format, and the time-series DB CRUD queries the DB and creates a payload to the API response.Then, the IFC DB CRUD gets the sensor IFC project, and finally, an integrator makes the final file in IFC format.
For the proof of concept, when the endpoint is reached, we fetch a static IFC file and return it as the payload (Figures 11 and 12).Still, as an improvement, we plan to develop an automatic IFC file integrator to build this payload programmatically.This endpoint is accessed by accessing the URL <http://my_server/ifctimeseries/Sensor_1?initial_timestamp=0&final_timestamp=0>.Additionally, to the direct data access endpoints, two data stream endpoints were created: One 8/14 using WebSocket technology and another using SSE (Server-Sent Events).Both automatically stream data with a single request connection.The WebSocket API Endpoint exposes the sensor data stream directly from the MQTT receiver, which is subscribed to the Sensor_1 topic.A WebSocket client should connect to this endpoint to get the stream of values.This endpoint is accessed by accessing the URL <ws://my_server/ws_datastream/Sensor_1>. Figure 9    Implementation of the RDF store The RDF store was implemented with Ontograph GraphDB, with the sensor and all the above endpoints in URI format as properties of the sensor.The properties were stored in a triple store according to this turtle RDF presented in Figure 17.This ontology provides a formal description of the properties used in the system to represent data and can be used to ensure consistent use of vocabulary terms in the application.Defines several properties that can be used to describe resources in the system.The properties defined in this ontology are: • certi_dict:currentvalue sse_datastream Each property is defined to have a range of rdfs:Class, which means that it can be applied to any resource that is an instance of an RDF class.The properties are also defined to have a domain of rdfs:Resource, meaning they can be applied to any resource in the ontology, as shown in Figure 18. 10/14

RESULTS AND DISCUSSION
The Linked spectrum part of this proof of concept (PoC) was designed to access endpoints without user authentication, i.e., it works only on a local network.Typically, these endpoints cannot be opened to the Internet on systems with private business data.In future work, this system will be improved with user authentication to access endpoints over the Internet.This work created a user-intuitive URI with the suffix ?initial_timestamp=0&final_timestamp=0; however, it is not machine-readable yet.In this PoC, the machine would have to mount the initial and end times in the URI automatically.In future works, this feature will be explored and enhanced.
A 3D representation is relevant for users' visualisation and engagement.When a user or a machine accesses the triple store, it will "see" the proposed endpoints as properties of the sensor.The primary possibility of use would be visual analyses and data interaction.A 3D viewer could infer that the "has data stream" property is associated with the equipment sensor and then use rules to show equipment statistics in different colours depending on the measured sensor value, for example.Also, to support users' data use, dashboard software like Grafana could read the triple store and show real-time sensor data on its dashboards.It is also possible to build a Grafana plugin to read triple stores.However, Grafana could access the WebSocket endpoint directly and get the sensor values stream.Program alarms, get metrics, and indicators are information users' demand.The RDF could store metrics properties to the sensor value to create alarms based on the sensor's real-time values.For example, a piece of equipment has a chance to explode if its temperature is up to 500°C.Suppose the equipment related to the sensor has a property like "maximum temperature" filled with 500°C when the live sensor data gets above 500°C.In that case, a reasoner could automatically infer that the equipment is at risk.Complex metrics and indicators could be constructed with the same idea.Over users' interaction, it will be explored in future works.
The APIs developed allow systems-to-systems interaction, proving the approach valuable.In the future, an API can also fetch the data in different graph databases, such as GraphDB (RDF graph) and Neo4J (property graph), extending the same API concept that implements a data bus with multiple data sources used in this work.Users' post-analyses will require time-series collection.Based on that, another intention is to build an automatic IfcTimeSeries.The requester sends the sensor "id" and initial and end times in timestamp format, and the time-series DB CRUD queries the time-series DB, creates the IfcTimeSeries variable, queries the IFC DB to get the sensor IFC file, and incorporates the time-series values to the IFC file.
Its research is scalable mainly because it is based on Linked Data (LD).Over that, it increases semantic queries.Linked Data allows data interconnection across different sources, providing a more comprehensive and interconnected view of information.This can significantly improve the accuracy and relevance of semantic queries.Also, data availability may enhance the company's Learning Organisation (LO) (Tortorella et al., 2019).However, many current Internet of Things (IoT) systems lack integration with Resource Description Framework (RDF) and LD.This can limit the ability of these systems to leverage the benefits of semantic queries and LD entirely.Integrating LD over RDF with IIoT may open new paths for data interaction and semantic queries.It could empower systems to respond to complex queries, enhancing their functionality and usability.
Using LD over semantic queries may provide significant benefits in terms of cost and time.Connected to increase efficiency, LD allows data interconnection across different sources, improving the accuracy and relevance of semantic data and reducing the time and effort required to gather and analyse data (Konstantinou and Spanos, 2015).For example, accurate data for facility management connected with BIM and Lean may enhance collaboration across the management process (Nascimento et al., 2018).Also, LD is highly scalable and very suitable for dealing with big data, leading to cost savings in data storage and processing (Konstantinou and Spanos, 2015).Adding to this, using Web technologies such as HTTP and URIs makes integrating with existing systems easier and reduces the costs associated with data integration (Konstantinou and Spanos, 2015).Finally, LD may support more informed decision-making by providing a more comprehensive view of data, leading to saving costs.However, it is crucial to notice that the actual price and time savings will depend on different factors, such as the data quality, the technology infrastructure and the use case scenarios.

CONCLUSION
This paper discusses the importance of data sensors in industrial operations for ensuring safety and improving production.It emphasises the need for friendly access to operating data in the context of the industry 4.0 scenario.The work presented introduces the concept of using valid data points over URIs for sensor time-series and storing them in triple stores via APIs.The approach involves integrating industrial sensor data and sharing it via URIs.The proof of concept presented demonstrates the feasibility of sharing sensor data via URIs and lays the foundation for interoperability between MQTT data, RDF datasets, and IFC schema.The investigation's limitations include using simulated data, but it is expected to yield similar results with real data.The proposed solution has practical implications for companies looking to enhance data shareability and interoperability and researchers interested in advancing the solution for specific features.The paper offers insights into future deployments of systems-to-systems interoperability, focusing on user-friendly data shareability, and contributes to both industrial and academic developments.
Users might be limited to standard access and topic-based inquiries without the ease of URLs and semantic searches.This could potentially limit the scope and efficiency of conducting searches.This research was intended as a structuring basis for future uses.The PoC is a starting point for developers to structure semantic systems and develop more automated ones.By structuring 12/14 semantic systems based on these models, developers may create computerised systems that understand and respond to complex queries.This can be particularly useful for advanced users who must perform semantic searches for everyday uses.Large language models (LLMs) can help overcome these limitations by providing a more nuanced understanding of language, enabling more precise and relevant search results.This could revolutionise the way users interact with data in their daily routines.
Further, a different approach will be tested to query the SPARQL endpoint (or create a GraphQL like it was proposed in this reference).The URI reaches an API that retrieves the time series values for the period and programmatically makes the time-series values as RDF triples.Finally, it will be beneficial to search for ways to stream data through the WebSocket endpoint and time-series payload from the time-series endpoint to be accessed directly by SPARQL queries for automatic data analysis.The interoperability is mandatory for enhancing supply-chain data use and engagement.An IFC file generator can be developed, taking the result of a SPARQL query with one or more types of equipment and its sensors, with their properties, including time-series data, building the IFC file and aggregating to an existing 3D IFC file.It would be advantageous to users who want an IFC file to load in a 3D viewer.Concluding, user-friendly approaches and systems-tosystems interaction are critical elements of data interoperability.

Figure 5 -
Figure 5 -API schema to get the current sensor value Endpoint

Figure 7 -
Figure 7 -API schema to get the sensor Last Value Endpoint Figure 9 presents the API schema, and Figure 10 the respective PoC.

Figure 9 -
Figure 9 -API schema to get the sensor time-series Endpoint

Figure 11 -
Figure 11 -API schema to get the sensor IFC time-series Endpoint presents the WebSocket API Endpoint schema, and Figure 10 the respective PoC.

Figure 17 -
Figure 17 -Properties of the sensor according to turtle RDF

Figure 18 -
Figure 18 -RDF Schema for the Certi Vocabulary Terms