Real Time Data Ingestion Platform (RTDIP) aims to provide easy access to high-volume, historical and real-time process data for analytics applications, engineers, and data scientists wherever they are.
Components of RTDIP:
- Delta Ingestion engine processes time series data in the cloud from streaming endpoints (Eventhub, Kafka, etc.) and files into a Delta Lakehouse.
- Python SDK that enables interaction with data in the Delta Lakehouse
- Rest APIs that enable the same capabilities as the Python SDK to interact with the data
Organizations need data for day-to-day operations and various activities such as Optimization, Surveillance, Forecasting, and predictive analytics. Real-Time Data forms a major chunk of the total data utilized in the activities.
Using real-time data enables organizations to detect and respond to changes in their systems, thus improving the efficiency of their operations. This data needs to be available in scalable and secure data platforms.
RTDIP is the solution of choice, leveraging PaaS (Platform as a Service) services along with some custom components to provide Data Ingestion, Data Transformation, and Data Sharing as a platform. RTDIP can interface with several data sources to ingest many data types, including time series, alarms, videos, photos, and lab data. Metadata by different sources like PI, OPC UA, and APIs are used, making it a true Real-Time Data Platform.
The data can be shared via approved architecture patterns and services for the data needs of applications like Digital Twin, C3.ai, SeeQ, and business-specific solutions by providing data as streaming or query-based batch transfer. Highly Scalable and Secure, the data is ringfenced with approved security controls making it an ideal repository for industry data and Third Party Data. The technology stack in RTDIP is a standard offering that helps the onboarding and deployment of any asset data sources with agility and quick turnaround.
The code for RTDIP that Shell has contributed provides a foundational time-series data ingestion capability that gathers data from more than three million sensors in Shell’s assets. Impressive as this is, there is a huge potential for using this code in other areas rich in time-series data.
Anyone across or beyond the energy sector who needs to manage similar kinds of data may find value in these data ingestion capabilities. We see the opportunity for it to be used to monitor manufacturing or production processes and to track resource utilization, such as water usage, the operation of biorefineries or hydrogen plants, and carbon capture and storage (CCS) projects. Time-series data is also extensively used to forecast solar, and wind energy production, where gathering insights from such data helps inform decision-making for new or existing renewable energy projects.
Opportunities for collaboration through LF Energy could help industries develop new and better ways to manage power. Industries can anticipate efficiency gains from breaking down traditional data format barriers between companies, sectors, and national jurisdictions. Some of the most obvious benefits are in renewable energy production and distribution. Industries produce electricity from a vast range of wind and solar sources across many different countries. Adopting an open source approach with LF Energy will help better integrate these resources into distribution grids and build towards a more efficient and effective digital energy infrastructure.
Open-source ecosystems also encourage new collaborations. By creating a common data fabric with the member companies of LF Energy, Industries can deal more effectively with microgrids and help bring renewable power sources to the grid more easily. Bringing data together in common formats makes it easier for grid operating companies to work together. This contribution will create a shared data fabric that helps industries leverage software capabilities in an interoperable fashion while minimizing underlying data complexity.