Pipeline Pilot

Pipeline Pilot is a desktop software program sold by Dassault Systèmes for processing and analyzing data. Originally used in the natural sciences, the product's basic ETL (Extract, transform, load) and analytics capabilities have been broadened. The product is now used for data science, ETL, reporting, prediction and analytics in a number of sectors. The main feature of the product is the ability to design data workflows using a graphical user interface. The program is an example of visual and dataflow programming. It has use in a variety of settings, such as cheminformatics and QSAR,[1][2][3] Next Generation Sequencing,[4] image analysis,[5][6] and text analytics.[7]

Pipeline Pilot
Developer(s)Accelrys
Initial release1999 (1999)
Stable release
18.1 / May 2018 (2018-05)
Written inC++
Operating systemWindows and Linux
TypeVisual and dataflow programming language
LicenseProprietary
Websiteaccelrys.com/products/collaborative-science/biovia-pipeline-pilot/

History

The product was created by SciTegic. BIOVIA subsequently acquired SciTegic and Pipeline Pilot in 2004. BIOVIA was itself purchased by Dassault Systèmes in 2014. The product expanded from an initial focus on chemistry to include general extract, transform and load (ETL) capabilities. Beyond the base product, Dassault has added analytical and data processing collections for report generation, data visualization and a number of scientific and engineering sectors. Currently, the product is used for ETL, analytics and machine learning in the chemical, energy, consumer packaged goods, aerospace, automotive and electronics manufacturing industries.

Overview

Pipeline Pilot is part of a class of software products that provide user interfaces for manipulating and analyzing data. Pipeline Pilot and similar products allow users with limited or no coding abilities to transform and manipulate datasets. Usually, this is a precursor to conducting analysis of the data. Like other graphical ETL products, it enables users to pull from different data sources, such as CSV files, text files and databases.

Components, pipelines, protocols and data records

The graphical user interface, called the Pipeline Pilot Professional Client, allows users to drag and drop discrete data processing units called "components". Components can load, filter, join or manipulate data. Components can also perform much more advanced data manipulations, such as building regression models, training neural networks or processing datasets into PDF reports.

Pipeline Pilot implements a Components paradigm. Components are represented as nodes in a workflow. In a mathematical sense, components are modeled as nodes in a directed graph: "pipes" (graph edges) connect components and move data along the from node to node where operations are performed on the data. Users have the choice to use predefined components, or to develop their own. To help in industry-specific applications, such as Next Generation Sequencing (see High-throughput sequencing (HTS) methods), BIOVIA has developed components that greatly reduce the amount of time users need to do common industry-specific tasks.

Users can choose from components that come pre-installed or create their own components in workflows called "protocols". Protocols are sets of linked components. Protocols can be saved, reused and shared. Users can mix and match components that are provided with the software from BIOVIA with their own custom components. Connections between two components are called "pipes", and are visualized in the software as two components connected by a pipe. End users design their workflows/protocols, then execute them by running the protocol. Data flows from left to right along the pipes.

Modern data analysis and processing can involve a very large number of manipulations and transformations. One major feature of Pipeline Pilot is the ability to visually condense a lengthy series of data manipulations that involve many components. A workflow of any length can be visually condensed into a component that is used in a high level workflow. This means that a protocol can be saved and used as a component in another protocol. In the terminology used in Pipeline Pilot, protocols that are used as components in other protocols are called "subprotocols". This allows users to add layers of complexity to their data processing and manipulation workflows, then hide that complexity so they can design the workflow at a higher level of abstraction.

Component collections

Pipeline Pilot features a number of add-ons called "collections". Collections are groups of specialized functions like processing genetic information or analyzing polymers offered to end users for an additional licensing fee. Currently, there are a number of these collections.[8]

GroupDomainComponent collection
Science specificChemistryChemistry
ADMET
Cheminformatics
BiologyGene Expression
Sequence Analysis
Mass Spectrometry for Proteomics
Next Generation Sequencing
Materials Modeling & SimulationMaterials Studio
Polymer Properties (Synthia)
GenericReporting & VisualizationReporting
Database & Application IntegrationIntegration
ImagingImaging
Analysis & StatisticsData Modeling
Advanced Data Modeling
R Statistics
Document Search & AnalysisChemical Text Mining
Text Analytics
LaboratoryPlate Data Analytics
Analytical Instrumentation

Given the number of different add-ons now offered by BIOVIA, Pipeline Pilot's use cases are very broad and difficult to summarize succinctly. The product has been used in:

PilotScript and custom scripts

As with other ETL and analytics solutions, Pipeline Pilot is often used when one or more large (1TB+) and/or complex datasets is processed. In these situations, end users may want to utilize programming scripts that they have written. Early in its development, Pipeline Pilot created a simplified, pared-down scripting language called PilotScript that enabled end users to easily write basic programming scripts that could be incorporated into a Pipeline Pilot protocol. Later releases extended support for a variety of programming languages, including Python, .NET, Matlab, Perl, SQL, Java, VBScript and R.[9]

The syntax for PilotScript is based on PLSQL. It can be used in components such as the Custom Manipulator (PilotScript) or the Custom Filter (PilotScript). As an example, the following script can be used to add a property named "Hello" to each record passing through a custom scripting component in a Pipeline Pilot protocol. The value of the property is the string "Hello World!".

Hello := "Hello World!";

Currently, the product supports a number of APIs for different programming languages that can be executed without the program's graphical user interface.

References

  1. Hassan, Moises; Brown, Robert D.; Varma-O'Brien, Shikha; Rogers, David (2007). "Cheminformatics Analysis and Learning in a Data Pipelining Environment". ChemInform. 38 (12). doi:10.1002/chin.200712278. ISSN 0931-7597.
  2. Hu, Ye; Lounkine, Eugen; Bajorath, Jürgen (2009). "Improving the Search Performance of Extended Connectivity Fingerprints through Activity-Oriented Feature Filtering and Application of a Bit-Density-Dependent Similarity Function". ChemMedChem. 4 (4): 540–548. doi:10.1002/cmdc.200800408. ISSN 1860-7179. PMID 19263458.
  3. Warr, Wendy A. (2012). "Scientific workflow systems: Pipeline Pilot and KNIME". Journal of Computer-Aided Molecular Design. 26 (7): 801–804. Bibcode:2012JCAMD..26..801W. doi:10.1007/s10822-012-9577-7. ISSN 0920-654X. PMC 3414708. PMID 22644661.
  4. "Accelrys Enters Next Generation Sequencing Market with NGS Collection for Pipeline Pilot". Business Wire. 2011-02-23. Retrieved 15 February 2013.
  5. Rabal, Obdulia; Link, Wolfgang; G. Serelde, Beatriz; Bischoff, James R.; Oyarzabal, Julen (2010). "An integrated one-step system to extract, analyze and annotate all relevant information from image-based cell screening of chemical libraries". Molecular BioSystems. 6 (4): 711–20. doi:10.1039/b919830j. ISSN 1742-206X. PMID 20237649.
  6. Paveley, Ross A.; Mansour, Nuha R.; Hallyburton, Irene; Bleicher, Leo S.; Benn, Alex E.; Mikic, Ivana; Guidi, Alessandra; Gilbert, Ian H.; Hopkins, Andrew L.; Bickle, Quentin D. (2012). "Whole Organism High-Content Screening by Label-Free, Image-Based Bayesian Classification for Parasitic Diseases". PLoS Neglected Tropical Diseases. 6 (7): e1762. doi:10.1371/journal.pntd.0001762. ISSN 1935-2735. PMC 3409125. PMID 22860151.
  7. Vellay, SG; Latimer, NE; Paillard, G (2009). "Interactive text mining with Pipeline Pilot: a bibliographic web-based tool for PubMed". Infectious Disorders Drug Targets. 9 (3): 366–74. doi:10.2174/1871526510909030366. PMID 19519489.
  8. "Pipeline Pilot Component Collections". Accelrys. Archived from the original on January 15, 2013. Retrieved 26 January 2013.
  9. "Pipeline Pilot Integration Component Collection Datasheet" (PDF). Accelrys. Retrieved 8 February 2013.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.