Data integration refers to the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view.[1] There are a wide range of possible applications for data integration, from commercial (such as when a business merges multiple databases) to scientific (combining research data from different bioinformatics repositories).
The decision to integrate data tends to arise when the volume, complexity (that is, big data) and need to share existing data explodes.[2] It has become the focus of extensive theoretical work, and numerous open problems remain unsolved.
Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients.[3] A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.[4]