What’s schema matching?

Print anything with Printful



Schema matching is a technique used to merge complex databases by mapping similar aspects of each database into each other. It breaks the process into four stages: pre-integration, comparison, conformation, and merge. The goal is to automate and make the process more efficient.

Schema matching is a technique used to merge two or more complex databases or sets of information into each other. As the use of databases and the electronic storage of information becomes ever more extensive and complex through the Internet, methods must be defined to join data sets from one database to another, and schema matching is one such technique. The concept is simple, but the reality of data blending is quite complex.

The term “schema matching” is used synonymously with “schema mapping,” because users are actually mapping data, not matching it. Two or more databases are mapped together and similar aspects of each database are mapped into each other. The most common way to join data is to use exact references. An example of this style of merging is combining a name column from one database with a name column from another database.

Merging isn’t usually that simple, for people or computers. With so much data that needs to be filtered, combined and used, it is essential to have one database rather than multiple databases. Schema Mapping focuses on making this tedious process automated and more efficient. An example of where schema matching is needed might be when one database has a “student major” field and another database has a “student field of study” field. It’s the same information, but the slightly different titles complicate efforts to merge them.

Schema matching breaks this complex database merging process into four stages: pre-integration, comparison, conformation, and merge. Before multiple databases can be merged, they must be analyzed for similarities and differences. In the realm of schema matching, this is known as pre-integration. The computer begins to determine the most efficient integration method.

Next, the computer evaluates the patterns by comparing them to each other at a more detailed level. In the compare phase, the computer examines each entry in the database and determines where there might be conflicts. An example of this is when one “student interest” field lists “doctor” and another database lists it as “doctor”. A person would probably recognize the information as identical but, to database tools, they are two separate entities.

Once the computer has determined all potential conflicts, it can proceed to try to fix the problems. This can be as simple as changing all instances of “doctor” to “doctor”. In reality, the process is substantially more complex.
Once all conflicts are resolved, the computer can proceed to merge the data in the schema matching process. In this stage, two or more databases are merged into one large database. Hopefully, there will be no conflicts or errors during integration and future access to the database.




Protect your devices with Threat Protection by NordVPN


Skip to content