Presented by:
Erhard Rahm
Date:
Wednesday 14th September 2016 - 09:00 to 10:00
Venue:
INI Seminar Room 1
Abstract:
Data integration is a key challenge for Big Data applications to semantically
enrich and combine large sets of heterogeneous data for enhanced data analysis.
In many cases, there is also a need to deal with a very high number of data
sources, e.g., product offers from many e-commerce websites. We will discuss
approaches to deal with the key data integration tasks of (large-scale) entity
resolution and schema matching. In particular, we discuss parallel blocking and
entity resolution on Hadoop platforms together with load balancing techniques to
deal with data skew. We also discuss challenges and recent approaches for
holistic data integration of many data sources, e.g., to create knowledge graphs
or to make use of huge collections of web tables.
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.