
2DM
GIS Data Watermarking
|
We developed a method for watermarking 2D-vector data which are used in Geographical Information Systems (GIS). GIS-data represent a high material value due to high efforts and costs in the acquisition of the point coordinates, which is either performed manually or through analysis of aerial photography. Due to the high value and the wide distribution of such data inside e.g. vehicle navigation systems or via the internet there exists an interest of the copyright holders to protect these data against unlicensed copying. We deal with two kinds of 2D data sets with different tolerance levels due to different acquisition methods. Both of them do represent point clouds and (optional) attributed information. The first kind represents objects like "real estates" and have tolerances in the dm/cm range and the other represents e.g. roads in a vehicle navigation system with tolerances in the meter range. The tolerance range of the coordinates is used to embed the information. As an embedding constraint no violation of the tolerance range at each point may occur. Computing data presentations of GIS-data may require generalization of the available information, as in the transition from finer to coarser maps the limited information per space requires simplified symbols and data objects. Watermarking concepts should take this into account. Furthermore a common modification of the GIS-data set is made up of the so-called Douglas-Peucker algorithm which is a polygon simplification procedure. Due to this change of the GIS-data set correlation based watermarking methods are difficult to realize and we do favor a statistical formulation of the problem. Due to the large coordinate values and the high precision resp. low tolerance range of the measurements there is a poor signal to noise ratio (SNR) and we have to relate them to a new coordinate system. In the first step of our approach we lay a rectangular grid over the map, where the origin of this grid (relative to the old one) is only known by the embedder and the absolute size of each grid element (e.g. 10km x 10km) depends on the actual scale of the map. Based on a secret key we choose two disjunctive sets A and B of grid elements namely "patches." Now these selected elements are further divided into smaller subareas or "subpatches" (e.g. of size 10m x 10m). This defined number of subpatches will operate as information carrier units. The point coordinates in each subpatch are transformed to a new coordinate values, varying between e.g. 0 and 10m for the given example of a 10m x 10m subpatch size. As GIS coordinates are given in a defined tolerance range, that provides us with sufficient space to embed our information and to essentially improve the SNR. In subpatches belonging to set A the distances of points to a reference line are either unchanged or increased while the distances of subpatches of set B are decreased. As an embedding constraint no violation of the tolerance range at each point may occur. Figure (a) visualizes the embedding procedure for subpatches of the set B.
In the retrieving procedure we collect the points belonging to the sets A and B and calculate for each of them their sample variance . Considering the variance of the patches corresponds to the problem in communication theory, where two sources with different power randomly transmit a signal at a time and the task is to decide which one is sending the signal. A sufficient statistic for this problem is the power of the received measurement signal resp. its sample variance. The decision if a map is watermarked or not is then made with the help of a variance hypothesis test. The attacks or modification of the data can be divided into two groups. The one which might happen in everydays work and the other one with a bad intention. To the first one belongs a widespread procedure for simplification of polylines or polygons, namely the so-called Douglas-Peucker algorithm. Depending on a predetermined parameter this algorithm reduces the number of points of a polygon whereby the characteristic shape is preserved in certain limits. The parameter defines the strength of the simplification. For the case that we want to be robust against dropouts of maximal 50% of the data points we have to increase the sets A and B for doubling their points. Another attack is cropping the map. For the case that the cropped part contains enough patches of both sets A and B this is treated like a polygonsimplification. An attack we would count to the second group of attacks is the translation attack. Since the attacker do not know the marked patches he probably will shift all the data by a certain amount. We are robust against such attacks if the shift lies within the tolerance range. As mentioned at the beginning the choice of the two point sets A and B is based on a secret key and it defines which patch is used for embedding. Since it is not sure that every patch contains data at first a list will be generated out of the set of all patches, where every patch contains data and the key selection is actually applied to this list. The key defines not only the patches but also the direction of the coordinate in which the information is embedded, as shown in Figure (b) below.
Figures (c) and (d) show the behavior of the detection block for different keys and after a translation attack, which shifts the coordinates by the tolerance value of 1m. So actually we have an attack of sqr(2) times tolerance. We offer 250 randomly generated keys to the detector. The key length is 10 patches with an average number of 150 points in the sets B, where the patches of the set A are not explicitly watermarked. Figure (c) gives the variance vs. key diagram. Figure (d) shows the yes/no results after the detection phase.
In summary, we chose a statistical approach, where we modify the variances of certain point patches. The selection of the patches depends on a secret key. The decision if a map is watermarked or not is made with the help of the F-variance test. As attacks we considered translations and polygonsimplification. The method is robust against shift-attacks within the tolerance range. Since it is a statistical method, we encounter no problems for practical rates of polygonsimplifications. We did vary the patchsizes and certain patchsizes seem to be optimal for certain application scenarios. The presented method embeds a one-bit watermark, but a generalization to n bits is straightforward. |