Need a report of an IEEE paper which is attached below.The report must be 9 pages and should be in IEEE format.Plagiarism shouldn’t exceed more than 9%
07271016__3_.pdf

Unformatted Attachment Preview

IEEE TRANSACTIONS ON COMPUTERS, VOL. 65,
NO. 5,
MAY 2016
1377
Enabling Smart Transportation Systems:
A Parallel Spatio-Temporal Database Approach
Zhiming Ding, Bin Yang, Yuanying Chi, and Limin Guo
Abstract—We are witnessing increasing interests in developing “smart cities” which helps improve the efficiency, reliability, and
security of a traditional city. An important aspect of developing smart cities is to enable “smart transportation,” which improves the
efficiency, safety, and environmental sustainability of city transportation means. Meanwhile, the increasing use of GPS devices has led
to the emergence of big trajectory data that consists of large amounts of historical trajectories and real-time GPS data streams that
reflect how the transportation networks are used or being used by moving objects, e.g., vehicles, cyclists, and pedestrians. Such big
trajectory data provides a solid data foundation for developing various smart transportation applications, such as congestion
avoidance, reducing greenhouse gas emissions, and effective traffic accident response, etc. Instead of proposing yet another specific
smart transportation application, we propose the parallel-distributed network-constrained moving objects database (PD-NMOD), a
general framework that manages big trajectory data in a scalable manner, which provides an infrastructure that is able to support a
wide variety of smart transportation applications and thus benefiting the smart city vision as a whole. The PD-NMOD manages both
transportation networks and trajectories in a distributed manner. In addition, the PD-NMOD is designed to support general SQL queries
over moving objects and to efficiently process the SQL queries on big trajectory data in parallel. Such design facilitates smart
transportation applications to retrieve relevant trajectory data and to conduct statistical analyses. Empirical studies on a large trajectory
data set collected from 3,500 taxis in Beijing offer insight into the design properties of the PD-NMOD and offer evidence that the
PD-NMOD is efficient and scalable.
Index Terms—Spatial temporal, moving objects, database, parallel-distributed, general SQL query, large volume
Ç
1
INTRODUCTION
W
are now in the age of extreme urbanization. The
United Nation predicts that the world urban population will reach approximately 4.9 billion in 2030. In China,
about 300 million rural inhabitants will move to urban areas
in the next 15 years. The extreme urbanization urges us to
develop smart cities that improve the efficiency, reliability,
and security of traditional cities. Among other things, urban
transportation plays an important role in a city and has a
great influence in the development of urbanization. Thus,
an important aspect of developing smart cities is to develop
smart transportation systems which are able to provide
urban inhabitants faster, cheaper, and greener ways to
travel in cities.
With the increasing use of GPS devices, moving objects
with GPS devices are able to report real-time GPS records
that reflect dynamic traffic conditions [22]. The collected,
historical GPS records, typically organized in trajectories,
contain detailed information on historical traffic conditions [4], [5]. Luckily, with such trajectory data as data

E
Z. Ding is with the College of Computer Science, Beijing University of
Technology, Beijing 100124, China. E-mail: zmding@bjut.edu.cn.
B. Yang is with the Department of Computer Science, Aalborg University,
Aalborg Øst 9220, Denmark. E-mail: byang@cs.aau.dk.
Y. Chi is with the College of Economics and Management, Beijing University of Technology, Beijing 100124, China. E-mail: goodcyy@bjut.edu.cn.
L. Guo is with the Institute of Software, Chinese Academy of Sciences,
Beijing 100190, China. E-mail: mailto:limin@nfs.iscas.ac.cn.
Manuscript received 31 Jan. 2015; revised 20 Aug. 2015; accepted 27 Aug.
2015. Date of publication 16 Sept. 2015; date of current version 13 Apr. 2016.
Recommended for acceptance by S. Hu, G. Betis, R. Ranjan, and L. Wang.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TC.2015.2479596
foundation, a wide variety of smart transportation systems
(a.k.a. intelligent transportation systems) appear. For example, time-dependent (e.g., peak versus off-peak hours) travel
time information can be obtained from GPS trajectories,
which provides accurate travel time estimation [47]. GPS
trajectories can also be employed to estimate vehicles’
greenhouse gas emissions [23], [24], which enables eco-routing [3], [25]. The impact of traffic accidents can also be captured by GPS trajectories, which in turn enables effective
accident response [11].
Various smart transportation applications would benefit
from a Moving Objects Database (MOD) that is able to efficiently and effectively manage large amount of trajectories
and real-time GPS stream. In particular, MODs employ nonconventional data types including moving points, moving
lines, and moving regions [27] to describe the time dependent locations and spatial extensions of moving objects. The
MOD is able to facilitate smart transportation applications
to retrieve relevant trajectories from huge amount of historical trajectories. In addition, the MOD is able to support
efficient computation of various traffic statistics under different, and possibly sophisticated, conditions.
Earlier work on MODs mainly focuses on free-moving
moving objects in an euclidean space [20], [27], [44].
Recently, increasing research interests focus on network
constrained moving objects [15], [16], [17], [28], [39], e.g.,
vehicles in road networks. Various issues in both euclideanbased and network-based MODs have been studied, including data models, physical storage, query processing
methods, and indexing structures.
Most of the existing MODs are centralized, i.e., using
a single-node architecture, which are unable to efficiently
0018-9340 ß 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1378
and effectively manage large amounts of moving objects’
current locations and historical trajectories. In centralized
solutions, all location update messages and queries are
sent to and processed at a single database server. Thus, the
overall performance may significantly decrease when the
number of moving objects increases. For instance, in a metropolises such as Beijing, there may exist millions of moving
objects to be managed in an MOD. If every moving object
updates its location once per minute, the database server
has to deal with millions of location updates per minute,
which could be a very heavy workload for the server. In
addition, since all queries, e.g., issued by some smart intelligent systems, have to be processed at the same server over a
large volume of trajectory data, the query processing efficiency may not be sufficiently efficient to support the smart
transportation applications.
To manage the huge amount of GPS data, some efforts
are made towards managing massive GPS data based on
the MapReduce framework [2], [26], [35], [48]. However,
they do not consider dynamic location updates, i.e., realtime GPS streams. Actually, MapReduce framework is not
suitable to process streaming data, but is suited to process
huge amounts of static data in a batch mode. Since supporting frequent location updates from moving objects is one of
the key characteristics of MODs, MapReduce-based systems
are not suitable for MODs.
A parallel and distributed MOD is highly desirable, since
it is possible to share location updates and query processing
workloads among multiple MOD servers. However, The
research on parallel and distributed MOD is still rather limited. Although there are some studies on query processing
in distributed database framework [29], [33], [43], [45], they
rarely take care of system architectures and primitive operators. Thus, they only fit for processing specific query types
on moving objects, such as continuous range queries [43]
and aggregation queries [45]. None of the existing solutions
support general SQL query processing, especially when join
operators are involved. These greatly limited the usability
of these models.
To solve the aforemention problems, we propose a Parallel Distributed Network-constrained Moving Objects Database (PD-NMOD) mechanism. The PD-NMOD employs a
parallel and distributed architecture, but does not use the
MapReduce framework. In addition, the PD-NMOS is able
to support frequent location updates from moving objects
and are able to process general SQL queries over moving
objects. In the PD-NMOD, multiple moving objects database
nodes make up a distributed database system, with each
node managing the moving objects and the underlying
transportation network in a certain geographical area. Location tracking and query processing are conducted in a distributed and parallel manner so that the workload can be
shared among multiple MOD nodes and the performance
can be significantly improved compared to a centralized
MOD. In addition, PD-NMOD is able to support general
SQL queries such as point queries, range queries, and join
queries over moving objects.
This paper makes three main contributions. First, A new
parallel distributed MOD architecture that is able to manage
transportation networks and moving object trajectories
are proposed. In addition, a distributed location tracking
IEEE TRANSACTIONS ON COMPUTERS,
VOL. 65,
NO. 5, MAY 2016
strategy is presented. Second, the complete database model,
including the data types and operators, for the management
of moving objects over the distributed MOD architecture is
designed. Third, the parallel distributed query processing
mechanism for general SQL queries over moving object trajectories is proposed. The experimental results shows that
the proposed query processing mechanism achieves high
performance.
The remaining part of this paper is organized as follows.
Section 2 briefs the related work. Section 3 presents the
architecture of the PD-NMOD system. Section 4 describes
the data modeling and distributing mechanisms in the PDNMOD. Section 5 describes the parallel query processing
mechanism of the PD-NMOD. Section 6 discusses implementation issues and performance evaluation results, and
Section 7 finally concludes the paper.
2
RELATED WORK
2.1 Centralized MODs
The research on MOD started in mid 1990s. Earlier work on
MOD is mainly focused on euclidean-based solutions.
Wolfson et al. propose a Moving Objects Spatio-Temporal
(MOST) model [44] which is capable of tracking both current and near future positions of moving objects. Su et al.
present a data model for moving objects based on linear
constraint databases [40]. G€
uting et al. present a data model
and data structures for managing moving objects based on
abstract data types [20], [27]. Indexing structures for moving
objects in euclidean spaces [38] are also studied. Increasing
research interests focus on network-constrained moving
objects. A framework that manages moving objects on fixed
road networks is proposed [42]. Later, a rich set of data
types and operations on a fixed-network based MOD model
are defined [28]. A computational data model for network
constrained moving objects is proposed [39]. In addition,
the index structures of network constrained moving objects
are studied [13], [15], [21]. Further, a few recent studies consider data management issues for indoor moving objects [6],
[8]. Note that all the aforementioned studies are single-node
oriented, which are suitable only when the number of moving objects is limited and are unable to efficiently deal with
huge amounts of moving object.
2.2 Parallel and Distributed MODs
Existing parallel and distributed MODs mainly support predefined query types for specific applications. For example,
continuous range queries [43], time-specific queries [33],
and aggregate queries [45] have been studied in the context
of distributed MODs. None of the existing studies support
general SQL query processing, especially when join operators are involved.
Alternatively, parallel, relational databases [9] are suitable for storing massive structured data and can be used to
store moving objects’ GPS records. However, two major
problems exist when storing GPS data in parallel, relational
databases. First, the spatial-temporal joins in MOD can not
be supported by existing parallel join methods. In parallel,
relational databases, joins are often conducted based on
divide-and-conquer techniques, such as ABJ+ [30], CMDJoin [34], Grace Hash Join [32], SBABJ+ [37], EHJA [50],
DING ET AL.: ENABLING SMART TRANSPORTATION SYSTEMS: A PARALLEL SPATIO-TEMPORAL DATABASE APPROACH
Fig. 1. Architecture of the PD-NMOD.
DER [46] and BSP [9]. However, spatial-temporal joins in
MOD often make the data set un-dividable. For instance,
“Query all the moving object pairs whose distance at time t
is larger than 1,000 meters”. In processing such a query, it is
impossible to partition moving objects into groups with distance of moving objects of different groups larger than 1,000
meters. Second, due to the transaction management mechanisms in parallel, relational databases, when location
updates occur, the database tuples are locked so that other
users cannot access the data, which may greatly affect the
overall performance of the system.
2.3 MODs in Cloud Data Management Platforms
Cloud data management systems are designed to manage
massive data sets. A few systems extend the MapReduce
framework to support large-scale spatial data [2], [19]. In
addition, a few MapReduce-based algorithms for processing spatial join queries are proposed. In particular, two-way
spatial join [49], multi-way spatial join [26], and thetajoin [36]. However, MapReduce based methods focus on
static spatial data but do not consider frequent location
updates. Due to the characteristics of the MapReduce framework, MapReduce-based systems are not suitable to process
real-time tasks and to handle streaming data.
Key-value stores, such as Bigtable [10] and Dynamo [14],
are unable to well support complicated queries involved in
MODs, e.g., spatio-temporal joins [7]. More recently, some
efforts have been made to utilize the advantages of both
parallel, relational databases and key-value stores, such as
HadoopDB [1], PNUTS [12], and HIVE [41]. However, most
work is based on transformations that are outside of database kernels. Thus, the query processing performance may
be significantly affected. Moreover, since either key-value
stores or parallel databases are used at the bottom, the major
limitations from key-value stores and parallel databases still
exist as discussed above.
To sum up, no existing system is able to handle large volumes of location updates and historical trajectories and to
process general SQL query processing over moving objects
in parallel.
3
ARCHITECTURE OVERVIEW OF THE PD-NMOD
Fig. 1 gives an architecture overview of the PD-NMOD. The
PD-NMOD has a two-layered architecture—the bottom
layer consists of multiple node servers and multiple sampling
receivers and the top layer consists of a single master server.
Each moving object is registered in a sampling receiver.
1379
Each sampling receiver keeps receiving GPS records
reported from the registered moving objects. Each node
server is supposed to store GPS records reported from moving objects that travel in a predefined geographic area. The
area is called the service area of the node server. To achieve
this, sampling receivers transfer received GPS records to
node servers according to their service areas.1 The master
server maintains global information, e.g., node servers with
their corresponding service areas. However, it does not
store GPS data.
The PD-NMOD employs a space-based distribution strategy.2 The area of interest G, e.g., a city, is partitioned into n
sub-areas, where n is the number of available node servers.
Each node server i corresponds to a sub-area, denoted
as að i Þ, which is the node server’s service area. The node
server manages the traffic network and the GPS data
sampled in its service area.
Given n node servers 1 ; 2 ; . . . ; n in the PD-NMOD system, the following conditions should meet.
1) 8i;
and i 6¼ j: ðað i ÞÞ ðað j ÞÞ ¼ ;;
S j;
n
2)
i¼1 að i Þ ¼ G.
The master server keeps a service-area-partitioning Table
(SAP-Table) that records the relationships between node
servers and their service areas.
The master server and the node servers have fullfledged moving objects database systems that support
standard data types and operators as well as spatial and
MOD data types and operators. Detailed description
about data types and operators in PD-NMOD will be provided in Section 4.
The transportation network and the network-constrained
moving objects that travel on the network are managed
through three relational tables, Routes, Juncts, and MObjs.
These tables are stored in the node servers in a distributed
manner. That is, a node server only keeps the routes, the
junctions, and the moving object trajectory segments that
are related to its service area, as explained in Section 4.2. To
provide fast data access, each route, junction, or moving
object is associated with a unique Object IDentifier (OID),
through which the related tuple can be quickly accessed.
According to certain location tracking policies [18], a
moving object sends its location update message to its registered sampling receiver according to the identifier of the
moving object. The sampling receiver transfers the location
update massages to the node servers according to the
current and last location update messages. The sampling
receiver transfers the location update messages to the node
servers whose service areas cover the moving object’s trajectory since the last location update. In this way, location
updates can be processed by different MOD nodes in parallel and the related costs can be shared among them, as
shown in Fig. 1. Detailed description about location updates
can be found in Section 4.2.
1. The GPS records on each node server may have multiple physical
copies among different node servers to ensure fault tolerance.
2. The PD-NMOD is able to employ any space partition strategy,
e.g., grid based [19], [48] and tree-based [19] partition strategies. Based
on a chosen partition strategy, the PD-NMOD has a mechanism to distribute trajectories and road networks to different nodes, which will be
discussed in Section 4.2
1380
IEEE TRANSACTIONS ON COMPUTERS,
Fig. 2. A junction and its connectivity matrix.
All queries are sent to the master server. When executing
a query, the master server parses it into a query operator
tree, optimizes the tree, and divides it into two layers. The
lower layer includes a set of sub-trees which need to multicast to node servers for parallel execution, while the upper
layer of the tree is executed locally at the master server
based on the materialized results of the lower-layer subtrees. The final result is sent to the querying user through
the master server.
4
DATA MODELING AND DATA DISTRIBUTING
MECHANISM
4.1
Modeling Traffic Networks and NetworkConstrained Moving Objects
The PD-NMOD employs an edge-based model to represent
transportation networks. A transforation network Net is
modeled as a graph Net ¼ ðE; JÞ, where E is a set of
directed edges and J is a set of junctions.
A direct edge (or simply an edge) e 2 E is defined in the
form e ¼ ðeid; geo; len; jids; jideÞ, where eid 2 string is the
identifier of edge e; geo ¼ ðp1 ; p2 ; . . . ; pn Þ 2 polyline represents the geometry of edge e, where pi ð1 4 i 4 nÞ 2 point
is the ith point in a polyline; len 2 real is the length of edge
e, and jids, j …
Purchase answer to see full
attachment