Data Flow
Information flows through real-time data systems in three possible directions: to the
mobile device owner, to someone else (Big Brother!) monitoring the mobile device owner,
or between two or more participants in a data exchange. For instance, a mobile device
owner could identify the least-congested roads within a mile of his or her current
location. Or, parents could receive notification when their commuting child has arrived
safely at school. Most technically challenging, though, is the mobile-to-mobile information
exchange, such as when a taxicab driver calls a special number and is automatically
connected to the nearest strolling pedestrian who needs a ride. Or, consider a law-enforcement
scenario (see Figure 2).
Parolee #432’s car and Protected Witness #567’s PDA are now within 50 meters
of each other -- send an SMS warning to the bodyguard’s cell phone! |
Figure 1: Two real-time events of several tracked vehicles in central Paris set off an alarm
(a database trigger) when they become closer than 50 meters to each other. Paris basemap courtesy of
TeleAtlas.
|
In the first two of these information flows, a solitary database system can process the
input from 100 mobile units at acceptable speeds, even when their coordinates are entered
into the system as rapidly as every 5 seconds. Real-time data processing systems for small
collections of events, such as a fleet of 100 trucks, do not have to be complicated or
outrageously expensive. A company called Locarta (www.locarta.com), for instance, sells $700
dashboard units approximately the size of cigar boxes that are each their own wireless Web
site. Visiting the site returns the unit’s real-time GPS coordinates. Add a spatial database,
a connection to mapping software, and the only remaining expense is the application
development.
Reality bites. This same system, however, will crawl or break if the collection of events is
large, such as 100,000 simultaneously moving units. Likewise, even with just 100 moving units,
performance of a mobile-tomobile query may be unacceptably slow. Because telecommunications
firms are likely to offer real-time services to customer bases of 1 million or more people, a
solution to the problems raised by large, mobile-tomobile, real-time datasets is an important
goal.
Furthermore, spatio-temporal data itself presents an interesting new challenge to data-conversion
experts -- namely, it is often irregular. Figure 3 simulates how dissimilar each of three
real-time events can be. All three points represent consumers subscribing to the same LBS
provider, which captures and stores their event histories. |
Figure 3a: This simulation of three people moving over time in downtown Oakland, California,
is available online as a Java applet at
www.giswebsite.com/java/PeetTrak/PeepTrak.htm. Its original intent was to identify contamination
cluster points for biological or chemical terrorist attacks based on hospitalized patients’ recent
location histories.
|
Figure 3b: Traditional GIS attributes store the irregular real-time data associated with Figure 3a
inefficiently.
|
Figure 3c: IBM’s time-series solution handles irregular data in variable-length arrays
for each moving point.
|
Notice how the red object only changes position twice during the observation period, the
green object changes five times, and the purple object temporarily fails to transmit a
signal at all because of interference or battery drain. All three points make their moves
at irregular intervals, sometimes staying in one place for several hours, sometimes moving
twice within two hours.
The height problem. Elevation also affects proximity -- are two people with the same x and
y coordinates standing looking at each other, or are they each in their own corner offices
on different floors of the same building, and not really in close proximity at all? What’s
the best way to store these events so they can be overlaid with static basemap data in a GIS
or imaging program? Irregular real-time data can be stored and manipulated in a variety of ways,
but there isn’t yet a real-time storage standard in use by the spatial community.
Mining time. Coincidentally, this kind of irregular behavior is similar to stock market
ticker data for a portfolio of disparate stocks. Stock values change rapidly in real time,
cease to be active on weekends and holidays, and may disappear entirely from an index if they
fall below a limit. Stock market data miners use custom solutions to capture and then analyze
regular and irregular time-based data.
Such time-series database products can also be applied to spatial data, storing a variable-length
array of locations and their offsets from a starting time for each moving point in a collection.
This storage format and its builtin analysis functions reveal patterns over time, a function
strangely absent from most LBS Web sites. Maybe data mining event histories raises privacy issues.
But, rather than needing to watch events moving in real-time, won’t many users want to see event
histories such as where the fleet’s vehicles were two hours ago, or the path of an important
moving object over the past day, month, or year?
Storing incoming real-time data over time enables users to ask such complex spatio-temporal
questions as, "What was the average flow of people past my store between 10:00 AM and 12:00
PM over the past three months?" Using a time-series format within a database preserves the
original irregular nature of real-time data (including nulls when signals drop out) but can
streamline temporal analysis. For instance, users conducting spatiotemporal analysis of leisure
travel patterns might define calendar templates that limit a year-long collection of real-time
data to only holidays and weekends.
|
|