RSS FeedWhite Papers

White Paper Download

Communication Efficient Distributed Monitoring of Thresholded Counts

Many emerging systems are fundamentally distributed in nature

Category: Networking

Date: , 10:00

Monitoring is an issue of primary concern in current and next generation networked systems. For example, the objective of sensor networks is to monitor their surroundings for a variety of different applications like atmospheric conditions, wildlife behavior, and troop movements among others. Similarly, monitoring in data networks is critical not only for accounting and management, but also for detecting anomalies and attacks.

Such monitoring applications are inherently continuous and distributed, and must be designed to minimize the communication overhead that they introduce. In this context we introduce and study a fundamental class of problems called “thresholded counts” where we must return the aggregate frequency count of an event that is continuously monitored by distributed nodes with a user-specified accuracy whenever the actual count exceeds a given threshold value.

Communication-Ef cient DistributedMonitoring of Thresholded CountsRam KeralapuraECE DepartmentUniversity of California, Davisrkeralapura@ucdavis.eduGraham CormodeBell LaboratoriesMurray Hill, NJcormode@lucent.comJai RamamirthamBell LaboratoriesBangalore, Indiajai@lucent.comABSTRACTMonitoring is an issue of primary concern in current and next gen-eration networked systems.  For example, the objective of sensornetworks is to monitor their surroundings for a variety of differ-ent applications like atmospheric conditions, wildlife behavior, andtroop movements among others. Similarly, monitoring in data net-works is critical not only for accounting and management, but alsofor detecting anomalies and attacks. Such monitoring applicationsare inherently continuous and distributed, and must be designed tominimize the communication overhead that they introduce. In thiscontext we introduce and study a fundamental class of problemscalled thresholded counts where we must return the aggregatefrequency count of an event that is continuously monitored by dis-tributed nodes with a user-speci ed accuracy whenever the actualcount exceeds a given threshold value.In this paper we propose to address the problem of thresholdedcounts by setting local thresholds at each monitoring node and initi-ating communication only when the locally observed data exceedsthese local thresholds.  We explore algorithms in two categories:static thresholds and adaptive thresholds.  In the static case, weconsider thresholds based on a linear combination of two alternatestrategies, and show that there exists an optimal blend of the twostrategies that results in minimum communication overhead.  Wefurther show that this optimal blend can be found using a steepestdescent search.  In the adaptive case, we propose algorithms thatadjust the local thresholds based on the observed distributions ofupdated information in the distributed monitoring system. We useextensive simulations not only to verify the accuracy of our algo-rithms and validate our theoretical results, but also to evaluate theperformance of the two approaches. We nd that both approachesyield signi cant savings over the naive approach of performing pro-cessing at a centralized location.1.   INTRODUCTIONMany emerging systems are fundamentally distributed in nature.The current and next generation of networks are large-scale, andwidespread.  Within this distributed networked systems, a princi-pal concern is monitoring: either monitoring the environment sur-Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for pro t or commercial advantage and that copiesbear this notice and the full citation on the rst page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior speci cpermission and/or a fee.SIGMOD 2006 June 27 29,2006,Chicago,Illinois,USACopyright 2006 ACM 1-59593-256-9/06/0006 ... 5.00.rounding each of the network nodes, or monitoring the behavior ofthe network itself. Two prototypical applications are in: (i) Sensornetworks, whose raison d etre is for monitoring and collating in-formation on atmospheric conditions, wildlife behavior, and troopmovements in military applications among others, and (ii) Net-work traf c monitoring in (wired or wireless) data networks, fortraf c management, routing optimization, and anomaly and attackdetection.Over the past few years, the de ning characteristics of these ap-plications have been identi ed to pose new challenges that are notanswered by traditional data management systems. The challengesin monitoring arise mainly because the systems are fundamentally:" Continuous:  Unlike the traditional database on-demand view of the world, where queries are posed in SQL and ananswer returned to the user, queries in these monitoring situ-ations are typically long running queries over the data, whichmust continuously run and return answers as and when theyare found." Distributed: The data required to answer the monitoring queriesis distributed throughout the network. Typically, a query re-quires information to be collated and aggregated from many,if not all, nodes in the network." Resource-constrained: Ef ciency of operation is absolutelyvital in the distributed monitoring world. In sensor networks,we must ensure that the life of the network is extended aslong as possible by minimizing the energy drain of runningthe monitoring protocol; in data networks, we must ensurethat the protocol does not hinder the principal operation ofthe network, allowing the delivery of messages unencum-bered by the monitoring overhead. These concerns manifestthemselves principally as a requirement to minimize (to theextent possible) the communication cost of the monitoringprotocols: communication is the principal energy drain for asensor, and excess communication in a data network reducesthe capacity for normal operation. As a secondary concern,computation and memory usage should also be minimal foref cient execution of the monitoring.Thresholded Counts.  Within this framework of continuous, dis-tributed, and resource-constrained systems, there are many possi-ble types of monitoring queries that can be posed. Prior work haslooked at particular query types such as top-k monitoring [3], setexpression cardinality [12], and holistic aggregates such as quan-tiles [8].  But many queries rely at heart on monitoring sums orcounts of values, in combination with thresholds (lower bounds).Consider the following queries from a variety of different domains:" (from [22]) Report when at least 40 soldiers have crossed aspeci ed boundary.Untitled Document" Raiseanalertwhenthetotalnumberofcarsonthehighwayexceeds4000andreportthenumberofvehiclesdetected."Whichspecieshavemorethan50memberswithinacertainregionformorethananhour?"Identifyalldestinationsthatreceivemorethan2GBoftraf cfromthemonitorednetworkinaday,andreporttheirtransfertotals"(QueryQ1of[21])Monitorthevolumeofremotelogin(tel-net,ssh,ftpetc.)requestreceivedbyhostswithintheorga-nizationthatoriginatefromtheexternalhosts."Whichuserswithinthemonitorednetworkreceivemorethan1000differentconnections?Ineachcasetherearetwopartstothequery:arequestforasumorcountofaparticularquantity(vehicles,animals,networkcon-nectionsetc.),andaminimumthreshold,ortrigger,forwhenweneedtoknowthisinformation.Suchthresholdsarevitalinfocusingtheinformationreturnedtotheuser,andinreducingthemonitoringburdenonthenetwork.Inalmosteveryapplicationinvolvingmea-suringquantitieslikethese,itisonlyimportanttoknowthequan-titieswhentheyexceedaspeci edlevel.Smallcounts(ofremotelogins,humanactivity,networktraf c)areprevalentandcanbeig-noredtoreducethereportingburdenonthemonitoringsystem.Butinalltheabovesituationswecande neathreshold,suchthatitiscriticaltoknowwhenthisthresholdhasbeencrossed.Ourfocusinthispaperisondesigningprotocolsandalgorithmstomonitorsuchsumsandcountswiththresholds.Intheextremecase,thesethresholdscanbetrivial(i.e.zeroorone),butinallthescenarioswehaveoutlined,thereexistsanon-trivialthresholdwhichcanbeusedtoreducethecommunicationcost.Ingeneral,thethresholdscanbespeci edeitheraspartofthequery,orlearnedbythesysteminresponsetotheobserveddata.Forthiswork,withoutlossofgen-eralityweassumethatthethresholdis xedaprioriandfocusonansweringqueriesforthresholdedcountsgivensuchathreshold;dynamicthresholdscanbeaccommodated,butwedonotdiscussthemhere.Thesecondcomponentofthesetypesofqueriesistoreturnacountofaparticularsetofvalues.Here,onecanmaketheobser-vationthatanapplicationrarelyneedstoknowtheexactcountsolongastheanswerisgivenwithreasonableprecision:itisnotnec-essarytoknowwhetherthenumberofcarsonthehighwayis4237or4251,ifeitheranswerisaccuratetowithin1%.Soinsteadofdemandingexactresults,wecanexplorethetradeoffbetweenac-curacyandcommunication:clearly,allowinglargeruncertaintiesaboutcountsallowsmonitoringsitestobemoreconservativeaboutwhentheysendtheirupdatestoacentralmonitor.Thisbene tbe-comesclearinourexperiments,whichshowsigni cantsavingsastheallowableuncertaintyincreases.OurContributions.Weaddresstheproblemofcontinuouslymon-itoringthresholdedcountsinadistributedenvironment.Ourcon-tributionsareasfollows:1.Weintroduceandformalizethethresholdedcountingprob-lem,whichisfundamentaltoseveraldistributedmonitoringscenar-ios.Wegiveguaranteedsolutionstotheproblembasedonmonitorscomparingtheirlocalcountstolocalthresholds,andpostponingcommunicationuntilthesethresholdsareviolated.Weconsidertwoapproaches,dependingonwhetherthethresholdsaredeter-minedstaticallyinadvance,orcanbeallocatedadaptivelyasthedistributionofupdatedinformationisobserved.2.Inthestaticcase,weshowtwodifferentfundamentaltech-niquesforsettingthelocalthresholds. Weintroduceablendedapproach,basedonalinearcombinationofthetwofundamentalmethodswhileretainingthecorrectnessguarantee.Wegiveacare-fulanddetailedanalysisoftheoptimalsettingofthisblendwhichdependsonlyoncoarsepropertiesofthetotalcount.3.Intheadaptivecase,weintroduceavarietyofincreasinglysophisticatedalgorithmsthatattempttocapturetheobserveddis-tributionofcountupdates,andhencereducetheoverallnumberofmessagessentwithinthesystem.4.Weshowthatourstaticandadaptivealgorithmscanbeeasilyextendedtoincludenegativeupdates,slidingwindows,approxi-matecounts,andtime-dependentthresholdvalues.5.Weconductathoroughanddetailedsetofexperimentstoverifytheef cacyofourmethodsforgivingalowcostmonitoringschemeforthresholdedqueries. Wecomparetoapplicationsofpriorworkonavarietyofrealandsyntheticdata,andshowthattherearesigni cantsavingsforusingourmethods.Outline.InSection2,wereviewpriorworkincontinuous,dis-tributedmonitoring.Weformallyde netheproblemofthresholdedcountsinSection3.Weproposealgorithmswhichsetstaticthresh-oldsinSection4,andadaptivethresholdsinSection5.WediscusssomeextensionstoourcurrentworkinSection6.Wepresentre-sultsfromourexperimentalevaluationinSection7andconcludeinSection8.2. RELATEDWORKOurworkisrelatedtothedesignandconstructionofsystemstohandlelargestreamsofdata. Thereareseveralgeneralpur-pose DataStreamManagementSystems (DSMSs),suchasAu-rora[1],Stream[2],andTelegraph[4],andspecialpurposesys-tems(suchasfornetworkdatastreams)includingGigascope[11]andTribeca[23].However,ourproblemisinherentlydistributed,whereasthesesystemsareprimarilyfocusedoncentralizedmoni-toring.Recently,distributeddatastreammanagementsystemshavebeenproposedsuchasBorealis[14]andMedusa[24,5].Themainfocusofthesesystemsishowtoeffectivelydistributetheworkofthesystem,andhencebalancetheload;itisthereforeassumedpos-sibletoredirectdatastreams.Inourmodel,weexplicitlyruleoutthepossibilityofsendingallobserveddatatoadifferentprocessor:inresourceconstrainedsituations,ouronlypossibilityistoprocessthestreamofdataatthemonitorwhereitisobserved.(Inourex-periments,weareabletoimproveonthecostofsendingthewholedatastreambythreeordersofmagnitudebyprocessingatthepointofobservation).Themodelofcontinuousdistributedcomputationhasbeenadoptedbyaseriesofpapersoverrecentyears.Withinthiscontext,anum-berofdifferentproblemshavebeenaddressed,includingmonitor-ingthe(exact)top-kset[3],evaluatingsetexpressions[12],andquantilesandheavyhitters[8,16].Techniqueshavebeenproposedwhichallowthecontinuous,distributedmonitoringofavarietyofqueriesbasedonusingsketches[7](fromwhichjoin-sizesandmanyotherfunctionscanbeestimatedaccurately)andbasedontechniquesthatareresilienttoduplicationofinformation[10].Theproblemweconsider,ofmaintainingaccuratecountsaboveathreshold,isnotcoveredbyanyoftheseapproaches.Wenowdiscusspreviousworkthatisclosertothisproblem.Variousworkshaveconsideredaccuratemaintenanceofsumsandcountsinadis-tributedsensornetworksetting,butthesearetypicallyforduplicate-resilient on-demand computations,ratherthancontinuous[19,17,6].The heavyhitters problemisto ndallitemswhosecountisabovesomefraction ofthecountofallitems(e.g.usersinawirednetworkconsumingmorethan1%ofthetotalbandwidth).Whilesemanticallyrelatedtoourthresholdbasedproblem,con-tinuousmonitoringoftheheavyhittersisasomewhateasierprob-Untitled DocumentC  oordinatorsiteRem otesite, 1Rem otesite, iRem otesite, mFigure1:DistributedMonitoringArchitecturelem,sincetherecanbeatmost1/ heavyhittersatanyonetime,whereastherecanbeanunboundednumberofmonitoredcountsexceedingtheirthresholds.ThepermittederrorforHeavyHittersalsoscaleswiththesumofthecounts,ratherthanthecountoftheindividualitem,aswedemandinourformalde nitionofthisproblem(seenextsection).Acontinuous,distributedalgorithmforheavyhittersfollowsasaspecialcaseofquantilemonitoring[8],butthiscannotbeappliedtoourproblemsincetheguaranteesareinsuf cienttomeetourrequirements.MorecloselyrelatedtoourproblemistheworkofJainetal[15],whichconsiderstriggers.Theauthorsmakeacompellingargumentforcontinuousdistributedmonitoringoftriggerconditionsbasedonlocalcountsfi,andsuggestsomepotentialapproaches.ThemethodoutlinedrelatestotriggersoftheformPifi>C,i.e.thetriggershouldberaisedifthetotalofthefisatdifferentsitesexceedsthethresholdC.Thisisclearlyasimpli edversionofourproblem.Thealgorithmsgivenarerandomized,andinexpectationgiveerrorsofconstantfactorstimesC,whicharemuchlargerthanthe guaranteesweprovide.Olstonetal[21]proposea lterbasedapproachtomaintainingaccuratecounts.There,thecentralsitewishestoknoweachsumFj=Pifi,jwithuncertaintyatmost jfora xed j.Theap-proachistoallocate lters (allowableranges)ateachsite,andwhenvaluesareobservedthatliewithinthese lters,noactionistaken;iftheyfalloutsidetherange,thentheyarepasseduptothecentralsite.Toadapttotheobserveddistribution, ltersareperi-odicallyresizedbasedonhowlongthe lterhasbeenvalid.Thisapproachcanbemodi edtoapplytoourproblem,andwecompareagainstthisalgorithminourexperimentalsection.3. PROBLEMDEFINITIONANDAPPROACHWede netheproblemofef cientlymaintainingapproximatecountsinadistributedscenariointhissectionanddescribeourap-proach.3.1 SystemArchitectureWeconsideradistributedmonitoringsystemasshowninFig-ure1.Thesystemconsistsofmremotesitesandacentralco-ordinatorsite.Theremotesitesobserveacontinuousstreamofupdates,whichtakentogetherde neadistributionofvalues.Theremotesitescancommunicatewiththecoordinatorinordertoen-surethatthecoordinatorisabletoaccuratelyanswerqueriesovertheunionoftheupdatestreams.Ingeneral,theremotesitescancommunicateamongstthemselvesaswellaswiththecoordinator;however,inlinewithpreviouswork,weonlyconsiderprotocolsthathave(pairwise)communicationsbetweenthecoordinatorandremotesites.Thisisthemodelthatisadoptedinmostpriorworkoncontinuous,distributedmonitoringproblems[3,21,12,8,7]1.Eachsitei {1...m}monitorsasetofkvaluesNv,i,v {1...k},whicharede nedincrementally.Wemodeleachstreamofupdatesobservedattheremotesiteiasasequenceoftuplesi,v,t,ci,v,t.Thisisinterpretedasanupdateofci,v,ttoNvinsiteiattimet.Weassumethatupdatesareorderedbytimestamp,andsiteionlyseesupdatestoitself2.ThenNv,i(t),thevalueofthecountinsiteiattimet,isde nedasNv,i(t)=Ptci,v,t.Theglobalcount,Nv(t),isde nedasNv(t)=Pi {1...m}Nv,i(t).OurgoalistomonitorthevalueofeachNv(t)withinspeci edac-curacybounds.Sinceweareinterestedonlyinthe current valueofcounts,intherestofthepaperwedropreferencetotanduseNvandNv,itorepresenttheglobalandlocalcounts.Thismodelaccuratelycapturesthescenarioswedescribedintheintroduction.Forexample,innetworktraf cmonitoring,eachup-datemightcorrespondtotheobservationofapacketataremotesite(ormonitor).Inthiscontext,tisthecurrenttime,iistheidenti erofthemonitor,andvandci,v,tarepropertiesofthepacket,suchasdestinationIPaddressandsizeofpacket,respectively.Basedontheinputsfromalltheremotesites,thecoordinatortrackstheag-gregatetraf ctovariousdestinationsandraisesanalarmwhenthetotaltraf cbecomeshigh,indicativeofanunusualactivity(likeaDDoSattack).Monitoringinsensornetworkscanalsobemappedontothismodelinanaturalway.Ingeneral,theupdates,ci,v,t,canbenegative(correspondingtoadecreaseinNv,i,suchastemperatureupdatesinsensornetworks)orfractional(likerainfallmeasurements).Allourmethodswillhandlesuchsettings,butforclaritywefocusonthecasewhereci,v,tsarepositiveintegers,andpostponediscussionofnegativeupdatestoSection6.3.2 TheThresholdedCountProblemOurfocusisonmonitoringtheNvatthecentralcoordinator.SinceNvisde nedbyupdatestoremotesites,ifwerequiretoknowNvexactly,thenwemusteagerlysendeveryupdatefromaremotesitetothecoordinatorassoonasitisobserved.Thisen-suresaccuratevaluesatthecoordinatoratalltimes,butcomeswithhugecommunicationoverhead.AsobservedinSection1,such neaccuracyisnotneededinpractice.Anotherpossibilityisforthere-motesitestosendtheircountsperiodicallytothecoordinatorsite.Thisreducesthecommunicationburden,butstillhassomeissuesinpractice:updatesinrealsystemsaretypicallybursty,i.e.,countschangerapidlyinsometimeperiodswhileitmayhardlychangeinothers.Theformerresultsininaccuratevaluesatthecoordinator,whilethelatterresultsinunnecessarycommunications.Inthispa-per,wede netheproblemofcontinuouslymonitoringthresholdedcounts,whichensuresthatthecoordinatoralwayshasanaccuratecountwithminimaldelay3.DEFINITION1.GivenathresholdTvandanerrorguarantee v,the v-de cientthresholdedcount, Nvsatis esthefollowingproperties"0d NvvwhenNvv1Asinthecitedpriorwork,weconcentrateonthekeyalgorith-micchallengesindesigningeffectiveprotocolsforthissetting,andhencewedonotconsiderissuesofmessageloss.Instead,weas-sumethattherearesuf cientmechanismsinplacesuchasmes-sageacknowledgments,retransmissions,andtimestampingtoen-surethatthecorrectbehaviorisobserved.2Thisisnotastrongassumption;typically,thesiteobservesonlythepairv,ci,v,t,andsuppliestandiitself.3Sincenetworkdelayfromremotesitestocoordinatorisunavoid-able,everytrackingalgorithmmustincursomeminimumdelay.Untitled Document" Nv(1 v) Nvvwhen Nve TvWhere it is clear from context, we will drop the quali cation vand refer to N,T, .  Note that this de nition is distinct from theHeavy Hitter de nition in data streams [18], which requires an ad-ditive error that scales as the sum of all monitored counts; instead,we have a much more demanding requirement to monitor all countswith relative error on each count, above the threshold. Without athreshold, T, the communication overhead is high to begin with,as low counts require every update to be pushed to the coordinatorin order to maintain the error guarantee .  Since low counts aretypically uninteresting for monitoring applications, by supressingcommunication for these counts, the overhead of the monitoringcan be kept low.The value of the threshold depends on individual applications.For applications in network monitoring that track anomalous be-havior, like DDoS attacks, the value of the threshold can be high,while applications like traf c accounting [13] that count the traf csent by hosts or networks beyond a certain initial minimum can usea lower threshold value.3.3   Our ApproachOur basic approach is to set local thresholds at each remote sitesuch that the current count is bounded by the local threshold. Whena local threshold in a remote site is violated, the remote site willcommunicate this to the coordinator and sets a new threshold. Theith remote site maintains local thresholds, ti,j, j = 0, 1, . .. , andensures that ti,f(i)d Nv,ii,f(i)+1for some threshold f(i) thatis known to the coordinator. If the ith remote site s count violatesthis condition, it sends an update to the coordinator with a newf(i) and ti,f(i)such that ti,f(i)d  Nv,ii,f(i)+1for thecurrent value of Nv,i. The coordinator can use the set of ti,f(i)s toestimate any global count as Nv=Pmi=1ti,f(i).Note that while the count at a remote site obeys the ti,f(i)dNv,ii,f(i)+1bounds, the remote site does not send any up-dates until the count is outside these bounds.  Until the coordi-nator receives the next threshold update the actual count can lieanywhere between the two threshold values. Hence, the maximumerror contributed to the global count error by remote site i is givenby ti,f(i)+1 ti,f(i). An algorithm that tracks counts must ensurethat the error is within the -de cient requirement when the countis greater than the speci ed threshold.  Formally, we must ensurethat 0 Pi {1...m}ti,f(i)+1 ti,f(i)vwhen Nv> T.Thus, adjacent thresholds need to be chosen to be close enough tosatisfy this requirement. The total number of updates sent from re-mote sites to the coordinator corresponds to the number of thresh-old boundaries crossed at the remote sites.  This means we wantto set the local thresholds as far apart as possible to minimize thecommunication overhead.Algorithms that track the -de cient thresholded count of anitem need to balance the error requirement with minimal communi-cation overhead. We consider two fundamental categories for set-ting threshold: static thresholding, and adaptive thresholding.  Instatic thresholding methods, each remote site is assigned a prede-termined set of thresholds that do not change over the entire courseof tracking the count. It simply tracks between which pair of thresh-olds its count currently lies, and informs the coordinator when thischanges.  In the adaptive case, when old thresholds are violated,new thresholds at the remote sites are chosen by the central siteaccording to the observed conditions, to dynamically reduce thecommunication overhead.While the adaptive thresholding methods can be expected to per-form better than the static methods, the static methods are desir-able when the capabilities of the remote sites and the coordinatorare limited.  The adaptive thresholding places additional process-ing overhead and additional functional requirements on the remotesites and the coordinator. The coordinator needs to recompute newthresholds and export them to the remote sites, in addition to pro-cessing updates from the remote sites to maintain the count.  Incertain cases, like sensor networks or high speed routers, this ad-ditional processing overhead may be too expensive to accommo-date. A further practical issue with using adaptive thresholding isthat the system has to be more resilient to network delays. Speci -cally, the coordinator may need to collect current values from sites,and send out many new thresholds, which incurs appreciable delaywhere the current counts may be outdated. The static thresholdingscheme does not have this problem because the communication isperformed from the remote site to the coordinator only. Thus thechoice of adaptive or static thresholds will depend not only on theirrelative cost (which we analyze in detail in subsequent sections),but also on the underlying network properties and performance.4.   STATIC THRESHOLDSWe now describe the static thresholding scheme to maintain the -de cient thresholded counts. In these schemes, the threshold val-ues in the remote sites are predetermined and do not change overthe period of tracking.  We present three such threshold assign-ment regimes to determine the local threshold values at the remotesites and discuss their complexity in terms of communication over-head. In this work we consider all the remote sites to be symmetricand hence use the same set of static threshold values. Our focus ison determining the local threshold values in the remote sites for agiven value of and T. The static threshold assignment problemcan be formally stated as:DEFINITION  2. Given m remote sites, a global threshold T, anerror guarantee , and f(i) (the current threshold level at site i),determine threshold values, tj, j = [0, ), such that the followingconstraints are satis ed" j e 0 : tj+1> tj, and t0= 0" f     Nm:Pmi=1tf(i)+1 tf(i)d   Pmi=1tf(i), whenPmi=1tf(i)+1e TThe rst constraint ensures that the threshold values are increas-ing.  The second constraint captures the error requirement of thethresholded count problem. The maximum error in the ith remotesite when f(i) is the threshold in force at site i is tf(i)+1 tf(i).Thus, the second constraint states that the total error in the count atthe coordinator must satisfy the thresholded error guarantee for allpossible threshold values at the remote sites.4.1   Uniform Threshold AssignmentThe simplest solution is to keep the maximum global error levelat T at all times even when the global count, N, is much greaterthan T.  This can be accomplished by setting the threshold levelsin each monitor as tj=j Tm.  When N  e T, we have the totalerrorPmi=1tf(i)+1 tf(i)d T  d N, thus satisfying the -de cient thresholded count constraints.  If the global count is N,the maximum number of updates sent to the coordinator is givenby mN T.  This simplicity comes at a price.  The method workswell for counts that are small (below T or only above T by a smallamount), since the threshold gaps are relatively large.  But as Nincreases above T, the cost scales linearly with N, as the overlytight error guarantee is maintained. In summary,LEMMA  1. The total number of messages from all remote sitesto the coordinator with uniform threshold assignments is O(mN T).Untitled Document4.2   Proportional Threshold AssignmentA more scalable solution is to assign threshold values propor-tional to the local count at the remote site. The thresholds at the re-mote sites are assigned as tj= (1 + )tj 1and t0= 0, t1= 1. Ifthe threshold value reported by site i to the coordinator is tf(i), themaximum possible error from the site is tf(i)+1 tf(i)= tf(i).The maximum error at the coordinator is:mXi=1tf(i)+1 tf(i)=mXi=1 tf(i)d mXi=1Ni= Nwhere N is the global count.  This assignment satis es the errorrequirement even when the global count is less than the thresholdT.LEMMA  2. The total number of messages from all remote sitesto the coordinator with proportional threshold assignments is O(m logNm).PROOF.  If tf(i)d Nif(i)+1, the number of updates fromremote site i is given by f(i). Since tf(i)= (1 + )f(i) 1we getf(i) = 1 +log(tf(i))log(1 + )d 1 +log(Ni)log(1 + )The total number of messages is bounded bymXi=1f(i) d m +mXi=1log(Ni)log(1 + )d m + mlog`Nm log(1 + )We use the fact thatPmi=1Ni= N,Pmi=1log Ni= log`Qmi=1Ni ,andQmi=1Niis maximized when i : Ni=Nm. Since for log 1(1 + ) = O(1 ), the stated bound follows.This method of assignment performs well when N   T. Therelative cost of the uniform assignment to the proportional assign-ment is O(m/ log(N/m))/O(Nm/( T)) = O(T/N log(N/m)).When T is greater than N, the uniform spread assignment performsbetter, but as N increases above T, the proportional assignment re-quires fewer communications.4.3   Blended Threshold AssignmentThe main idea of blended threshold assignment is to exploit thebest features of the previous two assignments and provide a mech-anism to tune the performance for different values of N.DEFINITION  3. The blended assignment sets the local thresh-old values as" tj= (1+ )tj 1+(1 ) Tm, for a parameter 0 d d 1" t0= 0, and when = 1, t1= 1Note that = 0 corresponds to the uniform assignment while = 1 corresponds to the proportional assignment.  Varying thevalue of helps in tuning the threshold values to combine uniformand proportional thresholds.THEOREM  4. The blended threshold assignment satis es the -de cient thresholded error guarantee for all values of [0, 1].PROOF.  Using the blended threshold assignment, the maximumerror in the ith remote site istf(i)+1 tf(i)= tf(i)+ (1 ) TmThus, the total error in the global count is given byPmi=1tf(i)+1 tf(i)=    (Pmi=1tf(i)) + (1 ) Td    N + (1 ) Td    N, when N > TLEMMA  3. The total number of messages from all remote sitesto the coordinator with blended threshold assignments and 0   m log(1 + (NT 1)))PROOF.  The threshold values using the blended assignment for (0, 1) can be written astj= (1 + )j 1 (1 ) TmThus, the number of updates from remote site i when the thresholdvalue exceeded is f(i) isf(i) =log(1 + tf(i) m(1 )T)log(1 + )dlog(1 + Ni m(1 )T)log(1 + ), since tf(i)d Ni=log(1 + hi) log(1 )log(1 + ), where hi=NimT 1Note that givenPmi=1hi=NmT m, the expressionQmi=1(1 + hi)is maximized when i : hi= h =NT 1. The total number ofupdates from all remote sites ismXi=1f(i) =log(Qmi=1(1 + hi)) m log(1 )log(1 + )(1)dmlog(1 + h) log(1 )log(1 + )(2)Upper bounding this expression gives the stated worst case bound.Determining the optimum value of . For small values of N T, = 0 gives us the best possible assignment and for large valuesof N T, = 1 gives us the best assignment. For intermediatevalues of N, the best value of can be determined by minimizingthe number of updates.Note that the communication cost in Lemma 3 is dependent onthe global count, N.  Hence, the optimal value of depends onN. We advocate two approaches to determining the best value of .  The rst approach is to track the global count and determinean expected value of N, Neafter a long period of observation,and use this value to determine the optimal value of . This can beexpected to result in good performance if the actual value of N doesnot vary a lot from the estimate Ne. A more sophisticated approachis to track the distribution of N over a large set of observations anddetermine the value of that minimizes the expected number ofupdate messages over this distribution.THEOREM  5. The total number of updates (from Eqn 2), KN=mlog(1 + h) log(1 )log(1 + )is a convex function in in the range (0, 1) for small values of .The proof of this theorem is presented in Appendix A.THEOREM  6. Given an expected value of N or a discrete prob-ability distribution of N, we can nd a value of that minimizesthe number of messages with blended threshold assignments.PROOF.  First, observe that if p(N) is the probability densityfunction of N, then the expected maximum number of updatesgiven by K =P N=1p(N)KNis a convex function in in therange      (0, 1).  Since K is a convex combination of convexfunctions KN, K is itself convex.Untitled DocumentSince K and KNare convex functions in in the range (0, 1),there exists a single minimum for K and KNthat can be searchedby using techniques like gradient descent.  The descent algorithmcan be used to determine the optimal values of for both the pro-posed approaches. In the rst approach where we are given the ex-pected value Ne, we determine the optimal value of by minimiz-ing KNe. In the second approach where we are given the distribu-tion of N, we can use the descent method to determine the optimalvalue of by minimizing the function K as de ned above.5.   ADAPTIVE THRESHOLDSUnlike the static thresholding scheme, in the adaptive threshold-ing scheme the thresholds in the monitoring nodes are adaptivelyset by the coordinator every time there is a threshold violation in anode. In other words, the coordinator not only receives the thresh-old violations from the monitoring nodes, but also reacts to themby sending new thresholds back. This gives the coordinator morepower to set thresholds based on more information about how thedistributions at each site are evolving, and hence try to reduce thenumber of threshold violations. In a general scenario, the coordi-nator may wait for multiple violations before resetting thresholds,and may reset thresholds for arbitrary subsets of the nodes basedon a complete history of past violations. In this work, we react toeach threshold violation, and consider only recent history.5.1   Adaptive Threshold Assignment ProblemIn the adaptive thresholding scheme, two levels of thresholds,lower and higher thresholds, are maintained at every node at alltimes.  The lower threshold at node i is denoted by tiL, and thehigher threshold by tiH, so that at all times tiLd NiiH. Ifthese thresholds are violated, i.e. if this condition is no longer true,then the site i contacts the coordinator site with its current countNi, and it resets its lower threshold tiL= Ni. The coordinator es-timates the count as the sum of the reported counts from the remotesites, N  =Pmi=1tiL.  The coordinator then updates the tiHfornode i, and possibly those of other nodes, to ensure that its countstill meets the -de cient requirement. To minimize the communi-cation in the system, the coordinator needs to set the upper thresh-olds to as high a value as possible. Note that the maximum errorcontributed by site i is tiH tiL.The problem of setting the upper thresholds of the remote sitesby the coordinator can be formally stated as follows.DEFINITION  7. Given m remote monitoring nodes, a globalthreshold T, an error guarantee , and a threshold violation fromnode j, our objective is to determine the higher threshold values,tiH, in all m monitoring nodes such that the number of messages inthe monitoring system is kept as low as possible, and the followingconstraints are satis ed:" 1 d i d m, tiH> tiL"Pmi=1tiH tiLd Pmi=1tiL, whenPmi=1tiHe TSimilar to the static thresholding scheme, the rst constraint en-sures that the higher thresholds are greater than the lower thresholdsin all the nodes and the second constraint ensures that the total er-ror in the count at the coordinator must satisfy the thresholded errorguarantee.In the static threshold method, the remote sites do not know ifthe current global count is greater than T or lesser at any time.Hence, the thresholds need to be set to handle both these cases.A key advantage of the adaptive algorithm is that when the globalcount is less than the threshold, the coordinator can afford to setBASICADAPT( , T, m)1: tiL 0; tiH Tm; N 02: loop {receive update (i,Ni); }3:    if (( N N +Ni tiLe (1 )T)) then4:poll all sites j for Nj; tjL Nj; send tjH (1+ )tjL;5:    tiL Ni; N Pmj=1Nj;6:    if ( N 7:for all j send tjH tjLT Nto j;8:    else9:send tiH tiL(1 + ) to iFigure 2: Basic Adaptive Thresholding Algorithmhigher thresholds at the remote sites than in the static algorithm.To illustrate this, de ne the slack in the system as the differencebetween the threshold and the current estimate of the global count,S  =  T N.  The coordinator can now split this slack amongthe remote sites in any manner and still be able to satisfy the -de cient error requirement.  Assume that the slack is split amongthe remote sites as i, i  =  1, . .. , m, such thatPmi=1 id  S.Thus, tiH= tiL+ i. If the counts at all the remote sites are lessthan their respective upper thresholds, then the global count mustbe lesser than the global threshold because N Pmi=1tiHd T.If at any point the global count exceeds the threshold, at least oneof the thresholds in the remote sites will be exceeded. This allowsthe coordinator to determine when the count exceeds the thresholdand switch to the case when N e T and track the count closely tosatisfy the -de cient error requirement.5.2   Basic Adaptive AlgorithmWhen the total count estimated at the central site, N, is less thanT, a naive approach is to split the slack equally among all the nodes;instead, we propose to split the difference proportional to the cur-rent count in the nodes, since nodes that have larger counts thanothers are likely to grow larger.  We set the new tiH=  tiL+(T Pmj=1tjL)tiLPmj=1tiL= tiLT N.If N e (1 )T, we set tiH tiL= tiL, so that the maximumerror in each node is tiL. This approach is similar to the propor-tional spread threshold assignment algorithm for static thresholdingproblem presented in Section 4.2. The adaptive thresholding algo-rithm is presented in Figure 2.  Line 7 performs the proportionalsplit when the counts are small, and line 9 performs the propor-tional growth when the counts are large. Lines 3 and 4 handle thecase when we switch from having N N e (1 )T.LEMMA  4. The adaptive thresholding assignment algorithm pre-sented in Figure 2 satis es the -de cient thresholded count con-straints.PROOF.  When N =Pmi=1tiLPmi=1NiPmi=1tiH=T NPmi=1tiL= T, so we know that thetotal count is less than the threshold T. When N e (1 )T, weknow that the total count exceeds (1 )T and the algorithm issimilar to the proportional spread threshold assignment algorithmfor the static thresholding scheme. In this case, tiH tiL= tiLand soPmi=1(tiH tiL) = N d N, as required.Although this algorithm is simple and intuitive, it has some draw-backs: The rst time there is a threshold violation from some re-mote site i, the tiHvalue at the node is set to T while the value atUntitled DocumentMODIFIEDADAPT( , T, m)1: tiL 0; tiH Tm; R ; N = 0;2: loop {receive update (i,Ni); }3:    if ( N N tiL+ Nie (1 )T) then4:poll all sites j for Nj; tjL Nj; send tjH (1+ )tjL;5:    if (R = ) then6:for j = 1 to m do7:Poll site j for Nj; tjL Nj; sj max{tjL, Tm};8:if Nj Tmthen send tjH Tmto j9:    tiL Ni; N PjtjL; si tiL; R R * {i};10:    if ( N 11:S Pmr=1sr; tmin minj{tjLPr RtrL(T S)};12:for all j E do13:if tmin Tmthen send tjH Tmto j;14:else send tjH tjL+tjLPr StrL(T S)} to j15:    else send tiH (1 + )tiLto i;Figure 3: Modi ed Adaptive Thresholding Algorithmall other nodes will be set to 0, since Nj= 0 at the coordinatorsite initially.  This could unnecessarily trigger a lot of communi-cations especially when several nodes have non-zero counts. Sec-ondly, when the estimated aggregate count at the central node isclose to T the new threshold will be very close to the old thresh-old, thus triggering a lot of threshold violations. In the followingsection we present a modi ed algorithm that addresses the aboveshortcomings.5.3   Modi ed Adaptive AlgorithmIn order to avoid the problems in the original algorithm for adap-tive thresholds we modify the original algorithm, which is illus-trated in Figure 3.  There are two main differences between theoriginal and modi ed algorithms: (a) As soon as the central nodereceives the rst threshold violation the tiHvalues in all the nodeswhose counts Niare below Tmare initialized to Tm, and (b) whenthe difference between global threshold and the estimated aggre-gate count is small (i.e., below Tm), instead of using the adaptivestrategy of distributing the difference to all the nodes, we maintaina constant difference between the upper and lower thresholds, i.e.,tiH tiL= Tm. In our algorithm in Figure 3, we maintain a setR of nodes whose count exceeds Tm. Lines 5 9 deal with the rstthreshold violation, by polling all nodes to initialize S, and settingupper bounds for the nodes not in R.  If the total count is suf -ciently below T, lines 10-14 allocate the slack in proportion to thecounts; however, we ensure that the difference between higher andlower thresholds is at least Tm, using extra variables si, to ensurethat the total amount of slack allocated stays within the permittedbounds. Lines 3 4 deal with the case when the count rst exceedsT, and from that point on, we switch to proportionally increasingcounts (line 15) as before.LEMMA  5. The modi ed adaptive thresholding assignment al-gorithm presented in Figure 3 satis es the -de cient thresholdedcount constraints.PROOF.  Consider the case when N  site i does not belong to R (i / R), tiH= Tm. In line 11, the restof the available slack, T S, is proportionally divided to the restof the sites R. tmindenotes the minimum of the slack values. Iftmin Tm, then all sites R are allocated a slack of Tmin line 13of the algorithm. Hence N Pmi=1tiH= N +Pmi=1 TmIf tmin> Tm, then the slacks are proportionally allocated to thesites.  Hence N  Pmi=1tiH= T, because the algorithm allo-cates the slack in the system to the sites. Thus, if N N N e (1 )T, the total count exceeds (1 )Tand the algorithm follows the proportional spread threshold assign-ment in the static scheme, and the proof is the same as the previousLemma. Thus, the modi ed algorithm satis es the -de cient errorconstraints.THEOREM  8. The total number of messages from all remotesites to the coordinator using the modi ed adaptive algorithm isO(m (m+log(N Tm)) when N > T, and O(m2N T) when N d T.PROOF.  We split our analysis into two parts, rst when the totalcount is less than T, and the second when it exceeds T.  In the rst part, the algorithm ensures that the slack in each threshold,i.e.  tiH tiLis always at least Tm.  Thus, there can be at mostmN Tthreshold violations before the count reaches T, simplifyingtom when N rst exceeds T. Each threshold violation causes atmost O(m) messages to be sent, to inform the sites of their newhigh thresholds tiH.  When the count is above T, the algorithmmimics the proportional threshold assignment case in Section 4.2,and adapting Lemma 2, the number of messages between remotessites and the central site to go from T to N is O(m logN Tm). Theresult follows by summing these two bounds.Note that one can easily force &(m2+m logN Tm) messages by rst making one site have countTm, then settingm2counts Ni= Tm(to set up the adaptive thresholds).  Then for each of the samem2sites in turn, set their local count to Ni=Tm: each of these settingscauses (m) messages, overm2sites gives the &(m2) bound. Us-ing the remainingm2sites (currently with zero local count each),one can then elicit them2+m2log2(N T )mlog(1+ )=  &(m log(N Tm))cost from the proportional threshold settings. However, in general,we expect to do much better than this worst case bound, since theanalysis is somewhat pessimistic.6.   EXTENSIONSNegative Updates. Thus far we have assumed that all updates re-ceived at remote sites are non-negative.  However, a simple ob-servation is that our static protocols remain correct when negativeupdates are permitted. Instead of checking for thresholds being ex-ceeded, we must check that the upper threshold remains an upperbound, and also that the lower threshold remains a lower bound.Similarly, our adaptive protocols can also handle negative updateswith minor modi cations.  The analysis in previous sections thatrelates the cost of the protocol to the value of the global count nolonger applies:  positive and negative updates can cause a lot ofcommunication but leave the global count quite low, thus the com-munication bound cannot still hold.  Indeed, if the updates causecounts to repeatedly cross the same threshold boundaries (in thestatic case), then the best bound we can state is one that is linear inthe number of updates.Sliding Windows.  Being able to handle negative updates meansthat we can apply our methods to other models of computing counts.Typically, we do not want to monitor counts which increase inde -nitely. Indeed, in several of the queries outlined in the Introduction,time windows were implicitly given in the form of within an hour or in a day . There are several models for dealing with such time-windowed queries:  (a) Periodic reset.  After the time period hasUntitled Documentelapsed, reset all counts to zero, and restart the protocol. (b) Slid-ing window.  Ensure that the current count covers exactly the lasthour, for example, by keeping track of past updates, and applyingupdates older than one hour as negative updates.  In the case thatthere is insuf cient storage to retain this many updates, then ap-proximate information can be kept, as explained below. (c) Over-lapping window. A compromise between periodic reset and slidingwindow is to apply the overlapping window approach: for exam-ple, the window consists of one hour s data, and the start of thewindow is advanced by ve minute intervals every ve minutes (sothe window contains between 1 hour and 1 hour and ve minutesof updates). Now we just have to record the sum of updates in each ve minutes and apply these as a single negative update when thestart of the window is advanced.Approximate Counts.  So far we have assumed that there is suf- cient storage capacity at the remote nodes to store all local countvalues.  But in the case when there are very many updates of dif-ferent values (for example, tracking network activity), we cannotmake this assumption. We may use the same (static) thresholds and value for all counts to reduce space usage, but still there may betoo many counts to store.  The natural solution is to adopt an ap-proximate way of storing the counts, such as Lossy Counting [18]or Count-Min sketch [9]. However, using such approximate struc-tures mean that the guarantees that we can give are much weaker:instead of the -de cient guarantee, we must now give a guaranteerelative to Nv+ PvNv, since the approximate counting meth-ods return counts of each item with error PvNv. Although wecan reduce (at the cost of more space), in general it is not possi-ble to set a non-zero value of that gives a -de cient guarantee.Hence, the result appear more in line with those that follow forHeavy-hitter style problems [8].Time-dependent Thresholds. Prior work has built models of howdata varies with time in order to reduce the communication costfurther [8, 7]. We could apply a similar approach to our methods:the result is time-dependent local thresholds.  Now we would setour thresholds so that they can increase or decrease as time passes,so that we ensure that the total uncertainty still remains within thesame bounds. The idea is that the varying thresholds predict wherethe true count will lie at time t; if this prediction is correct, then nocommunication cost is incurred. If any (now time dependent) localthreshold is broken, then communication is triggered with the coor-dinator, and the model can be recalibrated with the recent history.We hope to investigate such extensions in future work.7.   EXPERIMENTAL STUDYIn this section we present the experimental evaluation of ourstatic and adaptive algorithms. We also compare the performanceof these algorithms with a technique proposed by Olston et al in [21](referred to as the OJW algorithm in the rest of this work) wherethe authors try to minimize communication overhead while main-taining a certain accuracy for continuous queries over a distributeddata stream.We begin by presenting our experimental setup in Section 7.1.We then experimentally show that our algorithms always satisfythe problem requirements. We also validate our blended approachfor static thresholds by comparing the theoretical results from themodel with the results from our experiments.  Then we presentsome observations that provide more insights about the usefulnessof our algorithms.7.1   Experimental SetupTo gain a better understanding of our theoretical analysis, webuilt a simulator with m monitoring nodes and one central node.Data Sets. Although the de nition of thresholded counts problemis applicable in a variety of different scenarios (as pointed out inSection 1), our focus in the experiments is on a distributed networkmonitoring system. In this scenario, every node monitors traf c ona link for all the registered events and increments the count for allthe events that are observed. We de ne an event as the occurrenceof a combination of destination IP address and the destination portnumber in a packet seen by a monitoring node.We use publicly available link traces from NLANR [20] as inputto our distributed monitoring system. These traces are for a singleingress link, and we transform this data for our distributed systemby assigning a probability distribution for distributing packets ran-domly to the various monitors. By using different probability dis-tributions, we can simulate various scenarios that can occur in realnetworks. For example, a skewed probability distribution functionrepresents a scenario where a few nodes (that are monitoring largeinter-domain peering links) receive large number of events whileothers do not.  Similarly, a uniform distribution represents a sce-nario where events are equally likely to occur in any of the mon-itoring nodes.  Although we track all the events that occur in thelink traces from NLANR, for the ease of illustration, we presentthe results for tracking one event whose overall count was 960000.Implementation Issues. We implemented the static and adaptivealgorithms described earlier in Sections 4 and 5. Since the OJW al-gorithm was not proposed to address the thresholded counts prob-lem, we need to set certain parameters of the algorithm in [21] toapply it to our problem. The main issues are:" The OJW algorithm assumes that a single node can monitorall the updates for a given object/event and a single query caninclude multiple objects. Since we are interested in trackingthe same objects/events in multiple monitors, we treat eachitem in each site as a separate object/event that is the subjectof a single query." In the thresholded counts problem de nition, the error valuesare relative, i.e., the maximum error allowed for an event inthe system depends on its current count. The original OJWalgorithm uses absolute errors (i.e., the total error in the sys-tem is required to be below a certain constant value), so toapply to our problem, we set the maximum allowed error foreach count to be xed at T, divided evenly between all siteswhere it can occur.These parameter settings of the OJW algorithm are our best ef-fort to make the algorithm apply to our -de cient thresholdedcount problem.  They ensure that the algorithm generates resultsthat are correct according to our problem de nition (and the algo-rithm falls into our class of adaptive algorithms); however, we willsee that the cost is much higher than our algorithms that were de-signed for this problem.7.2   Performance AccuracyCount Accuracy. In Figure 4(a) we examine the total error in thedistributed monitoring system as packets arrive at various monitor-ing nodes while using the blended static threshold assignment. Weset the values of T, , and m to be 10000, 5%, and 20 respectively.When the count of the event is less than T the error in the systemcan be as high as 50%, but after the count exceeds the value of Tthe error is always less than the value speci ed by (indicated bythe heavy line on the gure). Different parameter settings yieldedsimilar results.Untitled Document00.511.522.5x 1040 1020304050 =0.0 =0.3 =0.7 =1.0T = 10000 = 0.05 m =20 E rror = 5% (a)NVsErrorpercentageinthestaticcase   !  !  ! "  "  " (b)NVsErrorpercentageintheadaptivecaseFigure4:Testingaccuracyforstaticandadaptivecases     Figure5:Comparingtheoptimaltheoretical valueswiththeresultsobtainedfromsimulation.Thex-axisrepresentsdiffer-entcombinationsofT, ,andN.Eachcombinationisreferredtoasa parametersetting andrepresentsapointonthex-axis.Theresultsforthesameexperimentperformedwiththeadaptivealgorithms(butvaryingvaluesofT)areshowninFigure4(b).Thedistinctiveshapeofthecurveforthemodi edadaptivealgorithmisexplainedbythedifferentpartsofthealgorithm:theinitialhigherrorisduetoallocatingtiH=Tmintheinitialphaseoftheal-gorithm.Theerrordropstozerowhenthecentralnodepollsallthemonitoringnodesandhencehasaccuratecountinformation.Theerrorgraduallyincreaseswhennodesareallocatedadaptivethresholds,whichallowsthetotalerrortogrow(withintheallowedbounds),untilthecountreaches(1 )T.Finally,thealgorithmswitchestoproportionallygrowingthresholds,whichkeepthefrac-tionalerrorwithinthenecessarybounds.Thusweobservethatthetotalerrorinthesystemislessthanthevaluespeci edby afterthetotalcountexceedsT.Meanwhile,thebasicadaptivealgorithmhasconsistentlyhighererrorforNnicationcost.Setting fortheStaticAlgorithm.Tovalidateourtheoreticalresults,wecomparetheoptimalvalueof obtainedfromourthe-oreticalmodelusingagradientdescentapproachtotheidealvalueof obtainedfromexperiments.Forthisexperiment,weuseauniformdistributiontosendagivenpacketfromtheinput letothemonitoringnodes.Thisisbecausethestaticthresholdassign-mentalgorithmshaveaworstcasewhenthepacketsareuniformlydistributedacrosstheremotesites.Werepeattheexperiment100timestoensurethattheoutcomeisnotbiasedbyoutliers(whilegeneratinguniformdistribution),andtheexperimentalresultspre-sentedherearetheaveragevaluefromallthe100runs.InourexperimentsweconsiderseveraldifferentvaluesforTand .TherangeofvaluesforTwas[100,100000]andtherangeofvalues was[0.01,0.1].AsweshowedinSection4.3,thetotalnumberofmessagesinamonitoringsystemusingblendedstaticthresholdingapproachalsodependsonNi,thecountoftheeventinthemonitori.Henceinourexperimentswealsovarytheoverallcountoftheeventthatwetrackintherange[2500,960000].ForeachcombinationofvaluesforT, ,andN(referredtoasaparam-etersetting),wevarythevalueof from0to1withincrementsof0.001.Foreveryparametersettingwecomputethevalueof thatresultedintheminimumnumberofcommunications,bothusingsimulationsandthetheoreticalmodel.Thecomparisonoftheidealvaluesof fromthesimulationsandtheoreticalmodelisshowninthetopofFigure5.Althoughthetheoreticalresultscloselymatchtheexperimentalvaluesinmostofthecases,thereareafewcaseswherethedifferencebetweenthetwoissigni cant.However,thesehaveminimalimpactontheoverallcost,asshowninthelowerhalfofthe gure.Thediscrep-anciesaremainlyduetothefactthatweuseintegervaluesinoursimulatorwhileourtheoreticalmodelignoresthisconditionandconsidersthresholdstoberealvalues.Thedifferencebetweentheexperimentalandtheoreticalresultsaresigni cantonlywhenthevaluesofbothTand aresmall(i.e.,whenthesystemrequireshighaccuracy).Weseethatinmostcases,thecostusingthetheo-reticallypredictedalphaisasgoodasorbetterthanthevaluefoundbysimulations,andinonlyafewcasesisthereaslightbene tfortheempiricallyfoundvalue.7.3 CommunicationCostUniformlyDistributedEvents.Figure6(a)examinestheimpactUntitled Document00.20.40.60.810500010000 00.20.40.60.810200040006000T=30000T=50000T=100000T=500000T=30000T=50000T=100000T=500000 = 0.05 = 0.1 (a)Communicationcostas variesinstaticcase0246810x 10401000200030004000T02468104102104106108TB asic A daptive A lg; = 0.05B asic A daptive A lg; = 0.1M odified A daptive A lg; = 0.05M odified A daptive A lg; = 0.1M odified A daptive A lg; = 0.05M odified A daptive A lg; = 0.1O JW A lg; = 0.05O JW A lg; = 0.1(b)CommunicationcaseoftheadaptivealgorithmsFigure6:CommunicationCostofStaticandAdaptiveAlgorithms00.20.40.60.81010002000300040005000600070008000 T=50000 T=100000 T=500000 = 0.05 Figure7:Variationincommunicationcostwithvarying inthestaticmodelover500repetitionswithrandomincomingpacketdistribution.of onthenumberofmessagesexchangedinamonitoringsystemusingstaticthresholds.Thetotalcount,N,oftheeventthatistrackedis960000.ForasmallvalueofT(i.e.,ahighvalueoftheratioNT),asthevalueof increases,thenumberofmessagesexchangedinthesystemdecreases.Inotherwords,theoptimalvalueof iscloseto1.ForalargervalueofT(i.e.,asmallvalueofNT),theoptimalvalueof iscloserto0.Inessence,astheratioNTdecreases,theoptimalvalueof movestowards0.However,thereisabroadrangeofsettingsof whichachievesimilarlylowcosts,showingthatanapproximatevalueof willoftensuf ce.Lastly,inlinewithexpectations,decreasing increasestotalcost.InFigure6(b)wecomparetheperformanceofthebasicadap-tive,modi edadaptive,andtheOJWalgorithms.Thetopgraphinthe gurecomparesthetotalnumberofmessagesexchangedinthesystem(fromthemonitorstothecentralnodeandviceversa)usingbasicadaptiveandmodi edadaptivealgorithms.Themodi edal-gorithmoutperformedthebasicalgorithmbyanappreciablefactorinallourexperiments.Thebottomgraphinthe gurecomparesthe0501001500500100015002000250030003500400045005000S taticA daptiveFigure8:Comparingcostofadaptiveandstaticthresholdset-tings. EachcombinationofT, ,andNiisreferredtoasa parametersetting andrepresentsasinglepointonthex-axis.performanceofOJWalgorithmwithmodi edadaptivealgorithm.Notethatthey-axisinthisgraphisinlogscale.Thegraphshowsthatthemodi edalgorithmperformsatleasttwoordersofmag-nitudebetterthantheOJWalgorithmcon rmingthattheexistingtechniquesareinsuf cientforthisproblem.RandomlyDistributedEvents.Previousresultswerebasedonse-lectingwhichnodetoupdateuniformly.InFigure7weexploretheeffectofusingrandomdistributionstoupdatedifferentnodes.Ran-domdistributionsarecreatedbygeneratingrandomprobabilitiesassociatedwitheachofthemonitoringnodes.Theseprobabilitiesareusedtosendtheupdatesfromtheinputtracetothemonitoringnodes.Notethatadifferentrandomdistributionisgeneratedforeverysimulationrun.Werepeatthesimulation500timestoensurethatwecaptureavarietyofdifferentrandomdistributions.InFig-ure7weplottheaveragenumberofmessagesexchangedduetotheserandomdistributions.Theerrorbarsinthe gurerepresenttherangeofvaluesforthenumberofmessagesgeneratedbythe500runsofthesimulator.Wecanseethattheeffectofusingran-Untitled Document (a)NumberofmessagesVs. forstaticthresholdsusingonehour sdatafromalivenetwork.102103104105101102103104105106107 = 0.01 = 0.05 = 0.1(b)NumberofmessagesVs.Tforadaptivethresholdsusingonehour sdatafromalivenetwork.Figure9:Experimentsonrealnetworkdatadomdistributionsisrelativelysmall.Asbefore,theoptimalvalueof thatresultsintheminimumnumberofmessagesinthesystemdecreasesastheratioNTdecreases.Notethatthetotalnumberofmessagesinthebestcase(1000-2000)isapproximately0.1%ofthetotalnumberofupdates.Henceweobserveathousand-foldre-ductionincostcomparedtothecostofsendingeveryupdatetothecentralsite.ComparingCostsofStaticandAdaptiveAlgorithms.Figure8comparestheblendedstaticthresholdingalgorithmandthemod-i edadaptivealgorithmintermsofthenumberofmessagesinthemonitoringsystemfordifferentparametersettings.Forboththealgorithms,wevarythevaluesofT, ,andNintherange[100,100000],[0.01,0.1],and[2500,960000]respectively.Inthestaticalgorithm,weusedtheempiricallydeterminedoptimalvalueof forthegivenparametersetting.Wecanseethattheperfor-manceoftheadaptivealgorithmisalwaysslightlybetterthanthestaticalgorithm.However,whichmethodisbestwilldependonthescenarioinwhichtheyarebeingapplied:everymessageinthestaticalgorithmisonlyafewbyteslong(toindicatethecurrentthresholdbeingusedbythesite),whilethemessagesarelongerinadaptivealgorithmssincethecentralsitemustgivemoreinforma-tion(thetypeofmessage,thenewthresholdbeingsent,etc.).Inpowerconstrainedsensornetworks,theenergyconsumptionoftheadaptivealgorithmsmaythereforebehigher,whereasinmoretra-ditionalwirednetworks,thesizeofmessageheaderswillmakethedifferenceinsizeofthemessagesinsigni cant.ExperimentsonRealNetworkData.Theresultsfromourexper-imentspresenteduntilnowtrackedasingleevent.Inordertoex-ploreamorerealisticandpracticalscenario,weobtainedcompletenetworkpackettracesfromaresearchnetwork.Thenetworkcon-sistsofseveralroutersandweobtainedanonymizedtracesofallthepacketsthatenteredthenetworkateachoftheroutersforonehouronAug15,2005.4Ournetworkmonitoringsystemarchitectureconsistedofmonitoringnodes,onecollocatedwitheveryrouterinthenetwork.Weusedthetracesfromthecollocatedrouterastheinputtothemonitoringnode.Themonitoringnodestrackedallthe4Foranonymityandproprietaryreasons,wecannotgivefurtherdetailsoftheexperimentalsetup.incomingeventsforonehour,approximately8millionintotal.Figure9(a)showsthenumberofmessagesrequiredbythestaticmonitoringsystemtotrackalltheeventswith -accuracy.WecanseethatwhenthevalueofTissmall,ahighvalueof resultsinminimumcommunicationoverhead,andathighervaluesofT,thebestvalueof reduces.Figure9(b)showsthenumberofmes-sagesinthemonitoringsystemusingadaptivethresholds.Com-paringFigures9(a)and9(b)wecanseethat,atlargevaluesofT,theadaptivealgorithmperformssigni cantlybetterthanthestaticalgorithm.SinceFigure9(b)isplottedonalog-logscale,itshowsanapproximatelylinearrelationbetweenthelogarithmofthenum-berofmessagesandlogT,implyinganinversepolynomialdepen-dencyonT.Thisagreeswithouranalysisoftheadaptivealgo-rithm,suggestingthatthebulkofthecostisduetoitemswhosecountNvtionaltoNv T,whichagreeswiththeobservedbehavior.8. CONCLUSIONSANDFUTUREWORKWehavegivenanumberofalgorithmstoef cientlymonitordistributedsetsofcountsinacontinuousfashion,afundamentalproblemattheheartofmanynetworkandsensormonitoringprob-lems.Inourexperimentalevaluation,weobservedthatouradaptivealgorithmstypicallyoutperformthosebasedonmaintainingstaticthresholds.However,theadaptivealgorithmsmaybemoreexpen-siveintermsofresourcesrequiredtorunandcomputationalpoweroftheparticipants.Severalproblemsremainopentostudyinfuturework: rstly,tocomparethecostofdeployingthethesealgorithmsinrealnet-workscenariossuchasaparticularsensornetworkenvironmentorIPnetworkmonitoringsetting(sofarwehavefocusedontheper-vasivecostissuesratherthanconsideringanyparticularsituation).Inthesesituations,wecanconsiderothernetworktopologies,suchasmorehierarchicalapproachestoavoidoverwhelmingthecentralcoordinator.Itwillbeofinteresttoextendourapproachtootherquerytypeswithasimilarthresholdednature,suchasarithmeticcombinationsofthresholds( reportx+ywhenx+y>T or reportx ywhenx y>T )orapplyacrosssites( reportthenumberofsitesobservingeventEwhenEisobservedbymorethannsites ).Untitled Document9. REFERENCES[1]D.Abadi,D.Carney,U.C etintemel,M.Cherniack,C.Convey,C.Erwin,E.Galvez,M.Hatoun,A.Maskey,A.Rasin,A.Singer,M.Stonebraker,N.Tatbul,Y.Xing,R.Yan,andS.Zdonik.Aurora:adatastreammanagementsystem.InProceedingsofACMSIGMOD,page666,2003.[2]A.Arasu,B.Babcock,S.Babu,M.Datar,K.Ito,I.Nishizawa,J.Rosenstein,andJ;Widom.STREAM:theStanfordStreamDataManager(demonstrationdescription).InProceedingsofACMSIGMOD,pages665 665,2003.[3]B.BabcockandC.Olston.Distributedtop-kmonitoring.InProceedingsofACMSIGMOD,2003.[4]S.Chandrasekaran,O.Cooper,A.Deshpande,M.J.Franklin,J.M.Hellerstein,W.Hong,S.Krishnamurthy,S.R.Madden,F.Reiss,andM.A.Shah.TelegraphCQ:continuousdata owprocessing.InProceedingsofACMSIGMOD,page668,2003.[5]M.Cherniack,H.Balakrishnan,M.Balazinska,D.Carney,U.Cetintemel,Y.Xing,andS.Zdonik.Scalabledistributedstreamprocessing.InProccedingsofConferenceonInnovativeDataSystemsResearch,2003.[6]J.Considine,F.Li,G.Kollios,andJ.Byers.Approximateaggregationtechniquesforsensordatabases.InIEEEICDE,2004.[7]G.CormodeandM.Garofalakis.Sketchingstreamsthroughthenet:Distributedapproximatequerytracking.InProceedingsofVLDB,2005.[8]G.Cormode,M.Garofalakis,S.Muthukrishnan,andR.Rastogi.Holisticaggregatesinanetworkedworld:Distributedtrackingofapproximatequantiles.InProceedingsofACMSIGMOD,2005.[9]G.CormodeandS.Muthukrishnan.Animproveddatastreamsummary:Thecount-minsketchanditsapplications.JournalofAlgorithms55(1),pages58 75,2005.[10]G.Cormode,S.Muthukrishnan,andW.Zhuang.What sdifferent:Distributed,continuousmonitoringofduplicateresilientaggregatesondatastreams.InIEEEICDE,2006.[11]C.Cranor,T.Johnson,O.Spatscheck,andV.Shkapenyuk.Gigascope:Astreamdatabasefornetworkapplications.InProceedingsofACMSIGMOD,pages647 651,2003.[12]A.Das,S.Ganguly,M.Garofalakis,andR.Rastogi.Distributedset-expressioncardinalityestimation.InProceedingsofVLDB,2004.[13]C.EstanandG.Varghese.Newdirectionsintraf cmeasurementandaccounting.InProceedingsofACMSIGCOMM,volume32,4ofComputerCommunicationReview,pages323 338,2002.[14]Ahmadet.al.Distributedoperationintheborealisstreamprocessingengine.InProceedingsofACMSIGMOD,2005.[15]A.Jain,J.Hellerstein,S.Ratnasamy,andD.Wetherall.Awakeupcallforinternetmonitoringsystems:Thecasefordistributedtriggers.InProceedingsofHotnets,2004.[16]N.Jain,P.Yalagandula,M.Dahlin,andY.Zhang.INSIGHT:Adistributedmonitoringsystemfortrackingcontinuousqueries.InWork-in-progresssessionatACMSOSP,2005.[17]A.Manjhi,S.Nath,andP.Gibbons.Tributariesanddeltas:Ef cientandrobustaggregationinsensornetworkstreams.InProceedingsofACMSIGMOD,2005.[18]G.S.MankuandR.Motwani.Approximatefrequencycountsoverdatastreams.InProceedingsofVLDB,pages346 357,2002.[19]S.Nath,P.B.Gibbons,S.Seshan,andZ.R.Anderson.Synopsisdiffusionforrobustaggrgationinsensornetworks.InACMSenSys,2004.[20]NationalLaboratoryforAppliedNetworkResearch.http://www.nlanr.net/.[21]C.Olston,J.Jiang,andJ.Widom.Adaptive ltersforcontinuousqueriesoverdistributeddatastreams.InProceedingsofACMSIGMOD,2003.[22]Stanfordstreamdatamanager.http://www-db.stanford.edu/stream/sqr.[23]M.SullivanandA.Heybey.Asystemformanaginglargedatabasesofnetworktraf c.InProceedingsofUSENIX,1998.[24]S.Zdonik,M.Stonebraker,M.Cherniack,andU.Cetintemel.TheAuroraandMedusaprojects.BulletinoftheTechnicalCommitteeonDataEngineering,pages3 10,March2003.APPENDIXA. PROOFOFCONVEXITYOFKNTHEOREM5.KN=mlog(1+ h) log(1 )log(1+ )isacon-vexfunctionin intherange (0,1)forsmallvaluesof .PROOF.Weprovethatkisconvexin byshowingthatthesecondderivativeke0,wherek=KNm=log(1+ h) log(1 )log(1+ )=log(1+ h) log(1 ) Sinceh=NT 1andN,T>0,h ( 1, )orh+1>0.Also, , (0,1),andassumingthat isverysmall,weapproximateln(1+ )as .Differentiatingk,wegetthe rstandsecondderivatesas k=h1+h +11 k(3) k=1(1 )2 h2(1+h )2 2 k(4)=1(1 )2 h2(1+h )2 2 h1+h +11 k (5)UsingLemma6,ln(1+h ) ln(1 )=ln 1+h 1 e2 (h+1)2+h Notethat1+h 1 >1,sinceh+1, >0.Thus,ke1 2 (h+1)2+h =2 h+12+h SubstitutingthisisinEquation(5),weget ke(h+1)(1+2h h)(1 )2(1+h )2 2 h+1(1+h )(1 ) 2(h+1)2+h =(h+1)(1+2h h)2(1 )2(1+h )2(2+h )e0Sinceke0,kisconvexin .LEMMA6.ln(x)e2 x 1x+1 ,whenx>1PROOF.UsingTaylor sseries,wegetln(1x)= ln(x)=`1x 1 12`1x 1 2+13`1x 1 3 ...Soln(x) =`1 1x +12`1 1x 2+13`1 1x 3+...e`1 1x +12`1 1x 2+14`1 1x 3+...=`1 1x 1+1 1/x2+ 1 1/x2 2+... =`1 1x 11 1 1/x2 =2 1 1/x1+1/x = 2 x 1x+1

You must have an account to access this white paper. Please register below. If you already have an account, please login.

Already registered?

Login

Forgot password?

New customer?

White paper download

ComputerworldUK Webcast

ComputerworldUK
Share
x
Open
* *