CN102819664A

CN102819664A - Influence maximization parallel accelerating method based on graphic processing unit

Info

Publication number: CN102819664A
Application number: CN2012102487323A
Authority: CN
Inventors: 李姗姗; 廖湘科; 刘晓东; 吴庆波; 戴华东; 彭绍亮; 王蕾; 付松龄; 鲁晓佩; 郑思
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2012-07-18
Filing date: 2012-07-18
Publication date: 2012-12-12
Anticipated expiration: 2032-07-18
Also published as: CN102819664B

Abstract

The invention discloses an influence maximization parallel accelerating method based on a graphic processing unit. The purpose of the invention is to provide the influence maximization parallel accelerating method based on the GPU (graphic processing unit). Algorithm implementation is accelerated and the implementation time is shortened by parallel calculating ability of the GPU. The influence maximization parallel accelerating method is characterized by comprising the following steps: in each Monte Carlo simulation, firstly, finding out strong connectivity in a network diagram, merging all nodes in the same strong connectivity into a node, wherein the weight is the sum of the weights of all nodes in the strong connectivity; then calculating an influence value of each node in parallel by a strategy of traversing upwards from the bottom; using different threads by the GPU calculation cores to calculate in a parallel way the influence values of different nodes with the help of the parallel calculation capability of the GPU, and obtaining the K most influential nodes. According to the invention, a pattern is converted into a directed acyclic graph; the calculation quantity of an influence value can be obviously reduced, meanwhile, the overall operation time is shortened by scheduling parallel calculation of each node in the calculation core of the GPU to the maximal extent.

Description

The parallel accelerated method of a kind of having the greatest impact based on GPU

Technical field

The present invention relates to having the greatest impact of community network way to solve the problem in the mass data excavation applications, especially pointer excavates a kind of parallel accelerated method based on GPU GPU of proposition to the mass user of large scale community network.

Background technology

The fast development of Web2.0 technology has promoted the flourish of social medium.All kinds of social network sites continue to bring out, website user's quantity growth such as for example external Facebook, Twitter and domestic everybody net, Sina's microblogging very rapidly, any active ues of current Facebook has surpassed 8.5 hundred million.Social network sites is not only people and is linked up and the bridge that exchanges, the important media that has also become information to propagate and spread simultaneously.Research shows that 68% client can buy the suggestion of inquiring its household, friend before the product.Viral marketing (Viral Marketing) has utilized the principle that public praise is propagated between the user just, carries out network multiple level marketing methods such as brand promotion.And along with the maintaining sustained and rapid growth of community network user, viral marketing has become a kind of ten minutes high-efficiency information circulation way.

Having the greatest impact problem is the classical problem of propagating about influence in the social network analysis.Imagine following scene: a company will carry out new product and promote; It promotes strategy: select K name client free trial new product; Utilize this K name client that the propaganda of product is promoted afterwards and influence propagation attraction more customers purchase new product, thereby reach the optimum purpose of interests.Having the greatest impact problem can formalized description be: for community network figure G=(V, E, W), V={v wherein ₀, v ₁..., v _N-1Be node set, the node number is n among the V; E is the set of the directed edge between the node among the node set V, promptly

The bar number of directed edge is m among the E; W is the set of node weights among the G, has characterized the influence power (initial value is set at 1, promptly only can influence node self) of each node.Given network chart G and initially enliven the node number K in the node set, having the greatest impact problem is from node set V, to select K best node as initially enlivening node set S, transmits through influence, the final scope maximum that makes the influence diffusion.How having the greatest impact the very corn of a subject is the most influential K name member in the fixer network, i.e. leader of opinion in the network, thus make that through viral marketing the number of users that finally is affected is maximum.The research of having the greatest impact problem not only has crucial realistic meaning to the marketing, also there is crucial application aspects such as public sentiment early warning, epidemic situation discovery simultaneously.Since Pedro Domingos and Matt Richardson proposed having the greatest impact problem in the article Mining the network value ofcustomers that calendar year 2001 ACM SIGKDD meeting is announced after, this problem had received more and more researchers' concern.People such as David Kempe have proved that having the greatest impact problem is under the jurisdiction of the NP-Hard problem in the article Maximizing tte Spread of Influence through a Social Network that ACMSIGKDD meeting in 2003 is announced, and have proposed a kind of greedy algorithm of climbing the mountain and obtain approximate optimal solution.Greedy algorithm can reach the best approximation (e be natural logarithm at the bottom of) of 1-1/e though climb the mountain; But because David Kempe employing Monte Carlo simulation (for example 20000 times) repeatedly calculates the influence value of each node; Therefore need to consume the plenty of time, and can't be extended in the large-scale network.

A lot of researchists are devoted to design the efficiency that new method solves having the greatest impact.Key problem in the greedy algorithm of climbing the mountain is that repeatedly Monte Carlo simulation is to calculate the influence value of all nodes.In order to address this problem; Among the article Cost-effective Outbreak Detection in Networks that people such as Jure Leskovec announce in ACM SIGKDD2007 according to the half module characteristics design that influences spread function new optimization method CELF; Can reduce the calculated amount of Monte Carlo simulation largely, thereby reduce computing time.Afterwards, people such as Wei Chen announce article Efficient Influence Maximization in Social Networks in ACM SIGKDD2009, proposed the greedy algorithm MixGreedy of present optimum in the article.The improvement of this algorithm is that be that all nodes calculate influence values in the network when each Monte Carlo simulation, thereby has further reduced the complexity of algorithm.MixGreedy has integrated the CELF algorithm simultaneously, greatly reduces algorithm execution time.Yet because having the greatest impact computation complexity is very high, even optimum at present MixGreedy algorithm is still very consuming time when handling large scale community network; For example from 37154 community network nodes, select 50 users the most influential just to need more than 2 hours.Therefore, how from the large scale community network mass user the most influential user of fast mining become problem demanding prompt solution.

On the other hand, (Graphics Processing Unit, the architecture of multinuclear multithreading high bandwidth GPU) makes GPU have superpower computation capability to GPU, is widely used in the general-purpose computations.Many graph-theoretical algorithms, for example breadth-first search, minimum spanning tree etc. can utilize the parallel ability of GPU to quicken to carry out.How to make full use of the computation capability of GPU, excavate the concurrent execution potentiality of having the greatest impact problem, designing based on the parallel accelerated method of the having the greatest impact of GPU architecture is the feasible program that solves having the greatest impact problem in the large scale community network.

In sum; The efficiency of having the greatest impact problem is the problem of extensive concern in the social network analysis; Present computing method can't reasonably accurately oriented the most powerful user in the time, and have very poor extensibility, can't be applicable to large scale community network.Therefore, research is efficient and have having the greatest impact of good extendability and dissolve certainly that method is the technical matters that those skilled in the art very pay close attention to.The method that does not have open source literature to relate in existing the having the greatest impact Study on Problems to utilize the computation capability of GPU to reduce working time.

Summary of the invention

The technical matters that the present invention will solve is: to the having the greatest impact problem in the community network; A kind of novel having the greatest impact parallel method based on GPU is proposed; But the abundant computation capability of excavating the parallel section in the greedy algorithm and utilizing GPU is to reach the purpose that accelerating algorithm is carried out, reduced the execution time.

In order to solve the problems of the technologies described above; Technical scheme of the present invention is: in each Monte Carlo simulation; At first find the strong connected component in the network chart; Because the influence value of each node is identical in the same strong connected component, so all nodes in the same strong connected component are merged into a node, its weight is each node weights sum in this strong connected component; Adopt the strategy of bottom-up traversal then, the influence value of each node of parallel computation.Utilize the computation capability of GPU, adopt separately thread to different nodes parallel computation influence value by each GPU computation core.Through farthest dispatching the parallel computation in the computation core of GPU of each node, reduce the overall operation time.

Concrete technical scheme is:

The first step: having the greatest impact of initialization node set S is empty.

Second step: set current Monte Carlo simulation times N um=0.

The 3rd step: the Monte Carlo simulation methods of people in the article Efficient Influence Maximization in Social Networks that ACM SIGKDD2009 announces such as employing Wei Chen are selected the limit to figure, obtain figure G '.

The 4th step: seek the strong connected component among the figure G '.In digraph, if two node v _eAnd v _fBetween both had one from v _eTo v _fDirected walk, simultaneously have one again from v _fTo v _eDirected walk, then claim v _eAnd v _fThe strong connection.If per two nodes all are communicated with by force in the digraph, then this figure is a strongly connected graph.Adopt Robert Tarian to equal the Tarjan algorithm that proposes among the article Depth-first search and linear graph algorithm of SIAM Journal on Computing magazine announcement in 1972, seek all strong connected component SCC among the figure G ' based on depth-first search _i, the i value is from 0 to j-1, and j is the number of the strong connected component among the figure G '.

The 5th step: according to each strong connected component SCC of figure G ' _i, will scheme G ' and change directed acyclic graph G into ^*, method is:

5.1: initialization i=0.

5.2: with strong connected component SCC _iUse new node v _N+iReplace, wherein n is the node number among the figure G '.

Concrete grammar is:

5.2.1: for strong connected component SCC _i, newly-increased node v _N+iNode v _N+iThe limit set of going into be changed to SCC _iIn all nodes go into limit union of sets collection, go out limit set and be SCC _iIn all nodes go out limit union of sets collection, weight is each node weights sum in this strong connected component.

5.2.2: with strong connected component SCC _iIn all nodes go into limit set and go out the limit set to put sky, weight zero setting.Method is:

5.2.2.1: initialization integer variable l is 0.

5.2.2.2: for strong connected component SCC _iMiddle node v _l, with node v _lGo into limit set and go out the limit set to be changed to empty set

Weight is changed to 0.

5.2.2.3：l＝l+1。If l<n _i, n wherein _iBe strong connected component SCC _iThe node number, then change 5.2.2.2.If l>=n _i, change 5.3.

5.3：i＝i+1。If < j changes 5.2 to i.If i>=j explains that then all strong connected components are all replaced by new node, scheme G this moment and change for directed acyclic graph G ^*, carried out for the 6th step.

The 6th step: from out-degree is that 0 node begins bottom-up traversal directed acyclic graph G ^*In all nodes, utilize GPU to calculate the influence value of all nodes.Concrete grammar is:

6.1: the definition of variable and initialization.Method is:

6.1.1: use boolean array Visited [] to write down each node and whether visited Visited [v _p] equal true and represent node v _pVisited Visited [v _p] equal fajse and represent node v _pDo not visited, wherein 0≤p≤n-1.Array Visited [] all is initialized as false, representes that all nodes are not all visited;

6.1.2: use integer array Count [] to write down the child node number that each node has been visited, wherein 0≤Count [v _x]≤outdegree [v _x], 0≤x≤n-1, outdegree [v _x] be node v _xOut-degree.Array Count [] all is initialized as 0, and expression is not all visited;

6.1.3: use integer array Inf [] writes down the influence value of each node, wherein 0≤Inf [v _x]≤n, 0≤x≤n-1.Array Inf [] all is initialized as 0;

6.1.4: use character string array Label [] writes down the label of each node, label Label [v _x] mark node v _xThe position that possibly overlap with other nodes, wherein node v _aAnd node v _bBe overlapped in node v _cAnd if only if from node v _aAnd v _bAll exist at least one path can reach node v _c, 0≤a, b, c≤n-1.Array Label [] all is initialized as NULL.

6.1.5: whether use Boolean variable Stop record thread to calculate and accomplish, Stop, Stop equals false and representes not accomplish if equaling true and representing that all node influence values calculating are accomplished in this time simulation.Stop is a global variable, and all GPU threads all can be revised its content.Initialization Stop is false.

6.2: if stop to indicate Stop is false, explains that this time simulating influence value calculating remains unfulfilled, and then changes 6.3 and uses the GPU multi-threaded parallels to calculate; If Stop is true, the influence value of all nodes had all calculated and has finished during then explanation was this time simulated, and changeed for the 7th step.

6.3:GPU adopt the executive mode of single instruction stream multiple data stream, with the influence value of the mode computing node of multi-threaded parallel; The mode of multi-threaded parallel is meant: GPU distributes a thread computes influence value for each node, and GPU is once calculated the influence value of y node by y thread parallel, and y is the stream handle number (because of GPU model difference stream handle number difference) among the GPU; After the influence value of the current y of a GPU node calculated completion, if also have the node influence value not calculate, then GPU calculated the influence value of residue node with the mode of multi-threaded parallel through the GPU thread scheduling; Influence value calculating until all nodes finishes; GPU adopts the executive mode of single instruction stream multiple data stream, the shared same instruction fetching component of all stream handles in the same stream handle unit, and instruction is emission in order; There is not branch prediction; Be that different threads is carried out same instruction, but handle different pieces of information, thereby reach parallel computation.The thread computes node v of GPU _pThe method of influence value is:

6.3.1: will stop to indicate that Stop is changed to true.

6.3.2: if Visited is [v _p] equal false, carry out 6.3.3; Otherwise node v is described _pVisited, changeed 6.2.

6.3.3: if Count is [v _p] equal node v _pOut-degree, node v then is described _pAll child nodes all visited, carry out 6.3.4 computing node v _pInfluence value; Otherwise node v is described _pChild node in still have untreated node, will stop then indicating that Stop is changed to false, change 6.2.

6.3.4: computing node v _pThe summation sum of all child node influence values,

Out [v wherein _p] be node v _pThe set of all child nodes.

6.3.5: computing node v _pLabel Label (v _p).Node v _pLabel label (v _p) equal node v _pAll child nodes to v _pThe union of contribution, promptly

Con (v wherein _q) be child node v _qTo node v _pContribution.Child node v _qTo node v _pContribution be meant: if child node v _qIn-degree greater than 1, then overlappingly possibly betide v _q, this moment v _qTo node v _pContribution be node v _qSelf, i.e. Con (v _q)=v _qIf child node v _qIn-degree smaller or equal to 1, overlappingly can not betide v _q, this moment v _qContribution be node v _qLabel, i.e. Con (v _q)=Label (v _q).

6.3.6: computing node v _pSet out [the v of all child nodes _p] overlapping influence value Overlap (out [v _p]), method is:

6.3.6.1: initialization Overlap (out [v _p]) be 0, the overlapping scope set of initialization Range is out [v _p].

6.3.6.2: for arbitrary node v _a∈ Range is if exist node v _b∈ Range and v _b≠ v _a, and from node v _aExist the path can reach node v _b, this moment the overlapping node v that occurs in _bSo, Overlap (out [v _p])=Overlap (out [v _p]+Inf [v _b], simultaneously with v _bFrom Range, delete, i.e. Range=Range-v _b

6.3.6.3: using the crowded item among character string array Extra [] the record Range, initially is Overlap (out [v with Extra [] _p])=Overlap (out [v _p])+(Overlap (Filter)-Overlap (Extra)).

6.3.7: computing node v _pInfluence value Inf [v _p], Inf [v _p]=sum+weight (v _p)-overlap (out [v _p]), weight (v wherein _p) be node v _pWeight.Totallnf [v _p]=Totallnf [v _p]+Inf [v _p], TotalInf [v wherein _p] be R Monte Carlo simulation node v _pTotal influence value.R always simulates number of times, generally is set at 20000.

6.3.8: if node v _pNo father node then changes 6.3.9; Otherwise, for node v _pAny father node v _s, it has been visited child node number Count [v _s] add 1, i.e. Count [v _s]=Count [v _s]+1, and will stop to indicate that Stop is changed to false.

6.3.9: with node v _pBe labeled as and visited, be i.e. Visited [v _p]=true changes 6.2;

The 7th step: um adds 1 with the Monte Carlo simulation times N.Whether judge Num less than R, if < R changeed for the 3rd step to Num, otherwise carried out for the 8th step.

The 8th step: all nodes among the pair set V-S, select the maximum node v of TotallnF [] to join in the S set.

The 9th step: if the node number of S set | S|＜K, changeed for second step, otherwise selected K the node the most influential of explanation finishes.

Compared with prior art, adopt the present invention can reach following beneficial effect:

1. the strong connected component in the present invention the 4th step calculating chart; Because the influence value of all nodes is identical in the strong connected component; Therefore in the 5th step, each strong connected component is replaced with individual node, thereby figure is converted into directed acyclic graph, can significantly reduce the calculated amount of influence value.

2. the present invention adopted bottom-up traversal method to calculate influence value for each node in the 6th step.Because the influence value of father node directly depends on the influence value of its all child nodes, therefore can only travel through the influence value that can obtain all nodes with a full figure through method, reduced assorted degree.

3. the present invention has fully excavated the concurrency of former greedy algorithm and the computation capability of GPU, especially calculates for each node distributes a GPU thread.Utilize the executed in parallel between the GPU thread, reduced the program implementation time significantly, thereby can handle, be with good expansibility more massive community network.

Description of drawings

Fig. 1 is optimum greedy algorithm MixGreedy process flow diagram;

Fig. 2 is an overview flow chart of the present invention.

Embodiment

Fig. 1 is optimum greedy algorithm MixGreedy process flow diagram.

The first step: initialization node set S is empty.

Second step: set current Monte Carlo simulation times N um=0.

The 3rd step: adopt the Monte Carlo simulation method that figure is selected the limit, obtain figure G '.

The 4th step:, calculate the influence value of each node for each node carries out breadth-first search.

The 5th step: um adds 1 with the Monte Carlo simulation times N.Whether judge Num less than R,, otherwise carried out for the 6th step if Num＜R then changeed for the 3rd step.

The 6th step: select the node v of TotallnF [] influence value maximum among the set V-S to join in the S set.

The 7th step: if the node number of S set | S|＜K, then changeed for second step, otherwise selected K the node the most influential of explanation, EOP (end of program) is withdrawed from.

Fig. 2 is an overview flow chart of the present invention.

The first step: initialization node set S is empty.

Second step: set current Monte Carlo simulation times N um=0.

The 4th step: seek the strong connected component among the figure G '.

The 5th step: will scheme G ' according to each strong connected component and change directed acyclic graph G into ^*

The 6th step: from out-degree is that 0 node begins bottom-up traversal directed acyclic graph G ^*In all nodes, utilize the influence value of all nodes of GPU different threads parallel computation

The 7th step: um adds 1 with the Monte Carlo simulation times N.Whether judge Num less than R,, otherwise carried out for the 8th step if Num＜R then changeed for the 3rd step.

The 8th step: select the node v of TotallnF [] influence value maximum among the set V-S to join in the S set.

The 9th step: if the node number of S set | S|＜K, then changeed for second step, otherwise selected K the node the most influential of explanation, EOP (end of program) is withdrawed from.

Claims

1. one kind based on the parallel accelerated method of the having the greatest impact of GPU, may further comprise the steps:

The first step: having the greatest impact of initialization node set S is empty;

Second step: set current Monte Carlo simulation times N um=0;

The 3rd step: adopt the Monte Carlo simulation method that figure is selected the limit, obtain figure G ';

It is characterized in that further comprising the steps of:

The 4th step: adopt the Tarian algorithm, seek all strong connected component SCC among the figure G ' based on depth-first search _i, the i value is from 0 to j-1, and i is the number of the strong connected component among the figure G ';

5.1: initialization i=0;

5.2: with strong connected component SCC _iUse new node v _N+iReplace, wherein n is the node number among the figure G ';

5.3:i=i+1, if < j changes 5.2 to i; If i≤j carried out for the 6th step;

The 6th step: from out-degree is that 0 node begins bottom-up traversal directed acyclic graph G ^*In all nodes, utilize the GPU different threads to calculate the influence value of all nodes, thread number is that the thread of p is responsible for computing node v _pInfluence value, 0≤p≤n-1 wherein, concrete grammar is:

6.1: definition and initializing variable, method is:

6.1.1: use boolean array Visited [] to write down each node and whether visited Visited [v _p] equal true and represent node v _pVisited Visited [v _p] equal false and represent node v _pDo not visited, array Visited [] all is initialized as false, represented that all nodes are not all visited;

6.1.2: use integer array Count [] to write down the child node number that each node has been visited, wherein 0≤Count [v _x]≤outdegree [v _x], 0≤x≤n-1, outdegree [v _x] be node v _xOut-degree; Array Count [] all is initialized as 0, and expression is not all visited;

6.1.3: use integer array Inf [] to write down the influence value of each node, wherein 0≤, Inf [v _x]≤n, 0≤x≤n-1 all is initialized as 0 with array Inf [];

6.1.4: use character string array Label [] writes down the label of each node, label Label [v _x] mark node v _xThe position that possibly overlap with other nodes, wherein node v _aAnd node v _bBe overlapped in node v _cAnd if only if from node v _aAnd v _bAll exist at least one path can reach node v _c, 0≤a, b, c≤n-1 all is initialized as NULL with array Label [];

6.1.5: use Boolean variable Stop record thread to calculate and whether accomplish; Stop equals true and representes that all node influence values calculating are accomplished in this time simulation; Stop equals false and representes not accomplish; Stop is a global variable, and all GPU threads all can be revised its content, and initialization Stop is false;

6.2: if stop to indicate Stop is false, changes 6.3; If Stop is true, changeed for the 7th step;

6.3:GPU adopt the executive mode of single instruction stream multiple data stream, with the influence value of the mode computing node of multi-threaded parallel; The mode of multi-threaded parallel is meant: GPU distributes a thread computes influence value for each node; GPU is once calculated the influence value of y node by v thread parallel, y is the stream handle number among the GPU, after the influence value of the current y of a GPU node calculates completion; If also have the node influence value not calculate; Then GPU calculates the influence value of residue node through the GPU thread scheduling with the mode of multi-threaded parallel, calculates until the influence value of all nodes to finish the thread computes node v of GPU _pThe method of influence value is:

6.3.1: will stop to indicate that Stop is changed to true;

6.3.2: if Visited is [v _p] equal false, carry out 6.3.3; Otherwise node v is described _pVisited, changeed 6.2;

6.3.3: if Count is [v _p] equal node v _pOut-degree, node v then is described _pAll child nodes all oneself is visited, carry out 6.3.4 computing node v _pInfluence value; Otherwise node v is described _pChild node in still have untreated node, will stop then indicating that Stop is changed to false, change 6.2;

6.3.4: computing node v _pThe summation sum of all child node influence values,

Out [v wherein _p] be node v _pThe set of all child nodes;

6.3.5: computing node v _pLabel Label (v _p), node v _pLabel label (v _p) equal node v _pAll child nodes to v _pThe union of contribution, promptly

Con (v wherein _q) be child node v _qTo node v _pContribution, child node v _qTo node v _pContribution be meant: if child node v _qIn-degree greater than 1, v then _qTo node v _pContribution be node v _qSelf, i.e. Con (v _q)=v _qIf child node v _qIn-degree smaller or equal to 1, v _qContribution be node v _qLabel, i.e. Con (v _q)=Label (v _q);

6.3.6.1: initialization Overlap (out [v _p]) be O, the overlapping scope set of initialization Range is out [v _p];

6.3.6.2: for arbitrary node v _a∈ Range is if exist node v _b∈ Range and v _b≠ v _a, and from node v _aExist the path can reach node v _b, this moment the overlapping node v that occurs in _bSo, Overlap (out [v _p])=Overlap (out [v _p])+Inf [v _b], simultaneously with v _bFrom Range, delete, i.e. Range=Ranqe-v _b

6.3.6.3: use the crowded item among character string array Extra [] the record Range, Extra [] is initialized as empty set

Use and remove remaining single of crowded item among character string array Filter [] the record Range, Filter [] is initialized as empty set

For arbitrary node v _a∈ range is for arbitrary element u ∈ Label (v _a), if element u has belonged to Filter, then element u is a crowded item, u is added among the Extra and with its influence value Inf [u] join Overlap (out [v _p]), i.e. Extra=Extra ∪ u, Overlap (out [v _p])=Overlap (out [v _p])+Inf [u]; If u does not belong to Filter, then u is joined in the Filter array, i.e. Filter=Filter ∪ u;

6.3.6.4: because still possibly there is repetition in the element in Extra [] and the Filter [] array, so finish node v _pThe eclipse effect value Overlap (out [v of all child nodes _p]) need add the poor of both overlapping values, i.e. Overlap (out [v _p])=Overlap (out [v _p])+(0verlap (Filter)-Overlap (Extra));

6.3.7: computing node v _pInfluence value Inf [v _p], Inf [v _p]=sum+weight (v _p)-overlap (out [v _p]), weight (v wherein _p) be node v _pWeight; Totallnf [v _p]=Totallnf [v _p]+Inf [v _p], Totallnf [v wherein _p] be R Monte Carlo simulation node v _pTotal influence value; R always simulates number of times, and R is a positive integer;

6.3.8: if node v _pNo father node then changes 6.3.8; Otherwise, for node v _pAny father node v _s, it has been visited child node number Count [v _s] add 1, i.e. Count [v _s]=Count [v _s]+1, and will stop to indicate that Stop is changed to false;

The 7th step: um adds 1 with the Monte Carlo simulation times N, whether judges Num less than R, if < R changeed for the 3rd step to Num, otherwise carried out for the 8th step;

The 8th step: all nodes among pair set V-S, select the maximum node v of Totallnf [] to join in the S set;

The 9th step: if the node number of S set | < K changeed for second step to S|, otherwise selected K the node the most influential of explanation finishes.

2. the parallel accelerated method of a kind of having the greatest impact based on GPU as claimed in claim 1 is characterized in that in said the 5th step strong connected component SCC _iUse new node v _N+iThe method that replaces is:

5.2.1: for strong connected component SCC _i, newly-increased node v _N+i, node v _N+iThe limit set of going into be changed to SCC _iIn all nodes go into limit union of sets collection, go out limit set and be SCC _iIn all nodes go out limit union of sets collection, weight is each node weights sum in this strong connected component;

5.2.2: with strong connected component SCC _iIn all nodes go into limit set and go out the limit set to put sky, weight zero setting, method is:

5.2.2.1: initialization integer variable l is 0;

Weight is changed to 0;

5.2.2.3:l=l+1; If l<n _i, n wherein _iBe strong connected component SCC _iThe node number, then change 5.2.2.2; If l>=n _i, finish.

3. the parallel accelerated method of a kind of having the greatest impact based on GPU as claimed in claim 1 is characterized in that said total simulation number of times R is 20000.