�������� � P2P has lots of advantages ���������������������� � You know the list ���������������������������� � But, challenges to widespread (lasting) acceptance � Security, efficiency, QoS, xacts, etc Neil Daswani, Hector Garcia-Molina, � Old distributed systems techniques don’t Beverly Yang apply to the scale and nature of P2P systems � This paper looks at search and security ������������� ���������������������������� �������� ������� ������ � Not an exhaustive survey � Assume “pure” p2p � Other applications besides data sharing � Their definition of “hybrid” is the Napster example � Other issues besides search and security � Challenges � Other issues within search and security � Scale � Based on work within the Stanford Peers � Unreliability Group
�������������������������������� ����������������������������������� �������� � Topology � Expressiveness � How peers connect to each other � How powerful is the query language? � autonomy vs. efficiency � Comprehensiveness � Data placement � All results vs top K vs single � Both data and metadata � Autonomy � Message routing � Peers may want to only connect to trusted peers � How queries are propagated � Can utilize both topology and data placement �������������������������������������� �������������� � Key Lookup � Efficiency � DHTs � Bandwidth + processing + storage + … � Keyword � Quality of Service (QoS) � Can DHTs handle this? � User perceived qualities � Ranked Keyword � Want to do ranking in the network if top K is less than total � Robustness results � Above good during churn � Aggregates � Want to do this in the network as well � SQL � PIER and PeerDB
����������������������� ����������������������� � Decoupling autonomy and efficiency is a large challenge � With less autonomy, can bound the lookup cost (Chord) � By designating some nodes more equal than others, there are � By imposing rigid requirements on the some nodes guaranteed to have the answer (super-peers) � Replication increases the chance of finding the answer on a system, it becomes hard to maintain random node � Skipnet makes progress by allowing the user to tune the autonomy vs. efficiency tradeoff ��� �������� � Different metrics: � Challenging because of the nature of P2P systems � Open � Number of results � Autonomous � Response time � Have to assume a hostile environment � Relevance (precision and recall) � Address: � Application specific � Availability � Example: Gnutella � File authenticity � Tradeoff between # results and cost � Anonymity � Access control � Directed BFS and concept clustering address this � Want to prevent, detect, manage, and recover from � What is the best technique to optimize this attacks tradeoff?
�������������� ��������������� � Malicious nodes create Byzantine failures � Each node should be able to accept � Current approaches are unpopular because of complexity messages as well as offer services to the and overhead network � Also assumes complete and secure communication � DoS Attack between nodes � How to deal with general node failures? � Chosen-victim attack in Gnutella � Being addressed by DHTs � A node directs all search queries it gets to a victim node � Other issues: � Adversaries take advantage of loose protocols � Need to prevent amplification and back-door � Malicious query/storage flooding access � File availability No mention of Oceanstore, etc � ����������������� ���������� � Good for: � What is the definition of authenticity? � “Borrowing” music � Different than integrity � Censorship resistance � Solved with checksums/signatures � Freedom of speech � Oldest Document: the first submitted � Privacy protection � Expert-based: A single expert deems a document authentic � Voting-based: majority of expert opinions determine authenticity � Reputation-based: weigh votes of some experts more
������������ ������������������� �������������� � For anonymity, should not be able to � Utility limited if there is restrictions on data- determine which node an object in stored at sharing, but some level is needed for legality Vs. For efficiency, should be able to determine � Endpoint vs P2P network enforcement exactly which node is responsible for an object � Onion routing/crowds address anonymity through forwarding � Still have problems if nodes collude ������������������ � What are the most pressing issues for P2P to become widely acceptable? � P2P vs centralized? � Structured vs unstructured? � Hybrid vs pure P2P? � Where will P2P make an impact? � …
Recommend
More recommend