|
|
PlanetLab Roadmap
This roadmap describes four broad development efforts that are currently underway:
- Federation
- PlanetLab Backbone
- PlanetLab OS and Xen
- Node Manager
1. Federation
As outlined in the PlanetLab Architecture Document, PlanetLab is designed to support decentralized management. This is accomplished in two ways. The first, unbundled management, decouples the distributed services used to manage PlanetLab from the core of the system. There are already several infrastructure services running on PlanetLab. The second is through federation, whereby different organizations or geographic regions retain autonomous control over their nodes. We are currently in the process of enabling federation, which will have the tangible outcome of spreading responsibility for PlanetLab across multiple PLC-like entities. The roadmap identifies the stages of this process. It assumes the reader is familiar with the PlanetLab Architecture Document.
- Stage 0 [Done]
-
Create and distribute custom BootCDs that enable alternative boot servers.
- Stage 1 [Done]
-
Enable the installation and management of private PlanetLab instances.
- Stage 2 [Q1 2006]
-
Standardize and document multiple slice namespace scheme.
-
Standardize and document interfaces:
- SA user interfaces (slice namespace creation and management)
- MA user interfaces (user/node creation and management)
- SCS to SA interfaces (actual slice instance creation and management)
- NodeManager/SCS interfaces
-
Update implementation of SA and MA to use above documented interfaces.
-
Update NodeManager/SCS to support:
- Communication to multiple SA via above SCS to SA interfaces
- Authenticating SAs
-
Enable localization of content, such as user guides and announcements.
-
Stage 3 [Q2 2006]
-
Federate private PlanetLab instances by including the PlanetLab Central namespace in their SA.
Developers: Aaron Klingaman (lead), Marc Fiuczynski, Steve Muir.
Collaborators: Aaron Harwood (Melbourne), Timur Friedman (UPMC), Thierry Parmentelat (Inria), He Tao (Tsinghua), Aki Nakao (Tokyo), Danny Bickson (HUJI), Jeff Sedayo (Intel), Yoichi Shinoda (JAIST).
2. PlanetLab Backbone
To better support research on new routing protocols and network architectures, we plan to allow users to build virtual ISPs (vISP) consisting of PlanetLab nodes (as routers), dedicated capacity between certain pairs of nodes (as links) and direct connections to the rest of the Internet (as access links, with BGP sessions to one or more providers). Initially, this effort will leverage Internet2: we will connect the PlanetLab nodes at the Abilene PoPs with MPLS tunnels, and connect one or more of these Abilene sites to commercial ISPs via MAE-East and MAE-West. In addition to this general facility, we will also build a reference vISP implementation, called the PlanetLab Backbone (PLB), and make it available for use by any slice running on PlanetLab. This project involves four parallel efforts.
- Kernel:
-
Change the PlanetLab OS to give slices access to pseudo-devices that are logically connected to the virtual ISP [To be deployed in v3.2, described below.]
- Connectivity:
-
Form a backbone topology with dedicated bandwidth by connecting PlanetLab nodes at Abilene PoPs with 1Gbps MPLS tunnels.
-
Establish connectivity to the commercial Internet via MAE-East and MAE-West.
-
Obtain address space for our vISPs.
- Control Plane:
-
Deploy a routing daemon that can handle BGP sessions with commercial ISPs.
-
Develop a routing implementation that can handle PLB traffic.
-
Develop a routing platform that can be used for experimentation.
- Data Plane:
-
Develop an API and interface for slices to access PLB: PPTP, OpenVPN and local sockets. [Done, to be deployed in v3.2]
-
Deploy a statically routed demonstration of PLB that uses existing BGP sites as egress points.
-
Determine if the current demo-quality implementation of PLB using Click and TUN/TAP, can scale to support the incentive model of PLB.
Developers: Mark Huang, Andy Bavier.
Collaborators: Nick Femster (Princeton), Jen Rexford (Princeton), Rick Summerhill (Internet2).
3. PlanetLab OS and Xen
We will continue to develop and enhance the PlanetLab OS, with the goal of supporting both vserver- and Xen-based slices. An integral part of this effort will be to improve our ability to remotely manage PlanetLab nodes. This effort will proceed in four stages:
-
Stage 1: Version 3.2 [Q4 2005]
The first stage entails moving to a kernel based on Fedora Core's 1.1398 release for Linux 2.6.12 (or above), vserver 2.0, and a custom CPU scheduler that supports both fair share and guarantees.
During this upgrade, two existing pieces of software will be deprecated: CKRM and resman. CKRM support has been dropped completely from the kernel as the CPU scheduler no longer suits our needs and no other useful/stable CKRM resource controllers have materialized. The resman utility was intended to serve as a thin veneer to hide resource management details from other software. It turns out to be slow and more of a burden than a service to its primary user: the Node Manager (NM).
Tasks:
- Kernel:
-
Upgrade from 2.6.10 to 2.6.12, cleanse source tree from CKRM modifications. [Done]
-
Upgrade from vserver 1.9.3 to 2.0, upgrade util-vserver to latest release. [Done]
-
Migrate resman software to python-based implemention, improve integration with NM. [Done]
- Scheduler:
-
New Scheduler code checked into 2.6.12 kernel tree. [Done]
-
Run on alpha nodes. [Done]
-
Scheduler evaluation on uniprocessor. [Done]
-
Write press release for PlanetLab user community, detailing need for new scheduler, presenting evaluation results.
- VNET:
-
Overlay socket / Tunnel support. [Done]
-
No bandwidth limits. [Done]
- Node Manager:
-
Root resource allocation. [Done]
-
Separate PL_conf into a slice. [Done]
-
rspec defnition. [Done]
Developers: Steve Muir, Marc Fiuczynski, Mark Huang, Andy Bavier.
-
Stage 2: PLK on Xen (PLXD) [Q4 2005]
Once we upgrade to the 2.6.12 kernel, we will be able to run PLK on top of Xen. This allows sites to run PlanetLab in a Xen domain without the need to dedicate any physical hardware. This is especially useful for sites that are willing to contribute unused cluster capacity to PlanetLab.
Tasks:
-
Port vserver support to Xen arch subsystem. [Done]
-
Validate kexec support works when using a xenU-based PlanetLab kernel.
-
Automate task of creating a PLXD filesystem image and make alpha release for Xen-3.0-Devel.
Developers: Marc Fiuczynski (lead), Mark Huang, Aaron Klingaman.
Collaborators: David Becker (Duke), Ian Foster (Chicago/Argonne), Amin Vahdat & Stefan Savage (UCSD).
-
Stage 3: Xen Management Domain (XRM) [Q4 2005]
The next step is to use Xen for better Remote Management. Currently there are a large number of sites that do not have network-addressable PCUs for PlanetLab. When the PlanetLab Kernel (PLK) hangs on these systems we need to request that someone manually power cycle the node. Our experience is that for some sites this can take days and sometimes weeks to happen. The goal is to have 100% PCU style coverage across PlanetLab by running a reboot service in domain 0 and PLK in domain 1. Moreover, XRM it will make it possible to smooth the roll out of PLK on.
Developers: Marc Fiuczynski (lead), Mark Huang.
-
Stage 4: Allocate Xen Slivers (Xlice) [Q1 2006]
The ultimate goal is to allow PlanetLab to support slices running in their own Xen domains. This capability will co-exist with vserver-based slices.
Tasks:
-
Port NM to support Xen domains as slices. [Done. Deploy with v3.2]
-
Make PlanetFlow work with separate Xen slices.
-
Support all user expected services for Xen-based slices (ssh, etc).
-
Figure out how to distribute images to PlanetLab nodes.
Developers: Steve Muir (lead), Stephen Soltesz.
Collaborators: Kevin Lai (HP), Mic Bowman (Intel), Rob Knauerhase (Intel).
4. Node Manager
The PlanetLab node manager needs to be extended in a number of ways in order to support the other tasks described above. There are other changes that need to be made in the future in order to conform more closely to the proposed architecture, and also to expose new various features.
- Stage 1: Centralised Suspend/Restart of Slices [Q4 2005]
The node manager currently provides a node-local mechanism for suspending and restarting slices, but this needs to be extended into the PLC agent so that operations can be invoked remotely. There will also be support for centrally i.e., at PLC, placing a slice into the suspended state and having that change be applied on all nodes.
- Stage 2: Support for Multiple PLC Agent Slice Pools [Q4 2005]
The PLC agent uses a slice pool allocated as part of each node's root resource allocations in order to create new slices. As part of our support for federation it will be necessary for the PLC agent to be able to use multiple slice pools, one for each slice authority.
- Stage 3: Expose CPU Guarantees to Resource Brokers [Q1 2006]
The new CPU scheduler deployed as part of the 3.2 rollout includes the ability to allocate a guaranteed fraction of the CPU to some slices. This feature needs to be exposed to the PLC agent and existing resource brokerage services e.g., Sirius.
- Stage 4: Integrate Proper into Node Manager [Q1 2006]
Proper currently exists as a standalone service using its own access control list and RPC mechanism. It should be integrated into Node Manager such that rcaps and rspecs are used to provide privileged operations to the appropriate slices.
Developer: Steve Muir.
|