Mirroring Puppet Forge with Pulp
Introduction
One of the biggest changes in Pulp v2 is the ability to manage non-RPM content types. The first use case we had in mind for this is to inventory Puppet modules and serve them from the Pulp server. This includes both uploading modules into the Pulp server as well as selectively downloading and keeping up to date modules served at Puppet Forge.
Keep in mind that this is an early preview of code I’m still working on and is subject to changes in the coming weeks.
Repository Creation & Configuration
Pulp organizes content into repositories. The granularity of a repository is up to the user. A typical usage is to stage content from a live stream (such as Puppet Forge), to a testing environment, ultimately migrating the tested modules into a production usage. This paradigm can be modeled using separate Pulp repositories and copying the modules between them as appropriate.
Pulp will use the modules.json file located at Puppet Forge to determine which modules should be downloaded. Technically speaking, there is nothing that limits Pulp to using Puppet Forge. Any host can be used so long as the modules.json format is supported. Additionally, modules.json offers a limited query syntax for scoping which modules are included in its metadata. Pulp uses this query syntax to limit a repository to contain only certain modules.
For existing Pulp users, the pulp-admin client is undergoing some changes to support both RPM and Puppet content (and any further additions we support in the future). All of the Puppet repository related commands are located under the puppet
section in the root of the client.
$ pulp-admin puppet repo Usage: pulp-admin repo [SUB_SECTION, ..] COMMAND Description: repository lifecycle commands Available Sections: copy - copy modules from one repository into another group - repository group lifecycle commands publish - run, schedule, or view the status of publish tasks remove - remove copied or uploaded modules from a repository sync - run, schedule, or view the status of sync tasks uploads - upload modules into a repository Available Commands: create - creates a new repository delete - deletes a repository list - lists repositories on the Pulp server search - searches for Puppet repositories on the server update - changes metadata on an existing repository |
Creating a repository is done, not surprisingly, through the create
command. This command requires a ID to be specified to uniquely identify the repository in Pulp. While all other configuration is optional, when mirroring Puppet Forge there are two options in particular of interest:
The feed
option is used to indicate the URL of the host from which to download modules. This must refer to the location in which the modules.json file resides. For Puppet Forge, this is simply http://forge.puppetlabs.com
.
The query
argument is used to limit which modules are downloaded into the repository. It may be specified multiple times if a single query is not enough to fully describe all of the desired modules.
For example, the following command will create a repository that will download Apache and MySQL related modules:
$ pulp-admin puppet repo create --repo-id blog-repo --feed http://forge.puppetlabs.com/ --query httpd --query mysql Successfully created repository [blog-repo] |
There are two options related to the serving of Puppet modules from the Pulp server itself. The serve-http
and serve-https
options are used to indicate which of these protocols the Pulp server will make the repository available over. These may be enabled individually or both may be active at once (technically, none can be active in the event a repository should contain modules for use in copying to other repositories but never itself be exposed). By default, Puppet repositories will be served over HTTP but not HTTPS.
The list of repositories is retrieved using the list
command. A more advanced repository search is supported but I won’t cover it here.
$ pulp-admin puppet repo list +----------------------------------------------------------------------+ Repositories +----------------------------------------------------------------------+ Id: blog-repo Display Name: blog-repo Description: None Content Unit Count: 0 |
The repository reflects that there are zero content units (where “unit” is the generic Pulp term for a piece of content it manages) because the modules have not been downloaded yet. This is done through the Pulp “sync” process.
Sync and Publish
A repository sync operation is the process through which the external feed for a repository is contacted to check for changes to its content. On the initial sync, all modules (matching any queries if specified) will be downloaded to the Pulp server and inventoried into its database. On subsequent sync operations, only new modules and new versions of existing modules will be downloaded. Additionally, any modules that were once present in the feed but have been removed will be removed from the Pulp repository as well.
The publish process is the opposite. When Pulp publishes a Puppet repository, it takes the current modules in the repository and serves them over HTTP and/or HTTPS from the Pulp server. This includes modules added to a repository from a sync operation as well as those uploaded by a user or copied from another Pulp repository.
By default, this publish operation happens automatically on the tail end of a successful sync operation. Users may trigger a publish explicitly in cases where only local changes to a repository have been made (upload, copy) or if there is no external feed configured for the repository.
A sync can be run immediately using the sync run
command. By default, the command will continue to poll the Pulp server and display the progress as it synchronizes and publishes the repository. This output can be skipped using the --bg
command or simply by pressing ctrl+c; the sync will continue on the Pulp server.
Below is a sample output from the sync command. Remember that the repository was configured with two queries (httpd and mysql) and is set to only publish the modules over HTTP.
$ pulp-admin puppet repo sync run --repo-id blog-repo +----------------------------------------------------------------------+ Synchronizing Repository [blog-repo] +----------------------------------------------------------------------+ This command may be exited by pressing ctrl+c without affecting the actual operation on the server. Downloading metadata... [==================================================] 100% Metadata Query: 2/2 items ... completed Downloading new modules... [==================================================] 100% Module: 20/20 items ... completed Publishing modules... [==================================================] 100% Module: 20/20 items ... completed Generating repository metadata... [-] ... completed Publishing repository over HTTP... ... completed Publishing repository over HTTPS... ... skipped |
Each repository has a separate location under /pulp/puppet
. Within that directory, the published repository will use the same format as Puppet Forge. Modules are located under a directory with the first letter of the author’s name and then futher under the author’s name. The module name will be–.tar.gz. Additionally, Pulp will generate a modules.json file in the root of the repository, allowing systems that use this for parsing to use a Pulp server instead of Puppet Forge. The modules.json file will be customized to include only modules in the repository when it was published.
Below are some screenshots of navigating the published repository. Firefox hides the “http” portion, but the port 80 line shows the serve-http flag was honored.
Digging into the system and releases directories shows the breakdown by the first character of each author name:
Finally, digging all the way down the hierarchy reveals the modules themselves:
Conclusion
This article only covered the basic concepts for mirroring Puppet Forge: repository creation, configuration, and synchronization/publishing. Pulp offers a number of other features in the areas of searching for modules, uploading user-defined modules, and copying modules between repositories. Future work will involve registering a puppet master with the Pulp server and using Pulp to control the deployment of modules to the puppet master.
This functionality is not yet available in a released version of Pulp (to be honest, I just finished writing the bulk of it in the past few days). However, there are options for playing with this functionality before it makes it into a formal Community Release. See the repository definition files in our repositories for information on how to enable the weekly testing builds or Community Release candidate builds. This early in the feature development, all input is welcome and will likely be incorporated.
For more information, see the Pulp mailing lists or ping me in our chat room (jdob in #pulp on Freenode).