blob: 6ef6f70be22281919bd98e85775651c3f80fb37a [file] [log] [blame] [edit]
//! A `Source` for registry-based packages.
//! # What's a Registry?
//! [Registries] are central locations where packages can be uploaded to,
//! discovered, and searched for. The purpose of a registry is to have a
//! location that serves as permanent storage for versions of a crate over time.
//! Compared to git sources (see [`GitSource`]), a registry provides many
//! packages as well as many versions simultaneously. Git sources can also
//! have commits deleted through rebasings where registries cannot have their
//! versions deleted.
//! In Cargo, [`RegistryData`] is an abstraction over each kind of actual
//! registry, and [`RegistrySource`] connects those implementations to
//! [`Source`] trait. Two prominent features these abstractions provide are
//! * A way to query the metadata of a package from a registry. The metadata
//! comes from the index.
//! * A way to download package contents (a.k.a source files) that are required
//! when building the package itself.
//! We'll cover each functionality later.
//! [Registries]:
//! [`GitSource`]: super::GitSource
//! # Different Kinds of Registries
//! Cargo provides multiple kinds of registries. Each of them serves the index
//! and package contents in a slightly different way. Namely,
//! * [`LocalRegistry`] --- Serves the index and package contents entirely on
//! a local filesystem.
//! * [`RemoteRegistry`] --- Serves the index ahead of time from a Git
//! repository, and package contents are downloaded as needed.
//! * [`HttpRegistry`] --- Serves both the index and package contents on demand
//! over a HTTP-based registry API. This is the default starting from 1.70.0.
//! Each registry has its own [`RegistryData`] implementation, and can be
//! created from either [`RegistrySource::local`] or [`RegistrySource::remote`].
//! [`LocalRegistry`]: local::LocalRegistry
//! [`RemoteRegistry`]: remote::RemoteRegistry
//! [`HttpRegistry`]: http_remote::HttpRegistry
//! # The Index of a Registry
//! One of the major difficulties with a registry is that hosting so many
//! packages may quickly run into performance problems when dealing with
//! dependency graphs. It's infeasible for cargo to download the entire contents
//! of the registry just to resolve one package's dependencies, for example. As
//! a result, cargo needs some efficient method of querying what packages are
//! available on a registry, what versions are available, and what the
//! dependencies for each version is.
//! To solve the problem, a registry must provide an index of package metadata.
//! The index of a registry is essentially an easily query-able version of the
//! registry's database for a list of versions of a package as well as a list
//! of dependencies for each version. The exact format of the index is
//! described later.
//! See the [`index`] module for topics about the management, parsing, caching,
//! and versioning for the on-disk index.
//! ## The Format of The Index
//! The index is a store for the list of versions for all packages known, so its
//! format on disk is optimized slightly to ensure that `ls registry` doesn't
//! produce a list of all packages ever known. The index also wants to ensure
//! that there's not a million files which may actually end up hitting
//! filesystem limits at some point. To this end, a few decisions were made
//! about the format of the registry:
//! 1. Each crate will have one file corresponding to it. Each version for a
//! crate will just be a line in this file (see [`IndexPackage`] for its
//! representation).
//! 2. There will be two tiers of directories for crate names, under which
//! crates corresponding to those tiers will be located.
//! (See [`cargo_util::registry::make_dep_path`] for the implementation of
//! this layout hierarchy.)
//! As an example, this is an example hierarchy of an index:
//! ```notrust
//! .
//! ├── 3
//! │   └── u
//! │   └── url
//! ├── bz
//! │   └── ip
//! │   └── bzip2
//! ├── config.json
//! ├── en
//! │   └── co
//! │   └── encoding
//! └── li
//!    ├── bg
//!    │   └── libgit2
//!    └── nk
//!    └── link-config
//! ```
//! The root of the index contains a `config.json` file with a few entries
//! corresponding to the registry (see [`RegistryConfig`] below).
//! Otherwise, there are three numbered directories (1, 2, 3) for crates with
//! names 1, 2, and 3 characters in length. The 1/2 directories simply have the
//! crate files underneath them, while the 3 directory is sharded by the first
//! letter of the crate name.
//! Otherwise the top-level directory contains many two-letter directory names,
//! each of which has many sub-folders with two letters. At the end of all these
//! are the actual crate files themselves.
//! The purpose of this layout is to hopefully cut down on `ls` sizes as well as
//! efficient lookup based on the crate name itself.
//! See [The Cargo Book: Registry Index][registry-index] for the public
//! interface on the index format.
//! [registry-index]:
//! ## The Index Files
//! Each file in the index is the history of one crate over time. Each line in
//! the file corresponds to one version of a crate, stored in JSON format (see
//! the [`IndexPackage`] structure).
//! As new versions are published, new lines are appended to this file. **The
//! only modifications to this file that should happen over time are yanks of a
//! particular version.**
//! # Downloading Packages
//! The purpose of the index was to provide an efficient method to resolve the
//! dependency graph for a package. After resolution has been performed, we need
//! to download the contents of packages so we can read the full manifest and
//! build the source code.
//! To accomplish this, [`RegistryData::download`] will "make" an HTTP request
//! per-package requested to download tarballs into a local cache. These
//! tarballs will then be unpacked into a destination folder.
//! Note that because versions uploaded to the registry are frozen forever that
//! the HTTP download and unpacking can all be skipped if the version has
//! already been downloaded and unpacked. This caching allows us to only
//! download a package when absolutely necessary.
//! # Filesystem Hierarchy
//! Overall, the `$HOME/.cargo` looks like this when talking about the registry
//! (remote registries, specifically):
//! ```notrust
//! # A folder under which all registry metadata is hosted (similar to
//! # $HOME/.cargo/git)
//! $HOME/.cargo/registry/
//! # For each registry that cargo knows about (keyed by hostname + hash)
//! # there is a folder which is the checked out version of the index for
//! # the registry in this location. Note that this is done so cargo can
//! # support multiple registries simultaneously
//! index/
//! registry1-<hash>/
//! registry2-<hash>/
//! ...
//! # This folder is a cache for all downloaded tarballs (`.crate` file)
//! # from a registry. Once downloaded and verified, a tarball never changes.
//! cache/
//! registry1-<hash>/<pkg>-<version>.crate
//! ...
//! # Location in which all tarballs are unpacked. Each tarball is known to
//! # be frozen after downloading, so transitively this folder is also
//! # frozen once its unpacked (it's never unpacked again)
//! # CAVEAT: They are not read-only. See rust-lang/cargo#9455.
//! src/
//! registry1-<hash>/<pkg>-<version>/...
//! ...
//! ```
//! [`IndexPackage`]: index::IndexPackage
use std::collections::HashSet;
use std::fs;
use std::fs::{File, OpenOptions};
use std::io;
use std::io::Read;
use std::io::Write;
use std::path::{Path, PathBuf};
use std::task::{ready, Poll};
use anyhow::Context as _;
use cargo_util::paths::{self, exclude_from_backups_and_indexing};
use flate2::read::GzDecoder;
use serde::Deserialize;
use serde::Serialize;
use tar::Archive;
use tracing::debug;
use crate::core::dependency::Dependency;
use crate::core::global_cache_tracker;
use crate::core::{Package, PackageId, SourceId};
use crate::sources::source::MaybePackage;
use crate::sources::source::QueryKind;
use crate::sources::source::Source;
use crate::sources::PathSource;
use crate::util::cache_lock::CacheLockMode;
use crate::util::interning::InternedString;
use crate::util::network::PollExt;
use crate::util::{hex, VersionExt};
use crate::util::{restricted_names, CargoResult, Filesystem, GlobalContext, LimitErrorReader};
/// The `.cargo-ok` file is used to track if the source is already unpacked.
/// See [`RegistrySource::unpack_package`] for more.
/// Not to be confused with `.cargo-ok` file in git sources.
const PACKAGE_SOURCE_LOCK: &str = ".cargo-ok";
pub const CRATES_IO_INDEX: &str = "";
pub const CRATES_IO_HTTP_INDEX: &str = "sparse+";
pub const CRATES_IO_REGISTRY: &str = "crates-io";
pub const CRATES_IO_DOMAIN: &str = "";
/// The content inside `.cargo-ok`.
/// See [`RegistrySource::unpack_package`] for more.
#[derive(Deserialize, Serialize)]
#[serde(rename_all = "kebab-case")]
struct LockMetadata {
/// The version of `.cargo-ok` file
v: u32,
/// A [`Source`] implementation for a local or a remote registry.
/// This contains common functionality that is shared between each registry
/// kind, with the registry-specific logic implemented as part of the
/// [`RegistryData`] trait referenced via the `ops` field.
/// For general concepts of registries, see the [module-level documentation](crate::sources::registry).
pub struct RegistrySource<'gctx> {
/// A unique name of the source (typically used as the directory name
/// where its cached content is stored).
name: InternedString,
/// The unique identifier of this source.
source_id: SourceId,
/// The path where crate files are extracted (`$CARGO_HOME/registry/src/$REG-HASH`).
src_path: Filesystem,
/// Local reference to [`GlobalContext`] for convenience.
gctx: &'gctx GlobalContext,
/// Abstraction for interfacing to the different registry kinds.
ops: Box<dyn RegistryData + 'gctx>,
/// Interface for managing the on-disk index.
index: index::RegistryIndex<'gctx>,
/// A set of packages that should be allowed to be used, even if they are
/// yanked.
/// This is populated from the entries in `Cargo.lock` to ensure that
/// `cargo update somepkg` won't unlock yanked entries in `Cargo.lock`.
/// Otherwise, the resolver would think that those entries no longer
/// exist, and it would trigger updates to unrelated packages.
yanked_whitelist: HashSet<PackageId>,
/// Yanked versions that have already been selected during queries.
/// As of this writing, this is for not emitting the `--precise <yanked>`
/// warning twice, with the assumption of (`dep.package_name()` + `--precise`
/// version) being sufficient to uniquely identify the same query result.
selected_precise_yanked: HashSet<(InternedString, semver::Version)>,
/// The [`config.json`] file stored in the index.
/// The config file may look like:
/// ```json
/// {
/// "dl": "{crate}/{version}/download",
/// "api": "",
/// "auth-required": false # unstable feature (RFC 3139)
/// }
/// ```
/// [`config.json`]:
#[derive(Deserialize, Debug, Clone)]
#[serde(rename_all = "kebab-case")]
pub struct RegistryConfig {
/// Download endpoint for all crates.
/// The string is a template which will generate the download URL for the
/// tarball of a specific version of a crate. The substrings `{crate}` and
/// `{version}` will be replaced with the crate's name and version
/// respectively. The substring `{prefix}` will be replaced with the
/// crate's prefix directory name, and the substring `{lowerprefix}` will
/// be replaced with the crate's prefix directory name converted to
/// lowercase. The substring `{sha256-checksum}` will be replaced with the
/// crate's sha256 checksum.
/// For backwards compatibility, if the string does not contain any
/// markers (`{crate}`, `{version}`, `{prefix}`, or `{lowerprefix}`), it
/// will be extended with `/{crate}/{version}/download` to
/// support registries like which were created before the
/// templating setup was created.
/// For more on the template of the download URL, see [Index Configuration](
pub dl: String,
/// API endpoint for the registry. This is what's actually hit to perform
/// operations like yanks, owner modifications, publish new crates, etc.
/// If this is None, the registry does not support API commands.
pub api: Option<String>,
/// Whether all operations require authentication. See [RFC 3139].
/// [RFC 3139]:
pub auth_required: bool,
/// Result from loading data from a registry.
pub enum LoadResponse {
/// The cache is valid. The cached data should be used.
/// The cache is out of date. Returned data should be used.
Data {
raw_data: Vec<u8>,
/// Version of this data to determine whether it is out of date.
index_version: Option<String>,
/// The requested crate was found.
/// An abstract interface to handle both a local and remote registry.
/// This allows [`RegistrySource`] to abstractly handle each registry kind.
/// For general concepts of registries, see the [module-level documentation](crate::sources::registry).
pub trait RegistryData {
/// Performs initialization for the registry.
/// This should be safe to call multiple times, the implementation is
/// expected to not do any work if it is already prepared.
fn prepare(&self) -> CargoResult<()>;
/// Returns the path to the index.
/// Note that different registries store the index in different formats
/// (remote = git, http & local = files).
fn index_path(&self) -> &Filesystem;
/// Loads the JSON for a specific named package from the index.
/// * `root` is the root path to the index.
/// * `path` is the relative path to the package to load (like `ca/rg/cargo`).
/// * `index_version` is the version of the requested crate data currently
/// in cache. This is useful for checking if a local cache is outdated.
fn load(
&mut self,
root: &Path,
path: &Path,
index_version: Option<&str>,
) -> Poll<CargoResult<LoadResponse>>;
/// Loads the `config.json` file and returns it.
/// Local registries don't have a config, and return `None`.
fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>>;
/// Invalidates locally cached data.
fn invalidate_cache(&mut self);
/// If quiet, the source should not display any progress or status messages.
fn set_quiet(&mut self, quiet: bool);
/// Is the local cached data up-to-date?
fn is_updated(&self) -> bool;
/// Prepare to start downloading a `.crate` file.
/// Despite the name, this doesn't actually download anything. If the
/// `.crate` is already downloaded, then it returns [`MaybeLock::Ready`].
/// If it hasn't been downloaded, then it returns [`MaybeLock::Download`]
/// which contains the URL to download. The [`crate::core::package::Downloads`]
/// system handles the actual download process. After downloading, it
/// calls [`Self::finish_download`] to save the downloaded file.
/// `checksum` is currently only used by local registries to verify the
/// file contents (because local registries never actually download
/// anything). Remote registries will validate the checksum in
/// `finish_download`. For already downloaded `.crate` files, it does not
/// validate the checksum, assuming the filesystem does not suffer from
/// corruption or manipulation.
fn download(&mut self, pkg: PackageId, checksum: &str) -> CargoResult<MaybeLock>;
/// Finish a download by saving a `.crate` file to disk.
/// After [`crate::core::package::Downloads`] has finished a download,
/// it will call this to save the `.crate` file. This is only relevant
/// for remote registries. This should validate the checksum and save
/// the given data to the on-disk cache.
/// Returns a [`File`] handle to the `.crate` file, positioned at the start.
fn finish_download(&mut self, pkg: PackageId, checksum: &str, data: &[u8])
-> CargoResult<File>;
/// Returns whether or not the `.crate` file is already downloaded.
fn is_crate_downloaded(&self, _pkg: PackageId) -> bool {
/// Validates that the global package cache lock is held.
/// Given the [`Filesystem`], this will make sure that the package cache
/// lock is held. If not, it will panic. See
/// [`GlobalContext::acquire_package_cache_lock`] for acquiring the global lock.
/// Returns the [`Path`] to the [`Filesystem`].
fn assert_index_locked<'a>(&self, path: &'a Filesystem) -> &'a Path;
/// Block until all outstanding Poll::Pending requests are Poll::Ready.
fn block_until_ready(&mut self) -> CargoResult<()>;
/// The status of [`RegistryData::download`] which indicates if a `.crate`
/// file has already been downloaded, or if not then the URL to download.
pub enum MaybeLock {
/// The `.crate` file is already downloaded. [`File`] is a handle to the
/// opened `.crate` file on the filesystem.
/// The `.crate` file is not downloaded, here's the URL to download it from.
/// `descriptor` is just a text string to display to the user of what is
/// being downloaded.
Download {
url: String,
descriptor: String,
authorization: Option<String>,
mod download;
mod http_remote;
mod index;
pub use index::IndexSummary;
mod local;
mod remote;
/// Generates a unique name for [`SourceId`] to have a unique path to put their
/// index files.
fn short_name(id: SourceId, is_shallow: bool) -> String {
// CAUTION: This should not change between versions. If you change how
// this is computed, it will orphan previously cached data, forcing the
// cache to be rebuilt and potentially wasting significant disk space. If
// you change it, be cautious of the impact. See `test_cratesio_hash` for
// a similar discussion.
let hash = hex::short_hash(&id);
let ident = id.url().host_str().unwrap_or("").to_string();
let mut name = format!("{}-{}", ident, hash);
if is_shallow {
impl<'gctx> RegistrySource<'gctx> {
/// Creates a [`Source`] of a "remote" registry.
/// It could be either an HTTP-based [`http_remote::HttpRegistry`] or
/// a Git-based [`remote::RemoteRegistry`].
/// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
pub fn remote(
source_id: SourceId,
yanked_whitelist: &HashSet<PackageId>,
gctx: &'gctx GlobalContext,
) -> CargoResult<RegistrySource<'gctx>> {
let name = short_name(
.map_or(false, |features| features.shallow_index)
&& !source_id.is_sparse(),
let ops = if source_id.is_sparse() {
Box::new(http_remote::HttpRegistry::new(source_id, gctx, &name)?) as Box<_>
} else {
Box::new(remote::RemoteRegistry::new(source_id, gctx, &name)) as Box<_>
/// Creates a [`Source`] of a local registry, with [`local::LocalRegistry`] under the hood.
/// * `path` --- The root path of a local registry on the file system.
/// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
pub fn local(
source_id: SourceId,
path: &Path,
yanked_whitelist: &HashSet<PackageId>,
gctx: &'gctx GlobalContext,
) -> RegistrySource<'gctx> {
let name = short_name(source_id, false);
let ops = local::LocalRegistry::new(path, gctx, &name);
RegistrySource::new(source_id, gctx, &name, Box::new(ops), yanked_whitelist)
/// Creates a source of a registry. This is a inner helper function.
/// * `name` --- Name of a path segment which may affect where `.crate`
/// tarballs, the registry index and cache are stored. Expect to be unique.
/// * `ops` --- The underlying [`RegistryData`] type.
/// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked.
fn new(
source_id: SourceId,
gctx: &'gctx GlobalContext,
name: &str,
ops: Box<dyn RegistryData + 'gctx>,
yanked_whitelist: &HashSet<PackageId>,
) -> RegistrySource<'gctx> {
RegistrySource {
name: name.into(),
src_path: gctx.registry_source_path().join(name),
index: index::RegistryIndex::new(source_id, ops.index_path(), gctx),
yanked_whitelist: yanked_whitelist.clone(),
selected_precise_yanked: HashSet::new(),
/// Decode the [configuration](RegistryConfig) stored within the registry.
/// This requires that the index has been at least checked out.
pub fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>> {
/// Unpacks a downloaded package into a location where it's ready to be
/// compiled.
/// No action is taken if the source looks like it's already unpacked.
/// # History of interruption detection with `.cargo-ok` file
/// Cargo has always included a `.cargo-ok` file ([`PACKAGE_SOURCE_LOCK`])
/// to detect if extraction was interrupted, but it was originally empty.
/// In 1.34, Cargo was changed to create the `.cargo-ok` file before it
/// started extraction to implement fine-grained locking. After it was
/// finished extracting, it wrote two bytes to indicate it was complete.
/// It would use the length check to detect if it was possibly interrupted.
/// In 1.36, Cargo changed to not use fine-grained locking, and instead used
/// a global lock. The use of `.cargo-ok` was no longer needed for locking
/// purposes, but was kept to detect when extraction was interrupted.
/// In 1.49, Cargo changed to not create the `.cargo-ok` file before it
/// started extraction to deal with `.crate` files that inexplicably had
/// a `.cargo-ok` file in them.
/// In 1.64, Cargo changed to detect `.crate` files with `.cargo-ok` files
/// in them in response to [CVE-2022-36113], which dealt with malicious
/// `.crate` files making `.cargo-ok` a symlink causing cargo to write "ok"
/// to any arbitrary file on the filesystem it has permission to.
/// In 1.71, `.cargo-ok` changed to contain a JSON `{ v: 1 }` to indicate
/// the version of it. A failure of parsing will result in a heavy-hammer
/// approach that unpacks the `.crate` file again. This is in response to a
/// security issue that the unpacking didn't respect umask on Unix systems.
/// This is all a long-winded way of explaining the circumstances that might
/// cause a directory to contain a `.cargo-ok` file that is empty or
/// otherwise corrupted. Either this was extracted by a version of Rust
/// before 1.34, in which case everything should be fine. However, an empty
/// file created by versions 1.36 to 1.49 indicates that the extraction was
/// interrupted and that we need to start again.
/// Another possibility is that the filesystem is simply corrupted, in
/// which case deleting the directory might be the safe thing to do. That
/// is probably unlikely, though.
/// To be safe, we deletes the directory and starts over again if an empty
/// `.cargo-ok` file is found.
/// [CVE-2022-36113]:
fn unpack_package(&self, pkg: PackageId, tarball: &File) -> CargoResult<PathBuf> {
let package_dir = format!("{}-{}",, pkg.version());
let dst = self.src_path.join(&package_dir);
let path = dst.join(PACKAGE_SOURCE_LOCK);
let path = self
.assert_package_cache_locked(CacheLockMode::DownloadExclusive, &path);
let unpack_dir = path.parent().unwrap();
match fs::read_to_string(path) {
Ok(ok) => match serde_json::from_str::<LockMetadata>(&ok) {
Ok(lock_meta) if lock_meta.v == 1 => {
.mark_registry_src_used(global_cache_tracker::RegistrySrc {
package_dir: package_dir.into(),
size: None,
return Ok(unpack_dir.to_path_buf());
_ => {
if ok == "ok" {
tracing::debug!("old `ok` content found, clearing cache");
} else {
tracing::warn!("unrecognized .cargo-ok content, clearing cache: {ok}");
// See comment of `unpack_package` about why removing all stuff.
Err(e) if e.kind() == io::ErrorKind::NotFound => {}
Err(e) => anyhow::bail!("unable to read .cargo-ok file at {path:?}: {e}"),
let mut tar = {
let size_limit = max_unpack_size(self.gctx, tarball.metadata()?.len());
let gz = GzDecoder::new(tarball);
let gz = LimitErrorReader::new(gz, size_limit);
let mut tar = Archive::new(gz);
set_mask(&mut tar);
let mut bytes_written = 0;
let prefix = unpack_dir.file_name().unwrap();
let parent = unpack_dir.parent().unwrap();
for entry in tar.entries()? {
let mut entry = entry.with_context(|| "failed to iterate over archive")?;
let entry_path = entry
.with_context(|| "failed to read entry path")?
// We're going to unpack this tarball into the global source
// directory, but we want to make sure that it doesn't accidentally
// (or maliciously) overwrite source code from other crates. Cargo
// itself should never generate a tarball that hits this error, and
// should also block uploads with these sorts of tarballs,
// but be extra sure by adding a check here as well.
if !entry_path.starts_with(prefix) {
"invalid tarball downloaded, contains \
a file at {:?} which isn't under {:?}",
// Prevent unpacking the lockfile from the crate itself.
if entry_path
.map_or(false, |p| p == PACKAGE_SOURCE_LOCK)
// Unpacking failed
bytes_written += entry.size();
let mut result = entry.unpack_in(parent).map_err(anyhow::Error::from);
if cfg!(windows) && restricted_names::is_windows_reserved_path(&entry_path) {
result = result.with_context(|| {
"`{}` appears to contain a reserved Windows path, \
it cannot be extracted on Windows",
.with_context(|| format!("failed to unpack entry at `{}`", entry_path.display()))?;
// Now that we've finished unpacking, create and write to the lock file to indicate that
// unpacking was successful.
let mut ok = OpenOptions::new()
.with_context(|| format!("failed to open `{}`", path.display()))?;
let lock_meta = LockMetadata { v: 1 };
write!(ok, "{}", serde_json::to_string(&lock_meta).unwrap())?;
.mark_registry_src_used(global_cache_tracker::RegistrySrc {
package_dir: package_dir.into(),
size: Some(bytes_written),
/// Turns the downloaded `.crate` tarball file into a [`Package`].
/// This unconditionally sets checksum for the returned package, so it
/// should only be called after doing integrity check. That is to say,
/// you need to call either [`RegistryData::download`] or
/// [`RegistryData::finish_download`] before calling this method.
fn get_pkg(&mut self, package: PackageId, path: &File) -> CargoResult<Package> {
let path = self
.unpack_package(package, path)
.with_context(|| format!("failed to unpack package `{}`", package))?;
let mut src = PathSource::new(&path, self.source_id, self.gctx);
let mut pkg = match {
MaybePackage::Ready(pkg) => pkg,
MaybePackage::Download { .. } => unreachable!(),
// After we've loaded the package configure its summary's `checksum`
// field with the checksum we know for this `PackageId`.
let cksum = self
.hash(package, &mut *self.ops)
.expect("a downloaded dep now pending!?")
.expect("summary not found");
impl<'gctx> Source for RegistrySource<'gctx> {
fn query(
&mut self,
dep: &Dependency,
kind: QueryKind,
f: &mut dyn FnMut(IndexSummary),
) -> Poll<CargoResult<()>> {
let mut req = dep.version_req().clone();
// Handle `cargo update --precise` here.
if let Some((_, requested)) = self
.filter(|(c, to)| {
if to.is_prerelease() && self.gctx.cli_unstable().unstable_options {
} else {
let mut called = false;
let callback = &mut |s| {
called = true;
// If this is a locked dependency, then it came from a lock file and in
// theory the registry is known to contain this version. If, however, we
// come back with no summaries, then our registry may need to be
// updated, so we fall back to performing a lazy update.
if kind == QueryKind::Exact && req.is_locked() && !self.ops.is_updated() {
debug!("attempting query without update");
.query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| {
if dep.matches(s.as_summary()) {
// We are looking for a package from a lock file so we do not care about yank
if called {
} else {
debug!("falling back to an update");
} else {
let mut precise_yanked_in_use = false;
.query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| {
let matched = match kind {
QueryKind::Exact => {
if req.is_precise() && self.gctx.cli_unstable().unstable_options {
} else {
QueryKind::Alternatives => true,
QueryKind::Normalized => true,
if !matched {
// Next filter out all yanked packages. Some yanked packages may
// leak through if they're in a whitelist (aka if they were
// previously in `Cargo.lock`
if !s.is_yanked() {
} else if self.yanked_whitelist.contains(&s.package_id()) {
} else if req.is_precise() {
precise_yanked_in_use = true;
if precise_yanked_in_use {
let name = dep.package_name();
let version = req
.expect("--precise <yanked-version> in use");
if self.selected_precise_yanked.insert((name, version.clone())) {
let mut shell =;
"selected package `{name}@{version}` was yanked by the author"
shell.note("if possible, try a compatible non-yanked version")?;
if called {
return Poll::Ready(Ok(()));
let mut any_pending = false;
if kind == QueryKind::Alternatives || kind == QueryKind::Normalized {
// Attempt to handle misspellings by searching for a chain of related
// names to the original name. The resolver will later
// reject any candidates that have the wrong name, and with this it'll
// along the way produce helpful "did you mean?" suggestions.
// For now we only try the canonical lysing `-` to `_` and vice versa.
// More advanced fuzzy searching become in the future.
for name_permutation in [
dep.package_name().replace('-', "_"),
dep.package_name().replace('_', "-"),
] {
let name_permutation = InternedString::new(&name_permutation);
if name_permutation == dep.package_name() {
any_pending |= self
.query_inner(name_permutation, &req, &mut *self.ops, f)?
if any_pending {
} else {
fn supports_checksums(&self) -> bool {
fn requires_precise(&self) -> bool {
fn source_id(&self) -> SourceId {
fn invalidate_cache(&mut self) {
fn set_quiet(&mut self, quiet: bool) {
fn download(&mut self, package: PackageId) -> CargoResult<MaybePackage> {
let hash = loop {
match self.index.hash(package, &mut *self.ops)? {
Poll::Pending => self.block_until_ready()?,
Poll::Ready(hash) => break hash,
match, hash)? {
MaybeLock::Ready(file) => self.get_pkg(package, &file).map(MaybePackage::Ready),
MaybeLock::Download {
} => Ok(MaybePackage::Download {
fn finish_download(&mut self, package: PackageId, data: Vec<u8>) -> CargoResult<Package> {
let hash = loop {
match self.index.hash(package, &mut *self.ops)? {
Poll::Pending => self.block_until_ready()?,
Poll::Ready(hash) => break hash,
let file = self.ops.finish_download(package, hash, &data)?;
self.get_pkg(package, &file)
fn fingerprint(&self, pkg: &Package) -> CargoResult<String> {
fn describe(&self) -> String {
fn add_to_yanked_whitelist(&mut self, pkgs: &[PackageId]) {
fn is_yanked(&mut self, pkg: PackageId) -> Poll<CargoResult<bool>> {
self.index.is_yanked(pkg, &mut *self.ops)
fn block_until_ready(&mut self) -> CargoResult<()> {
// Before starting to work on the registry, make sure that
// `<cargo_home>/registry` is marked as excluded from indexing and
// backups. Older versions of Cargo didn't do this, so we do it here
// regardless of whether `<cargo_home>` exists.
// This does not use `create_dir_all_excluded_from_backups_atomic` for
// the same reason: we want to exclude it even if the directory already
// exists.
// IO errors in creating and marking it are ignored, e.g. in case we're on a
// read-only filesystem.
let registry_base = self.gctx.registry_base_path();
let _ = registry_base.create_dir();
impl RegistryConfig {
/// File name of [`RegistryConfig`].
const NAME: &'static str = "config.json";
/// Get the maximum unpack size that Cargo permits
/// based on a given `size` of your compressed file.
/// Returns the larger one between `size * max compression ratio`
/// and a fixed max unpacked size.
/// In reality, the compression ratio usually falls in the range of 2:1 to 10:1.
/// We choose 20:1 to cover almost all possible cases hopefully.
/// Any ratio higher than this is considered as a zip bomb.
/// In the future we might want to introduce a configurable size.
/// Some of the real world data from common compression algorithms:
/// * <>
/// * <>
/// * <>
/// * <>
fn max_unpack_size(gctx: &GlobalContext, size: u64) -> u64 {
const MAX_UNPACK_SIZE: u64 = 512 * 1024 * 1024; // 512 MiB
const MAX_COMPRESSION_RATIO: usize = 20; // 20:1
let max_unpack_size = if cfg!(debug_assertions) && gctx.get_env(SIZE_VAR).is_ok() {
// For integration test only.
.expect("a max unpack size in bytes")
} else {
let max_compression_ratio = if cfg!(debug_assertions) && gctx.get_env(RATIO_VAR).is_ok() {
// For integration test only.
.expect("a max compression ratio in bytes")
} else {
u64::max(max_unpack_size, size * max_compression_ratio as u64)
/// Set the current [`umask`] value for the given tarball. No-op on non-Unix
/// platforms.
/// On Windows, tar only looks at user permissions and tries to set the "read
/// only" attribute, so no-op as well.
/// [`umask`]:
fn set_mask<R: Read>(tar: &mut Archive<R>) {