Add EXPECTED_UPSTREAM file and corresponding tools

See README.md for the documentation about the file format
and the tool usages.

Bug: 111603149
Test: ojluni_modify_expectation add jdk8u/jdk8u121-b13 java.lang.String
Test: tools/expected_upstream/refresh_files.py
Change-Id: I0d17cfa6424aa095bded08bca49d5b30d677c1c6
diff --git a/EXPECTED_UPSTREAM b/EXPECTED_UPSTREAM
new file mode 100644
index 0000000..69129aa
--- /dev/null
+++ b/EXPECTED_UPSTREAM
@@ -0,0 +1,19 @@
+# Copyright (C) 2021 The Android Open Source Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This table has 3 columns, i.e.
+# <destination path in ojluni>,<upstream release version / git-tag>,<source path in the upstream repository>
+
+ojluni/src/main/java/java/lang/NoSuchMethodException.java,jdk11u/jdk-11+28,src/java.base/share/classes/java/lang/NoSuchMethodException.java
+ojluni/src/main/java/java/lang/NullPointerException.java,jdk11u/jdk-11+28,src/java.base/share/classes/java/lang/NullPointerException.java
diff --git a/tools/expected_upstream/README.md b/tools/expected_upstream/README.md
new file mode 100644
index 0000000..e69bd02
--- /dev/null
+++ b/tools/expected_upstream/README.md
@@ -0,0 +1,130 @@
+This folder contains tools to update the files in the aosp/expected_upstream
+branch.
+
+# Prerequisite
+* python3
+* pip3
+* A remote `aosp` is setup in your local git repository
+
+# Directory Layout
+1. ojluni/
+    * It has the same layout as the ojluni/ files in aosp/master
+    * A file should only exist if aosp/master has the such file path, and the
+    file content comes from the OpenJDK upstream.
+2. EXPECTED_UPSTREAM file
+    * The file format is like .csv file using a `,` separator
+    * The table has 3 columns, i.e.
+        1. Destination path in ojluni/
+        2. Expected upstream version. Normally, it's a git tag in the upstream
+        git repositories.
+        3. File path in the git tree specified in the 2nd column.
+3. tools/expected_upstream/
+    * Contains the tools
+
+# Tools
+## tools/expected_upstream/install_tools.sh
+* Installs the dependency libraries
+* Installs the other tools into your current shell process
+
+## ojluni_modify_expectation
+* Command line tool that can help modify the EXPECTED_UPSTREAM file
+
+## ojluni_refresh_files
+* Reads the EXPECTED_UPSTREAM file and updates the files contents in ojluni/
+accordingly
+
+# Workflow in command lines
+## Setup
+1. Switch to the expected_upstream branch
+```shell
+git branch local_expected_upstream aosp/expected_upstream
+git checkout local_expected_upstream
+```
+
+2. Install tools
+```shell
+source ./tools/expected_upstream/install_tools.sh
+```
+## Upgrade a java class to a higher OpenJDK version
+For example, upgrade `java.lang.String` to 11+28 version:
+
+```shell
+ojluni_modify_expectation modify java.lang.String jdk11u/jdk-11+28
+ojluni_refresh_files
+```
+
+or if `java.lang.String` is missing in EXPECTED_UPSTREAM:
+```shell
+ojluni_modify_expectation add jdk11u/jdk-11+28 java.lang.String
+ojluni_refresh_files
+```
+2 commits should be created to update the `ojluni/src/main/java/java/lang/String.java`.
+You can verify and view the diff by the following command
+
+```shell
+git diff aosp/expected_upstream -- ojluni/src/main/java/java/lang/String.java
+```
+
+You can then upload your change to AOSP gerrit.
+```shell
+repo upload --cbr -t . # -t sets a topic to the CLs in the gerrit
+```
+
+Remember to commit your EXPECTED_UPSTREAM file change into a new commit
+```shell
+git commit -- EXPECTED_UPSTREAM
+```
+
+Then upload your change to AOSP gerrit.
+```shell
+repo upload --cbr -t . # -t sets a topic to the CLs in the gerrit
+```
+
+Then you can switch back to your local `master` branch to apply the changes
+```shell
+git checkout <local_master_branch>
+git merge local_expected_upstream
+# Resolve any merge conflict
+git commit --amend # Amend the commit message and add the bug number you are working on
+repo upload .
+```
+
+## Add a java test from the upstream
+
+The process is similar to the above commands, but needs to run
+`ojluni_modify_expectation` with an `add` subcommand.
+
+For example, add a test for `String.isEmpty()` method:
+```shell
+ojluni_modify_expectation add jdk8u/jdk8u121-b13 java.lang.String.IsEmpty
+```
+Note: java.lang.String.IsEmpty is a test class in the upstream repository.
+
+
+# Known bugs
+* `repo upload` may not succeed because gerrit returns error.
+    1. Just try to run `repo upload` again!
+        * The initial upload takes a long time because it tries to sync with the
+          remote AOSP gerrit server. The second upload is much faster and thus
+          it may succeed.
+    2. `repo upload` returns TimeOutException, but the CL has been uploaded.
+       Just find your CL in http://r.android.com/. See http://b/202848945
+    3. Try to upload the merge commits 1 by 1
+    ```shell
+    git rev-parse HEAD # a sha is printed and you will need it later
+    git reset HEAD~1 # reset to a earlier commit
+    repo upload --cbr . # try to upload it again
+    git reset <the sha printed above>
+    ```
+* After `ojluni_modify_expectation add` and `ojluni_refresh_files`, a `git commit -a`
+  would include more files than just EXPECTED_UPSTREAM, because `git`, e.g. `git status`,
+  isn't aware of changes in the working tree / in the file system. This can lead to
+  an error when checking out the branch that is based on master.
+    1. Do a `git checkout --hard <initial commit before the add>`
+    2. Rerun the `ojluni_modify_expectation add` and `ojluni_refresh_files`
+    3. `git stash && git stash pop`
+    4. Commit the updated EXPECTED_UPSTREAM and proceed
+
+# Report bugs
+* Report bugs if the git repository is corrupt!
+    * Sometimes, you can recover the repository by running `git reset aosp/expected_upstream`
diff --git a/tools/expected_upstream/common_util.py b/tools/expected_upstream/common_util.py
new file mode 100644
index 0000000..49d39c6
--- /dev/null
+++ b/tools/expected_upstream/common_util.py
@@ -0,0 +1,114 @@
+# Copyright 2021 The Android Open Source Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Common Utils."""
+
+
+# pylint: disable=g-importing-member
+from dataclasses import dataclass
+from pathlib import Path
+import sys
+from typing import List
+
+# pylint: disable=g-import-not-at-top
+try:
+  from git import Tree
+except ModuleNotFoundError:
+  print(
+      'ERROR: Please install GitPython by `pip3 install GitPython`.',
+      file=sys.stderr)
+  exit(1)
+
+THIS_DIR = Path(__file__).resolve().parent
+LIBCORE_DIR = THIS_DIR.parent.parent.resolve()
+
+
+@dataclass
+class ExpectedUpstreamEntry:
+  """A map entry in the EXPECTED_UPSTREAM file."""
+  dst_path: str  # destination path
+  git_ref: str  # a git reference to an upstream commit
+  src_path: str  # source path in the commit pointed by the git_ref
+  comment_lines: str = ''  # The comment lines above the entry line
+
+
+class ExpectedUpstreamFile:
+  """A file object representing the EXPECTED_UPSTREAM file."""
+
+  def __init__(self, file_path: str = LIBCORE_DIR / 'EXPECTED_UPSTREAM'):
+    self.path = Path(file_path)
+
+  def read_all_entries(self) -> List[ExpectedUpstreamEntry]:
+    """Read all entries from the file."""
+    result: List[ExpectedUpstreamEntry] = []
+    with self.path.open() as file:
+      comment_lines = ''  # Store the comment lines in the next entry
+      for line in file:
+        stripped = line.strip()
+        # Ignore empty lines and comments starting with '#'
+        if not stripped or stripped.startswith('#'):
+          comment_lines += line
+          continue
+
+        entry = self.parse_line(stripped, comment_lines)
+        result.append(entry)
+        comment_lines = ''
+
+    return result
+
+  def write_all_entries(self, entries: List[ExpectedUpstreamEntry]) -> None:
+    """Write all entries into the file."""
+    with self.path.open('w') as file:
+      for e in entries:
+        file.write(e.comment_lines)
+        file.write(','.join([e.dst_path, e.git_ref, e.src_path]))
+        file.write('\n')
+
+  def write_new_entry(self,
+                      entry: ExpectedUpstreamEntry,
+                      entries: List[ExpectedUpstreamEntry] = None) -> None:
+    if entries is None:
+      entries = self.read_all_entries()
+
+    entries.append(entry)
+    self.sort_and_write_all_entries(entries)
+
+  def sort_and_write_all_entries(self,
+                                 entries: List[ExpectedUpstreamEntry]) -> None:
+    header = entries[0].comment_lines
+    entries[0].comment_lines = ''
+    entries.sort(key=lambda e: e.dst_path)
+    # Keep the header above the first entry
+    entries[0].comment_lines = header + entries[0].comment_lines
+    self.write_all_entries(entries)
+
+  @staticmethod
+  def parse_line(line: str, comment_lines: str) -> ExpectedUpstreamEntry:
+    items = line.split(',')
+    size = len(items)
+    if size != 3:
+      raise ValueError(
+          f"The size must be 3, but is {size}. The line is '{line}'")
+
+    return ExpectedUpstreamEntry(items[0], items[1], items[2], comment_lines)
+
+
+def has_file_in_tree(path: str, tree: Tree) -> bool:
+  """Returns True if the directory / file exists in the tree."""
+  try:
+    # pylint: disable=pointless-statement
+    tree[path]
+    return True
+  except KeyError:
+    return False
diff --git a/tools/expected_upstream/install_tools.sh b/tools/expected_upstream/install_tools.sh
new file mode 100755
index 0000000..7d877c7
--- /dev/null
+++ b/tools/expected_upstream/install_tools.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+# prerequisite to run the script
+pip3 install GitPython
+
+git fetch aosp upstream-openjdk7u
+git fetch aosp upstream-openjdk8u
+git fetch aosp upstream-openjdk9
+git fetch aosp upstream-openjdk11u
+
+THIS_DIR=$(realpath $(dirname $BASH_SOURCE))
+alias ojluni_refresh_files=${THIS_DIR}/ojluni_refresh_files.py
+alias ojluni_modify_expectation=${THIS_DIR}/ojluni_modify_expectation.py
+
+
+_ojluni_modify_expectation ()
+{
+  COMPREPLY=( $(ojluni_modify_expectation --autocomplete $COMP_CWORD ${COMP_WORDS[@]:1}))
+
+  return 0
+}
+
+complete -o nospace -F _ojluni_modify_expectation ojluni_modify_expectation
\ No newline at end of file
diff --git a/tools/expected_upstream/ojluni_modify_expectation.py b/tools/expected_upstream/ojluni_modify_expectation.py
new file mode 100755
index 0000000..07e9451
--- /dev/null
+++ b/tools/expected_upstream/ojluni_modify_expectation.py
@@ -0,0 +1,458 @@
+#!/usr/bin/python3 -B
+
+# Copyright 2021 The Android Open Source Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""ojluni_modify_expectation is a command-line tool for modifying the EXPECTED_UPSTREAM file."""
+
+import argparse
+# pylint: disable=g-importing-member
+from pathlib import PurePath
+import sys
+# pylint: disable=g-multiple-import
+from typing import (
+    Set,
+    Sequence,
+    List,
+)
+
+from common_util import (
+    ExpectedUpstreamEntry,
+    ExpectedUpstreamFile,
+    LIBCORE_DIR,
+    has_file_in_tree,
+)
+
+# Import git only after common_util because common_util will
+# produce informative error
+from git import (Blob, Commit, Repo)
+from gitdb.exc import BadName
+
+LIBCORE_REPO = Repo(LIBCORE_DIR.as_posix())
+
+UPSTREAM_JAVA_BASE_PATHS = [
+    'jdk/src/share/classes/',
+    'src/java.base/share/classes/',
+]
+
+UPSTREAM_TEST_PATHS = [
+    'jdk/test/',
+    'test/jdk/',
+]
+
+UPSTREAM_SEARCH_PATHS = UPSTREAM_JAVA_BASE_PATHS + UPSTREAM_TEST_PATHS
+
+OJLUNI_JAVA_BASE_PATH = 'ojluni/src/main/java/'
+OJLUNI_TEST_PATH = 'ojluni/src/'
+
+AUTOCOMPLETE_TAGS = [
+    'jdk7u/jdk7u40-b60',
+    'jdk8u/jdk8u121-b13',
+    'jdk8u/jdk8u60-b31',
+    'jdk9/jdk-9+181',
+    'jdk11u/jdk-11+28',
+]
+
+
+def error_and_exit(msg: str) -> None:
+  print(f'Error: {msg}', file=sys.stderr)
+  sys.exit(1)
+
+
+def get_commit_or_exit(git_ref: str) -> Commit:
+  try:
+    return LIBCORE_REPO.commit(git_ref)
+  except BadName as e:
+    error_and_exit(f'{e}')
+
+
+def translate_from_class_name_to_ojluni_path(class_or_path: str) -> str:
+  # if it contains '/', then it's a path
+  if '/' in class_or_path:
+    return class_or_path
+
+  base_path = OJLUNI_TEST_PATH if class_or_path.startswith(
+      'test.') else OJLUNI_JAVA_BASE_PATH
+
+  relative_path = class_or_path.replace('.', '/')
+  return f'{base_path}{relative_path}.java'
+
+
+def translate_src_path_to_ojluni_path(src_path: str) -> str:
+  """Returns None if the path can be translated into a ojluni/ path."""
+  relative_path = None
+  for base_path in UPSTREAM_TEST_PATHS:
+    if src_path.startswith(base_path):
+      length = len(base_path)
+      relative_path = src_path[length:]
+      break
+
+  if relative_path:
+    return f'{OJLUNI_TEST_PATH}test/{relative_path}'
+
+  for base_path in UPSTREAM_JAVA_BASE_PATHS:
+    if src_path.startswith(base_path):
+      length = len(base_path)
+      relative_path = src_path[length:]
+      break
+
+  if relative_path:
+    return f'{OJLUNI_JAVA_BASE_PATH}{relative_path}'
+
+  return None
+
+
+def find_src_path_from_class(commit: Commit, class_or_path: str) -> str:
+  """Finds a valid source path given a valid class name or path."""
+  # if it contains '/', then it's a path
+  if '/' in class_or_path:
+    if has_file_in_tree(class_or_path, commit.tree):
+      return class_or_path
+    else:
+      return None
+
+  relative_path = class_or_path.replace('.', '/')
+  src_path = None
+  full_paths = []
+  for base_path in UPSTREAM_SEARCH_PATHS:
+    full_path = f'{base_path}{relative_path}.java'
+    full_paths.append(full_path)
+    if has_file_in_tree(full_path, commit.tree):
+      src_path = full_path
+      break
+
+  return src_path
+
+
+def find_src_path_from_ojluni_path(commit: Commit, ojluni_path: str) -> str:
+  """Returns a source path that guessed from the ojluni_path."""
+  base_paths = None
+  relative_path = None
+  if ojluni_path.startswith(OJLUNI_JAVA_BASE_PATH):
+    base_paths = UPSTREAM_JAVA_BASE_PATHS
+    length = len(OJLUNI_JAVA_BASE_PATH)
+    relative_path = ojluni_path[length:]
+  elif ojluni_path.startswith(OJLUNI_TEST_PATH):
+    base_paths = UPSTREAM_TEST_PATHS
+    length = len(OJLUNI_TEST_PATH)
+    relative_path = ojluni_path[length:]
+  else:
+    return None
+
+  for base_path in base_paths:
+    full_path = base_path + relative_path
+    if has_file_in_tree(full_path, commit.tree):
+      return full_path
+
+  return None
+
+
+def autocomplete_existing_ojluni_path(input_path: str,
+                                      existing_paths: List[str]) -> Set[str]:
+  """Returns a set of existing file paths matching the given partial path."""
+  path_matches = list(
+      filter(lambda path: path.startswith(input_path), existing_paths))
+  result_set: Set[str] = set()
+  # if it's found, just return the result
+  if input_path in path_matches:
+    result_set.add(input_path)
+  else:
+    input_ojluni_path = PurePath(input_path)
+    # the input ends with '/', the autocompletion result contain the children
+    # instead of the matching the prefix in its parent directory
+    input_path_parent_or_self = input_ojluni_path
+    if not input_path.endswith('/'):
+      input_path_parent_or_self = input_path_parent_or_self.parent
+    n_parts = len(input_path_parent_or_self.parts)
+    for match in path_matches:
+      path = PurePath(match)
+      # path.parts[n_parts] should not exceed the index and should be
+      # a valid child path because input_path_parent_or_self must be a
+      # valid directory
+      child = list(path.parts)[n_parts]
+      result = (input_path_parent_or_self / child).as_posix()
+      # if result is not exact, the result represents a directory.
+      if result != match:
+        result += '/'
+      result_set.add(result)
+
+  return result_set
+
+
+def convert_path_to_java_class_name(path: str, base_path: str) -> str:
+  base_len = len(base_path)
+  result = path[base_len:]
+  if result.endswith('.java'):
+    result = result[0:-5]
+  result = result.replace('/', '.')
+  return result
+
+
+def autocomplete_existing_class_name(input_class_name: str,
+                                     existing_paths: List[str]) -> List[str]:
+  """Returns a list of package / class names given the partial class name."""
+  # If '/' exists, it's probably a path, not a partial class name
+  if '/' in input_class_name:
+    return []
+
+  result_list = []
+  partial_relative_path = input_class_name.replace('.', '/')
+  for base_path in [OJLUNI_JAVA_BASE_PATH, OJLUNI_TEST_PATH]:
+    partial_ojluni_path = base_path + partial_relative_path
+    result_paths = autocomplete_existing_ojluni_path(partial_ojluni_path,
+                                                     existing_paths)
+    # pylint: disable=cell-var-from-loop
+    result_list.extend(
+        map(lambda path: convert_path_to_java_class_name(path, base_path),
+            list(result_paths)))
+
+  return result_list
+
+
+def autocomplete_tag_or_commit(str_tag_or_commit: str) -> List[str]:
+  """Returns a list of tags / commits matching the given partial string."""
+  if str_tag_or_commit is None:
+    str_tag_or_commit = ''
+  return list(
+      filter(lambda tag: tag.startswith(str_tag_or_commit), AUTOCOMPLETE_TAGS))
+
+
+def autocomplete_upstream_path(input_path: str, commit: Commit,
+                               excluded_paths: Set[str]) -> List[str]:
+  """Returns a list of source paths matching the given partial string."""
+  result_list = []
+
+  def append_if_not_excluded(path: str) -> None:
+    nonlocal result_list, excluded_paths
+    if path not in excluded_paths:
+      result_list.append(path)
+
+  search_tree = commit.tree
+  path_obj = PurePath(input_path)
+  is_exact = has_file_in_tree(path_obj.as_posix(), search_tree)
+  search_word = ''
+  if is_exact:
+    git_obj = search_tree[path_obj.as_posix()]
+    if isinstance(git_obj, Blob):
+      append_if_not_excluded(input_path)
+      return result_list
+    else:
+      # git_obj is a tree
+      search_tree = git_obj
+  elif len(path_obj.parts) >= 2:
+    parent_path = path_obj.parent.as_posix()
+    if has_file_in_tree(parent_path, search_tree):
+      search_tree = search_tree[parent_path]
+      search_word = path_obj.name
+    else:
+      # Return empty list because no such path is found
+      return result_list
+  else:
+    search_word = input_path
+
+  for tree in search_tree.trees:
+    tree_path = PurePath(tree.path)
+    if tree_path.name.startswith(search_word):
+      append_if_not_excluded(tree.path)
+
+  for blob in search_tree.blobs:
+    blob_path = PurePath(blob.path)
+    if blob_path.name.startswith(search_word):
+      append_if_not_excluded(blob.path)
+
+  return result_list
+
+
+def autocomplete_upstream_class(input_class_name: str, commit: Commit,
+                                excluded_paths: Set[str]) -> List[str]:
+  """Return a list of package / class names from given commit and input."""
+  # If '/' exists, it's probably a path, not a class name.
+  if '/' in input_class_name:
+    return []
+
+  result_list = []
+  for base_path in UPSTREAM_SEARCH_PATHS:
+    base_len = len(base_path)
+    path = base_path + input_class_name.replace('.', '/')
+    path_results = autocomplete_upstream_path(path, commit, excluded_paths)
+    for p in path_results:
+      relative_path = p[base_len:]
+      if relative_path.endswith('.java'):
+        relative_path = relative_path[0:-5]
+      result_list.append(relative_path.replace('/', '.'))
+
+  return result_list
+
+
+COMMAND_ACTIONS = ['add', 'modify', 'sort']
+
+
+def autocomplete_action(partial_str: str) -> None:
+  result_list = list(
+      filter(lambda action: action.startswith(partial_str), COMMAND_ACTIONS))
+  print('\n'.join(result_list))
+  exit(0)
+
+
+def main(argv: Sequence[str]) -> None:
+  is_auto_complete = len(argv) >= 2 and argv[0] == '--autocomplete'
+  # argparse can't help autocomplete subcommand. We implement this without
+  # argparse here.
+  if is_auto_complete and argv[1] == '1':
+    action = argv[2] if len(argv) >= 3 else ''
+    autocomplete_action(action)
+
+  # If it's for autocompletion, then all arguments are optional.
+  parser_nargs = '?' if is_auto_complete else 1
+
+  main_parser = argparse.ArgumentParser(
+      description='A command line tool modifying the EXPECTED_UPSTREAM file.')
+  # --autocomplete <int> is an 'int' argument because the value represents
+  # the raw index of the argument to be autocompleted received in the Shell,
+  # and this number is not always the same as the number of arguments
+  # received here, i.e. len(argv), for examples of empty value in the
+  # argument or autocompleting the middle argument, not last argument.
+  main_parser.add_argument(
+      '--autocomplete', type=int, help='flag when tabbing in command line')
+  subparsers = main_parser.add_subparsers(
+      dest='command', help='sub-command help')
+
+  add_parser = subparsers.add_parser(
+      'add', help='Add a new entry into the EXPECTED_UPSTREAM '
+      'file')
+  add_parser.add_argument(
+      'tag_or_commit',
+      nargs=parser_nargs,
+      help='A git tag or commit in the upstream-openjdkXXX branch')
+  add_parser.add_argument(
+      'class_or_source_file',
+      nargs=parser_nargs,
+      help='Fully qualified class name or upstream source path')
+  add_parser.add_argument(
+      'ojluni_path', nargs='?', help='Destination path in ojluni/')
+
+  modify_parser = subparsers.add_parser(
+      'modify', help='Modify an entry in the EXPECTED_UPSTREAM file')
+  modify_parser.add_argument(
+      'class_or_ojluni_path', nargs=parser_nargs, help='File path in ojluni/')
+  modify_parser.add_argument(
+      'tag_or_commit',
+      nargs=parser_nargs,
+      help='A git tag or commit in the upstream-openjdkXXX branch')
+  modify_parser.add_argument(
+      'source_file', nargs='?', help='A upstream source path')
+
+  subparsers.add_parser(
+      'sort', help='Sort the entries in the EXPECTED_UPSTREAM file')
+
+  args = main_parser.parse_args(argv)
+
+  expected_upstream_file = ExpectedUpstreamFile()
+  expected_entries = expected_upstream_file.read_all_entries()
+
+  if is_auto_complete:
+    no_args = args.autocomplete
+
+    autocomp_result = []
+    if args.command == 'modify':
+      if no_args == 2:
+        input_class_or_ojluni_path = args.class_or_ojluni_path
+        if input_class_or_ojluni_path is None:
+          input_class_or_ojluni_path = ''
+
+        existing_dst_paths = list(
+            map(lambda entry: entry.dst_path, expected_entries))
+        # Case 1: Treat the input as file path
+        autocomp_result += autocomplete_existing_ojluni_path(
+            input_class_or_ojluni_path, existing_dst_paths)
+
+        # Case 2: Treat the input as java package / class name
+        autocomp_result += autocomplete_existing_class_name(
+            input_class_or_ojluni_path, existing_dst_paths)
+      elif no_args == 3:
+        autocomp_result += autocomplete_tag_or_commit(args.tag_or_commit)
+    elif args.command == 'add':
+      if no_args == 2:
+        autocomp_result += autocomplete_tag_or_commit(args.tag_or_commit)
+      elif no_args == 3:
+        commit = get_commit_or_exit(args.tag_or_commit)
+        class_or_src_path = args.class_or_source_file
+        if class_or_src_path is None:
+          class_or_src_path = ''
+
+        existing_src_paths = set(map(lambda e: e.src_path, expected_entries))
+        autocomp_result += autocomplete_upstream_path(class_or_src_path, commit,
+                                                      existing_src_paths)
+
+        autocomp_result += autocomplete_upstream_class(class_or_src_path,
+                                                       commit,
+                                                       existing_src_paths)
+
+    print('\n'.join(autocomp_result))
+    exit(0)
+
+  if args.command == 'modify':
+    dst_class_or_file = args.class_or_ojluni_path[0]
+    dst_file = translate_from_class_name_to_ojluni_path(dst_class_or_file)
+    matches = list(filter(lambda e: dst_file == e.dst_path, expected_entries))
+    if not matches:
+      error_and_exit(f'{dst_file} is not found in the EXPECTED_UPSTREAM.')
+    entry: ExpectedUpstreamEntry = matches[0]
+    str_tag_or_commit = args.tag_or_commit[0]
+    is_src_given = args.source_file is not None
+    src_path = args.source_file if is_src_given else entry.src_path
+    commit = get_commit_or_exit(str_tag_or_commit)
+    if has_file_in_tree(src_path, commit.tree):
+      pass
+    elif not is_src_given:
+      guessed_src_path = find_src_path_from_ojluni_path(commit, dst_file)
+      if guessed_src_path is None:
+        error_and_exit('[source_file] argument is required.')
+      src_path = guessed_src_path
+    else:
+      error_and_exit(f'{src_path} is not found in the {str_tag_or_commit}')
+    entry.git_ref = str_tag_or_commit
+    entry.src_path = src_path
+    expected_upstream_file.write_all_entries(expected_entries)
+    print(f'Modified the entry {entry}')
+  elif args.command == 'add':
+    class_or_src_path = args.class_or_source_file[0]
+    str_tag_or_commit = args.tag_or_commit[0]
+    commit = get_commit_or_exit(str_tag_or_commit)
+    src_path = find_src_path_from_class(commit, class_or_src_path)
+    if src_path is None:
+      error_and_exit(f'{class_or_src_path} is not found in {commit}. '
+                     f'The search paths are:\n{UPSTREAM_SEARCH_PATHS}')
+    ojluni_path = args.ojluni_path
+    # Guess the source path if it's not given in the argument
+    if ojluni_path is None:
+      ojluni_path = translate_src_path_to_ojluni_path(src_path)
+    if ojluni_path is None:
+      error_and_exit('The ojluni destination path is not given.')
+
+    matches = list(
+        filter(lambda e: ojluni_path == e.dst_path, expected_entries))
+    if matches:
+      error_and_exit(f"Can't add the file {ojluni_path} because "
+                     f'{class_or_src_path} exists in the EXPECTED_UPSTREAM')
+
+    new_entry = ExpectedUpstreamEntry(ojluni_path, str_tag_or_commit, src_path)
+    expected_upstream_file.write_new_entry(new_entry, expected_entries)
+  elif args.command == 'sort':
+    expected_upstream_file.sort_and_write_all_entries(expected_entries)
+  else:
+    error_and_exit(f'Unknown subcommand: {args.command}')
+
+
+if __name__ == '__main__':
+  main(sys.argv[1:])
diff --git a/tools/expected_upstream/ojluni_refresh_files.py b/tools/expected_upstream/ojluni_refresh_files.py
new file mode 100755
index 0000000..3211180
--- /dev/null
+++ b/tools/expected_upstream/ojluni_refresh_files.py
@@ -0,0 +1,236 @@
+#!/usr/bin/python3 -B
+
+# Copyright 2021 The Android Open Source Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Read the EXPECTED_UPSTREAM and update the files from the upstream."""
+
+import logging
+# pylint: disable=g-importing-member
+from pathlib import Path
+import sys
+from typing import List
+
+# pylint: disable=g-multiple-import
+from common_util import (
+    ExpectedUpstreamEntry,
+    ExpectedUpstreamFile,
+    has_file_in_tree,
+    LIBCORE_DIR,
+)
+
+from git import (
+    Blob,
+    IndexFile,
+    Repo,
+)
+
+# Enable INFO logging for error emitted by GitPython
+logging.basicConfig(level=logging.INFO)
+
+# Pick an arbitrary existing commit with an empty tree
+EMPTY_COMMIT_SHA = "d85bc16ba1cdcc20bec6fcbfe46dc90f9fcd2f78"
+
+
+def validate_and_remove_updated_entries(
+    entries: List[ExpectedUpstreamEntry],
+    repo: Repo) -> List[ExpectedUpstreamEntry]:
+  """Returns a list of entries of which the file content needs to be updated."""
+  head_tree = repo.head.commit.tree
+  result: List[ExpectedUpstreamEntry] = []
+
+  for e in entries:
+    try:
+      # The following step validate each entry by querying the git database
+      commit = repo.commit(e.git_ref)
+      source_blob = commit.tree.join(e.src_path)
+      if not has_file_in_tree(e.dst_path, head_tree):
+        # Add the entry if the file is missing in the HEAD
+        result.append(e)
+        continue
+
+      dst_blob = head_tree.join(e.dst_path)
+      # Add the entry if the content is different.
+      # data_stream will be close during GC.
+      if source_blob.data_stream.read() != dst_blob.data_stream.read():
+        result.append(e)
+    except:
+      print(f"ERROR: reading entry: {e}", file=sys.stderr)
+      raise
+
+  return result
+
+
+def partition_entries_by_ref(
+    entries: List[ExpectedUpstreamEntry]) -> List[List[ExpectedUpstreamEntry]]:
+  result_map = {}
+  for e in entries:
+    if result_map.get(e.git_ref) is None:
+      result_map[e.git_ref] = []
+    result_map[e.git_ref].append(e)
+
+  return list(result_map.values())
+
+
+THIS_TOOL_PATH = Path(__file__).relative_to(LIBCORE_DIR)
+MSG_FIRST_COMMIT = ("Import {summary} from {ref}\n"
+                    "\n"
+                    "List of files:\n"
+                    "  {files}\n"
+                    "\n"
+                    f"Generated by {THIS_TOOL_PATH}"
+                    "\n"
+                    "Test: N/A")
+
+MSG_SECOND_COMMIT = ("Merge {summary} from {ref} into the "
+                     " expected_upstream branch\n"
+                     "\n"
+                     "List of files:\n"
+                     "  {files}\n"
+                     "\n"
+                     f"Generated by {THIS_TOOL_PATH}"
+                     "\n"
+                     "Test: N/A")
+
+
+def merge_files_and_create_commit(entry_set: List[ExpectedUpstreamEntry],
+                                  repo: Repo) -> None:
+  r"""Create the commits importing the given files into the current branch.
+
+  `--------<ref>---------------   aosp/upstream_openjdkXXX
+             \
+        <first_commit>
+              \
+  -------<second_commit>------   expected_upstream
+
+  This function creates the 2 commits, i.e. first_commit and second_commit, in
+  the diagram. The goal is to checkout a subset files specified in the
+  entry_set, and merged into the pected_upstream branch in order to keep the
+  git-blame history of the individual files. first_commit is needed in order
+  to move the files specified in the entry_set.
+
+  In the implementation, first_commit isn't really modified from the ref, but
+  created from an empty tree, and all files in entry_set will be added into
+  the first_commit, second_commit is a merged commit and modified from
+  the parent in the expected_upstream branch, and any file contents in the
+  first commit will override the file content in the second commit.
+
+  You may reference the following git commands for understanding which should
+  create the same commits, but the python implementation is cleaner, because
+  it doesn't change the working tree or create a new branch.
+  first_commit:
+      git checkout -b temp_branch <entry.git_ref>
+      rm -r * .jcheck/ .hgignore .hgtags # Remove hidden files
+      git checkout <entry.git_ref> <entry.src_path>
+      mkdir -p <entry.dst_path>.directory && git mv <entry.src_path>
+      <entry.dst_path>
+      git commit -a
+  second_commit:
+      git merge temp_branch
+      git checkout HEAD -- ojluni/ # Force checkout to resolve merge conflict
+      git checkout temp_branch -- <entry.dst_path>
+      git commit
+
+  Args:
+    entry_set: a list of entries
+    repo: the repository object
+  """
+  ref = entry_set[0].git_ref
+  upstream_commit = repo.commit(ref)
+
+  # We need an index empty initially, i.e. no staged files.
+  # Note that the empty commit is not the parent. The parents can be set later.
+  first_index = IndexFile.from_tree(repo, repo.commit(EMPTY_COMMIT_SHA))
+  for entry in entry_set:
+    src_blob = upstream_commit.tree[entry.src_path]
+    # Write into the file system directly because GitPython provides no API
+    # writing into the index in memory. IndexFile.move doesn't help here,
+    # because the API requires the file on the working tree too.
+    # However, it's fine, because we later reset the HEAD to the second commit.
+    # The user expects the file showing in the file system, and the file is
+    # not staged/untracked because the file is in the second commit too.
+    Path(entry.dst_path).parent.mkdir(parents=True, exist_ok=True)
+    with open(entry.dst_path, "wb") as file:
+      file.write(src_blob.data_stream.read())
+    first_index.add(entry.dst_path)
+
+  dst_paths = [e.dst_path for e in entry_set]
+  str_dst_paths = "\n  ".join(dst_paths)
+  summary_msg = "files"
+  if len(entry_set) == 1:
+    summary_msg = Path(entry_set[0].dst_path).stem
+  msg = MSG_FIRST_COMMIT.format(
+      summary=summary_msg, ref=ref, files=str_dst_paths)
+
+  first_commit = first_index.commit(
+      message=msg, parent_commits=[upstream_commit], head=False)
+
+  # The second commit is a merge commit. It doesn't use the current index,
+  # i.e. repo.index, to avoid affecting the current staged files.
+  prev_head = repo.active_branch.commit
+  second_index = IndexFile.from_tree(repo, prev_head)
+  blob_filter = lambda obj, i: isinstance(obj, Blob)
+  blobs = first_commit.tree.traverse(blob_filter)
+  second_index.add(blobs)
+  msg = MSG_SECOND_COMMIT.format(
+      summary=summary_msg, ref=ref, files=str_dst_paths)
+  second_commit = second_index.commit(
+      message=msg, parent_commits=[prev_head, first_commit], head=True)
+
+  # We updated the HEAD to the second commit. Thus, git-reset updates the
+  # current index. Otherwise, the current index, aka, repo.index, shows that
+  # the files are deleted.
+  repo.index.reset(paths=dst_paths)
+
+  print(f"New merge commit {second_commit} contains:")
+  print(f"  {str_dst_paths}")
+
+
+def create_commits(repo: Repo) -> None:
+  """Create the commits importing files according to the EXPECTED_UPSTREAM."""
+  current_tracking_branch = repo.active_branch.tracking_branch()
+  if current_tracking_branch.name != "aosp/expected_upstream":
+    print("This script should only run on aosp/expected_upstream branch. "
+          f"Currently, this is on branch {repo.active_branch} "
+          f"tracking {current_tracking_branch}")
+
+  print("Reading EXPECTED_UPSTREAM file...")
+  expected_upstream_entries = ExpectedUpstreamFile().read_all_entries()
+
+  outdated_entries = validate_and_remove_updated_entries(
+      expected_upstream_entries, repo)
+
+  if not outdated_entries:
+    print("No need to update. All files are updated.")
+    return
+
+  print("The following entries will be updated from upstream")
+  for e in outdated_entries:
+    print(f"  {e.dst_path}")
+
+  entry_sets_to_be_merged = partition_entries_by_ref(outdated_entries)
+
+  for entry_set in entry_sets_to_be_merged:
+    merge_files_and_create_commit(entry_set, repo)
+
+
+def main():
+  repo = Repo(LIBCORE_DIR.as_posix())
+  try:
+    create_commits(repo)
+  finally:
+    repo.close()
+
+
+if __name__ == "__main__":
+  main()