Skip to content
This repository was archived by the owner on Sep 27, 2019. It is now read-only.

Invalid Index Scan Plan Selection with Multi-column Indexes #1299

Open
vkonagar opened this issue Apr 14, 2018 · 2 comments
Open

Invalid Index Scan Plan Selection with Multi-column Indexes #1299

vkonagar opened this issue Apr 14, 2018 · 2 comments
Assignees
Labels

Comments

@vkonagar
Copy link

As part of our class project "Index Selection", we are trying to generate multi-column hypothetical indexes and checking the cost of using them for a query.

The optimizer does not select a valid plan with multi-column indexes. If we create a multiple-column index say Index1 on columns say (a, b, c) on a table T, and if a query has a where clause on only (b), then the optimizer chooses Index1 in its best plan for the query. Same goes with (c) as well. Ideally, Index1 should only help queries with (a) or (a, b) or (a, b, c) as the where clause predicates. Surprisingly, even with the wrong plan, the query executed successfully.

Steps to reproduce:

  • CREATE table T(a int, b int, c int, d int, e int);
  • INSERT into T VALUES(1, 2, 3, 4, 5);
  • INSERT into T VALUES(2, 3, 4, 5, 6);
  • INSERT into T VALUES(3, 4, 5, 6, 7);
  • INSERT into T VALUES(4, 5, 6, 7, 8);
  • INSERT into T VALUES(5, 6, 7, 8, 9);
  • CREATE INDEX Index1 on T(a, c, d, e);
  • EXPLAIN SELECT * FROM T where e = 8;
    Query plan
    Plan Type: INDEXSCAN
    Info: IndexScan
    NumChildren: 0
    (5 rows)
  • SELECT * FROM T where e = 8;
    a | b | c | d | e
    ---+---+---+---+---
    4 | 5 | 6 | 7 | 8
    (1 row)
@GustavoAngulo
Copy link
Member

GustavoAngulo commented Apr 16, 2018

bool GetToIndexScan::Check(std::shared_ptr<OperatorExpression> plan,
OptimizeContext *context) const {
// If there is a index for the table, return true,
// else return false
(void)context;
const LogicalGet *get = plan->Op().As<LogicalGet>();
bool index_exist = false;
if (get != nullptr && get->table != nullptr &&
!get->table->GetIndexObjects().empty()) {
index_exist = true;
}
return index_exist;
}

This is probably the cause of the issue. The Check function essentially checks if we can use an index scan for a "Get" operator. We're only checking the existence of an index, not if we can actually use the index if there is one. I can probably take a look.

@GustavoAngulo GustavoAngulo self-assigned this Apr 16, 2018
@vkonagar
Copy link
Author

Gustavo, Thanks! I looked at the code you pointed and I think the issue is here.

if (index_col_set.find(col_id) != index_col_set.end()) {

Here, we are just checking if the given column is present in the column set of the index object but not enforcing any order in the matches. Instead, we should take the vector of column oids present in index catalog object (this preserves multi-column order as specified by the user) and check in the given order if there is any match with the columns given in the query predicates.

Something like this?

// Find match index for the predicates
auto index_objects = get->table->GetIndexObjects();
for (auto &index_id_object_pair : index_objects) {
  auto &index_id = index_id_object_pair.first;
  auto &index_object = index_id_object_pair.second;
  std::vector<oid_t> index_key_column_id_list;
  std::vector<ExpressionType> index_expr_type_list;
  std::vector<type::Value> index_value_list;

  // Only pick the index if the query columns match the index's columns in order.
  auto index_id_list = index_object->GetKeyAttrs();
  for (size_t offset = 0; (offset < key_column_id_list.size()) && (offset < index_id_list.size()); offset++) {
    auto col_id = key_column_id_list[offset];
    if (index_id_list[offset] == col_id) {
      index_key_column_id_list.push_back(col_id);
      index_expr_type_list.push_back(expr_type_list[offset]);
      index_value_list.push_back(value_list[offset]);
    } else {
      break;
    }
  }

  // Add transformed plan
  if (!index_key_column_id_list.empty()) {
    auto index_scan_op = PhysicalIndexScan::make(
        get->get_id, get->table, get->table_alias, get->predicates,
        get->is_for_update, index_id, index_key_column_id_list,
        index_expr_type_list, index_value_list);
    transformed.push_back(
        std::make_shared<OperatorExpression>(index_scan_op));
  }
}

This seems to work.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants