Skip to content

Feature/schema endpoint #21836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: devel
Choose a base branch
from
Open

Feature/schema endpoint #21836

wants to merge 33 commits into from

Conversation

knakatasf
Copy link

@knakatasf knakatasf commented Jul 2, 2025

Scope & Purpose

Added schema endpoint to show the overview of a database collection.
(Please describe the changes in this PR for reviewers, motivation, rationale - mandatory)

  • 💩 Bugfix
  • 🍕 New feature
  • 🔥 Performance improvement
  • 🔨 Refactoring/simplification

Checklist

  • Tests
    • Regression tests
    • C++ Unit tests
    • integration tests
    • resilience tests
  • 📖 CHANGELOG entry made
  • 📚 documentation written (release notes, API changes, ...)
  • Backports
    • Backport for 3.12.0: (Please link PR)
    • Backport for 3.11: (Please link PR)
    • Backport for 3.10: (Please link PR)

Related Information

(Please reference tickets / specification / other PRs etc)

@cla-bot cla-bot bot added the cla-signed label Jul 2, 2025
@knakatasf knakatasf requested a review from mchacki July 2, 2025 17:37
@knakatasf knakatasf self-assigned this Jul 2, 2025
Copy link
Member

@mchacki mchacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I did not complete all files, I will continue tomorrow.
I already have some feedback in the comments.

In general what I have seen looks already pretty good 👍

@@ -1,5 +1,9 @@
devel
-----
* Add new API GET /_api/schema to show graphs, views, collections along with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Add new API GET /_api/schema to show graphs, views, collections along with
* Add new API GET /_api/schema to show the sampled schema of graphs, views, collections along with

Comment on lines 4 to 5
/// Copyright 2014-2025 ArangoDB GmbH, Cologne, Germany
/// Copyright 2004-2014 triAGENS GmbH, Cologne, Germany
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Copyright 2014-2025 ArangoDB GmbH, Cologne, Germany
/// Copyright 2004-2014 triAGENS GmbH, Cologne, Germany
/// Copyright 2025-2025 ArangoDB GmbH, Cologne, Germany

}

auto const& suffix = _request->suffixes();
switch (suffix.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I think we forgot to talk about user rights to access this.

You can see how it would be done here:
https://github.com/arangodb/arangodb/blob/b14b418811b3f4e292429310a5de41f78325b957/arangod/RestHandler/RestAdminClusterHandler.cpp#L2367C1-L2372C4

My suggestion right now:
RO on "canUseDatabase" and "RW" on canUseCollection" for the collection schema.
And RW on "canUseDatabase" for everything else.

We may lift this later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added permission check if blocks and will show them on Friday.

return RestStatus::DONE;
}

RestStatus RestSchemaHandler::lookupSchema(uint64_t sampleNum,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this also return a Result, and move the complete "write response or write error" into the execute method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic of those helper methods and will show them on Friday.

Comment on lines 148 to 150
if (graphsRes.fail()) {
return RestStatus::DONE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is correct when looking into above method.
However this looks strange, as suggested above I would not have any function return the RestStatus.
But all functions return a Result.
And then in the execute react on the Result.fail() in one central place one time.
Then it is clear that on any error case the error message is written.

WDYT?

Copy link
Author

@knakatasf knakatasf Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that one central place should react all the Result.fail() so it will be clear that where the error came from. On the other hand, I called generateError() in the stemmed method so the error message is concrete.

Comment on lines 486 to 490
if (!data.hasKey("links")) {
Result res{TRI_ERROR_ARANGO_DATA_SOURCE_NOT_FOUND,
std::format("View {} has no links", viewName)};
generateError(ResponseCode::NOT_FOUND, res.errorNumber(),
res.errorMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think having a view without links is valid.
it is not useful in production though, but we may just look at a bad point in time (e.g. during creation of the view)
I think I would let this through without an error.
Then of course if asking for the view it should give an empty list of collections.

Copy link
Author

@knakatasf knakatasf Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. So, I changed the method so when data doesn't have links yet, it will be an empty JSON array instead of generateError().

linksBuilder.openArray();
for (auto li : velocypack::ObjectIterator(data.get("links"))) {
auto colName = li.key.copyString();
auto colValue = li.value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is valid to have a View with "includeAllFields: true" would this error out here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't been aware of includeAllFields actually. I implemented a method to fetch all the attributes and its analyzers with the limit of sampleNum, but I have some questions on this.

Comment on lines 526 to 532
velocypack::Builder colBuilder;
colBuilder.openObject();
colBuilder.add("collectionName", velocypack::Value(colName));
colSet.insert(colName);
colBuilder.add("fields", fieldsBuilder.slice());
colBuilder.close();
linksBuilder.add(colBuilder.slice());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is correct.

However it would need to copy the information from the fieldsBuilder into the colBuilder, and then yet again into the linksBuilder.

Can you instead write the information directly into the links builder?
I think if you start with linksBuilder.openObject(); at the beginning of the loop you can write into it directly.

Maybe you want to even check the ObjectGuard (this directly does a close() for you when getting out of scope, so you cannot forget it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed so it directly writes to one viewBuilder instead of instantiating many Builder object and copying.

@dothebart
Copy link
Contributor

Please remember adding rta-makedata tests, so dump & restore can be tested as well.

Copy link
Member

@mchacki mchacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is already in a very good shape!
I have some mostly minor requests to change.

@@ -693,6 +694,11 @@ void GeneralServerFeature::defineRemainingHandlers(
RestHandlerCreator<RestSimpleHandler>::createData<aql::QueryRegistry*>,
queryRegistry);

f.addPrefixHandler(
"/_api/schema",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically define those as static strings.

Please look at arangod/RestHandler/RestVocbaseBaseHandler.cpp and arangod/RestHandler/RestVocbaseBaseHandler.h
And add the Schema path there.

);
gm._create(
GRAPH_MANUFACTURE,
[ gm._relation(EDGE_MANUFACTURED, COLLECTION_COMPANIES, COLLECTION_PRODUCTS) ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the sake of completeness:

  1. Can you make one of these two graphs hold two Relations e.g. add EDGE_CONTAINS, COLLECTION_PRODUCTS, COLLECTION_MATERIALS to the GRAPH_MANUFACTURE.
    So we can see the API can handle multiple edge collections, and handle overlapping vertex collections.

  2. I think I have not talked about orphan collections yet.
    A graph can have Vertex collections that do not have any connected Edge Collection. They are added in the third parameter which is an Array. Like this:
    gm._create("myGraph", [gm._relation("e", "v","v")], ["orphan"]);
    It should be visible as Orphans in the GraphSchema.
    And more importantly the orphan needs to be part of the Collection list when asked for the Graph Schema.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the code:
(1) GraphSchema now has the attribute orphans and orphan collections are also shown in the collections list.
image

(2) Added the case where a graph has 2 relations, from and to has 2 documents, and orphans has 2 documents.
image

(3) Added assertions for the case of (3).
image

(4) This line ensures that no duplicated collections are shown.
image

Comment on lines 159 to 160
productsCollection.dropIndex(prodIndex);
customersCollection.dropIndex(cusIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, but not required.
Dropping the collection will drop all Indexes

Comment on lines 62 to 66
db._drop(COLLECTION_CUSTOMERS);
db._drop(COLLECTION_PRODUCTS);
db._drop(COLLECTION_COMPANIES);
db._drop(EDGE_PURCHASED);
db._drop(EDGE_MANUFACTURED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach, but this does not cover the Indexes or Graphs.

I would suggest the following:
Create a function "tearDown" in Line 59.
Which takes the full implementation of "tearDownAll" below (that one is good and complete!)
Then at this place call tearDown();
And below you set:

tearDownAll: tearDown
This way you need to implement the cleanup only once.
And the setup makes sure nothing is left from any previously failed test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great idea! I changed.

assertEqual(descView.links.length, 2, 'descView should have 2 links');
},

test_InvalidSampleNumValues_ShouldReturn400: function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the following three tests:

test_InvalidSampleNumValues_ShouldReturn400
test_InvalidExampleNumValues_ShouldReturn400
test_ExampleNumGreaterThanSampleNum_ShouldReturn400

They only test the base "schema" endpoint.
I would like to make sure that collection, graph and view are also following this approach.

Therefore can you add another loop in those tests

path = ["", "/collection/products", "/graph/purchaseHistory", "/view/descView"]
path.forEach(p => {
const doc = arango.GET_RAW(api + path + param);
<< your code here >>
})

Just make sure collection / graph / view exists and I did not add a typo in the suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this also goes for:

test_ExampleNumZero_ShouldReturnNoExamples
test_SampleNumZero_ShouldReturn400

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of that! I added forEach for all the parameter validation tests not only your suggestions.


const auto indType = ind.get("type").stringView();
if (indType != "primary" && indType != "edge") {
builder.add(velocypack::Value(velocypack::ValueType::Object));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is correct.

This is not a strict suggestion, just please check if this makes sense or not.

VelocyPack allows for a keep Method.
From the Tests it works like this:

TEST(CollectionTest, KeepSomeAttributes) {
std::string const value(
"{"foo":"bar","baz":"quux","number":1,"boolean":true,"
""empty":null}");

Parser parser;
parser.parse(value);
Slice s(parser.start());

std::vectorstd::string const toKeep = {"foo", "baz", "empty"};
Builder b = Collection::keep(s, toKeep);
s = b.slice();
ASSERT_TRUE(s.isObject());
ASSERT_EQ(3U, s.length());

ASSERT_TRUE(s.hasKey("foo"));
ASSERT_EQ("bar", s.get("foo").copyString());

ASSERT_TRUE(s.hasKey("baz"));
ASSERT_EQ("quux", s.get("baz").copyString());

ASSERT_TRUE(s.hasKey("empty"));
ASSERT_TRUE(s.get("empty").isNull());

ASSERT_FALSE(s.hasKey("number"));
ASSERT_FALSE(s.hasKey("boolean"));
ASSERT_FALSE(s.hasKey("quetzalcoatl"));
}

It should help to keep exact attributes.
Mabye that is easier to use for you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenient method! I changed the code. I thought I could use it for edgeDefinition but it is not a json object so I couldn't use it.

Comment on lines 451 to 457
const std::string graphQueryString = R"(
FOR g IN _graphs
RETURN {
name: g._key,
relations: g.edgeDefinitions
}
)";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to get away without implementing this query here.
This functionality should be provided by the GraphManager.
I know it does not right now.
But I think it would be easy to add there.

It has an implementation of:
ResultT<std::unique_ptr> lookupGraphByName(
std::string const& name) const;

  Which gives us one Graph Object.
  
  And   Result readGraphs(velocypack::Builder& builder) const;

Which gives us all Graphs in JSON format.

Could you check if you can implement a lookupAllGraphs method?

e.g. ResultT<std::vector<std::unique_ptr>> lookupAllGraphs() const;
By combining the two?
I am okay if the GraphManager would need to run an AQL inside that function.

But this would encapsulate the logic into that one Class only, and it does not leak the information on where Graphs are stored and in which format to any other files.

Copy link
Author

@knakatasf knakatasf Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the code:

  1. Implemented lookupAllGraphs() in GraphManager class.
image
  1. getGraphAndCollections() takes Graph& graph and calls graph.edgeDefinitions() to populate the builder object.
image
  1. getAllGraphsAndCollections() calls _graphManager.lookupAllGraphs() to obtain vector<unique_ptr<Graph>>, and then calls getGraphAndCollections().

Comment on lines 389 to 396
const std::string graphQueryString = R"(
FOR g IN _graphs
FILTER g._key == @graphName
RETURN {
name: g._key,
relations: g.edgeDefinitions
}
)";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the graphManager lookepGraphByName here?

constexpr std::string_view moduleName("graph management");
}

const std::string RestSchemaHandler::queryStr = R"(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this query 👍

using namespace arangodb::rest;

namespace {
constexpr std::string_view moduleName("graph management");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr std::string_view moduleName("graph management");
constexpr std::string_view moduleName("schema endpoint");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module name is to describe where the call came from.
So this should be the name of the Schema endpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants