-
Notifications
You must be signed in to change notification settings - Fork 857
Feature/schema endpoint #21836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Feature/schema endpoint #21836
Conversation
… type; added tests for schema/view endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I did not complete all files, I will continue tomorrow.
I already have some feedback in the comments.
In general what I have seen looks already pretty good 👍
@@ -1,5 +1,9 @@ | |||
devel | |||
----- | |||
* Add new API GET /_api/schema to show graphs, views, collections along with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add new API GET /_api/schema to show graphs, views, collections along with | |
* Add new API GET /_api/schema to show the sampled schema of graphs, views, collections along with |
/// Copyright 2014-2025 ArangoDB GmbH, Cologne, Germany | ||
/// Copyright 2004-2014 triAGENS GmbH, Cologne, Germany |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Copyright 2014-2025 ArangoDB GmbH, Cologne, Germany | |
/// Copyright 2004-2014 triAGENS GmbH, Cologne, Germany | |
/// Copyright 2025-2025 ArangoDB GmbH, Cologne, Germany |
} | ||
|
||
auto const& suffix = _request->suffixes(); | ||
switch (suffix.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, I think we forgot to talk about user rights to access this.
You can see how it would be done here:
https://github.com/arangodb/arangodb/blob/b14b418811b3f4e292429310a5de41f78325b957/arangod/RestHandler/RestAdminClusterHandler.cpp#L2367C1-L2372C4
My suggestion right now:
RO on "canUseDatabase" and "RW" on canUseCollection" for the collection schema.
And RW on "canUseDatabase" for everything else.
We may lift this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added permission check if blocks and will show them on Friday.
return RestStatus::DONE; | ||
} | ||
|
||
RestStatus RestSchemaHandler::lookupSchema(uint64_t sampleNum, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this also return a Result, and move the complete "write response or write error" into the execute method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic of those helper methods and will show them on Friday.
if (graphsRes.fail()) { | ||
return RestStatus::DONE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is correct when looking into above method.
However this looks strange, as suggested above I would not have any function return the RestStatus.
But all functions return a Result.
And then in the execute react on the Result.fail() in one central place one time.
Then it is clear that on any error case the error message is written.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that one central place should react all the Result.fail() so it will be clear that where the error came from. On the other hand, I called generateError() in the stemmed method so the error message is concrete.
if (!data.hasKey("links")) { | ||
Result res{TRI_ERROR_ARANGO_DATA_SOURCE_NOT_FOUND, | ||
std::format("View {} has no links", viewName)}; | ||
generateError(ResponseCode::NOT_FOUND, res.errorNumber(), | ||
res.errorMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think having a view without links is valid.
it is not useful in production though, but we may just look at a bad point in time (e.g. during creation of the view)
I think I would let this through without an error.
Then of course if asking for the view it should give an empty list of collections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. So, I changed the method so when data
doesn't have links
yet, it will be an empty JSON array instead of generateError().
linksBuilder.openArray(); | ||
for (auto li : velocypack::ObjectIterator(data.get("links"))) { | ||
auto colName = li.key.copyString(); | ||
auto colValue = li.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is valid to have a View with "includeAllFields: true" would this error out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't been aware of includeAllFields
actually. I implemented a method to fetch all the attributes and its analyzers with the limit of sampleNum
, but I have some questions on this.
velocypack::Builder colBuilder; | ||
colBuilder.openObject(); | ||
colBuilder.add("collectionName", velocypack::Value(colName)); | ||
colSet.insert(colName); | ||
colBuilder.add("fields", fieldsBuilder.slice()); | ||
colBuilder.close(); | ||
linksBuilder.add(colBuilder.slice()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is correct.
However it would need to copy the information from the fieldsBuilder into the colBuilder, and then yet again into the linksBuilder.
Can you instead write the information directly into the links builder?
I think if you start with linksBuilder.openObject(); at the beginning of the loop you can write into it directly.
Maybe you want to even check the ObjectGuard (this directly does a close() for you when getting out of scope, so you cannot forget it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed so it directly writes to one viewBuilder
instead of instantiating many Builder object and copying.
Please remember adding rta-makedata tests, so dump & restore can be tested as well. |
…r permission check, and includeAllFields option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already in a very good shape!
I have some mostly minor requests to change.
@@ -693,6 +694,11 @@ void GeneralServerFeature::defineRemainingHandlers( | |||
RestHandlerCreator<RestSimpleHandler>::createData<aql::QueryRegistry*>, | |||
queryRegistry); | |||
|
|||
f.addPrefixHandler( | |||
"/_api/schema", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically define those as static strings.
Please look at arangod/RestHandler/RestVocbaseBaseHandler.cpp and arangod/RestHandler/RestVocbaseBaseHandler.h
And add the Schema path there.
); | ||
gm._create( | ||
GRAPH_MANUFACTURE, | ||
[ gm._relation(EDGE_MANUFACTURED, COLLECTION_COMPANIES, COLLECTION_PRODUCTS) ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the sake of completeness:
-
Can you make one of these two graphs hold two Relations e.g. add EDGE_CONTAINS, COLLECTION_PRODUCTS, COLLECTION_MATERIALS to the GRAPH_MANUFACTURE.
So we can see the API can handle multiple edge collections, and handle overlapping vertex collections. -
I think I have not talked about orphan collections yet.
A graph can have Vertex collections that do not have any connected Edge Collection. They are added in the third parameter which is an Array. Like this:
gm._create("myGraph", [gm._relation("e", "v","v")], ["orphan"]);
It should be visible as Orphans in the GraphSchema.
And more importantly the orphan needs to be part of the Collection list when asked for the Graph Schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the code:
(1) GraphSchema now has the attribute orphans
and orphan collections are also shown in the collections list.
(2) Added the case where a graph has 2 relations, from
and to
has 2 documents, and orphans
has 2 documents.
(3) Added assertions for the case of (3).
(4) This line ensures that no duplicated collections are shown.
productsCollection.dropIndex(prodIndex); | ||
customersCollection.dropIndex(cusIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay, but not required.
Dropping the collection will drop all Indexes
db._drop(COLLECTION_CUSTOMERS); | ||
db._drop(COLLECTION_PRODUCTS); | ||
db._drop(COLLECTION_COMPANIES); | ||
db._drop(EDGE_PURCHASED); | ||
db._drop(EDGE_MANUFACTURED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good approach, but this does not cover the Indexes or Graphs.
I would suggest the following:
Create a function "tearDown" in Line 59.
Which takes the full implementation of "tearDownAll" below (that one is good and complete!)
Then at this place call tearDown();
And below you set:
tearDownAll: tearDown
This way you need to implement the cleanup only once.
And the setup makes sure nothing is left from any previously failed test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great idea! I changed.
assertEqual(descView.links.length, 2, 'descView should have 2 links'); | ||
}, | ||
|
||
test_InvalidSampleNumValues_ShouldReturn400: function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the following three tests:
test_InvalidSampleNumValues_ShouldReturn400
test_InvalidExampleNumValues_ShouldReturn400
test_ExampleNumGreaterThanSampleNum_ShouldReturn400
They only test the base "schema" endpoint.
I would like to make sure that collection, graph and view are also following this approach.
Therefore can you add another loop in those tests
path = ["", "/collection/products", "/graph/purchaseHistory", "/view/descView"]
path.forEach(p => {
const doc = arango.GET_RAW(api + path + param);
<< your code here >>
})
Just make sure collection / graph / view exists and I did not add a typo in the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this also goes for:
test_ExampleNumZero_ShouldReturnNoExamples
test_SampleNumZero_ShouldReturn400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of that! I added forEach for all the parameter validation tests not only your suggestions.
|
||
const auto indType = ind.get("type").stringView(); | ||
if (indType != "primary" && indType != "edge") { | ||
builder.add(velocypack::Value(velocypack::ValueType::Object)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is correct.
This is not a strict suggestion, just please check if this makes sense or not.
VelocyPack allows for a keep Method.
From the Tests it works like this:
TEST(CollectionTest, KeepSomeAttributes) {
std::string const value(
"{"foo":"bar","baz":"quux","number":1,"boolean":true,"
""empty":null}");
Parser parser;
parser.parse(value);
Slice s(parser.start());
std::vectorstd::string const toKeep = {"foo", "baz", "empty"};
Builder b = Collection::keep(s, toKeep);
s = b.slice();
ASSERT_TRUE(s.isObject());
ASSERT_EQ(3U, s.length());
ASSERT_TRUE(s.hasKey("foo"));
ASSERT_EQ("bar", s.get("foo").copyString());
ASSERT_TRUE(s.hasKey("baz"));
ASSERT_EQ("quux", s.get("baz").copyString());
ASSERT_TRUE(s.hasKey("empty"));
ASSERT_TRUE(s.get("empty").isNull());
ASSERT_FALSE(s.hasKey("number"));
ASSERT_FALSE(s.hasKey("boolean"));
ASSERT_FALSE(s.hasKey("quetzalcoatl"));
}
It should help to keep exact attributes.
Mabye that is easier to use for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenient method! I changed the code. I thought I could use it for edgeDefinition but it is not a json object so I couldn't use it.
const std::string graphQueryString = R"( | ||
FOR g IN _graphs | ||
RETURN { | ||
name: g._key, | ||
relations: g.edgeDefinitions | ||
} | ||
)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to get away without implementing this query here.
This functionality should be provided by the GraphManager.
I know it does not right now.
But I think it would be easy to add there.
It has an implementation of:
ResultT<std::unique_ptr> lookupGraphByName(
std::string const& name) const;
Which gives us one Graph Object.
And Result readGraphs(velocypack::Builder& builder) const;
Which gives us all Graphs in JSON format.
Could you check if you can implement a lookupAllGraphs method?
e.g. ResultT<std::vector<std::unique_ptr>> lookupAllGraphs() const;
By combining the two?
I am okay if the GraphManager would need to run an AQL inside that function.
But this would encapsulate the logic into that one Class only, and it does not leak the information on where Graphs are stored and in which format to any other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the code:
- Implemented
lookupAllGraphs()
in GraphManager class.

getGraphAndCollections()
takesGraph& graph
and callsgraph.edgeDefinitions()
to populate the builder object.

getAllGraphsAndCollections()
calls_graphManager.lookupAllGraphs()
to obtainvector<unique_ptr<Graph>>
, and then callsgetGraphAndCollections()
.
const std::string graphQueryString = R"( | ||
FOR g IN _graphs | ||
FILTER g._key == @graphName | ||
RETURN { | ||
name: g._key, | ||
relations: g.edgeDefinitions | ||
} | ||
)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the graphManager lookepGraphByName here?
constexpr std::string_view moduleName("graph management"); | ||
} | ||
|
||
const std::string RestSchemaHandler::queryStr = R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this query 👍
using namespace arangodb::rest; | ||
|
||
namespace { | ||
constexpr std::string_view moduleName("graph management"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr std::string_view moduleName("graph management"); | |
constexpr std::string_view moduleName("schema endpoint"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module name is to describe where the call came from.
So this should be the name of the Schema endpoint
Scope & Purpose
Added schema endpoint to show the overview of a database collection.
(Please describe the changes in this PR for reviewers, motivation, rationale - mandatory)
Checklist
Related Information
(Please reference tickets / specification / other PRs etc)