-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathnotes.txt
132 lines (121 loc) · 3.47 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
todo:
Evaluate go 1.7's flate library for fix
Handle missing external refs (full pull, "TODO mark repo as iffy")
Determine what happens on renamed repos, figure out what to do internally
Detect deleted repos
Handle DMCA'd repos
Don't die on repeated influxdb errors?
Consider tagging influxdb writes with operation type (pack, update)
Don't log object writes for update queue: they won't happen
done:
Don't close globpack as soon as a gitpack is done.
Rotate globpacks on the hour
Delete packs from (queue|disk) when done
Double-check what happens on external reference error
Don't unconditionally delete refpacks when closing them
Update procedure:
* Queue a repository update
* Claim the repo update
* Get last known refs
* Get new refs
* Calculate difference, store delta (but don't update last known refs!)
* Download new commits
* If failing with an empty response, retry up to twice, then queue a new
update.
* If failing with a hash error, log an error, don't queue a new update.
* If it succeeds:
* Queue reading the pack.
* Update last known refs
* Delete update from queue
Multistep pack reading:
Read whole pack:
For non-delta objects:
Store location in baseObjects[]
For delta objects:
Store location in descendedFrom[basehash][]
For each loc := range baseObjects:
Read object at loc
Write object
DFS search descendedFrom:
if obj.hash in descendedFrom, for loc := range descendedFrom[obj.hash]:
Read object at loc
Apply delta
Write object
DFS search descendedFrom.
For each basehash, objs := descendedFrom:
Lookup basehash in existing globpacks:
If present:
Read object at loc
Read base object
Apply delta
DFS search descendedFrom
If not present:
Next iteration
If len(descendedFrom) > 0, queue a full update of the repository, and exit
Hypothetical Index (doesn't exist, may not ever):
bytes | description
0-7 | Static header "gidx\x00\x0d\x0a\xa5"
8-11 | Version number "\x00\x00\x00\x09"
12 | Number of bytes in offset
Records:
20 bytes | Object ID
N bytes | Offset in packfile
CREATE TABLE queued_packs
(
filename text NOT NULL,
repo_path text NOT NULL,
repo_id integer NOT NULL,
queue_time timestamp with time zone NOT NULL DEFAULT now(),
in_progress smallint NOT NULL DEFAULT 0,
CONSTRAINT pk_filename PRIMARY KEY (filename)
)
WITH (
OIDS=FALSE
);
ALTER TABLE queued_packs
OWNER TO gitglob;
CREATE INDEX idx_queue_time
ON queued_packs
USING btree
(queue_time);
CREATE TABLE refs_latest
(
repo_id bigint NOT NULL,
"timestamp" timestamp with time zone NOT NULL,
fetches integer NOT NULL DEFAULT 1,
ref_names text[],
ref_hashes bytea[],
CONSTRAINT refs_latest_pkey PRIMARY KEY (repo_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE refs_latest
OWNER TO gitglob;
CREATE TABLE refs_history
(
id bigserial NOT NULL,
repo_id bigint NOT NULL,
"timestamp" timestamp with time zone NOT NULL,
type smallint NOT NULL,
from_ts timestamp with time zone,
new_names text[],
new_hashes bytea[],
changed_names text[],
changed_hashes bytea[],
deleted_names text[],
CONSTRAINT refs_history_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE refs_history
OWNER TO gitglob;
CREATE INDEX refs_history_repo_id_idx
ON refs_history
USING btree
(repo_id);
CREATE INDEX refs_history_repo_id_timestamp_idx
ON refs_history
USING btree
(repo_id, "timestamp");